From: N. Brouard Date: Wed, 16 Jun 2004 12:05:30 +0000 (+0000) Subject: *** empty log message *** X-Git-Tag: imach-099s7~596 X-Git-Url: https://henry.ined.fr/git/?a=commitdiff_plain;h=538a9b8e5c50a5d119a54009404e95f99d670602;p=.git *** empty log message *** --- diff --git a/html/doc/imach.htm b/html/doc/imach.htm new file mode 100644 index 0000000..f11e984 --- /dev/null +++ b/html/doc/imach.htm @@ -0,0 +1,1311 @@ + + + + + + +Computing Health Expectancies using IMaCh + + + + + +IMaCh + + + + +
+ +

Computing Health +Expectancies using IMaCh

+ +

(a Maximum +Likelihood Computer Program using Interpolation of Markov Chains)

+ +

 

+ +

+ +

INED and EUROREVES

+ +

Version +0.8a, May 2002

+ +
+ +

Authors of the +program: Nicolas Brouard, senior researcher at the Institut +National d'Etudes Démographiques (INED, Paris) in the "Mortality, +Health and Epidemiology" Research Unit

+ +

and Agnès +Lièvre
+

+ +

Contribution to the mathematics: C. R. +Heathcote (Australian +National University, Canberra).

+ +

Contact: Agnès Lièvre (lievre@ined.fr)

+ +
+ + + +
+ +

Introduction

+ +

This program computes Healthy Life Expectancies from cross-longitudinal +data using the methodology pioneered by Laditka and Wolf (1). +Within the family of Health Expectancies (HE), Disability-free +life expectancy (DFLE) is probably the most important index to +monitor. In low mortality countries, there is a fear that when +mortality declines, the increase in DFLE is not proportionate to +the increase in total Life expectancy. This case is called the Expansion +of morbidity. Most of the data collected today, in +particular by the international REVES +network on Health expectancy, and most HE indices based on these +data, are cross-sectional. It means that the information +collected comes from a single cross-sectional survey: people from +various ages (but mostly old people) are surveyed on their health +status at a single date. Proportion of people disabled at each +age, can then be measured at that date. This age-specific +prevalence curve is then used to distinguish, within the +stationary population (which, by definition, is the life table +estimated from the vital statistics on mortality at the same +date), the disable population from the disability-free +population. Life expectancy (LE) (or total population divided by +the yearly number of births or deaths of this stationary +population) is then decomposed into DFLE and DLE. This method of +computing HE is usually called the Sullivan method (from the name +of the author who first described it).

+ +

Age-specific proportions of people disable are very difficult +to forecast because each proportion corresponds to historical +conditions of the cohort and it is the result of the historical +flows from entering disability and recovering in the past until +today. The age-specific intensities (or incidence rates) of +entering disability or recovering a good health, are reflecting +actual conditions and therefore can be used at each age to +forecast the future of this cohort. For example if a country is +improving its technology of prosthesis, the incidence of +recovering the ability to walk will be higher at each (old) age, +but the prevalence of disability will only slightly reflect an +improve because the prevalence is mostly affected by the history +of the cohort and not by recent period effects. To measure the +period improvement we have to simulate the future of a cohort of +new-borns entering or leaving at each age the disability state or +dying according to the incidence rates measured today on +different cohorts. The proportion of people disabled at each age +in this simulated cohort will be much lower (using the exemple of +an improvement) that the proportions observed at each age in a +cross-sectional survey. This new prevalence curve introduced in a +life table will give a much more actual and realistic HE level +than the Sullivan method which mostly measured the History of +health conditions in this country.

+ +

Therefore, the main question is how to measure incidence rates +from cross-longitudinal surveys? This is the goal of the IMaCH +program. From your data and using IMaCH you can estimate period +HE and not only Sullivan's HE. Also the standard errors of the HE +are computed.

+ +

A cross-longitudinal survey consists in a first survey +("cross") where individuals from different ages are +interviewed on their health status or degree of disability. At +least a second wave of interviews ("longitudinal") +should measure each new individual health status. Health +expectancies are computed from the transitions observed between +waves and are computed for each degree of severity of disability +(number of life states). More degrees you consider, more time is +necessary to reach the Maximum Likelihood of the parameters +involved in the model. Considering only two states of disability +(disable and healthy) is generally enough but the computer +program works also with more health statuses.
+
+The simplest model is the multinomial logistic model where pij +is the probability to be observed in state j at the second +wave conditional to be observed in state i at the first +wave. Therefore a simple model is: log(pij/pii)= aij + +bij*age+ cij*sex, where 'age' is age and 'sex' +is a covariate. The advantage that this computer program claims, +comes from that if the delay between waves is not identical for +each individual, or if some individual missed an interview, the +information is not rounded or lost, but taken into account using +an interpolation or extrapolation. hPijx is the +probability to be observed in state i at age x+h +conditional to the observed state i at age x. The +delay 'h' can be split into an exact number (nh*stepm) +of unobserved intermediate states. This elementary transition (by +month or quarter trimester, semester or year) is modeled as a +multinomial logistic. The hPx matrix is simply the matrix +product of nh*stepm elementary matrices and the +contribution of each individual to the likelihood is simply hPijx. +
+

+ +

The program presented in this manual is a quite general +program named IMaCh (for Interpolated +MArkov CHain), designed to +analyse transition data from longitudinal surveys. The first step +is the parameters estimation of a transition probabilities model +between an initial status and a final status. From there, the +computer program produces some indicators such as observed and +stationary prevalence, life expectancies and their variances and +graphs. Our transition model consists in absorbing and +non-absorbing states with the possibility of return across the +non-absorbing states. The main advantage of this package, +compared to other programs for the analysis of transition data +(For example: Proc Catmod of SAS®) is that the whole +individual information is used even if an interview is missing, a +status or a date is unknown or when the delay between waves is +not identical for each individual. The program can be executed +according to parameters: selection of a sub-sample, number of +absorbing and non-absorbing states, number of waves taken in +account (the user inputs the first and the last interview), a +tolerance level for the maximization function, the periodicity of +the transitions (we can compute annual, quarterly or monthly +transitions), covariates in the model. It works on Windows or on +Unix.
+

+ +
+ +

(1) Laditka, Sarah B. and Wolf, Douglas A. (1998), "New +Methods for Analyzing Active Life Expectancy". Journal of +Aging and Health. Vol 10, No. 2.

+ +
+ +

On what kind of data can +it be used?

+ +

The minimum data required for a transition model is the +recording of a set of individuals interviewed at a first date and +interviewed again at least one another time. From the +observations of an individual, we obtain a follow-up over time of +the occurrence of a specific event. In this documentation, the +event is related to health status at older ages, but the program +can be applied on a lot of longitudinal studies in different +contexts. To build the data file explained into the next section, +you must have the month and year of each interview and the +corresponding health status. But in order to get age, date of +birth (month and year) is required (missing values is allowed for +month). Date of death (month and year) is an important +information also required if the individual is dead. Shorter +steps (i.e. a month) will more closely take into account the +survival time after the last interview.

+ +
+ +

The data file

+ +

In this example, 8,000 people have been interviewed in a +cross-longitudinal survey of 4 waves (1984, 1986, 1988, 1990). +Some people missed 1, 2 or 3 interviews. Health statuses are +healthy (1) and disable (2). The survey is not a real one. It is +a simulation of the American Longitudinal Survey on Aging. The +disability state is defined if the individual missed one of four +ADL (Activity of daily living, like bathing, eating, walking). +Therefore, even is the individuals interviewed in the sample are +virtual, the information brought with this sample is close to the +situation of the United States. Sex is not recorded is this +sample.

+ +

Each line of the data set (named data1.txt +in this first example) is an individual record which fields are:

+ + + +

 

+ +

If your longitudinal survey do not include information about +weights or covariates, you must fill the column with a number +(e.g. 1) because a missing field is not allowed.

+ +
+ +

Your first example parameter file

+ +

#Imach version 0.8a, May 2002, +INED-EUROREVES

+ +

This is a comment. Comments start with a '#'.

+ +

First uncommented line

+ +
title=1st_example datafile=data1.txt lastobs=8600 firstpass=1 lastpass=4
+ + + +

 

+ +

Second uncommented +line

+ +
ftol=1.e-08 stepm=1 ncovcol=2 nlstate=2 ndeath=1 maxwav=4 mle=1 weight=0
+ + + +

Covariates

+ +

Intercept and age are systematically included in the model. +Additional covariates can be included with the command:

+ +
model=list of covariates
+ + + +

In this example, we have two covariates in the data file +(fields 2 and 3). The number of covariates included in the data +file between the id and the date of birth is ncovcol=2 (it was +named ncov in version prior to 0.8). If you have 3 covariates in +the datafile (fields 2, 3 and 4), you will set ncovcol=3. Then +you can run the programme with a new parametrisation taking into +account the third covariate. For example, model=V1+V3 estimates +a model with the first and third covariates. More complicated +models can be used, but it will takes more time to converge. With +a simple model (no covariates), the programme estimates 8 +parameters. Adding covariates increases the number of parameters +: 12 for model=V1, 16 for model=V1+V1*age +and 20 for model=V1+V2+V3.

+ +

Guess values for optimization

+ +

You must write the initial guess values of the parameters for +optimization. The number of parameters, N depends on the +number of absorbing states and non-absorbing states and on the +number of covariates.
+N is given by the formula N=(nlstate + +ndeath-1)*nlstate*ncovmodel .
+
+Thus in the simple case with 2 covariates (the model is log +(pij/pii) = aij + bij * age where intercept and age are the two +covariates), and 2 health degrees (1 for disability-free and 2 +for disability) and 1 absorbing state (3), you must enter 8 +initials values, a12, b12, a13, b13, a21, b21, a23, b23. You can +start with zeros as in this example, but if you have a more +precise set (for example from an earlier run) you can enter it +and it will speed up them
+Each of the four lines starts with indices "ij": ij +aij bij

+ +
+
# Guess values of aij and bij in log (pij/pii) = aij + bij * age
+12 -14.155633  0.110794 
+13  -7.925360  0.032091 
+21  -1.890135 -0.029473 
+23  -6.234642  0.022315 
+
+ +

or, to simplify (in most of cases it converges but there is no +warranty!):

+ +
+
12 0.0 0.0
+13 0.0 0.0
+21 0.0 0.0
+23 0.0 0.0
+
+ +

In order to speed up the convergence you can make a first run +with a large stepm i.e stepm=12 or 24 and then decrease the stepm +until stepm=1 month. If newstepm is the new shorter stepm and +stepm can be expressed as a multiple of newstepm, like newstepm=n +stepm, then the following approximation holds:

+ +
aij(stepm) = aij(n . stepm) - ln(n)
+
+ +

and

+ +
bij(stepm) = bij(n . stepm) .
+ +

For example if you already ran for a 6 months interval and +got:
+

+ +
# Parameters
+12 -13.390179  0.126133 
+13  -7.493460  0.048069 
+21   0.575975 -0.041322 
+23  -4.748678  0.030626 
+
+ +

If you now want to get the monthly estimates, you can guess +the aij by substracting ln(6)= 1,7917
+and running
+

+ +
12 -15.18193847  0.126133 
+13 -9.285219469  0.048069
+21 -1.215784469 -0.041322
+23 -6.540437469  0.030626
+
+ +

and get
+

+ +
12 -15.029768 0.124347 
+13 -8.472981 0.036599 
+21 -1.472527 -0.038394 
+23 -6.553602 0.029856 
+
+which is closer to the results. The approximation is probably useful
+only for very small intervals and we don't have enough experience to
+know if you will speed up the convergence or not.
+
+ +
         -ln(12)= -2.484
+ -ln(6/1)=-ln(6)= -1.791
+ -ln(3/1)=-ln(3)= -1.0986
+-ln(12/6)=-ln(2)= -0.693
+
+ +

Guess values for computing variances

+ +

This is an output if mle=1. But it can be +used as an input to get the various output data files (Health +expectancies, stationary prevalence etc.) and figures without +rerunning the rather long maximisation phase (mle=0).

+ +

The scales are small values for the evaluation of numerical +derivatives. These derivatives are used to compute the hessian +matrix of the parameters, that is the inverse of the covariance +matrix, and the variances of health expectancies. Each line +consists in indices "ij" followed by the initial scales +(zero to simplify) associated with aij and bij.

+ + + +

Covariance matrix of parameters

+ +

This is an output if mle=1. But it can be +used as an input to get the various output data files (Health +expectancies, stationary prevalence etc.) and figures without +rerunning the rather long maximisation phase (mle=0).
+Each line starts with indices "ijk" followed by the +covariances between aij and bij:
+

+ +
+   121 Var(a12) 
+   122 Cov(b12,a12)  Var(b12) 
+          ...
+   232 Cov(b23,a12)  Cov(b23,b12) ... Var (b23) 
+ + + +

Age range for calculation of stationary +prevalences and health expectancies

+ +
agemin=70 agemax=100 bage=50 fage=100
+ +
+Once we obtained the estimated parameters, the program is able
+to calculated stationary prevalence, transitions probabilities
+and life expectancies at any age. Choice of age range is useful
+for extrapolation. In our data file, ages varies from age 70 to
+102. It is possible to get extrapolated stationary prevalence by
+age ranging from agemin to agemax.
+
+
+Setting bage=50 (begin age) and fage=100 (final age), makes
+the program computing life expectancy from age 'bage' to age
+'fage'. As we use a model, we can interessingly compute life
+expectancy on a wider age range than the age range from the data.
+But the model can be rather wrong on much larger intervals.
+Program is limited to around 120 for upper age!
+
+ + + +

Computing the observed prevalence

+ +
begin-prev-date=1/1/1984 end-prev-date=1/6/1988 estepm=1
+ +
+Statements 'begin-prev-date' and 'end-prev-date' allow to
+select the period in which we calculate the observed prevalences
+in each state. In this example, the prevalences are calculated on
+data survey collected between 1 january 1984 and 1 june 1988. 
+
+ + + +

Population- or status-based health +expectancies

+ +
pop_based=0
+ +

The program computes status-based health expectancies, i.e +health expectancies which depends on your initial health state. +If you are healthy your healthy life expectancy (e11) is higher +than if you were disabled (e21, with e11 > e21).
+To compute a healthy life expectancy independant of the initial +status we have to weight e11 and e21 according to the probability +to be in each state at initial age or, with other word, according +to the proportion of people in each state.
+We prefer computing a 'pure' period healthy life expectancy based +only on the transtion forces. Then the weights are simply the +stationnary prevalences or 'implied' prevalences at the initial +age.
+Some other people would like to use the cross-sectional +prevalences (the "Sullivan prevalences") observed at +the initial age during a period of time defined +just above.
+

+ + + +

Prevalence forecasting ( Experimental)

+ +
starting-proj-date=1/1/1989 final-proj-date=1/1/1992 mov_average=0 
+ +

Prevalence and population projections are only available if +the interpolation unit is a month, i.e. stepm=1 and if there are +no covariate. The programme estimates the prevalence in each +state at a precise date expressed in day/month/year. The +programme computes one forecasted prevalence a year from a +starting date (1 january of 1989 in this example) to a final date +(1 january 1992). The statement mov_average allows to compute +smoothed forecasted prevalences with a five-age moving average +centered at the mid-age of the five-age period.
+

+ + + +

Last uncommented line : Population +forecasting

+ +
popforecast=0 popfile=pyram.txt popfiledate=1/1/1989 last-popfiledate=1/1/1992
+ +

This command is available if the interpolation unit is a +month, i.e. stepm=1 and if popforecast=1. From a data file +including age and number of persons alive at the precise date +‘popfiledate’, you can forecast the number of persons +in each state until date ‘last-popfiledate’. In this +example, the popfile pyram.txt +includes real data which are the Japanese population in 1989.
+

+ + + +
+ +

Running Imach +with this example

+ +
We assume that you typed in your 1st_example
+parameter file as explained above. 
+
+To run the program you should either:
+
+ + + +
The time to converge depends on the step unit that you used (1
+month is cpu consuming), on the number of cases, and on the
+number of variables.
+
+
+The program outputs many files. Most of them are files which
+will be plotted for better understanding.
+
+
+ +
+ +

Output of the program +and graphs

+ +

Once the optimization is finished, some graphics can be made +with a grapher. We use Gnuplot which is an interactive plotting +program copyrighted but freely distributed. A gnuplot reference +manual is available here.
+When the running is finished, the user should enter a caracter +for plotting and output editing.
+These caracters are:
+

+ + + +
Results files
+
+- Observed prevalence in each state (and at first pass): +prbiaspar.txt
+
+ +

The first line is the title and displays each field of the +file. The first column is age. The fields 2 and 6 are the +proportion of individuals in states 1 and 2 respectively as +observed during the first exam. Others fields are the numbers of +people in states 1, 2 or more. The number of columns increases if +the number of states is higher than 2.
+The header of the file is

+ +
# Age Prev(1) N(1) N Age Prev(2) N(2) N
+70 1.00000 631 631 70 0.00000 0 631
+71 0.99681 625 627 71 0.00319 2 627 
+72 0.97125 1115 1148 72 0.02875 33 1148 
+ +

It means that at age 70, the prevalence in state 1 is 1.000 +and in state 2 is 0.00 . At age 71 the number of individuals in +state 1 is 625 and in state 2 is 2, hence the total number of +people aged 71 is 625+2=627.
+

+ +
- Estimated parameters and +covariance matrix: rbiaspar.imach
+ +

This file contains all the maximisation results:

+ +
 -2 log likelihood= 21660.918613445392
+ Estimated parameters: a12 = -12.290174 b12 = 0.092161 
+                       a13 = -9.155590  b13 = 0.046627 
+                       a21 = -2.629849  b21 = -0.022030 
+                       a23 = -7.958519  b23 = 0.042614  
+ Covariance matrix: Var(a12) = 1.47453e-001
+                    Var(b12) = 2.18676e-005
+                    Var(a13) = 2.09715e-001
+                    Var(b13) = 3.28937e-005  
+                    Var(a21) = 9.19832e-001
+                    Var(b21) = 1.29229e-004
+                    Var(a23) = 4.48405e-001
+                    Var(b23) = 5.85631e-005 
+ 
+ +

By substitution of these parameters in the regression model, +we obtain the elementary transition probabilities:

+ +

+ +
- Transition probabilities: +pijrbiaspar.txt
+ +

Here are the transitions probabilities Pij(x, x+nh) where nh +is a multiple of 2 years. The first column is the starting age x +(from age 50 to 100), the second is age (x+nh) and the others are +the transition probabilities p11, p12, p13, p21, p22, p23. For +example, line 5 of the file is:

+ +
 100 106 0.02655 0.17622 0.79722 0.01809 0.13678 0.84513 
+ +

and this means:

+ +
p11(100,106)=0.02655
+p12(100,106)=0.17622
+p13(100,106)=0.79722
+p21(100,106)=0.01809
+p22(100,106)=0.13678
+p22(100,106)=0.84513 
+ +
- Stationary prevalence in each state: +plrbiaspar.txt
+ +
#Prevalence
+#Age 1-1 2-2
+
+#************ 
+70 0.90134 0.09866
+71 0.89177 0.10823 
+72 0.88139 0.11861 
+73 0.87015 0.12985 
+ +

At age 70 the stationary prevalence is 0.90134 in state 1 and +0.09866 in state 2. This stationary prevalence differs from +observed prevalence. Here is the point. The observed prevalence +at age 70 results from the incidence of disability, incidence of +recovery and mortality which occurred in the past of the cohort. +Stationary prevalence results from a simulation with actual +incidences and mortality (estimated from this cross-longitudinal +survey). It is the best predictive value of the prevalence in the +future if "nothing changes in the future". This is +exactly what demographers do with a Life table. Life expectancy +is the expected mean time to survive if observed mortality rates +(incidence of mortality) "remains constant" in the +future.

+ +
- Standard deviation of +stationary prevalence: vplrbiaspar.txt
+ +

The stationary prevalence has to be compared with the observed +prevalence by age. But both are statistical estimates and +subjected to stochastic errors due to the size of the sample, the +design of the survey, and, for the stationary prevalence to the +model used and fitted. It is possible to compute the standard +deviation of the stationary prevalence at each age.

+ +
-Observed and stationary +prevalence in state (2=disable) with confidence interval: +vbiaspar21.gif
+ +

This graph exhibits the stationary prevalence in state (2) +with the confidence interval in red. The green curve is the +observed prevalence (or proportion of individuals in state (2)). +Without discussing the results (it is not the purpose here), we +observe that the green curve is rather below the stationary +prevalence. It suggests an increase of the disability prevalence +in the future.

+ +

+ +
-Convergence to the +stationary prevalence of disability: pbiaspar11.gif
+
+ +

This graph plots the conditional transition probabilities from +an initial state (1=healthy in red at the bottom, or 2=disable in +green on top) at age x to the final state 2=disable at +age x+h. Conditional means at the condition to be alive +at age x+h which is hP12x + hP22x. The +curves hP12x/(hP12x + hP22x) and hP22x/(hP12x ++ hP22x) converge with h, to the stationary +prevalence of disability. In order to get the stationary +prevalence at age 70 we should start the process at an earlier +age, i.e.50. If the disability state is defined by severe +disability criteria with only a few chance to recover, then the +incidence of recovery is low and the time to convergence is +probably longer. But we don't have experience yet.

+ +
- Life expectancies by age +and initial health status with standard deviation: erbiaspar.txt
+ +
# Health expectancies 
+# Age 1-1 (SE) 1-2 (SE) 2-1 (SE) 2-2 (SE)
+70 10.4171 (0.1517)    3.0433 (0.4733)    5.6641 (0.1121)    5.6907 (0.3366)
+71 9.9325 (0.1409)    3.0495 (0.4234)    5.2627 (0.1107)    5.6384 (0.3129)
+72 9.4603 (0.1319)    3.0540 (0.3770)    4.8810 (0.1099)    5.5811 (0.2907)
+73 9.0009 (0.1246)    3.0565 (0.3345)    4.5188 (0.1098)    5.5187 (0.2702)
+
+ +
For example 70 10.4171 (0.1517) 3.0433 (0.4733) 5.6641 (0.1121) 5.6907 (0.3366) means:
+e11=10.4171 e12=3.0433 e21=5.6641 e22=5.6907 
+ +
+ +

For example, life expectancy of a healthy individual at age 70 +is 10.42 in the healthy state and 3.04 in the disability state +(=13.46 years). If he was disable at age 70, his life expectancy +will be shorter, 5.66 in the healthy state and 5.69 in the +disability state (=11.35 years). The total life expectancy is a +weighted mean of both, 13.46 and 11.35; weight is the proportion +of people disabled at age 70. In order to get a pure period index +(i.e. based only on incidences) we use the computed or +stationary prevalence at age 70 (i.e. computed from +incidences at earlier ages) instead of the observed prevalence +(for example at first exam) (see +below).

+ +
- Variances of life +expectancies by age and initial health status: vrbiaspar.txt
+ +

For example, the covariances of life expectancies Cov(ei,ej) +at age 50 are (line 3)

+ +
   Cov(e1,e1)=0.4776  Cov(e1,e2)=0.0488=Cov(e2,e1)  Cov(e2,e2)=0.0424
+ +
-Variances of one-step +probabilities : probrbiaspar.txt
+ +

For example, at age 65

+ +
   p11=9.960e-001 standard deviation of p11=2.359e-004
+ +
- Health +expectancies +with standard errors in parentheses: trbiaspar.txt
+ +
#Total LEs with variances: e.. (std) e.1 (std) e.2 (std) 
+ +
70 13.26 (0.22) 9.95 (0.20) 3.30 (0.14) 
+ +

Thus, at age 70 the total life expectancy, e..=13.26 years is +the weighted mean of e1.=13.46 and e2.=11.35 by the stationary +prevalence at age 70 which are 0.90134 in state 1 and 0.09866 in +state 2, respectively (the sum is equal to one). e.1=9.95 is the +Disability-free life expectancy at age 70 (it is again a weighted +mean of e11 and e21). e.2=3.30 is also the life expectancy at age +70 to be spent in the disability state.

+ +
-Total life expectancy by +age and health expectancies in states (1=healthy) and (2=disable): +ebiaspar1.gif
+ +

This figure represents the health expectancies and the total +life expectancy with the confident interval in dashed curve.

+ +
        
+ +

Standard deviations (obtained from the information matrix of +the model) of these quantities are very useful. +Cross-longitudinal surveys are costly and do not involve huge +samples, generally a few thousands; therefore it is very +important to have an idea of the standard deviation of our +estimates. It has been a big challenge to compute the Health +Expectancy standard deviations. Don't be confuse: life expectancy +is, as any expected value, the mean of a distribution; but here +we are not computing the standard deviation of the distribution, +but the standard deviation of the estimate of the mean.

+ +

Our health expectancies estimates vary according to the sample +size (and the standard deviations give confidence intervals of +the estimate) but also according to the model fitted. Let us +explain it in more details.

+ +

Choosing a model means ar least two kind of choices. First we +have to decide the number of disability states. Second we have to +design, within the logit model family, the model: variables, +covariables, confonding factors etc. to be included.

+ +

More disability states we have, better is our demographical +approach of the disability process, but smaller are the number of +transitions between each state and higher is the noise in the +measurement. We do not have enough experiments of the various +models to summarize the advantages and disadvantages, but it is +important to say that even if we had huge and unbiased samples, +the total life expectancy computed from a cross-longitudinal +survey, varies with the number of states. If we define only two +states, alive or dead, we find the usual life expectancy where it +is assumed that at each age, people are at the same risk to die. +If we are differentiating the alive state into healthy and +disable, and as the mortality from the disability state is higher +than the mortality from the healthy state, we are introducing +heterogeneity in the risk of dying. The total mortality at each +age is the weighted mean of the mortality in each state by the +prevalence in each state. Therefore if the proportion of people +at each age and in each state is different from the stationary +equilibrium, there is no reason to find the same total mortality +at a particular age. Life expectancy, even if it is a very useful +tool, has a very strong hypothesis of homogeneity of the +population. Our main purpose is not to measure differential +mortality but to measure the expected time in a healthy or +disability state in order to maximise the former and minimize the +latter. But the differential in mortality complexifies the +measurement.

+ +

Incidences of disability or recovery are not affected by the +number of states if these states are independant. But incidences +estimates are dependant on the specification of the model. More +covariates we added in the logit model better is the model, but +some covariates are not well measured, some are confounding +factors like in any statistical model. The procedure to "fit +the best model' is similar to logistic regression which itself is +similar to regression analysis. We haven't yet been sofar because +we also have a severe limitation which is the speed of the +convergence. On a Pentium III, 500 MHz, even the simplest model, +estimated by month on 8,000 people may take 4 hours to converge. +Also, the program is not yet a statistical package, which permits +a simple writing of the variables and the model to take into +account in the maximisation. The actual program allows only to +add simple variables like age+sex or age+sex+ age*sex but will +never be general enough. But what is to remember, is that +incidences or probability of change from one state to another is +affected by the variables specified into the model.

+ +

Also, the age range of the people interviewed has a link with +the age range of the life expectancy which can be estimated by +extrapolation. If your sample ranges from age 70 to 95, you can +clearly estimate a life expectancy at age 70 and trust your +confidence interval which is mostly based on your sample size, +but if you want to estimate the life expectancy at age 50, you +should rely in your model, but fitting a logistic model on a age +range of 70-95 and estimating probabilties of transition out of +this age range, say at age 50 is very dangerous. At least you +should remember that the confidence interval given by the +standard deviation of the health expectancies, are under the +strong assumption that your model is the 'true model', which is +probably not the case.

+ +
- Copy of the parameter +file: orbiaspar.txt
+ +

This copy of the parameter file can be useful to re-run the +program while saving the old output files.

+ +
- Prevalence forecasting: +frbiaspar.txt
+ +

First, +we have estimated the observed prevalence between 1/1/1984 and +1/6/1988. The mean date of interview (weighed average of the +interviews performed between1/1/1984 and 1/6/1988) is estimated +to be 13/9/1985, as written on the top on the file. Then we +forecast the probability to be in each state.

+ +

Example, +at date 1/1/1989 :

+ +
# StartingAge FinalAge P.1 P.2 P.3
+# Forecasting at date 1/1/1989
+  73 0.807 0.078 0.115
+ +

Since +the minimum age is 70 on the 13/9/1985, the youngest forecasted +age is 73. This means that at age a person aged 70 at 13/9/1989 +has a probability to enter state1 of 0.807 at age 73 on 1/1/1989. +Similarly, the probability to be in state 2 is 0.078 and the +probability to die is 0.115. Then, on the 1/1/1989, the +prevalence of disability at age 73 is estimated to be 0.088.

+ +
- Population forecasting: +poprbiaspar.txt
+ +
# Age P.1 P.2 P.3 [Population]
+# Forecasting at date 1/1/1989 
+75 572685.22 83798.08 
+74 621296.51 79767.99 
+73 645857.70 69320.60 
+ +
# Forecasting at date 1/1/19909 
+76 442986.68 92721.14 120775.48
+75 487781.02 91367.97 121915.51
+74 512892.07 85003.47 117282.76 
+ +

From the population file, we estimate the number of people in +each state. At age 73, 645857 persons are in state 1 and 69320 +are in state 2. One year latter, 512892 are still in state 1, +85003 are in state 2 and 117282 died before 1/1/1990.

+ +
+ +

Trying an example

+ +

Since you know how to run the program, it is time to test it +on your own computer. Try for example on a parameter file named imachpar.imach which is a copy +of mypar.imach included +in the subdirectory of imach, mytry. +Edit it to change the name of the data file to ..\data\mydata.txt if you don't want to +copy it on the same directory. The file mydata.txt +is a smaller file of 3,000 people but still with 4 waves.

+ +

Click on the imach.exe icon to open a window. Answer to the +question:'Enter the parameter file name:'

+ + + + + +
IMACH, Version 0.8a

Enter + the parameter file name: ..\mytry\imachpar.imach

+
+ +

Most of the data files or image files generated, will use the +'imachpar' string into their name. The running time is about 2-3 +minutes on a Pentium III. If the execution worked correctly, the +outputs files are created in the current directory, and should be +the same as the mypar files initially included in the directory mytry.

+ + + +

 

+ + + +

Once the running is finished, the program +requires a caracter:

+ + + + + +
Type e to edit output files, g + to graph again, c to start again, and q for exiting:
+ +

First you should enter e to +edit the master file mypar.htm.

+ + + +

This software have been partly granted by Euro-REVES, a concerted +action from the European Union. It will be copyrighted +identically to a GNU software product, i.e. program and software +can be distributed freely for non commercial use. Sources are not +widely distributed today. You can get them by asking us with a +simple justification (name, email, institute) mailto:brouard@ined.fr and mailto:lievre@ined.fr .

+ +

Latest version (0.8a of May 2002) can be accessed at http://euroreves.ined.fr/imach
+

+ +