--- imach064/doc/imach.htm 2000/12/28 18:49:54 1.1 +++ imach064/doc/imach.htm 2001/05/09 14:09:37 1.3 @@ -1,1024 +1,1044 @@ - - - - - -Computing Health Expectancies using IMaCh - - - - -
- -

Computing Health -Expectancies using IMaCh

- -

(a Maximum -Likelihood Computer Program using Interpolation of Markov Chains)

- -

 

- -

- -

INED and EUROREVES

- -

March -2000

- -
- -

Authors of the -program: Nicolas Brouard, senior researcher at the Institut -National d'Etudes Démographiques (INED, Paris) in the "Mortality, -Health and Epidemiology" Research Unit

- -

and Agnès -Lièvre
-

- -

Contribution to the mathematics: C. R. -Heathcote (Australian -National University, Canberra).

- -

Contact: Agnès Lièvre (lievre@ined.fr)

- -
- - - -
- -

Introduction

- -

This program computes Healthy Life Expectancies from cross-longitudinal -data. Within the family of Health Expectancies (HE), -Disability-free life expectancy (DFLE) is probably the most -important index to monitor. In low mortality countries, there is -a fear that when mortality declines, the increase in DFLE is not -proportionate to the increase in total Life expectancy. This case -is called the Expansion of morbidity. Most of the data -collected today, in particular by the international REVES network on Health -expectancy, and most HE indices based on these data, are cross-sectional. -It means that the information collected comes from a single -cross-sectional survey: people from various ages (but mostly old -people) are surveyed on their health status at a single date. -Proportion of people disabled at each age, can then be measured -at that date. This age-specific prevalence curve is then used to -distinguish, within the stationary population (which, by -definition, is the life table estimated from the vital statistics -on mortality at the same date), the disable population from the -disability-free population. Life expectancy (LE) (or total -population divided by the yearly number of births or deaths of -this stationary population) is then decomposed into DFLE and DLE. -This method of computing HE is usually called the Sullivan method -(from the name of the author who first described it).

- -

Age-specific proportions of people disable are very difficult -to forecast because each proportion corresponds to historical -conditions of the cohort and it is the result of the historical -flows from entering disability and recovering in the past until -today. The age-specific intensities (or incidence rates) of -entering disability or recovering a good health, are reflecting -actual conditions and therefore can be used at each age to -forecast the future of this cohort. For example if a country is -improving its technology of prosthesis, the incidence of -recovering the ability to walk will be higher at each (old) age, -but the prevalence of disability will only slightly reflect an -improve because the prevalence is mostly affected by the history -of the cohort and not by recent period effects. To measure the -period improvement we have to simulate the future of a cohort of -new-borns entering or leaving at each age the disability state or -dying according to the incidence rates measured today on -different cohorts. The proportion of people disabled at each age -in this simulated cohort will be much lower (using the exemple of -an improvement) that the proportions observed at each age in a -cross-sectional survey. This new prevalence curve introduced in a -life table will give a much more actual and realistic HE level -than the Sullivan method which mostly measured the History of -health conditions in this country.

- -

Therefore, the main question is how to measure incidence rates -from cross-longitudinal surveys? This is the goal of the IMaCH -program. From your data and using IMaCH you can estimate period -HE and not only Sullivan's HE. Also the standard errors of the HE -are computed.

- -

A cross-longitudinal survey consists in a first survey -("cross") where individuals from different ages are -interviewed on their health status or degree of disability. At -least a second wave of interviews ("longitudinal") -should measure each new individual health status. Health -expectancies are computed from the transitions observed between -waves and are computed for each degree of severity of disability -(number of life states). More degrees you consider, more time is -necessary to reach the Maximum Likelihood of the parameters -involved in the model. Considering only two states of disability -(disable and healthy) is generally enough but the computer -program works also with more health statuses.
-
-The simplest model is the multinomial logistic model where pij -is the probability to be observed in state j at the second -wave conditional to be observed in state i at the first -wave. Therefore a simple model is: log(pij/pii)= aij + -bij*age+ cij*sex, where 'age' is age and 'sex' -is a covariate. The advantage that this computer program claims, -comes from that if the delay between waves is not identical for -each individual, or if some individual missed an interview, the -information is not rounded or lost, but taken into account using -an interpolation or extrapolation. hPijx is the -probability to be observed in state i at age x+h -conditional to the observed state i at age x. The -delay 'h' can be split into an exact number (nh*stepm) -of unobserved intermediate states. This elementary transition (by -month or quarter trimester, semester or year) is modeled as a -multinomial logistic. The hPx matrix is simply the matrix -product of nh*stepm elementary matrices and the -contribution of each individual to the likelihood is simply hPijx. -
-

- -

The program presented in this manual is a quite general -program named IMaCh (for Interpolated -MArkov CHain), designed to -analyse transition data from longitudinal surveys. The first step -is the parameters estimation of a transition probabilities model -between an initial status and a final status. From there, the -computer program produces some indicators such as observed and -stationary prevalence, life expectancies and their variances and -graphs. Our transition model consists in absorbing and -non-absorbing states with the possibility of return across the -non-absorbing states. The main advantage of this package, -compared to other programs for the analysis of transition data -(For example: Proc Catmod of SAS®) is that the whole -individual information is used even if an interview is missing, a -status or a date is unknown or when the delay between waves is -not identical for each individual. The program can be executed -according to parameters: selection of a sub-sample, number of -absorbing and non-absorbing states, number of waves taken in -account (the user inputs the first and the last interview), a -tolerance level for the maximization function, the periodicity of -the transitions (we can compute annual, quaterly or monthly -transitions), covariates in the model. It works on Windows or on -Unix.
-

- -
- -

On what kind of data can -it be used?

- -

The minimum data required for a transition model is the -recording of a set of individuals interviewed at a first date and -interviewed again at least one another time. From the -observations of an individual, we obtain a follow-up over time of -the occurrence of a specific event. In this documentation, the -event is related to health status at older ages, but the program -can be applied on a lot of longitudinal studies in different -contexts. To build the data file explained into the next section, -you must have the month and year of each interview and the -corresponding health status. But in order to get age, date of -birth (month and year) is required (missing values is allowed for -month). Date of death (month and year) is an important -information also required if the individual is dead. Shorter -steps (i.e. a month) will more closely take into account the -survival time after the last interview.

- -
- -

The data file

- -

In this example, 8,000 people have been interviewed in a -cross-longitudinal survey of 4 waves (1984, 1986, 1988, 1990). -Some people missed 1, 2 or 3 interviews. Health statuses are -healthy (1) and disable (2). The survey is not a real one. It is -a simulation of the American Longitudinal Survey on Aging. The -disability state is defined if the individual missed one of four -ADL (Activity of daily living, like bathing, eating, walking). -Therefore, even is the individuals interviewed in the sample are -virtual, the information brought with this sample is close to the -situation of the United States. Sex is not recorded is this -sample.

- -

Each line of the data set (named data1.txt -in this first example) is an individual record which fields are:

- - - -

 

- -

If your longitudinal survey do not include information about -weights or covariates, you must fill the column with a number -(e.g. 1) because a missing field is not allowed.

- -
- -

Your first example parameter file

- -

#Imach version 0.63, February 2000, -INED-EUROREVES

- -

This is a comment. Comments start with a '#'.

- -

First uncommented line

- -
title=1st_example datafile=data1.txt lastobs=8600 firstpass=1 lastpass=4
- - - -

 

- -

Second uncommented -line

- -
ftol=1.e-08 stepm=1 ncov=2 nlstate=2 ndeath=1 maxwav=4 mle=1 weight=0
- - - -

Guess values for optimization

- -

You must write the initial guess values of the parameters for -optimization. The number of parameters, N depends on the -number of absorbing states and non-absorbing states and on the -number of covariates.
-N is given by the formula N=(nlstate + -ndeath-1)*nlstate*ncov .
-
-Thus in the simple case with 2 covariates (the model is log -(pij/pii) = aij + bij * age where intercept and age are the two -covariates), and 2 health degrees (1 for disability-free and 2 -for disability) and 1 absorbing state (3), you must enter 8 -initials values, a12, b12, a13, b13, a21, b21, a23, b23. You can -start with zeros as in this example, but if you have a more -precise set (for example from an earlier run) you can enter it -and it will speed up them
-Each of the four lines starts with indices "ij":
-
-ij aij bij

- -
-
# Guess values of aij and bij in log (pij/pii) = aij + bij * age
-12 -14.155633  0.110794 
-13  -7.925360  0.032091 
-21  -1.890135 -0.029473 
-23  -6.234642  0.022315 
-
- -

or, to simplify:

- -
-
12 0.0 0.0
-13 0.0 0.0
-21 0.0 0.0
-23 0.0 0.0
-
- -

Guess values for computing variances

- -

This is an output if mle=1. But it can be -used as an input to get the vairous output data files (Health -expectancies, stationary prevalence etc.) and figures without -rerunning the rather long maximisation phase (mle=0).

- -

The scales are small values for the evaluation of numerical -derivatives. These derivatives are used to compute the hessian -matrix of the parameters, that is the inverse of the covariance -matrix, and the variances of health expectancies. Each line -consists in indices "ij" followed by the initial scales -(zero to simplify) associated with aij and bij.

- - - -
-
# Scales (for hessian or gradient estimation)
-12 0. 0. 
-13 0. 0. 
-21 0. 0. 
-23 0. 0. 
-
- - - -

Covariance matrix of parameters

- -

This is an output if mle=1. But it can be -used as an input to get the vairous output data files (Health -expectancies, stationary prevalence etc.) and figures without -rerunning the rather long maximisation phase (mle=0).

- -

Each line starts with indices "ijk" followed by the -covariances between aij and bij:

- -
-   121 Var(a12) 
-   122 Cov(b12,a12)  Var(b12) 
-          ...
-   232 Cov(b23,a12)  Cov(b23,b12) ... Var (b23) 
- - - -
-
# Covariance matrix
-121 0.
-122 0. 0.
-131 0. 0. 0. 
-132 0. 0. 0. 0. 
-211 0. 0. 0. 0. 0. 
-212 0. 0. 0. 0. 0. 0. 
-231 0. 0. 0. 0. 0. 0. 0. 
-232 0. 0. 0. 0. 0. 0. 0. 0.
-
- - - -

last -uncommented line

- -
agemin=70 agemax=100 bage=50 fage=100
- -

Once we obtained the estimated parameters, the program is able -to calculated stationary prevalence, transitions probabilities -and life expectancies at any age. Choice of age ranges is useful -for extrapolation. In our data file, ages varies from age 70 to -102. Setting bage=50 and fage=100, makes the program computing -life expectancy from age bage to age fage. As we use a model, we -can compute life expectancy on a wider age range than the age -range from the data. But the model can be rather wrong on big -intervals.

- -

Similarly, it is possible to get extrapolated stationary -prevalence by age raning from agemin to agemax.

- - - -
- -

Running Imach -with this example

- -

We assume that you entered your 1st_example -parameter file as explained above. To -run the program you should click on the imach.exe icon and enter -the name of the parameter file which is for example C:\usr\imach\mle\biaspar.txt -(you also can click on the biaspar.txt icon located in
-C:\usr\imach\mle and put it with -the mouse on the imach window).
-

- -

The time to converge depends on the step unit that you used (1 -month is cpu consuming), on the number of cases, and on the -number of variables.

- -

The program outputs many files. Most of them are files which -will be plotted for better understanding.

- -
- -

Output of the program -and graphs

- -

Once the optimization is finished, some graphics can be made -with a grapher. We use Gnuplot which is an interactive plotting -program copyrighted but freely distributed. Imach outputs the -source of a gnuplot file, named 'graph.gp', which can be directly -input into gnuplot.
-When the running is finished, the user should enter a caracter -for plotting and output editing.

- -

These caracters are:

- - - -
Results files
-
-- Observed prevalence in each state (and at first pass): -prbiaspar.txt
-
- -

The first line is the title and displays each field of the -file. The first column is age. The fields 2 and 6 are the -proportion of individuals in states 1 and 2 respectively as -observed during the first exam. Others fields are the numbers of -people in states 1, 2 or more. The number of columns increases if -the number of states is higher than 2.
-The header of the file is

- -
# Age Prev(1) N(1) N Age Prev(2) N(2) N
-70 1.00000 631 631 70 0.00000 0 631
-71 0.99681 625 627 71 0.00319 2 627 
-72 0.97125 1115 1148 72 0.02875 33 1148 
- -
# Age Prev(1) N(1) N Age Prev(2) N(2) N
-    70 0.95721 604 631 70 0.04279 27 631
- -

It means that at age 70, the prevalence in state 1 is 1.000 -and in state 2 is 0.00 . At age 71 the number of individuals in -state 1 is 625 and in state 2 is 2, hence the total number of -people aged 71 is 625+2=627.
-

- -
- Estimated parameters and -covariance matrix: rbiaspar.txt
- -

This file contains all the maximisation results:

- -
 Number of iterations=47
- -2 log likelihood=46553.005854373667  
- Estimated parameters: a12 = -12.691743 b12 = 0.095819 
-                       a13 = -7.815392   b13 = 0.031851 
-                       a21 = -1.809895 b21 = -0.030470 
-                       a23 = -7.838248  b23 = 0.039490  
- Covariance matrix: Var(a12) = 1.03611e-001
-                    Var(b12) = 1.51173e-005
-                    Var(a13) = 1.08952e-001
-                    Var(b13) = 1.68520e-005  
-                    Var(a21) = 4.82801e-001
-                    Var(b21) = 6.86392e-005
-                    Var(a23) = 2.27587e-001
-                    Var(b23) = 3.04465e-005 
- 
- -
- Transition probabilities: -pijrbiaspar.txt
- -

Here are the transitions probabilities Pij(x, x+nh) where nh -is a multiple of 2 years. The first column is the starting age x -(from age 50 to 100), the second is age (x+nh) and the others are -the transition probabilities p11, p12, p13, p21, p22, p23. For -example, line 5 of the file is:

- -
 100 106 0.03286 0.23512 0.73202 0.02330 0.19210 0.78460 
- -

and this means:

- -
p11(100,106)=0.03286
-p12(100,106)=0.23512
-p13(100,106)=0.73202
-p21(100,106)=0.02330
-p22(100,106)=0.19210 
-p22(100,106)=0.78460 
- -
- Stationary prevalence in each state: -plrbiaspar.txt
- -
#Age 1-1 2-2 
-70 0.92274 0.07726 
-71 0.91420 0.08580 
-72 0.90481 0.09519 
-73 0.89453 0.10547
- -

At age 70 the stationary prevalence is 0.92274 in state 1 and -0.07726 in state 2. This stationary prevalence differs from -observed prevalence. Here is the point. The observed prevalence -at age 70 results from the incidence of disability, incidence of -recovery and mortality which occurred in the past of the cohort. -Stationary prevalence results from a simulation with actual -incidences and mortality (estimated from this cross-longitudinal -survey). It is the best predictive value of the prevalence in the -future if "nothing changes in the future". This is -exactly what demographers do with a Life table. Life expectancy -is the expected mean time to survive if observed mortality rates -(incidence of mortality) "remains constant" in the -future.

- -
- Standard deviation of -stationary prevalence: vplrbiaspar.txt
- -

The stationary prevalence has to be compared with the observed -prevalence by age. But both are statistical estimates and -subjected to stochastic errors due to the size of the sample, the -design of the survey, and, for the stationary prevalence to the -model used and fitted. It is possible to compute the standard -deviation of the stationary prevalence at each age.

- -
Observed and stationary -prevalence in state (2=disable) with the confident interval: -vbiaspar2.gif
- -


-This graph exhibits the stationary prevalence in state (2) with -the confidence interval in red. The green curve is the observed -prevalence (or proportion of individuals in state (2)). Without -discussing the results (it is not the purpose here), we observe -that the green curve is rather below the stationary prevalence. -It suggests an increase of the disability prevalence in the -future.

- -

- -
Convergence to the -stationary prevalence of disability: pbiaspar1.gif
-
- -

This graph plots the conditional transition probabilities from -an initial state (1=healthy in red at the bottom, or 2=disable in -green on top) at age x to the final state 2=disable at -age x+h. Conditional means at the condition to be alive -at age x+h which is hP12x + hP22x. The -curves hP12x/(hP12x + hP22x) and hP22x/(hP12x -+ hP22x) converge with h, to the stationary -prevalence of disability. In order to get the stationary -prevalence at age 70 we should start the process at an earlier -age, i.e.50. If the disability state is defined by severe -disability criteria with only a few chance to recover, then the -incidence of recovery is low and the time to convergence is -probably longer. But we don't have experience yet.

- -
- Life expectancies by age -and initial health status: erbiaspar.txt
- -
# Health expectancies 
-# Age 1-1 1-2 2-1 2-2 
-70 10.7297 2.7809 6.3440 5.9813 
-71 10.3078 2.8233 5.9295 5.9959 
-72 9.8927 2.8643 5.5305 6.0033 
-73 9.4848 2.9036 5.1474 6.0035 
- -
For example 70 10.7297 2.7809 6.3440 5.9813 means:
-e11=10.7297 e12=2.7809 e21=6.3440 e22=5.9813
- -
- -

For example, life expectancy of a healthy individual at age 70 -is 10.73 in the healthy state and 2.78 in the disability state -(=13.51 years). If he was disable at age 70, his life expectancy -will be shorter, 6.34 in the healthy state and 5.98 in the -disability state (=12.32 years). The total life expectancy is a -weighted mean of both, 13.51 and 12.32; weight is the proportion -of people disabled at age 70. In order to get a pure period index -(i.e. based only on incidences) we use the computed or -stationary prevalence at age 70 (i.e. computed from -incidences at earlier ages) instead of the observed prevalence -(for example at first exam) (see -below).

- -
- Variances of life -expectancies by age and initial health status: vrbiaspar.txt
- -

For example, the covariances of life expectancies Cov(ei,ej) -at age 50 are (line 3)

- -
   Cov(e1,e1)=0.4667  Cov(e1,e2)=0.0605=Cov(e2,e1)  Cov(e2,e2)=0.0183
- -
- Health -expectancies -with standard errors in parentheses: trbiaspar.txt
- -
#Total LEs with variances: e.. (std) e.1 (std) e.2 (std) 
- -
70 13.42 (0.18) 10.39 (0.15) 3.03 (0.10)70 13.81 (0.18) 11.28 (0.14) 2.53 (0.09) 
- -

Thus, at age 70 the total life expectancy, e..=13.42 years is -the weighted mean of e1.=13.51 and e2.=12.32 by the stationary -prevalence at age 70 which are 0.92274 in state 1 and 0.07726 in -state 2, respectively (the sum is equal to one). e.1=10.39 is the -Disability-free life expectancy at age 70 (it is again a weighted -mean of e11 and e21). e.2=3.03 is also the life expectancy at age -70 to be spent in the disability state.

- -
Total life expectancy by -age and health expectancies in states (1=healthy) and (2=disable): -ebiaspar.gif
- -

This figure represents the health expectancies and the total -life expectancy with the confident interval in dashed curve.

- -
        
- -

Standard deviations (obtained from the information matrix of -the model) of these quantities are very useful. -Cross-longitudinal surveys are costly and do not involve huge -samples, generally a few thousands; therefore it is very -important to have an idea of the standard deviation of our -estimates. It has been a big challenge to compute the Health -Expectancy standard deviations. Don't be confuse: life expectancy -is, as any expected value, the mean of a distribution; but here -we are not computing the standard deviation of the distribution, -but the standard deviation of the estimate of the mean.

- -

Our health expectancies estimates vary according to the sample -size (and the standard deviations give confidence intervals of -the estimate) but also according to the model fitted. Let us -explain it in more details.

- -

Choosing a model means ar least two kind of choices. First we -have to decide the number of disability states. Second we have to -design, within the logit model family, the model: variables, -covariables, confonding factors etc. to be included.

- -

More disability states we have, better is our demographical -approach of the disability process, but smaller are the number of -transitions between each state and higher is the noise in the -measurement. We do not have enough experiments of the various -models to summarize the advantages and disadvantages, but it is -important to say that even if we had huge and unbiased samples, -the total life expectancy computed from a cross-longitudinal -survey, varies with the number of states. If we define only two -states, alive or dead, we find the usual life expectancy where it -is assumed that at each age, people are at the same risk to die. -If we are differentiating the alive state into healthy and -disable, and as the mortality from the disability state is higher -than the mortality from the healthy state, we are introducing -heterogeneity in the risk of dying. The total mortality at each -age is the weighted mean of the mortality in each state by the -prevalence in each state. Therefore if the proportion of people -at each age and in each state is different from the stationary -equilibrium, there is no reason to find the same total mortality -at a particular age. Life expectancy, even if it is a very useful -tool, has a very strong hypothesis of homogeneity of the -population. Our main purpose is not to measure differential -mortality but to measure the expected time in a healthy or -disability state in order to maximise the former and minimize the -latter. But the differential in mortality complexifies the -measurement.

- -

Incidences of disability or recovery are not affected by the -number of states if these states are independant. But incidences -estimates are dependant on the specification of the model. More -covariates we added in the logit model better is the model, but -some covariates are not well measured, some are confounding -factors like in any statistical model. The procedure to "fit -the best model' is similar to logistic regression which itself is -similar to regression analysis. We haven't yet been sofar because -we also have a severe limitation which is the speed of the -convergence. On a Pentium III, 500 MHz, even the simplest model, -estimated by month on 8,000 people may take 4 hours to converge. -Also, the program is not yet a statistical package, which permits -a simple writing of the variables and the model to take into -account in the maximisation. The actual program allows only to -add simple variables without covariations, like age+sex but -without age+sex+ age*sex . This can be done from the source code -(you have to change three lines in the source code) but will -never be general enough. But what is to remember, is that -incidences or probability of change from one state to another is -affected by the variables specified into the model.

- -

Also, the age range of the people interviewed has a link with -the age range of the life expectancy which can be estimated by -extrapolation. If your sample ranges from age 70 to 95, you can -clearly estimate a life expectancy at age 70 and trust your -confidence interval which is mostly based on your sample size, -but if you want to estimate the life expectancy at age 50, you -should rely in your model, but fitting a logistic model on a age -range of 70-95 and estimating probabilties of transition out of -this age range, say at age 50 is very dangerous. At least you -should remember that the confidence interval given by the -standard deviation of the health expectancies, are under the -strong assumption that your model is the 'true model', which is -probably not the case.

- -
- Copy of the parameter -file: orbiaspar.txt
- -

This copy of the parameter file can be useful to re-run the -program while saving the old output files.

- -
- -

Trying an example

- -

Since you know how to run the program, it is time to test it -on your own computer. Try for example on a parameter file named imachpar.txt which is a -copy of mypar.txt -included in the subdirectory of imach, mytry. Edit it to change the name of -the data file to ..\data\mydata.txt -if you don't want to copy it on the same directory. The file mydata.txt is a smaller file of 3,000 -people but still with 4 waves.

- -

Click on the imach.exe icon to open a window. Answer to the -question:'Enter the parameter file name:'

- - - - - -
IMACH, Version 0.63

Enter - the parameter file name: ..\mytry\imachpar.txt

-
- -

Most of the data files or image files generated, will use the -'imachpar' string into their name. The running time is about 2-3 -minutes on a Pentium III. If the execution worked correctly, the -outputs files are created in the current directory, and should be -the same as the mypar files initially included in the directory mytry.

- - - -

 

- - - -

Once the running is finished, the program -requires a caracter:

- - - - - -
Type g for plotting (available - if mle=1), e to edit output files, c to start again,

and - q for exiting:

-
- -

First you should enter g to -make the figures and then you can edit all the results by typing e. -

- - - -

This software have been partly granted by Euro-REVES, a concerted -action from the European Union. It will be copyrighted -identically to a GNU software product, i.e. program and software -can be distributed freely for non commercial use. Sources are not -widely distributed today. You can get them by asking us with a -simple justification (name, email, institute) mailto:brouard@ined.fr and mailto:lievre@ined.fr .

- -

Latest version (0.63 of 16 march 2000) can be accessed at http://euroreves.ined.fr/imach
-

- - + + + + + +Computing Health Expectancies using IMaCh + + + + + +
+ +

Computing Health +Expectancies using IMaCh

+ +

(a Maximum +Likelihood Computer Program using Interpolation of Markov Chains)

+ +

 

+ +

+ +

INED and EUROREVES

+ +

Version +64b, May 2001

+ +
+ +

Authors of the +program: Nicolas Brouard, senior researcher at the Institut +National d'Etudes Démographiques (INED, Paris) in the "Mortality, +Health and Epidemiology" Research Unit

+ +

and Agnès +Lièvre
+

+ +

Contribution to the mathematics: C. R. +Heathcote (Australian +National University, Canberra).

+ +

Contact: Agnès Lièvre (lievre@ined.fr)

+ +
+ + + +
+ +

Introduction

+ +

This program computes Healthy Life Expectancies from cross-longitudinal +data using the methodology pioneered by Laditka and Wolf (1). +Within the family of Health Expectancies (HE), Disability-free +life expectancy (DFLE) is probably the most important index to +monitor. In low mortality countries, there is a fear that when +mortality declines, the increase in DFLE is not proportionate to +the increase in total Life expectancy. This case is called the Expansion +of morbidity. Most of the data collected today, in +particular by the international REVES +network on Health expectancy, and most HE indices based on these +data, are cross-sectional. It means that the information +collected comes from a single cross-sectional survey: people from +various ages (but mostly old people) are surveyed on their health +status at a single date. Proportion of people disabled at each +age, can then be measured at that date. This age-specific +prevalence curve is then used to distinguish, within the +stationary population (which, by definition, is the life table +estimated from the vital statistics on mortality at the same +date), the disable population from the disability-free +population. Life expectancy (LE) (or total population divided by +the yearly number of births or deaths of this stationary +population) is then decomposed into DFLE and DLE. This method of +computing HE is usually called the Sullivan method (from the name +of the author who first described it).

+ +

Age-specific proportions of people disable are very difficult +to forecast because each proportion corresponds to historical +conditions of the cohort and it is the result of the historical +flows from entering disability and recovering in the past until +today. The age-specific intensities (or incidence rates) of +entering disability or recovering a good health, are reflecting +actual conditions and therefore can be used at each age to +forecast the future of this cohort. For example if a country is +improving its technology of prosthesis, the incidence of +recovering the ability to walk will be higher at each (old) age, +but the prevalence of disability will only slightly reflect an +improve because the prevalence is mostly affected by the history +of the cohort and not by recent period effects. To measure the +period improvement we have to simulate the future of a cohort of +new-borns entering or leaving at each age the disability state or +dying according to the incidence rates measured today on +different cohorts. The proportion of people disabled at each age +in this simulated cohort will be much lower (using the exemple of +an improvement) that the proportions observed at each age in a +cross-sectional survey. This new prevalence curve introduced in a +life table will give a much more actual and realistic HE level +than the Sullivan method which mostly measured the History of +health conditions in this country.

+ +

Therefore, the main question is how to measure incidence rates +from cross-longitudinal surveys? This is the goal of the IMaCH +program. From your data and using IMaCH you can estimate period +HE and not only Sullivan's HE. Also the standard errors of the HE +are computed.

+ +

A cross-longitudinal survey consists in a first survey +("cross") where individuals from different ages are +interviewed on their health status or degree of disability. At +least a second wave of interviews ("longitudinal") +should measure each new individual health status. Health +expectancies are computed from the transitions observed between +waves and are computed for each degree of severity of disability +(number of life states). More degrees you consider, more time is +necessary to reach the Maximum Likelihood of the parameters +involved in the model. Considering only two states of disability +(disable and healthy) is generally enough but the computer +program works also with more health statuses.
+
+The simplest model is the multinomial logistic model where pij +is the probability to be observed in state j at the second +wave conditional to be observed in state i at the first +wave. Therefore a simple model is: log(pij/pii)= aij + +bij*age+ cij*sex, where 'age' is age and 'sex' +is a covariate. The advantage that this computer program claims, +comes from that if the delay between waves is not identical for +each individual, or if some individual missed an interview, the +information is not rounded or lost, but taken into account using +an interpolation or extrapolation. hPijx is the +probability to be observed in state i at age x+h +conditional to the observed state i at age x. The +delay 'h' can be split into an exact number (nh*stepm) +of unobserved intermediate states. This elementary transition (by +month or quarter trimester, semester or year) is modeled as a +multinomial logistic. The hPx matrix is simply the matrix +product of nh*stepm elementary matrices and the +contribution of each individual to the likelihood is simply hPijx. +
+

+ +

The program presented in this manual is a quite general +program named IMaCh (for Interpolated +MArkov CHain), designed to +analyse transition data from longitudinal surveys. The first step +is the parameters estimation of a transition probabilities model +between an initial status and a final status. From there, the +computer program produces some indicators such as observed and +stationary prevalence, life expectancies and their variances and +graphs. Our transition model consists in absorbing and +non-absorbing states with the possibility of return across the +non-absorbing states. The main advantage of this package, +compared to other programs for the analysis of transition data +(For example: Proc Catmod of SAS®) is that the whole +individual information is used even if an interview is missing, a +status or a date is unknown or when the delay between waves is +not identical for each individual. The program can be executed +according to parameters: selection of a sub-sample, number of +absorbing and non-absorbing states, number of waves taken in +account (the user inputs the first and the last interview), a +tolerance level for the maximization function, the periodicity of +the transitions (we can compute annual, quaterly or monthly +transitions), covariates in the model. It works on Windows or on +Unix.
+

+ +
+ +

(1) Laditka, Sarah B. and Wolf, Douglas A. (1998), "New +Methods for Analyzing Active Life Expectancy". Journal of +Aging and Health. Vol 10, No. 2.

+ +
+ +

On what kind of data can +it be used?

+ +

The minimum data required for a transition model is the +recording of a set of individuals interviewed at a first date and +interviewed again at least one another time. From the +observations of an individual, we obtain a follow-up over time of +the occurrence of a specific event. In this documentation, the +event is related to health status at older ages, but the program +can be applied on a lot of longitudinal studies in different +contexts. To build the data file explained into the next section, +you must have the month and year of each interview and the +corresponding health status. But in order to get age, date of +birth (month and year) is required (missing values is allowed for +month). Date of death (month and year) is an important +information also required if the individual is dead. Shorter +steps (i.e. a month) will more closely take into account the +survival time after the last interview.

+ +
+ +

The data file

+ +

In this example, 8,000 people have been interviewed in a +cross-longitudinal survey of 4 waves (1984, 1986, 1988, 1990). +Some people missed 1, 2 or 3 interviews. Health statuses are +healthy (1) and disable (2). The survey is not a real one. It is +a simulation of the American Longitudinal Survey on Aging. The +disability state is defined if the individual missed one of four +ADL (Activity of daily living, like bathing, eating, walking). +Therefore, even is the individuals interviewed in the sample are +virtual, the information brought with this sample is close to the +situation of the United States. Sex is not recorded is this +sample.

+ +

Each line of the data set (named data1.txt +in this first example) is an individual record which fields are:

+ + + +

 

+ +

If your longitudinal survey do not include information about +weights or covariates, you must fill the column with a number +(e.g. 1) because a missing field is not allowed.

+ +
+ +

Your first example parameter file

+ +

#Imach version 0.64b, May 2001, +INED-EUROREVES

+ +

This is a comment. Comments start with a '#'.

+ +

First uncommented line

+ +
title=1st_example datafile=data1.txt lastobs=8600 firstpass=1 lastpass=4
+ + + +

 

+ +

Second uncommented +line

+ +
ftol=1.e-08 stepm=1 ncov=2 nlstate=2 ndeath=1 maxwav=4 mle=1 weight=0
+ + + +

Covariates

+ +

Intercept and age are systematically included in the model. +Additional covariates can be included with the command

+ +
model=list of covariates
+ + + +

Guess values for optimization

+ +

You must write the initial guess values of the parameters for +optimization. The number of parameters, N depends on the +number of absorbing states and non-absorbing states and on the +number of covariates.
+N is given by the formula N=(nlstate + +ndeath-1)*nlstate*ncov .
+
+Thus in the simple case with 2 covariates (the model is log +(pij/pii) = aij + bij * age where intercept and age are the two +covariates), and 2 health degrees (1 for disability-free and 2 +for disability) and 1 absorbing state (3), you must enter 8 +initials values, a12, b12, a13, b13, a21, b21, a23, b23. You can +start with zeros as in this example, but if you have a more +precise set (for example from an earlier run) you can enter it +and it will speed up them
+Each of the four lines starts with indices "ij":
+
+ij aij bij

+ +
+
# Guess values of aij and bij in log (pij/pii) = aij + bij * age
+12 -14.155633  0.110794 
+13  -7.925360  0.032091 
+21  -1.890135 -0.029473 
+23  -6.234642  0.022315 
+
+ +

or, to simplify:

+ +
+
12 0.0 0.0
+13 0.0 0.0
+21 0.0 0.0
+23 0.0 0.0
+
+ +

Guess values for computing variances

+ +

This is an output if mle=1. But it can be +used as an input to get the vairous output data files (Health +expectancies, stationary prevalence etc.) and figures without +rerunning the rather long maximisation phase (mle=0).

+ +

The scales are small values for the evaluation of numerical +derivatives. These derivatives are used to compute the hessian +matrix of the parameters, that is the inverse of the covariance +matrix, and the variances of health expectancies. Each line +consists in indices "ij" followed by the initial scales +(zero to simplify) associated with aij and bij.

+ + + +
+
# Scales (for hessian or gradient estimation)
+12 0. 0. 
+13 0. 0. 
+21 0. 0. 
+23 0. 0. 
+
+ + + +

Covariance matrix of parameters

+ +

This is an output if mle=1. But it can be +used as an input to get the vairous output data files (Health +expectancies, stationary prevalence etc.) and figures without +rerunning the rather long maximisation phase (mle=0).

+ +

Each line starts with indices "ijk" followed by the +covariances between aij and bij:

+ +
+   121 Var(a12) 
+   122 Cov(b12,a12)  Var(b12) 
+          ...
+   232 Cov(b23,a12)  Cov(b23,b12) ... Var (b23) 
+ + + +
+
# Covariance matrix
+121 0.
+122 0. 0.
+131 0. 0. 0. 
+132 0. 0. 0. 0. 
+211 0. 0. 0. 0. 0. 
+212 0. 0. 0. 0. 0. 0. 
+231 0. 0. 0. 0. 0. 0. 0. 
+232 0. 0. 0. 0. 0. 0. 0. 0.
+
+ + + +

last +uncommented line

+ +
agemin=70 agemax=100 bage=50 fage=100
+ +

Once we obtained the estimated parameters, the program is able +to calculated stationary prevalence, transitions probabilities +and life expectancies at any age. Choice of age ranges is useful +for extrapolation. In our data file, ages varies from age 70 to +102. Setting bage=50 and fage=100, makes the program computing +life expectancy from age bage to age fage. As we use a model, we +can compute life expectancy on a wider age range than the age +range from the data. But the model can be rather wrong on big +intervals.

+ +

Similarly, it is possible to get extrapolated stationary +prevalence by age raning from agemin to agemax.

+ + + +
+ +

Running Imach +with this example

+ +

We assume that you entered your 1st_example +parameter file as explained above. To +run the program you should click on the imach.exe icon and enter +the name of the parameter file which is for example C:\usr\imach\mle\biaspar.txt +(you also can click on the biaspar.txt icon located in
+C:\usr\imach\mle and put it with +the mouse on the imach window).
+

+ +

The time to converge depends on the step unit that you used (1 +month is cpu consuming), on the number of cases, and on the +number of variables.

+ +

The program outputs many files. Most of them are files which +will be plotted for better understanding.

+ +
+ +

Output of the program +and graphs

+ +

Once the optimization is finished, some graphics can be made +with a grapher. We use Gnuplot which is an interactive plotting +program copyrighted but freely distributed. A gnuplot reference +manual is available here.
+When the running is finished, the user should enter a caracter +for plotting and output editing.

+ +

These caracters are:

+ + + +
Results files
+
+- Observed prevalence in each state (and at first pass): +prbiaspar.txt
+
+ +

The first line is the title and displays each field of the +file. The first column is age. The fields 2 and 6 are the +proportion of individuals in states 1 and 2 respectively as +observed during the first exam. Others fields are the numbers of +people in states 1, 2 or more. The number of columns increases if +the number of states is higher than 2.
+The header of the file is

+ +
# Age Prev(1) N(1) N Age Prev(2) N(2) N
+70 1.00000 631 631 70 0.00000 0 631
+71 0.99681 625 627 71 0.00319 2 627 
+72 0.97125 1115 1148 72 0.02875 33 1148 
+ +

It means that at age 70, the prevalence in state 1 is 1.000 +and in state 2 is 0.00 . At age 71 the number of individuals in +state 1 is 625 and in state 2 is 2, hence the total number of +people aged 71 is 625+2=627.
+

+ +
- Estimated parameters and +covariance matrix: rbiaspar.txt
+ +

This file contains all the maximisation results:

+ +
 -2 log likelihood= 21660.918613445392
+ Estimated parameters: a12 = -12.290174 b12 = 0.092161 
+                       a13 = -9.155590  b13 = 0.046627 
+                       a21 = -2.629849  b21 = -0.022030 
+                       a23 = -7.958519  b23 = 0.042614  
+ Covariance matrix: Var(a12) = 1.47453e-001
+                    Var(b12) = 2.18676e-005
+                    Var(a13) = 2.09715e-001
+                    Var(b13) = 3.28937e-005  
+                    Var(a21) = 9.19832e-001
+                    Var(b21) = 1.29229e-004
+                    Var(a23) = 4.48405e-001
+                    Var(b23) = 5.85631e-005 
+ 
+ +

By substitution of these parameters in the regression model, +we obtain the elementary transition probabilities:

+ +

+ +
- Transition probabilities: +pijrbiaspar.txt
+ +

Here are the transitions probabilities Pij(x, x+nh) where nh +is a multiple of 2 years. The first column is the starting age x +(from age 50 to 100), the second is age (x+nh) and the others are +the transition probabilities p11, p12, p13, p21, p22, p23. For +example, line 5 of the file is:

+ +
 100 106 0.02655 0.17622 0.79722 0.01809 0.13678 0.84513 
+ +

and this means:

+ +
p11(100,106)=0.02655
+p12(100,106)=0.17622
+p13(100,106)=0.79722
+p21(100,106)=0.01809
+p22(100,106)=0.13678
+p22(100,106)=0.84513 
+ +
- Stationary prevalence in each state: +plrbiaspar.txt
+ +
#Prevalence
+#Age 1-1 2-2
+
+#************ 
+70 0.90134 0.09866
+71 0.89177 0.10823 
+72 0.88139 0.11861 
+73 0.87015 0.12985 
+ +

At age 70 the stationary prevalence is 0.90134 in state 1 and +0.09866 in state 2. This stationary prevalence differs from +observed prevalence. Here is the point. The observed prevalence +at age 70 results from the incidence of disability, incidence of +recovery and mortality which occurred in the past of the cohort. +Stationary prevalence results from a simulation with actual +incidences and mortality (estimated from this cross-longitudinal +survey). It is the best predictive value of the prevalence in the +future if "nothing changes in the future". This is +exactly what demographers do with a Life table. Life expectancy +is the expected mean time to survive if observed mortality rates +(incidence of mortality) "remains constant" in the +future.

+ +
- Standard deviation of +stationary prevalence: vplrbiaspar.txt
+ +

The stationary prevalence has to be compared with the observed +prevalence by age. But both are statistical estimates and +subjected to stochastic errors due to the size of the sample, the +design of the survey, and, for the stationary prevalence to the +model used and fitted. It is possible to compute the standard +deviation of the stationary prevalence at each age.

+ +
-Observed and stationary +prevalence in state (2=disable) with the confident interval: +vbiaspar21.gif
+ +

This graph exhibits the stationary prevalence in state (2) +with the confidence interval in red. The green curve is the +observed prevalence (or proportion of individuals in state (2)). +Without discussing the results (it is not the purpose here), we +observe that the green curve is rather below the stationary +prevalence. It suggests an increase of the disability prevalence +in the future.

+ +

+ +
-Convergence to the +stationary prevalence of disability: pbiaspar11.gif
+
+ +

This graph plots the conditional transition probabilities from +an initial state (1=healthy in red at the bottom, or 2=disable in +green on top) at age x to the final state 2=disable at +age x+h. Conditional means at the condition to be alive +at age x+h which is hP12x + hP22x. The +curves hP12x/(hP12x + hP22x) and hP22x/(hP12x ++ hP22x) converge with h, to the stationary +prevalence of disability. In order to get the stationary +prevalence at age 70 we should start the process at an earlier +age, i.e.50. If the disability state is defined by severe +disability criteria with only a few chance to recover, then the +incidence of recovery is low and the time to convergence is +probably longer. But we don't have experience yet.

+ +
- Life expectancies by age +and initial health status: erbiaspar.txt
+ +
# Health expectancies 
+# Age 1-1 1-2 2-1 2-2 
+70 10.9226 3.0401 5.6488 6.2122 
+71 10.4384 3.0461 5.2477 6.1599 
+72 9.9667 3.0502 4.8663 6.1025 
+73 9.5077 3.0524 4.5044 6.0401 
+ +
For example 70 10.9226 3.0401 5.6488 6.2122 means:
+e11=10.9226 e12=3.0401 e21=5.6488 e22=6.2122
+ +
+ +

For example, life expectancy of a healthy individual at age 70 +is 10.92 in the healthy state and 3.04 in the disability state +(=13.96 years). If he was disable at age 70, his life expectancy +will be shorter, 5.64 in the healthy state and 6.21 in the +disability state (=11.85 years). The total life expectancy is a +weighted mean of both, 13.96 and 11.85; weight is the proportion +of people disabled at age 70. In order to get a pure period index +(i.e. based only on incidences) we use the computed or +stationary prevalence at age 70 (i.e. computed from +incidences at earlier ages) instead of the observed prevalence +(for example at first exam) (see +below).

+ +
- Variances of life +expectancies by age and initial health status: vrbiaspar.txt
+ +

For example, the covariances of life expectancies Cov(ei,ej) +at age 50 are (line 3)

+ +
   Cov(e1,e1)=0.4776  Cov(e1,e2)=0.0488=Cov(e2,e1)  Cov(e2,e2)=0.0424
+ +
- Health +expectancies +with standard errors in parentheses: trbiaspar.txt
+ +
#Total LEs with variances: e.. (std) e.1 (std) e.2 (std) 
+ +
70 13.76 (0.22) 10.40 (0.20) 3.35 (0.14) 
+ +

Thus, at age 70 the total life expectancy, e..=13.76years is +the weighted mean of e1.=13.96 and e2.=11.85 by the stationary +prevalence at age 70 which are 0.90134 in state 1 and 0.09866 in +state 2, respectively (the sum is equal to one). e.1=10.40 is the +Disability-free life expectancy at age 70 (it is again a weighted +mean of e11 and e21). e.2=3.35 is also the life expectancy at age +70 to be spent in the disability state.

+ +
-Total life expectancy by +age and health expectancies in states (1=healthy) and (2=disable): +ebiaspar1.gif
+ +

This figure represents the health expectancies and the total +life expectancy with the confident interval in dashed curve.

+ +
        
+ +

Standard deviations (obtained from the information matrix of +the model) of these quantities are very useful. +Cross-longitudinal surveys are costly and do not involve huge +samples, generally a few thousands; therefore it is very +important to have an idea of the standard deviation of our +estimates. It has been a big challenge to compute the Health +Expectancy standard deviations. Don't be confuse: life expectancy +is, as any expected value, the mean of a distribution; but here +we are not computing the standard deviation of the distribution, +but the standard deviation of the estimate of the mean.

+ +

Our health expectancies estimates vary according to the sample +size (and the standard deviations give confidence intervals of +the estimate) but also according to the model fitted. Let us +explain it in more details.

+ +

Choosing a model means ar least two kind of choices. First we +have to decide the number of disability states. Second we have to +design, within the logit model family, the model: variables, +covariables, confonding factors etc. to be included.

+ +

More disability states we have, better is our demographical +approach of the disability process, but smaller are the number of +transitions between each state and higher is the noise in the +measurement. We do not have enough experiments of the various +models to summarize the advantages and disadvantages, but it is +important to say that even if we had huge and unbiased samples, +the total life expectancy computed from a cross-longitudinal +survey, varies with the number of states. If we define only two +states, alive or dead, we find the usual life expectancy where it +is assumed that at each age, people are at the same risk to die. +If we are differentiating the alive state into healthy and +disable, and as the mortality from the disability state is higher +than the mortality from the healthy state, we are introducing +heterogeneity in the risk of dying. The total mortality at each +age is the weighted mean of the mortality in each state by the +prevalence in each state. Therefore if the proportion of people +at each age and in each state is different from the stationary +equilibrium, there is no reason to find the same total mortality +at a particular age. Life expectancy, even if it is a very useful +tool, has a very strong hypothesis of homogeneity of the +population. Our main purpose is not to measure differential +mortality but to measure the expected time in a healthy or +disability state in order to maximise the former and minimize the +latter. But the differential in mortality complexifies the +measurement.

+ +

Incidences of disability or recovery are not affected by the +number of states if these states are independant. But incidences +estimates are dependant on the specification of the model. More +covariates we added in the logit model better is the model, but +some covariates are not well measured, some are confounding +factors like in any statistical model. The procedure to "fit +the best model' is similar to logistic regression which itself is +similar to regression analysis. We haven't yet been sofar because +we also have a severe limitation which is the speed of the +convergence. On a Pentium III, 500 MHz, even the simplest model, +estimated by month on 8,000 people may take 4 hours to converge. +Also, the program is not yet a statistical package, which permits +a simple writing of the variables and the model to take into +account in the maximisation. The actual program allows only to +add simple variables like age+sex or age+sex+ age*sex but will +never be general enough. But what is to remember, is that +incidences or probability of change from one state to another is +affected by the variables specified into the model.

+ +

Also, the age range of the people interviewed has a link with +the age range of the life expectancy which can be estimated by +extrapolation. If your sample ranges from age 70 to 95, you can +clearly estimate a life expectancy at age 70 and trust your +confidence interval which is mostly based on your sample size, +but if you want to estimate the life expectancy at age 50, you +should rely in your model, but fitting a logistic model on a age +range of 70-95 and estimating probabilties of transition out of +this age range, say at age 50 is very dangerous. At least you +should remember that the confidence interval given by the +standard deviation of the health expectancies, are under the +strong assumption that your model is the 'true model', which is +probably not the case.

+ +
- Copy of the parameter +file: orbiaspar.txt
+ +

This copy of the parameter file can be useful to re-run the +program while saving the old output files.

+ +
+ +

Trying an example

+ +

Since you know how to run the program, it is time to test it +on your own computer. Try for example on a parameter file named imachpar.txt which is a copy of mypar.txt included in the +subdirectory of imach, mytry. +Edit it to change the name of the data file to ..\data\mydata.txt if you don't want to +copy it on the same directory. The file mydata.txt +is a smaller file of 3,000 people but still with 4 waves.

+ +

Click on the imach.exe icon to open a window. Answer to the +question:'Enter the parameter file name:'

+ + + + + +
IMACH, Version 0.64b

Enter + the parameter file name: ..\mytry\imachpar.txt

+
+ +

Most of the data files or image files generated, will use the +'imachpar' string into their name. The running time is about 2-3 +minutes on a Pentium III. If the execution worked correctly, the +outputs files are created in the current directory, and should be +the same as the mypar files initially included in the directory mytry.

+ + + +

 

+ + + +

Once the running is finished, the program +requires a caracter:

+ + + + + +
Type e to edit output files, c + to start again, and q for exiting:
+ +

First you should enter e to +edit the master file mypar.htm.

+ + + +

This software have been partly granted by Euro-REVES, a concerted +action from the European Union. It will be copyrighted +identically to a GNU software product, i.e. program and software +can be distributed freely for non commercial use. Sources are not +widely distributed today. You can get them by asking us with a +simple justification (name, email, institute) mailto:brouard@ined.fr and mailto:lievre@ined.fr .

+ +

Latest version (0.64b of may 2001) can be accessed at http://euroreves.ined.fr/imach
+

+ +