From: N. Brouard Date: Wed, 16 Jun 2004 12:05:30 +0000 (+0000) Subject: *** empty log message *** X-Git-Tag: imach-099s7~596 X-Git-Url: https://henry.ined.fr/git/?a=commitdiff_plain;h=538a9b8e5c50a5d119a54009404e95f99d670602;p=.git *** empty log message *** --- diff --git a/html/doc/imach.htm b/html/doc/imach.htm new file mode 100644 index 0000000..f11e984 --- /dev/null +++ b/html/doc/imach.htm @@ -0,0 +1,1311 @@ + + + + + + +Computing Health Expectancies using IMaCh + + + + + +IMaCh + + + + +

+ +

Computing Health +Expectancies using IMaCh

+ +

(a Maximum +Likelihood Computer Program using Interpolation of Markov Chains)

+ +

INED and EUROREVES

+ +

Version +0.8a, May 2002

+ +

Authors of the +program: Nicolas Brouard, senior researcher at the Institut +National d'Etudes Démographiques (INED, Paris) in the "Mortality, +Health and Epidemiology" Research Unit

+ +

and Agnès +Lièvre
+

+ +

Contribution to the mathematics: C. R. +Heathcote (Australian +National University, Canberra).

+ +

Contact: Agnès Lièvre (lievre@ined.fr)

+ +

Introduction
On what kind of data can it be used?
The data file
The parameter file
Running Imach
Output files and graphs
Exemple

+ +

Introduction

+ +

This program computes Healthy Life Expectancies from cross-longitudinal +data using the methodology pioneered by Laditka and Wolf (1). +Within the family of Health Expectancies (HE), Disability-free +life expectancy (DFLE) is probably the most important index to +monitor. In low mortality countries, there is a fear that when +mortality declines, the increase in DFLE is not proportionate to +the increase in total Life expectancy. This case is called the Expansion +of morbidity. Most of the data collected today, in +particular by the international REVES +network on Health expectancy, and most HE indices based on these +data, are cross-sectional. It means that the information +collected comes from a single cross-sectional survey: people from +various ages (but mostly old people) are surveyed on their health +status at a single date. Proportion of people disabled at each +age, can then be measured at that date. This age-specific +prevalence curve is then used to distinguish, within the +stationary population (which, by definition, is the life table +estimated from the vital statistics on mortality at the same +date), the disable population from the disability-free +population. Life expectancy (LE) (or total population divided by +the yearly number of births or deaths of this stationary +population) is then decomposed into DFLE and DLE. This method of +computing HE is usually called the Sullivan method (from the name +of the author who first described it).

+ +

Age-specific proportions of people disable are very difficult +to forecast because each proportion corresponds to historical +conditions of the cohort and it is the result of the historical +flows from entering disability and recovering in the past until +today. The age-specific intensities (or incidence rates) of +entering disability or recovering a good health, are reflecting +actual conditions and therefore can be used at each age to +forecast the future of this cohort. For example if a country is +improving its technology of prosthesis, the incidence of +recovering the ability to walk will be higher at each (old) age, +but the prevalence of disability will only slightly reflect an +improve because the prevalence is mostly affected by the history +of the cohort and not by recent period effects. To measure the +period improvement we have to simulate the future of a cohort of +new-borns entering or leaving at each age the disability state or +dying according to the incidence rates measured today on +different cohorts. The proportion of people disabled at each age +in this simulated cohort will be much lower (using the exemple of +an improvement) that the proportions observed at each age in a +cross-sectional survey. This new prevalence curve introduced in a +life table will give a much more actual and realistic HE level +than the Sullivan method which mostly measured the History of +health conditions in this country.

+ +

Therefore, the main question is how to measure incidence rates +from cross-longitudinal surveys? This is the goal of the IMaCH +program. From your data and using IMaCH you can estimate period +HE and not only Sullivan's HE. Also the standard errors of the HE +are computed.

+ +

A cross-longitudinal survey consists in a first survey +("cross") where individuals from different ages are +interviewed on their health status or degree of disability. At +least a second wave of interviews ("longitudinal") +should measure each new individual health status. Health +expectancies are computed from the transitions observed between +waves and are computed for each degree of severity of disability +(number of life states). More degrees you consider, more time is +necessary to reach the Maximum Likelihood of the parameters +involved in the model. Considering only two states of disability +(disable and healthy) is generally enough but the computer +program works also with more health statuses.
+
+The simplest model is the multinomial logistic model where pij +is the probability to be observed in state j at the second +wave conditional to be observed in state i at the first +wave. Therefore a simple model is: log(pij/pii)= aij + +bij*age+ cij*sex, where 'age' is age and 'sex' +is a covariate. The advantage that this computer program claims, +comes from that if the delay between waves is not identical for +each individual, or if some individual missed an interview, the +information is not rounded or lost, but taken into account using +an interpolation or extrapolation. hPijx is the +probability to be observed in state i at age x+h +conditional to the observed state i at age x. The +delay 'h' can be split into an exact number (nh*stepm) +of unobserved intermediate states. This elementary transition (by +month or quarter trimester, semester or year) is modeled as a +multinomial logistic. The hPx matrix is simply the matrix +product of nh*stepm elementary matrices and the +contribution of each individual to the likelihood is simply hPijx. +
+

+ +

The program presented in this manual is a quite general +program named IMaCh (for Interpolated +MArkov CHain), designed to +analyse transition data from longitudinal surveys. The first step +is the parameters estimation of a transition probabilities model +between an initial status and a final status. From there, the +computer program produces some indicators such as observed and +stationary prevalence, life expectancies and their variances and +graphs. Our transition model consists in absorbing and +non-absorbing states with the possibility of return across the +non-absorbing states. The main advantage of this package, +compared to other programs for the analysis of transition data +(For example: Proc Catmod of SAS^®) is that the whole +individual information is used even if an interview is missing, a +status or a date is unknown or when the delay between waves is +not identical for each individual. The program can be executed +according to parameters: selection of a sub-sample, number of +absorbing and non-absorbing states, number of waves taken in +account (the user inputs the first and the last interview), a +tolerance level for the maximization function, the periodicity of +the transitions (we can compute annual, quarterly or monthly +transitions), covariates in the model. It works on Windows or on +Unix.
+

+ +

(1) Laditka, Sarah B. and Wolf, Douglas A. (1998), "New +Methods for Analyzing Active Life Expectancy". Journal of +Aging and Health. Vol 10, No. 2.

+ +

On what kind of data can +it be used?

+ +

The minimum data required for a transition model is the +recording of a set of individuals interviewed at a first date and +interviewed again at least one another time. From the +observations of an individual, we obtain a follow-up over time of +the occurrence of a specific event. In this documentation, the +event is related to health status at older ages, but the program +can be applied on a lot of longitudinal studies in different +contexts. To build the data file explained into the next section, +you must have the month and year of each interview and the +corresponding health status. But in order to get age, date of +birth (month and year) is required (missing values is allowed for +month). Date of death (month and year) is an important +information also required if the individual is dead. Shorter +steps (i.e. a month) will more closely take into account the +survival time after the last interview.

+ +

The data file

+ +

In this example, 8,000 people have been interviewed in a +cross-longitudinal survey of 4 waves (1984, 1986, 1988, 1990). +Some people missed 1, 2 or 3 interviews. Health statuses are +healthy (1) and disable (2). The survey is not a real one. It is +a simulation of the American Longitudinal Survey on Aging. The +disability state is defined if the individual missed one of four +ADL (Activity of daily living, like bathing, eating, walking). +Therefore, even is the individuals interviewed in the sample are +virtual, the information brought with this sample is close to the +situation of the United States. Sex is not recorded is this +sample.

+ +

Each line of the data set (named data1.txt +in this first example) is an individual record which fields are:

+ +

Index number: positive number (field 1)
First covariate positive number (field 2)
Second covariate positive number (field 3)
Weight: positive number + (field 4) . In most surveys individuals are weighted + according to the stratification of the sample.
Date of birth: coded as mm/yyyy. Missing dates are + coded as 99/9999 (field 5)
Date of death: coded as mm/yyyy. Missing dates are + coded as 99/9999 (field 6)
Date of first interview: coded as mm/yyyy. Missing + dates are coded as 99/9999 (field 7)
Status at first interview: positive number. + Missing values ar coded -1. (field 8)
Date of second interview: coded as mm/yyyy. + Missing dates are coded as 99/9999 (field 9)
Status at second interview positive + number. Missing values ar coded -1. (field 10)
Date of third interview: coded as mm/yyyy. Missing + dates are coded as 99/9999 (field 11)
Status at third interview positive + number. Missing values ar coded -1. (field 12)
Date of fourth interview: coded as mm/yyyy. + Missing dates are coded as 99/9999 (field 13)
Status at fourth interview positive + number. Missing values are coded -1. (field 14)
etc

+ +

If your longitudinal survey do not include information about +weights or covariates, you must fill the column with a number +(e.g. 1) because a missing field is not allowed.

+ +

Your first example parameter file

+ +

#Imach version 0.8a, May 2002, +INED-EUROREVES

+ +

This is a comment. Comments start with a '#'.

+ +

First uncommented line

+ +

title=1st_example datafile=data1.txt lastobs=8600 firstpass=1 lastpass=4

+ +

title= 1st_example is title of the run.
datafile= data1.txt is the name of the data set. + Our example is a six years follow-up survey. It consists + in a baseline followed by 3 reinterviews.
lastobs= 8600 the program is able to run on a + subsample where the last observation number is lastobs. + It can be set a bigger number than the real number of + observations (e.g. 100000). In this example, maximisation + will be done on the 8600 first records.
firstpass=1 , lastpass=4 In case of more + than two interviews in the survey, the program can be run + on selected transitions periods. firstpass=1 means the + first interview included in the calculation is the + baseline survey. lastpass=4 means that the information + brought by the 4th interview is taken into account.

+ +

Second uncommented +line

+ +

ftol=1.e-08 stepm=1 ncovcol=2 nlstate=2 ndeath=1 maxwav=4 mle=1 weight=0

+ +

ftol=1e-8 Convergence tolerance on the function + value in the maximisation of the likelihood. Choosing a + correct value for ftol is difficult. 1e-8 is a correct + value for a 32 bits computer.
stepm=1 Time unit in months for interpolation. + Examples:
- If stepm=1, the unit is a month
- If stepm=4, the unit is a trimester
- If stepm=12, the unit is a year
- If stepm=24, the unit is two years
- ...
+
ncovcol=2 Number of covariate columns in the + datafile which precede the date of birth. Here you can + put variables that won't necessary be used during the + run. It is not the number of covariates that will be + specified by the model. The 'model' syntax describe the + covariates to take into account.
nlstate=2 Number of non-absorbing (alive) states. + Here we have two alive states: disability-free is coded 1 + and disability is coded 2.
ndeath=1 Number of absorbing states. The absorbing + state death is coded 3.
maxwav=4 Number of waves in the datafile.
mle=1 Option for the + Maximisation Likelihood Estimation.
- If mle=1 the program does the maximisation and + the calculation of health expectancies
- If mle=0 the program only does the calculation of + the health expectancies.
+
weight=0 Possibility to add weights.
- If weight=0 no weights are included
- If weight=1 the maximisation integrates the + weights which are in field 4
+

+ +

Covariates

+ +

Intercept and age are systematically included in the model. +Additional covariates can be included with the command:

+ +

model=list of covariates

+ +

if model=. then no covariates are + included
if model=V1 the model includes the first + covariate (field 2)
if model=V2 the model includes the + second covariate (field 3)
if model=V1+V2 the model includes the + first and the second covariate (fields 2 and 3)
if model=V1*V2 the model includes the + product of the first and the second covariate (fields 2 + and 3)
if model=V1+V1*age the model includes + the product covariate*age

+ +

In this example, we have two covariates in the data file +(fields 2 and 3). The number of covariates included in the data +file between the id and the date of birth is ncovcol=2 (it was +named ncov in version prior to 0.8). If you have 3 covariates in +the datafile (fields 2, 3 and 4), you will set ncovcol=3. Then +you can run the programme with a new parametrisation taking into +account the third covariate. For example, model=V1+V3 estimates +a model with the first and third covariates. More complicated +models can be used, but it will takes more time to converge. With +a simple model (no covariates), the programme estimates 8 +parameters. Adding covariates increases the number of parameters +: 12 for model=V1, 16 for model=V1+V1*age +and 20 for model=V1+V2+V3.

+ +

Guess values for optimization

+ +

You must write the initial guess values of the parameters for +optimization. The number of parameters, N depends on the +number of absorbing states and non-absorbing states and on the +number of covariates.
+N is given by the formula N=(nlstate + +ndeath-1)*nlstate*ncovmodel .
+
+Thus in the simple case with 2 covariates (the model is log +(pij/pii) = aij + bij * age where intercept and age are the two +covariates), and 2 health degrees (1 for disability-free and 2 +for disability) and 1 absorbing state (3), you must enter 8 +initials values, a12, b12, a13, b13, a21, b21, a23, b23. You can +start with zeros as in this example, but if you have a more +precise set (for example from an earlier run) you can enter it +and it will speed up them
+Each of the four lines starts with indices "ij": ij +aij bij

+ +

# Guess values of aij and bij in log (pij/pii) = aij + bij * age
+12 -14.155633  0.110794 
+13  -7.925360  0.032091 
+21  -1.890135 -0.029473 
+23  -6.234642  0.022315

+ +

or, to simplify (in most of cases it converges but there is no +warranty!):

+ +

12 0.0 0.0
+13 0.0 0.0
+21 0.0 0.0
+23 0.0 0.0

+ +

In order to speed up the convergence you can make a first run +with a large stepm i.e stepm=12 or 24 and then decrease the stepm +until stepm=1 month. If newstepm is the new shorter stepm and +stepm can be expressed as a multiple of newstepm, like newstepm=n +stepm, then the following approximation holds:

+ +

aij(stepm) = aij(n . stepm) - ln(n)
+

+ +

and

+ +

bij(stepm) = bij(n . stepm) .

+ +

For example if you already ran for a 6 months interval and +got:
+

+ +

# Parameters
+12 -13.390179  0.126133 
+13  -7.493460  0.048069 
+21   0.575975 -0.041322 
+23  -4.748678  0.030626 
+

+ +

If you now want to get the monthly estimates, you can guess +the aij by substracting ln(6)= 1,7917
+and running
+

+ +

12 -15.18193847  0.126133 
+13 -9.285219469  0.048069
+21 -1.215784469 -0.041322
+23 -6.540437469  0.030626
+

+ +

and get
+

+ +

12 -15.029768 0.124347 
+13 -8.472981 0.036599 
+21 -1.472527 -0.038394 
+23 -6.553602 0.029856 
+
+which is closer to the results. The approximation is probably useful
+only for very small intervals and we don't have enough experience to
+know if you will speed up the convergence or not.
+

+ +

         -ln(12)= -2.484
+ -ln(6/1)=-ln(6)= -1.791
+ -ln(3/1)=-ln(3)= -1.0986
+-ln(12/6)=-ln(2)= -0.693
+

+ +

Guess values for computing variances

+ +

This is an output if mle=1. But it can be +used as an input to get the various output data files (Health +expectancies, stationary prevalence etc.) and figures without +rerunning the rather long maximisation phase (mle=0).

+ +

The scales are small values for the evaluation of numerical +derivatives. These derivatives are used to compute the hessian +matrix of the parameters, that is the inverse of the covariance +matrix, and the variances of health expectancies. Each line +consists in indices "ij" followed by the initial scales +(zero to simplify) associated with aij and bij.

+ +

If mle=1 you can enter zeros:

# Scales (for hessian or gradient estimation)
+12 0. 0. 
+13 0. 0. 
+21 0. 0. 
+23 0. 0.

If mle=0 you must enter a covariance matrix (usually + obtained from an earlier run).

+ +

Covariance matrix of parameters

+ +

+   121 Var(a12) 
+   122 Cov(b12,a12)  Var(b12) 
+          ...
+   232 Cov(b23,a12)  Cov(b23,b12) ... Var (b23)

+ +

If mle=1 you can enter zeros.

# Covariance matrix
+121 0.
+122 0. 0.
+131 0. 0. 0. 
+132 0. 0. 0. 0. 
+211 0. 0. 0. 0. 0. 
+212 0. 0. 0. 0. 0. 0. 
+231 0. 0. 0. 0. 0. 0. 0. 
+232 0. 0. 0. 0. 0. 0. 0. 0.

If mle=0 you must enter a covariance matrix (usually + obtained from an earlier run).

+ +

Age range for calculation of stationary +prevalences and health expectancies

+ +

agemin=70 agemax=100 bage=50 fage=100

+ +

+Once we obtained the estimated parameters, the program is able
+to calculated stationary prevalence, transitions probabilities
+and life expectancies at any age. Choice of age range is useful
+for extrapolation. In our data file, ages varies from age 70 to
+102. It is possible to get extrapolated stationary prevalence by
+age ranging from agemin to agemax.
+
+
+Setting bage=50 (begin age) and fage=100 (final age), makes
+the program computing life expectancy from age 'bage' to age
+'fage'. As we use a model, we can interessingly compute life
+expectancy on a wider age range than the age range from the data.
+But the model can be rather wrong on much larger intervals.
+Program is limited to around 120 for upper age!
+

+ +

agemin= Minimum age for calculation of the + stationary prevalence
agemax= Maximum age for calculation of the + stationary prevalence
bage= Minimum age for calculation of the health + expectancies
fage= Maximum age for calculation of the health + expectancies

+ +

Computing the observed prevalence

+ +

begin-prev-date=1/1/1984 end-prev-date=1/6/1988 estepm=1

+ +

+Statements 'begin-prev-date' and 'end-prev-date' allow to
+select the period in which we calculate the observed prevalences
+in each state. In this example, the prevalences are calculated on
+data survey collected between 1 january 1984 and 1 june 1988. 
+

+ +

begin-prev-date= Starting date + (day/month/year)
end-prev-date= Final date + (day/month/year)
estepm= Unit (in months).We compute the + life expectancy from trapezoids spaced every estepm + months. This is mainly to measure the difference between + two models: for example if stepm=24 months pijx are given + only every 2 years and by summing them we are calculating + an estimate of the Life Expectancy assuming a linear + progression inbetween and thus overestimating or + underestimating according to the curvature of the + survival function. If, for the same date, we estimate the + model with stepm=1 month, we can keep estepm to 24 months + to compare the new estimate of Life expectancy with the + same linear hypothesis. A more precise result, taking + into account a more precise curvature will be obtained if + estepm is as small as stepm.

+ +

Population- or status-based health +expectancies

+ +

pop_based=0

+ +

The program computes status-based health expectancies, i.e +health expectancies which depends on your initial health state. +If you are healthy your healthy life expectancy (e11) is higher +than if you were disabled (e21, with e11 > e21).
+To compute a healthy life expectancy independant of the initial +status we have to weight e11 and e21 according to the probability +to be in each state at initial age or, with other word, according +to the proportion of people in each state.
+We prefer computing a 'pure' period healthy life expectancy based +only on the transtion forces. Then the weights are simply the +stationnary prevalences or 'implied' prevalences at the initial +age.
+Some other people would like to use the cross-sectional +prevalences (the "Sullivan prevalences") observed at +the initial age during a period of time defined +just above.
+

+ +

popbased= 0 Health expectancies are + computed at each age from stationary prevalences + 'expected' at this initial age.
popbased= 1 Health expectancies are + computed at each age from cross-sectional 'observed' + prevalence at this initial age. As all the population is + not observed at the same exact date we define a short + period were the observed prevalence is computed.

+ +

Prevalence forecasting ( Experimental)

+ +

starting-proj-date=1/1/1989 final-proj-date=1/1/1992 mov_average=0

+ +

Prevalence and population projections are only available if +the interpolation unit is a month, i.e. stepm=1 and if there are +no covariate. The programme estimates the prevalence in each +state at a precise date expressed in day/month/year. The +programme computes one forecasted prevalence a year from a +starting date (1 january of 1989 in this example) to a final date +(1 january 1992). The statement mov_average allows to compute +smoothed forecasted prevalences with a five-age moving average +centered at the mid-age of the five-age period.
+

+ +

starting-proj-date= starting date + (day/month/year) of forecasting
final-proj-date= final date + (day/month/year) of forecasting
mov_average= smoothing with a five-age + moving average centered at the mid-age of the five-age + period. The command mov_average takes + value 1 if the prevalences are smoothed and 0 otherwise.

+ +

Last uncommented line : Population +forecasting

+ +

popforecast=0 popfile=pyram.txt popfiledate=1/1/1989 last-popfiledate=1/1/1992

+ +

This command is available if the interpolation unit is a +month, i.e. stepm=1 and if popforecast=1. From a data file +including age and number of persons alive at the precise date +‘popfiledate’, you can forecast the number of persons +in each state until date ‘last-popfiledate’. In this +example, the popfile pyram.txt +includes real data which are the Japanese population in 1989.
+

+ +

popforecast= + 0 Option for population forecasting. If + popforecast=1, the programme does the forecasting.
popfile= + name of the population file
popfiledate= + date of the population population
last-popfiledate= + date of the last population projection

+ +

Running Imach +with this example

+ +

We assume that you typed in your 1st_example
+parameter file as explained above. 
+
+To run the program you should either:
+

+ +

click on the imach.exe icon and enter the name of the + parameter file which is for example C:\usr\imach\mle\biaspar.imach +
You also can locate the biaspar.imach icon in C:\usr\imach\mle with your + mouse and drag it with the mouse on the imach window).
With latest version (0.7 and higher) if you setup windows + in order to understand ".imach" extension you + can right click the biaspar.imach icon and either edit + with notepad the parameter file or execute it with imach + or whatever.

+ +

The time to converge depends on the step unit that you used (1
+month is cpu consuming), on the number of cases, and on the
+number of variables.
+
+
+The program outputs many files. Most of them are files which
+will be plotted for better understanding.
+
+

+ +

Output of the program +and graphs

+ +

Once the optimization is finished, some graphics can be made +with a grapher. We use Gnuplot which is an interactive plotting +program copyrighted but freely distributed. A gnuplot reference +manual is available here.
+When the running is finished, the user should enter a caracter +for plotting and output editing.
+These caracters are:
+

+ +

'c' to start again the program from the beginning.
'e' opens the biaspar.htm + file to edit the output files and graphs.
'g' to graph again
'q' for exiting.

+ +

Results files
+
+- Observed prevalence in each state (and at first pass): +prbiaspar.txt
+

+ +

The first line is the title and displays each field of the +file. The first column is age. The fields 2 and 6 are the +proportion of individuals in states 1 and 2 respectively as +observed during the first exam. Others fields are the numbers of +people in states 1, 2 or more. The number of columns increases if +the number of states is higher than 2.
+The header of the file is

+ +

# Age Prev(1) N(1) N Age Prev(2) N(2) N
+70 1.00000 631 631 70 0.00000 0 631
+71 0.99681 625 627 71 0.00319 2 627 
+72 0.97125 1115 1148 72 0.02875 33 1148

+ +

It means that at age 70, the prevalence in state 1 is 1.000 +and in state 2 is 0.00 . At age 71 the number of individuals in +state 1 is 625 and in state 2 is 2, hence the total number of +people aged 71 is 625+2=627.
+

+ +

- Estimated parameters and +covariance matrix: rbiaspar.imach

+ +

This file contains all the maximisation results:

+ +

 -2 log likelihood= 21660.918613445392
+ Estimated parameters: a12 = -12.290174 b12 = 0.092161 
+                       a13 = -9.155590  b13 = 0.046627 
+                       a21 = -2.629849  b21 = -0.022030 
+                       a23 = -7.958519  b23 = 0.042614  
+ Covariance matrix: Var(a12) = 1.47453e-001
+                    Var(b12) = 2.18676e-005
+                    Var(a13) = 2.09715e-001
+                    Var(b13) = 3.28937e-005  
+                    Var(a21) = 9.19832e-001
+                    Var(b21) = 1.29229e-004
+                    Var(a23) = 4.48405e-001
+                    Var(b23) = 5.85631e-005 
+

+ +

By substitution of these parameters in the regression model, +we obtain the elementary transition probabilities:

+ +

- Transition probabilities: +pijrbiaspar.txt

+ +

Here are the transitions probabilities Pij(x, x+nh) where nh +is a multiple of 2 years. The first column is the starting age x +(from age 50 to 100), the second is age (x+nh) and the others are +the transition probabilities p11, p12, p13, p21, p22, p23. For +example, line 5 of the file is:

+ +

 100 106 0.02655 0.17622 0.79722 0.01809 0.13678 0.84513

+ +

and this means:

+ +

p11(100,106)=0.02655
+p12(100,106)=0.17622
+p13(100,106)=0.79722
+p21(100,106)=0.01809
+p22(100,106)=0.13678
+p22(100,106)=0.84513

+ +

- Stationary prevalence in each state: +plrbiaspar.txt

+ +

#Prevalence
+#Age 1-1 2-2
+
+#************ 
+70 0.90134 0.09866
+71 0.89177 0.10823 
+72 0.88139 0.11861 
+73 0.87015 0.12985

+ +

At age 70 the stationary prevalence is 0.90134 in state 1 and +0.09866 in state 2. This stationary prevalence differs from +observed prevalence. Here is the point. The observed prevalence +at age 70 results from the incidence of disability, incidence of +recovery and mortality which occurred in the past of the cohort. +Stationary prevalence results from a simulation with actual +incidences and mortality (estimated from this cross-longitudinal +survey). It is the best predictive value of the prevalence in the +future if "nothing changes in the future". This is +exactly what demographers do with a Life table. Life expectancy +is the expected mean time to survive if observed mortality rates +(incidence of mortality) "remains constant" in the +future.

+ +

- Standard deviation of +stationary prevalence: vplrbiaspar.txt

+ +

The stationary prevalence has to be compared with the observed +prevalence by age. But both are statistical estimates and +subjected to stochastic errors due to the size of the sample, the +design of the survey, and, for the stationary prevalence to the +model used and fitted. It is possible to compute the standard +deviation of the stationary prevalence at each age.

+ +

-Observed and stationary +prevalence in state (2=disable) with confidence interval: +vbiaspar21.gif

+ +

This graph exhibits the stationary prevalence in state (2) +with the confidence interval in red. The green curve is the +observed prevalence (or proportion of individuals in state (2)). +Without discussing the results (it is not the purpose here), we +observe that the green curve is rather below the stationary +prevalence. It suggests an increase of the disability prevalence +in the future.

+ +

-Convergence to the +stationary prevalence of disability: pbiaspar11.gif
+

+ +

This graph plots the conditional transition probabilities from +an initial state (1=healthy in red at the bottom, or 2=disable in +green on top) at age x to the final state 2=disable at +age x+h. Conditional means at the condition to be alive +at age x+h which is hP12x + hP22x. The +curves hP12x/(hP12x + hP22x) and hP22x/(hP12x ++ hP22x) converge with h, to the stationary +prevalence of disability. In order to get the stationary +prevalence at age 70 we should start the process at an earlier +age, i.e.50. If the disability state is defined by severe +disability criteria with only a few chance to recover, then the +incidence of recovery is low and the time to convergence is +probably longer. But we don't have experience yet.

+ +

- Life expectancies by age +and initial health status with standard deviation: erbiaspar.txt

+ +

# Health expectancies 
+# Age 1-1 (SE) 1-2 (SE) 2-1 (SE) 2-2 (SE)
+70 10.4171 (0.1517)    3.0433 (0.4733)    5.6641 (0.1121)    5.6907 (0.3366)
+71 9.9325 (0.1409)    3.0495 (0.4234)    5.2627 (0.1107)    5.6384 (0.3129)
+72 9.4603 (0.1319)    3.0540 (0.3770)    4.8810 (0.1099)    5.5811 (0.2907)
+73 9.0009 (0.1246)    3.0565 (0.3345)    4.5188 (0.1098)    5.5187 (0.2702)
+

+ +

For example 70 10.4171 (0.1517) 3.0433 (0.4733) 5.6641 (0.1121) 5.6907 (0.3366) means:
+e11=10.4171 e12=3.0433 e21=5.6641 e22=5.6907

+ +

For example, life expectancy of a healthy individual at age 70 +is 10.42 in the healthy state and 3.04 in the disability state +(=13.46 years). If he was disable at age 70, his life expectancy +will be shorter, 5.66 in the healthy state and 5.69 in the +disability state (=11.35 years). The total life expectancy is a +weighted mean of both, 13.46 and 11.35; weight is the proportion +of people disabled at age 70. In order to get a pure period index +(i.e. based only on incidences) we use the computed or +stationary prevalence at age 70 (i.e. computed from +incidences at earlier ages) instead of the observed prevalence +(for example at first exam) (see +below).

+ +

- Variances of life +expectancies by age and initial health status: vrbiaspar.txt

+ +

For example, the covariances of life expectancies Cov(ei,ej) +at age 50 are (line 3)

+ +

   Cov(e1,e1)=0.4776  Cov(e1,e2)=0.0488=Cov(e2,e1)  Cov(e2,e2)=0.0424

+ +

-Variances of one-step +probabilities : probrbiaspar.txt

+ +

For example, at age 65

+ +

   p11=9.960e-001 standard deviation of p11=2.359e-004

+ +

- Health +expectancies +with standard errors in parentheses: trbiaspar.txt

+ +

#Total LEs with variances: e.. (std) e.1 (std) e.2 (std)

+ +

70 13.26 (0.22) 9.95 (0.20) 3.30 (0.14)

+ +

Thus, at age 70 the total life expectancy, e..=13.26 years is +the weighted mean of e1.=13.46 and e2.=11.35 by the stationary +prevalence at age 70 which are 0.90134 in state 1 and 0.09866 in +state 2, respectively (the sum is equal to one). e.1=9.95 is the +Disability-free life expectancy at age 70 (it is again a weighted +mean of e11 and e21). e.2=3.30 is also the life expectancy at age +70 to be spent in the disability state.

+ +

-Total life expectancy by +age and health expectancies in states (1=healthy) and (2=disable): +ebiaspar1.gif

+ +

This figure represents the health expectancies and the total +life expectancy with the confident interval in dashed curve.

+ +

Standard deviations (obtained from the information matrix of +the model) of these quantities are very useful. +Cross-longitudinal surveys are costly and do not involve huge +samples, generally a few thousands; therefore it is very +important to have an idea of the standard deviation of our +estimates. It has been a big challenge to compute the Health +Expectancy standard deviations. Don't be confuse: life expectancy +is, as any expected value, the mean of a distribution; but here +we are not computing the standard deviation of the distribution, +but the standard deviation of the estimate of the mean.

+ +

Our health expectancies estimates vary according to the sample +size (and the standard deviations give confidence intervals of +the estimate) but also according to the model fitted. Let us +explain it in more details.

+ +

Choosing a model means ar least two kind of choices. First we +have to decide the number of disability states. Second we have to +design, within the logit model family, the model: variables, +covariables, confonding factors etc. to be included.

+ +

More disability states we have, better is our demographical +approach of the disability process, but smaller are the number of +transitions between each state and higher is the noise in the +measurement. We do not have enough experiments of the various +models to summarize the advantages and disadvantages, but it is +important to say that even if we had huge and unbiased samples, +the total life expectancy computed from a cross-longitudinal +survey, varies with the number of states. If we define only two +states, alive or dead, we find the usual life expectancy where it +is assumed that at each age, people are at the same risk to die. +If we are differentiating the alive state into healthy and +disable, and as the mortality from the disability state is higher +than the mortality from the healthy state, we are introducing +heterogeneity in the risk of dying. The total mortality at each +age is the weighted mean of the mortality in each state by the +prevalence in each state. Therefore if the proportion of people +at each age and in each state is different from the stationary +equilibrium, there is no reason to find the same total mortality +at a particular age. Life expectancy, even if it is a very useful +tool, has a very strong hypothesis of homogeneity of the +population. Our main purpose is not to measure differential +mortality but to measure the expected time in a healthy or +disability state in order to maximise the former and minimize the +latter. But the differential in mortality complexifies the +measurement.

+ +

Incidences of disability or recovery are not affected by the +number of states if these states are independant. But incidences +estimates are dependant on the specification of the model. More +covariates we added in the logit model better is the model, but +some covariates are not well measured, some are confounding +factors like in any statistical model. The procedure to "fit +the best model' is similar to logistic regression which itself is +similar to regression analysis. We haven't yet been sofar because +we also have a severe limitation which is the speed of the +convergence. On a Pentium III, 500 MHz, even the simplest model, +estimated by month on 8,000 people may take 4 hours to converge. +Also, the program is not yet a statistical package, which permits +a simple writing of the variables and the model to take into +account in the maximisation. The actual program allows only to +add simple variables like age+sex or age+sex+ age*sex but will +never be general enough. But what is to remember, is that +incidences or probability of change from one state to another is +affected by the variables specified into the model.

+ +

Also, the age range of the people interviewed has a link with +the age range of the life expectancy which can be estimated by +extrapolation. If your sample ranges from age 70 to 95, you can +clearly estimate a life expectancy at age 70 and trust your +confidence interval which is mostly based on your sample size, +but if you want to estimate the life expectancy at age 50, you +should rely in your model, but fitting a logistic model on a age +range of 70-95 and estimating probabilties of transition out of +this age range, say at age 50 is very dangerous. At least you +should remember that the confidence interval given by the +standard deviation of the health expectancies, are under the +strong assumption that your model is the 'true model', which is +probably not the case.

+ +

- Copy of the parameter +file: orbiaspar.txt

+ +

This copy of the parameter file can be useful to re-run the +program while saving the old output files.

+ +

- Prevalence forecasting: +frbiaspar.txt

+ +

First, +we have estimated the observed prevalence between 1/1/1984 and +1/6/1988. The mean date of interview (weighed average of the +interviews performed between1/1/1984 and 1/6/1988) is estimated +to be 13/9/1985, as written on the top on the file. Then we +forecast the probability to be in each state.

+ +

Example, +at date 1/1/1989 :

+ +

# StartingAge FinalAge P.1 P.2 P.3
+# Forecasting at date 1/1/1989
+  73 0.807 0.078 0.115

+ +

Since +the minimum age is 70 on the 13/9/1985, the youngest forecasted +age is 73. This means that at age a person aged 70 at 13/9/1989 +has a probability to enter state1 of 0.807 at age 73 on 1/1/1989. +Similarly, the probability to be in state 2 is 0.078 and the +probability to die is 0.115. Then, on the 1/1/1989, the +prevalence of disability at age 73 is estimated to be 0.088.

+ +

- Population forecasting: +poprbiaspar.txt

+ +

# Age P.1 P.2 P.3 [Population]
+# Forecasting at date 1/1/1989 
+75 572685.22 83798.08 
+74 621296.51 79767.99 
+73 645857.70 69320.60

+ +

# Forecasting at date 1/1/19909 
+76 442986.68 92721.14 120775.48
+75 487781.02 91367.97 121915.51
+74 512892.07 85003.47 117282.76

+ +

From the population file, we estimate the number of people in +each state. At age 73, 645857 persons are in state 1 and 69320 +are in state 2. One year latter, 512892 are still in state 1, +85003 are in state 2 and 117282 died before 1/1/1990.

+ +

Trying an example

+ +

Since you know how to run the program, it is time to test it +on your own computer. Try for example on a parameter file named imachpar.imach which is a copy +of mypar.imach included +in the subdirectory of imach, mytry. +Edit it to change the name of the data file to ..\data\mydata.txt if you don't want to +copy it on the same directory. The file mydata.txt +is a smaller file of 3,000 people but still with 4 waves.

+ +

Click on the imach.exe icon to open a window. Answer to the +question:'Enter the parameter file name:'

+ + + + + +

IMACH, Version 0.8a

Enter + the parameter file name: ..\mytry\imachpar.imach

+ +

Most of the data files or image files generated, will use the +'imachpar' string into their name. The running time is about 2-3 +minutes on a Pentium III. If the execution worked correctly, the +outputs files are created in the current directory, and should be +the same as the mypar files initially included in the directory mytry.

+ +

Output on the screen The output screen looks like this Log file
+#
+
+title=MLE datafile=..\data\mydata.txt lastobs=3000 firstpass=1 lastpass=3
+ftol=1.000000e-008 stepm=24 ncovcol=2 nlstate=2 ndeath=1 maxwav=4 mle=1 weight=0

Total number of individuals= 2965, Agemin = 70.00, Agemax= 100.92
+
+Warning, no any valid information for:126 line=126
+Warning, no any valid information for:2307 line=2307
+Delay (in months) between two waves Min=21 Max=51 Mean=24.495826
+These lines give some warnings on the data file and also some raw statistics on frequencies of transitions.
+Age 70 1.=230 loss[1]=3.5% 2.=16 loss[2]=12.5% 1.=222 prev[1]=94.1% 2.=14
+ prev[2]=5.9% 1-1=8 11=200 12=7 13=15 2-1=2 21=6 22=7 23=1
+Age 102 1.=0 loss[1]=NaNQ% 2.=0 loss[2]=NaNQ% 1.=0 prev[1]=NaNQ% 2.=0

+ +

Maximisation with the Powell algorithm. 8 directions are + given corresponding to the 8 parameters. this can be + rather long to get convergence.
+
+ Powell iter=1 -2*LL=11531.405658264877 1 0.000000000000 2 + 0.000000000000 3
+ 0.000000000000 4 0.000000000000 5 0.000000000000 6 + 0.000000000000 7
+ 0.000000000000 8 0.000000000000
+ 1..........2.................3..........4.................5.........
+ 6................7........8...............
+ Powell iter=23 -2*LL=6744.954108371555 1 -12.967632334283 +
+ 2 0.135136681033 3 -7.402109728262 4 0.067844593326
+ 5 -0.673601538129 6 -0.006615504377 7 -5.051341616718
+ 8 0.051272038506
+ 1..............2...........3..............4...........
+ 5..........6................7...........8.........
+ #Number of iterations = 23, -2 Log likelihood = + 6744.954042573691
+ # Parameters
+ 12 -12.966061 0.135117
+ 13 -7.401109 0.067831
+ 21 -0.672648 -0.006627
+ 23 -5.051297 0.051271
+

Calculation of the hessian matrix. Wait...
+12345678.12.13.14.15.16.17.18.23.24.25.26.27.28.34.35.36.37.38.45.46.47.48.56.57.58.67.68.78
+
+Inverting the hessian to get the covariance matrix. Wait...
+
+#Hessian matrix#
+3.344e+002 2.708e+004 -4.586e+001 -3.806e+003 -1.577e+000 -1.313e+002 3.914e-001 3.166e+001 
+2.708e+004 2.204e+006 -3.805e+003 -3.174e+005 -1.303e+002 -1.091e+004 2.967e+001 2.399e+003 
+-4.586e+001 -3.805e+003 4.044e+002 3.197e+004 2.431e-002 1.995e+000 1.783e-001 1.486e+001 
+-3.806e+003 -3.174e+005 3.197e+004 2.541e+006 2.436e+000 2.051e+002 1.483e+001 1.244e+003 
+-1.577e+000 -1.303e+002 2.431e-002 2.436e+000 1.093e+002 8.979e+003 -3.402e+001 -2.843e+003 
+-1.313e+002 -1.091e+004 1.995e+000 2.051e+002 8.979e+003 7.420e+005 -2.842e+003 -2.388e+005 
+3.914e-001 2.967e+001 1.783e-001 1.483e+001 -3.402e+001 -2.842e+003 1.494e+002 1.251e+004 
+3.166e+001 2.399e+003 1.486e+001 1.244e+003 -2.843e+003 -2.388e+005 1.251e+004 1.053e+006 
+# Scales
+12 1.00000e-004 1.00000e-006
+13 1.00000e-004 1.00000e-006
+21 1.00000e-003 1.00000e-005
+23 1.00000e-004 1.00000e-005
+# Covariance
+  1 5.90661e-001
+  2 -7.26732e-003 8.98810e-005
+  3 8.80177e-002 -1.12706e-003 5.15824e-001
+  4 -1.13082e-003 1.45267e-005 -6.50070e-003 8.23270e-005
+  5 9.31265e-003 -1.16106e-004 6.00210e-004 -8.04151e-006 1.75753e+000
+  6 -1.15664e-004 1.44850e-006 -7.79995e-006 1.04770e-007 -2.12929e-002 2.59422e-004
+  7 1.35103e-003 -1.75392e-005 -6.38237e-004 7.85424e-006 4.02601e-001 -4.86776e-003 1.32682e+000
+  8 -1.82421e-005 2.35811e-007 7.75503e-006 -9.58687e-008 -4.86589e-003 5.91641e-005 -1.57767e-002 1.88622e-004
+# agemin agemax for lifexpectancy, bage fage (if mle==0 ie no data nor Max likelihood).
+
+
+agemin=70 agemax=100 bage=50 fage=100
+Computing prevalence limit: result on file 'plrmypar.txt' 
+Computing pij: result on file 'pijrmypar.txt' 
+Computing Health Expectancies: result on file 'ermypar.txt' 
+Computing Variance-covariance of DFLEs: file 'vrmypar.txt' 
+Computing Total LEs with variances: file 'trmypar.txt' 
+Computing Variance-covariance of Prevalence limit: file 'vplrmypar.txt' 
+End of Imach
+

+ +

Once the running is finished, the program +requires a caracter:

+ + + + + +

Type e to edit output files, g + to graph again, c to start again, and q for exiting:

+ +

First you should enter e to +edit the master file mypar.htm.

+ +

Outputs files
+
+ - Copy of the parameter file: ormypar.txt
+ - Gnuplot file name: mypar.gp.txt
+ - Observed prevalence in each state: prmypar.txt
+ - Stationary prevalence in each state: plrmypar.txt
+ - Transition probabilities: pijrmypar.txt
+ - Life expectancies by age and initial health status + (estepm=24 months): ermypar.txt +
+ - Parameter file with estimated parameters and the + covariance matrix: rmypar.txt
+ - Variance of one-step probabilities: probrmypar.txt
+ - Variances of life expectancies by age and initial + health status (estepm=24 months): vrmypar.txt
+ - Health expectancies with their variances: trmypar.txt
+ - Standard deviation of stationary prevalences: vplrmypar.txt
+ No population forecast: popforecast = 0 (instead of 1) or + stepm = 24 (instead of 1) or model=. (instead of .)
+
+
Graphs
+
+ -One-step transition + probabilities
+ -Convergence to the + stationary prevalence
+ -Observed and stationary + prevalence in state (1) with the confident interval
+ -Observed and stationary + prevalence in state (2) with the confident interval
+ -Health life + expectancies by age and initial health state (1)
+ -Health life + expectancies by age and initial health state (2)
+ -Total life expectancy by + age and health expectancies in states (1) and (2).

+ +

This software have been partly granted by Euro-REVES, a concerted +action from the European Union. It will be copyrighted +identically to a GNU software product, i.e. program and software +can be distributed freely for non commercial use. Sources are not +widely distributed today. You can get them by asking us with a +simple justification (name, email, institute) mailto:brouard@ined.fr and mailto:lievre@ined.fr .

+ +

Latest version (0.8a of May 2002) can be accessed at http://euroreves.ined.fr/imach
+

+ +

Computing Health +Expectancies using IMaCh

(a Maximum +Likelihood Computer Program using Interpolation of Markov Chains)

INED and EUROREVES

Contribution to the mathematics: C. R. +Heathcote (Australian +National University, Canberra).

Contact: Agnès Lièvre (lievre@ined.fr)

Introduction

On what kind of data can +it be used?

The data file

Your first example parameter file

#Imach version 0.8a, May 2002, +INED-EUROREVES

First uncommented line

Second uncommented +line

Covariates

Guess values for optimization

Guess values for computing variances

Covariance matrix of parameters

Age range for calculation of stationary +prevalences and health expectancies

Computing the observed prevalence

Population- or status-based health +expectancies

Prevalence forecasting ( Experimental)

Last uncommented line : Population +forecasting

Running Imach +with this example

Output of the program +and graphs

Results files + +- Observed prevalence in each state (and at first pass): +prbiaspar.txt +

- Estimated parameters and +covariance matrix: rbiaspar.imach

- Transition probabilities: +pijrbiaspar.txt

- Stationary prevalence in each state: +plrbiaspar.txt

- Standard deviation of +stationary prevalence: vplrbiaspar.txt

-Observed and stationary +prevalence in state (2=disable) with confidence interval: +vbiaspar21.gif

-Convergence to the +stationary prevalence of disability: pbiaspar11.gif +

- Life expectancies by age +and initial health status with standard deviation: erbiaspar.txt

- Variances of life +expectancies by age and initial health status: vrbiaspar.txt

-Variances of one-step +probabilities : probrbiaspar.txt

- Health +expectancies +with standard errors in parentheses: trbiaspar.txt

-Total life expectancy by +age and health expectancies in states (1=healthy) and (2=disable): +ebiaspar1.gif

- Copy of the parameter +file: orbiaspar.txt

- Prevalence forecasting: +frbiaspar.txt

- Population forecasting: +poprbiaspar.txt

Trying an example

Results files
+
+- Observed prevalence in each state (and at first pass): +prbiaspar.txt
+

-Convergence to the +stationary prevalence of disability: pbiaspar11.gif
+