From d9cc8ef617766ab4a9b583ad9cc7acc2e2b8ebad Mon Sep 17 00:00:00 2001 From: =?utf8?q?Agn=C3=A8s=20Li=C3=A8vre?= Date: Fri, 10 Jun 2005 08:54:06 +0000 Subject: [PATCH] Documentation corrected by Carol --- html/doc/imach.htm | 2341 ++++++++++++++++++-------------------------- 1 file changed, 971 insertions(+), 1370 deletions(-) diff --git a/html/doc/imach.htm b/html/doc/imach.htm index 71dfce3..2b22678 100644 --- a/html/doc/imach.htm +++ b/html/doc/imach.htm @@ -1,632 +1,468 @@ - - - - - -Computing Health Expectancies using IMaCh - - - - -IMaCh - - - - -

- -

Computing Health -Expectancies using IMaCh

- -

(a Maximum -Likelihood Computer Program using Interpolation of Markov Chains)

- -

INED and EUROREVES

- -

Version -0.97, June 2004

- -

Authors of the -program: Nicolas Brouard, senior researcher at the Institut -National d'Etudes Démographiques (INED, Paris) in the "Mortality, -Health and Epidemiology" Research Unit

- -

and Agnès -Lièvre
-

- -

Contribution to the mathematics: C. R. -Heathcote (Australian -National University, Canberra).

- -

Contact: Agnès Lièvre (lievre@ined.fr)

- -

Introduction
On what kind of data can it be used?
The data file
The parameter file
Running Imach
Output files and graphs
Exemple

- -

Introduction

- -

This program computes Healthy Life Expectancies from cross-longitudinal -data using the methodology pioneered by Laditka and Wolf (1). -Within the family of Health Expectancies (HE), Disability-free -life expectancy (DFLE) is probably the most important index to -monitor. In low mortality countries, there is a fear that when -mortality declines, the increase in DFLE is not proportionate to -the increase in total Life expectancy. This case is called the Expansion -of morbidity. Most of the data collected today, in -particular by the international REVES -network on Health expectancy, and most HE indices based on these -data, are cross-sectional. It means that the information -collected comes from a single cross-sectional survey: people from -various ages (but mostly old people) are surveyed on their health -status at a single date. Proportion of people disabled at each -age, can then be measured at that date. This age-specific -prevalence curve is then used to distinguish, within the -stationary population (which, by definition, is the life table -estimated from the vital statistics on mortality at the same -date), the disable population from the disability-free -population. Life expectancy (LE) (or total population divided by -the yearly number of births or deaths of this stationary -population) is then decomposed into DFLE and DLE. This method of -computing HE is usually called the Sullivan method (from the name -of the author who first described it).

- -

Age-specific proportions of people disabled (prevalence of -disability) are dependent on the historical flows from entering -disability and recovering in the past until today. The age-specific -forces (or incidence rates), estimated over a recent period of time -(like for period forces of mortality), of entering disability or -recovering a good health, are reflecting current conditions and -therefore can be used at each age to forecast the future of this -cohortif nothing changes in the future, i.e to forecast the -prevalence of disability of each cohort. Our finding (2) is that the period -prevalence of disability (computed from period incidences) is lower -than the cross-sectional prevalence. For example if a country is -improving its technology of prosthesis, the incidence of recovering -the ability to walk will be higher at each (old) age, but the -prevalence of disability will only slightly reflect an improve because -the prevalence is mostly affected by the history of the cohort and not -by recent period effects. To measure the period improvement we have to -simulate the future of a cohort of new-borns entering or leaving at -each age the disability state or dying according to the incidence -rates measured today on different cohorts. The proportion of people -disabled at each age in this simulated cohort will be much lower that -the proportions observed at each age in a cross-sectional survey. This -new prevalence curve introduced in a life table will give a more -realistic HE level than the Sullivan method which mostly measured the -History of health conditions in this country.

- -

Therefore, the main question is how to measure incidence rates -from cross-longitudinal surveys? This is the goal of the IMaCH -program. From your data and using IMaCH you can estimate period -HE and not only Sullivan's HE. Also the standard errors of the HE -are computed.

- -

A cross-longitudinal survey consists in a first survey -("cross") where individuals from different ages are -interviewed on their health status or degree of disability. At -least a second wave of interviews ("longitudinal") -should measure each new individual health status. Health -expectancies are computed from the transitions observed between -waves and are computed for each degree of severity of disability -(number of life states). More degrees you consider, more time is -necessary to reach the Maximum Likelihood of the parameters -involved in the model. Considering only two states of disability -(disable and healthy) is generally enough but the computer -program works also with more health statuses.
-
-The simplest model is the multinomial logistic model where pij -is the probability to be observed in state j at the second -wave conditional to be observed in state i at the first -wave. Therefore a simple model is: log(pij/pii)= aij + -bij*age+ cij*sex, where 'age' is age and 'sex' -is a covariate. The advantage that this computer program claims, -comes from that if the delay between waves is not identical for -each individual, or if some individual missed an interview, the -information is not rounded or lost, but taken into account using -an interpolation or extrapolation. hPijx is the -probability to be observed in state i at age x+h -conditional to the observed state i at age x. The -delay 'h' can be split into an exact number (nh*stepm) -of unobserved intermediate states. This elementary transition (by -month or quarter trimester, semester or year) is modeled as a -multinomial logistic. The hPx matrix is simply the matrix -product of nh*stepm elementary matrices and the -contribution of each individual to the likelihood is simply hPijx. -
-

- -

The program presented in this manual is a quite general -program named IMaCh (for Interpolated -MArkov CHain), designed to -analyse transition data from longitudinal surveys. The first step -is the parameters estimation of a transition probabilities model -between an initial status and a final status. From there, the -computer program produces some indicators such as observed and -stationary prevalence, life expectancies and their variances and -graphs. Our transition model consists in absorbing and -non-absorbing states with the possibility of return across the -non-absorbing states. The main advantage of this package, -compared to other programs for the analysis of transition data -(For example: Proc Catmod of SAS^®) is that the whole -individual information is used even if an interview is missing, a -status or a date is unknown or when the delay between waves is -not identical for each individual. The program can be executed -according to parameters: selection of a sub-sample, number of -absorbing and non-absorbing states, number of waves taken in -account (the user inputs the first and the last interview), a -tolerance level for the maximization function, the periodicity of -the transitions (we can compute annual, quarterly or monthly -transitions), covariates in the model. It works on Windows or on -Unix.
-

- -

(1) Laditka, Sarah B. and Wolf, Douglas A. (1998), "New -Methods for Analyzing Active Life Expectancy". Journal of -Aging and Health. Vol 10, No. 2.

(2) Lièvre A., Brouard N. and Heathcote Ch. (2003) Estimating Health Expectancies -from Cross-longitudinal surveys. Mathematical Population Studies.- 10(4), pp. 211-248 - -

- -

On what kind of data can -it be used?

- -

The minimum data required for a transition model is the -recording of a set of individuals interviewed at a first date and -interviewed again at least one another time. From the -observations of an individual, we obtain a follow-up over time of -the occurrence of a specific event. In this documentation, the -event is related to health status at older ages, but the program -can be applied on a lot of longitudinal studies in different -contexts. To build the data file explained into the next section, -you must have the month and year of each interview and the -corresponding health status. But in order to get age, date of -birth (month and year) is required (missing values is allowed for -month). Date of death (month and year) is an important -information also required if the individual is dead. Shorter -steps (i.e. a month) will more closely take into account the -survival time after the last interview.

- -

The data file

- -

In this example, 8,000 people have been interviewed in a -cross-longitudinal survey of 4 waves (1984, 1986, 1988, 1990). Some -people missed 1, 2 or 3 interviews. Health statuses are healthy (1) -and disable (2). The survey is not a real one. It is a simulation of -the American Longitudinal Survey on Aging. The disability state is -defined if the individual missed one of four ADL (Activity of daily -living, like bathing, eating, walking). Therefore, even if the -individuals interviewed in the sample are virtual, the information -brought with this sample is close to the situation of the United -States. Sex is not recorded is this sample. The LSOA survey is biased -in the sense that people living in an institution were not surveyed at -first pass in 1984. Thus the prevalence of disability in 1984 is -biased downwards at old ages. But when people left their household to -an institution, they have been surveyed in their institution in 1986, -1988 or 1990. Thus incidences are not biased. But cross-sectional -prevalences of disability at old ages are thus artificially increasing -in 1986, 1988 and 1990 because of a higher weight of people -institutionalized in the sample. Our article shows the -opposite: the period prevalence is lower at old ages than the -adjusted cross-sectional prevalence proving important current progress -against disability.

- -

Each line of the data set (named data1.txt -in this first example) is an individual record. Fields are separated -by blanks:

- -

Index number: positive number (field 1)
First covariate positive number (field 2)
Second covariate positive number (field 3)
Weight: positive number - (field 4) . In most surveys individuals are weighted - according to the stratification of the sample.
Date of birth: coded as mm/yyyy. Missing dates are - coded as 99/9999 (field 5)
Date of death: coded as mm/yyyy. Missing dates are - coded as 99/9999 (field 6)
Date of first interview: coded as mm/yyyy. Missing - dates are coded as 99/9999 (field 7)
Status at first interview: positive number. - Missing values ar coded -1. (field 8)
Date of second interview: coded as mm/yyyy. - Missing dates are coded as 99/9999 (field 9)
Status at second interview positive - number. Missing values ar coded -1. (field 10)
Date of third interview: coded as mm/yyyy. Missing - dates are coded as 99/9999 (field 11)
Status at third interview positive - number. Missing values ar coded -1. (field 12)
Date of fourth interview: coded as mm/yyyy. - Missing dates are coded as 99/9999 (field 13)
Status at fourth interview positive - number. Missing values are coded -1. (field 14)
etc

- -

If your longitudinal survey does not include information about -weights or covariates, you must fill the column with a number -(e.g. 1) because a missing field is not allowed.

- -

Your first example parameter file

- -

#Imach version 0.97b, June 2004, -INED-EUROREVES

- -

This first line was a comment. Comments line start with a '#'.

- -

First uncommented line

- -

title=1st_example datafile=data1.txt lastobs=8600 firstpass=1 lastpass=4

- -

title= 1st_example is title of the run.
datafile= data1.txt is the name of the data set. - Our example is a six years follow-up survey. It consists - in a baseline followed by 3 reinterviews.
lastobs= 8600 the program is able to run on a - subsample where the last observation number is lastobs. - It can be set a bigger number than the real number of - observations (e.g. 100000). In this example, maximisation - will be done on the 8600 first records.
firstpass=1 , lastpass=4 In case of more - than two interviews in the survey, the program can be run - on selected transitions periods. firstpass=1 means the - first interview included in the calculation is the - baseline survey. lastpass=4 means that the information - brought by the 4th interview is taken into account.

- -

Second uncommented -line

- -

ftol=1.e-08 stepm=1 ncovcol=2 nlstate=2 ndeath=1 maxwav=4 mle=1 weight=0

- -

ftol=1e-8 Convergence tolerance on the function - value in the maximisation of the likelihood. Choosing a - correct value for ftol is difficult. 1e-8 is a correct - value for a 32 bits computer.
stepm=1 Time unit in months for interpolation. - Examples:
- If stepm=1, the unit is a month
- If stepm=4, the unit is a trimester
- If stepm=12, the unit is a year
- If stepm=24, the unit is two years
- ...
-
ncovcol=2 Number of covariate columns included in the - datafile before the column of the date of birth. You can have -covariates that won't necessary be used during the - run. It is not the number of covariates that will be - specified by the model. The 'model' syntax describes the - covariates to be taken into account during the run.
nlstate=2 Number of non-absorbing (alive) states. - Here we have two alive states: disability-free is coded 1 - and disability is coded 2.
ndeath=1 Number of absorbing states. The absorbing - state death is coded 3.
maxwav=4 Number of waves in the datafile.
mle=1 Option for the - Maximisation Likelihood Estimation.
- If mle=1 the program does the maximisation and - the calculation of health expectancies
- If mle=0 the program only does the calculation of - the health expectancies and other indices and graphs -but without the maximization..
-
weight=0 Possibility to add weights.
- If weight=0 no weights are included
- If weight=1 the maximisation integrates the - weights which are in field 4
-

- -

Covariates

- -

Intercept and age are systematically included in the model. -Additional covariates can be included with the command:

- -

model=list of covariates

- -

if model=. then no covariates are - included
if model=V1 the model includes the first - covariate (field 2)
if model=V2 the model includes the - second covariate (field 3)
if model=V1+V2 the model includes the - first and the second covariate (fields 2 and 3)
if model=V1*V2 the model includes the - product of the first and the second covariate (fields 2 - and 3)
if model=V1+V1*age the model includes - the product covariate*age

- -

In this example, we have two covariates in the data file -(fields 2 and 3). The number of covariates included in the data -file between the id and the date of birth is ncovcol=2 (it was -named ncov in version prior to 0.8). If you have 3 covariates in -the datafile (fields 2, 3 and 4), you will set ncovcol=3. Then -you can run the programme with a new parametrisation taking into -account the third covariate. For example, model=V1+V3 estimates -a model with the first and third covariates. More complicated -models can be used, but it will takes more time to converge. With -a simple model (no covariates), the programme estimates 8 -parameters. Adding covariates increases the number of parameters -: 12 for model=V1, 16 for model=V1+V1*age -and 20 for model=V1+V2+V3.

- -

Guess values for optimization

- -

You must write the initial guess values of the parameters for -optimization. The number of parameters, N depends on the -number of absorbing states and non-absorbing states and on the -number of covariates.
-N is given by the formula N=(nlstate + -ndeath-1)*nlstate*ncovmodel .
-
-Thus in the simple case with 2 covariates (the model is log -(pij/pii) = aij + bij * age where intercept and age are the two -covariates), and 2 health degrees (1 for disability-free and 2 -for disability) and 1 absorbing state (3), you must enter 8 -initials values, a12, b12, a13, b13, a21, b21, a23, b23. You can -start with zeros as in this example, but if you have a more -precise set (for example from an earlier run) you can enter it -and it will speed up them
-Each of the four lines starts with indices "ij": ij -aij bij

- -

# Guess values of aij and bij in log (pij/pii) = aij + bij * age + + +Computing Health Expectancies using IMaCh + + + +

+ +

Computing Health Expectancies using +IMaCh

(a Maximum Likelihood Computer +Program using Interpolation of Markov Chains)

INED and EUROREVES

Version 0.97, June +2004

+ +

Authors of the program: +Nicolas Brouard, senior researcher at the Institut National +d'Etudes Démographiques (INED, +Paris) in the "Mortality, Health and Epidemiology" Research Unit +

and Agnès Lièvre

Contribution to the mathematics: C. R. Heathcote +(Australian National University, +Canberra).

Contact: Agnès Lièvre (lievre@ined.fr)

+ +

Introduction

This program computes Healthy Life Expectancies from +cross-longitudinal data using the methodology pioneered by Laditka and +Wolf (1). Within the family of Health Expectancies (HE), disability-free life +expectancy (DFLE) is probably the most important index to monitor. In low +mortality countries, there is a fear that when mortality declines (and therefore total life expectancy improves), the increase will not be as great, leading to an Expansion of morbidity. Most of the data collected today, +in particular by the international REVES +network on Health Expectancy and the disability process, and most HE indices based on these data, are +cross-sectional. This means that the information collected comes from a +single cross-sectional survey: people from a variety of ages (but often old people) +are surveyed on their health status at a single date. The proportion of people +disabled at each age can then be estimated at that date. This age-specific +prevalence curve is used to distinguish, within the stationary population +(which, by definition, is the life table estimated from the vital statistics on +mortality at the same date), the disabled population from the disability-free +population. Life expectancy (LE) (or total population divided by the yearly +number of births or deaths of this stationary population) is then decomposed +into disability-free life expectancy (DFLE) and disability life +expectancy (DLE). This method of computing HE is usually called the Sullivan +method (after the author who first described it).

The age-specific proportions of people disabled (prevalence of disability) are +dependent upon the historical flows from entering disability and recovering in the past. The age-specific forces (or incidence rates) of entering +disability or recovering a good health, estimated over a recent period of time (as period forces of mortality), are reflecting current conditions and +therefore can be used at each age to forecast the future of this cohort if +nothing changes in the future, i.e to forecast the prevalence of disability of each cohort. Our finding (2) is that the period prevalence of disability +(computed from period incidences) is lower than the cross-sectional prevalence. +For example if a country is improving its technology of prosthesis, the +incidence of recovering the ability to walk will be higher at each (old) age, +but the prevalence of disability will only slightly reflect an improvement because +the prevalence is mostly affected by the history of the cohort and not by recent +period effects. To measure the period improvement we have to simulate the future +of a cohort of new-borns entering or leaving the disability state or +dying at each age according to the incidence rates measured today on different cohorts. The +proportion of people disabled at each age in this simulated cohort will be much +lower that the proportions observed at each age in a cross-sectional survey. +This new prevalence curve introduced in a life table will give a more realistic +HE level than the Sullivan method which mostly reflects the history of health +conditions in a country.

Therefore, the main question is how can we measure incidence rates from +cross-longitudinal surveys? This is the goal of the IMaCH program. From your +data and using IMaCH you can estimate period HE as well as the Sullivan HE. In addition the standard errors of the HE are computed.

A cross-longitudinal survey consists of a first survey ("cross") where +individuals of different ages are interviewed about their health status or degree +of disability. At least a second wave of interviews ("longitudinal") should +measure each individual new health status. Health expectancies are computed from +the transitions observed between waves (interviews) and are computed for each degree of +severity of disability (number of health states). The more degrees of severity considered, the more +time is necessary to reach the Maximum Likelihood of the parameters involved in +the model. Considering only two states of disability (disabled and healthy) is +generally enough but the computer program works also with more health +states.

The simplest model for the transition probabilities is the multinomial logistic model where +pij is the probability to be observed in state j at the second +wave conditional to be observed in state i at the first wave. Therefore +a simple model is: log(pij/pii)= aij + bij*age+ cij*sex, where +'age' is age and 'sex' is a covariate. The advantage that this +computer program claims, is that if the delay between waves is not +identical for each individual, or if some individual missed an interview, the +information is not rounded or lost, but taken into account using an +interpolation or extrapolation. hPijx is the probability to be observed +in state i at age x+h conditional on the observed state i +at age x. The delay 'h' can be split into an exact number +(nh*stepm) of unobserved intermediate states. This elementary transition +(by month or quarter, trimester, semester or year) is modeled as the above multinomial +logistic. The hPx matrix is simply the matrix product of nh*stepm +elementary matrices and the contribution of each individual to the likelihood is +simply hPijx.

The program presented in this manual is a general program named +IMaCh (for Interpolated +MArkov CHain), designed to analyse transitions from longitudinal surveys. The first step is the estimation of the set of the parameters of a model for the +transition probabilities between an initial state and a final state. +From there, the computer program produces indicators such as the observed and +stationary prevalence, life expectancies and their variances both numerically and graphically. Our +transition model consists of absorbing and non-absorbing states assuming the +possibility of return across the non-absorbing states. The main advantage of +this package, compared to other programs for the analysis of transition data +(for example: Proc Catmod of SAS^®) is that the whole individual +information is used even if an interview is missing, a state or a date is +unknown or when the delay between waves is not identical for each individual. +The program is dependent upon a set of parameters inputted by the user: selection of a sub-sample, +number of absorbing and non-absorbing states, number of waves to be taken in account , a tolerance level for the +maximization function, the periodicity of the transitions (we can compute +annual, quarterly or monthly transitions), covariates in the model. IMaCh works on +Windows or on Unix platform.

+ +

(1) Laditka S. B. and Wolf, D. (1998), New Methods for Analyzing +Active Life Expectancy. Journal of Aging and Health. Vol 10, No. 2.

(2) Lièvre +A., Brouard N. and Heathcote Ch. (2003) Estimating Health Expectancies from +Cross-longitudinal surveys. Mathematical Population Studies.- 10(4), +pp. 211-248 +

+ +

What kind of data is required?

The minimum data required for a transition model is the recording of a set of +individuals interviewed at a first date and interviewed once more. From the observations of an individual, we obtain a follow-up over +time of the occurrence of a specific event. In this documentation, the event is +related to health state, but the program can be applied to many +longitudinal studies with different contexts. To build the data file +as explained +in the next section, you must have the month and year of each interview and +the corresponding health state. In order to get age, date of birth (month +and year) are required (missing values are allowed for month). Date of death +(month and year) is an important information also required if the individual is +dead. Shorter steps (i.e. a month) will more closely take into account the +survival time after the last interview.

+ +

The data file

In this example, 8,000 people have been interviewed in a cross-longitudinal +survey of 4 waves (1984, 1986, 1988, 1990). Some people missed 1, 2 or 3 +interviews. Health states are healthy (1) and disabled (2). The survey is not a +real one but a simulation of the American Longitudinal Survey on Aging. The +disability state is defined as dependence in at least one of four ADLs (Activities +of daily living, like bathing, eating, walking). Therefore, even if the +individuals interviewed in the sample are virtual, the information in +this sample is close to reality for the United States. Sex is not recorded +is this sample. The LSOA survey is biased in the sense that people +living in an institution were not included in the first interview in +1984. Thus the prevalence of disability observed in 1984 is lower than +the true prevalence at old ages. However when people moved into an +institution, they were interviewed there in 1986, 1988 and 1990. Thus +the incidences of disabilities are not biased. Cross-sectional +prevalences of disability at old ages are thus artificially increasing in 1986, +1988 and 1990 because of a greater proportion of the sample +institutionalized. Our article (Lièvre A., Brouard N. and Heathcote +Ch. (2003)) shows the opposite: the period prevalence based on the +incidences is lower at old +ages than the adjusted cross-sectional prevalence illustrating that +there has been significant progress against disability.

Each line of the data set (named data1.txt in this first +example) is an individual record. Fields are separated by blanks:

Index number: positive number (field 1) +
First covariate positive number (field 2) +
Second covariate positive number (field 3) +
Weight: positive number (field 4) . In most + surveys individuals are weighted to account for stratification of the + sample. +
Date of birth: coded as mm/yyyy. Missing dates are coded as 99/9999 + (field 5) +
Date of death: coded as mm/yyyy. Missing dates are coded as 99/9999 + (field 6) +
Date of first interview: coded as mm/yyyy. Missing dates are coded + as 99/9999 (field 7) +
Status at first interview: positive number. Missing values ar coded + -1. (field 8) +
Date of second interview: coded as mm/yyyy. Missing dates are coded + as 99/9999 (field 9) +
Status at second interview positive number. Missing + values ar coded -1. (field 10) +
Date of third interview: coded as mm/yyyy. Missing dates are coded + as 99/9999 (field 11) +
Status at third interview positive number. Missing values + ar coded -1. (field 12) +
Date of fourth interview: coded as mm/yyyy. Missing dates are coded + as 99/9999 (field 13) +
Status at fourth interview positive number. Missing + values are coded -1. (field 14) +
etc

If you do not wish to include information on weights or +covariates, you must fill the column with a number (e.g. 1) since all +fields must be present.

+ +

Your first example parameter file

#Imach version 0.97b, June 2004, INED-EUROREVES

This first line was a comment. Comments line start with a '#'.

First uncommented line

title=1st_example datafile=data1.txt lastobs=8600 firstpass=1 lastpass=4

title= 1st_example is title of the run. +
datafile= data1.txt is the name of the data set. Our example is a + six years follow-up survey. It consists of a baseline followed by 3 + reinterviews. +
lastobs= 8600 the program is able to run on a subsample where the + last observation number is lastobs. It can be set a bigger number than the + real number of observations (e.g. 100000). In this example, maximisation will + be done on the first 8600 records. +
firstpass=1 , lastpass=4 If there are more than two interviews + in the survey, the program can be run on selected transitions periods. + firstpass=1 means the first interview included in the calculation is the + baseline survey. lastpass=4 means that the last interview to be + included will be by the 4th.

Second uncommented +line

ftol=1.e-08 stepm=1 ncovcol=2 nlstate=2 ndeath=1 maxwav=4 mle=1 weight=0

ftol=1e-8 Convergence tolerance on the function value in the + maximisation of the likelihood. Choosing a correct value for ftol is + difficult. 1e-8 is the correct value for a 32 bit computer. +
stepm=1 The time unit in months for interpolation. Examples: +
- If stepm=1, the unit is a month +
- If stepm=4, the unit is a trimester +
- If stepm=12, the unit is a year +
- If stepm=24, the unit is two years +
- ...
+
ncovcol=2 Number of covariate columns included in the datafile + before the column for the date of birth. You can include covariates + that will not be used in the model as this number is not the number of covariates that will + be specified by the model. The 'model' syntax describes the covariates to be + taken into account during the run. +
nlstate=2 Number of non-absorbing (alive) states. Here we have two + alive states: disability-free is coded 1 and disability is coded 2. +
ndeath=1 Number of absorbing states. The absorbing state death is + coded 3. +
maxwav=4 Number of waves in the datafile. +
mle=1 Option for the Maximisation Likelihood + Estimation. +
- If mle=1 the program does the maximisation and the calculation of health + expectancies +
- If mle=0 the program only does the calculation of the health + expectancies and other indices and graphs but without the maximization. + There are also other possible values: +
  - If mle=-1 you get a template for the number of parameters + and the size of the variance-covariance matrix. This is useful if the model is + complex with many covariates. +
  - If mle=-3 IMaCh computes the mortality but without any health status + (May 2004) +
  - If mle=2 IMach likelihood corresponds to a linear interpolation +
  - If mle=3 IMach likelihood corresponds to an exponential + inter-extrapolation +
  - If mle=4 IMach likelihood corresponds to no inter-extrapolation, thus biasing the results. +
  - If mle=5 IMach likelihood corresponds to no inter-extrapolation, and + before the correction of the Jackson's bug (avoid this).
+
weight=0 Provides the possibility of adding weights. +
- If weight=0 no weights are included +
- If weight=1 the maximisation integrates the weights which are in field + 4 +

Covariates

Intercept and age are automatically included in the model. Additional +covariates can be included with the command:

model=list of covariates

if model=. then no covariates are included +
if model=V1 the model includes the first covariate (field + 2) +
if model=V2 the model includes the second covariate + (field 3) +
if model=V1+V2 the model includes the first and the + second covariate (fields 2 and 3) +
if model=V1*V2 the model includes the product of the + first and the second covariate (fields 2 and 3) +
if model=V1+V1*age the model includes the product + covariate*age

In this example, we have two covariates in the data file (fields 2 and 3). +The number of covariates included in the data file between the id and the date +of birth is ncovcol=2 (it was named ncov in version prior to 0.8). If you have 3 +covariates in the datafile (fields 2, 3 and 4), you will set ncovcol=3. Then you +can run the programme with a new parametrisation taking into account the third +covariate. For example, model=V1+V3 estimates a model with the +first and third covariates. More complicated models can be used, but this will +take more time to converge. With a simple model (no covariates), the programme +estimates 8 parameters. Adding covariates increases the number of parameters : +12 for model=V1, 16 for model=V1+V1*age and 20 +for model=V1+V2+V3.

Guess values for optimization +

You must write the initial guess values of the parameters for optimization. +The number of parameters, N depends on the number of absorbing states +and non-absorbing states and on the number of covariates in the model (ncovmodel).
N is +given by the formula N=(nlstate + +ndeath-1)*nlstate*ncovmodel .

Thus in +the simple case with 2 covariates in the model(the model is log (pij/pii) = aij + bij * age +where intercept and age are the two covariates), and 2 health states (1 for +disability-free and 2 for disability) and 1 absorbing state (3), you must enter +8 initials values, a12, b12, a13, b13, a21, b21, a23, b23. You can start with +zeros as in this example, but if you have a more precise set (for example from +an earlier run) you can enter it and it will speed up the convergence
Each of the four +lines starts with indices "ij": ij aij bij

# Guess values of aij and bij in log (pij/pii) = aij + bij * age
 12 -14.155633  0.110794 
 13  -7.925360  0.032091 
 21  -1.890135 -0.029473 
-23  -6.234642  0.022315

- -

or, to simplify (in most of cases it converges but there is no -warranty!):

- -

12 0.0 0.0
+23  -6.234642  0.022315

or, to simplify (in most of cases it converges but there is no warranty!): +

12 0.0 0.0
 13 0.0 0.0
 21 0.0 0.0
-23 0.0 0.0

- -

In order to speed up the convergence you can make a first run -with a large stepm i.e stepm=12 or 24 and then decrease the stepm -until stepm=1 month. If newstepm is the new shorter stepm and -stepm can be expressed as a multiple of newstepm, like newstepm=n -stepm, then the following approximation holds:

- -

aij(stepm) = aij(n . stepm) - ln(n)
-

- -

and

- -

bij(stepm) = bij(n . stepm) .

- -

For example if you already ran for a 6 months interval and -got:
-

- -

# Parameters
+23 0.0 0.0

In order to speed up the convergence you can make a first run with a large +stepm i.e stepm=12 or 24 and then decrease the stepm until stepm=1 month. If +newstepm is the new shorter stepm and stepm can be expressed as a multiple of +newstepm, like newstepm=n stepm, then the following approximation holds:

aij(stepm) = aij(n . stepm) - ln(n)
+

and

bij(stepm) = bij(n . stepm) .

For example if you already ran with stepm=6 (a 6 months interval) and got:

# Parameters
 12 -13.390179  0.126133 
 13  -7.493460  0.048069 
 21   0.575975 -0.041322 
 23  -4.748678  0.030626 
-

- -

If you now want to get the monthly estimates, you can guess -the aij by substracting ln(6)= 1,7917
-and running
-

- -

12 -15.18193847  0.126133 
+

Then you now want to get the monthly estimates, you can guess the aij by +subtracting ln(6)= 1.7917
and running using

12 -15.18193847  0.126133 
 13 -9.285219469  0.048069
 21 -1.215784469 -0.041322
 23 -6.540437469  0.030626
-

- -

and get
-

- -

12 -15.029768 0.124347 
+

and get

12 -15.029768 0.124347 13 -8.472981 0.036599 21 -1.472527 -0.038394 23 -6.553602 0.029856 -which is closer to the results. The approximation is probably useful +

which is closer to the results. The approximation is probably useful only for very small intervals and we don't have enough experience to -know if you will speed up the convergence or not. -

- -

         -ln(12)= -2.484
+know if you will speed up the convergence or not.

+

         -ln(12)= -2.484
  -ln(6/1)=-ln(6)= -1.791
  -ln(3/1)=-ln(3)= -1.0986
 -ln(12/6)=-ln(2)= -0.693
-

- -In version 0.9 and higher you can still have valuable results even if -your stepm parameter is bigger than a month. The idea is to run with -bigger stepm in order to have a quicker convergence at the price of a -small bias. Once you know which model you want to fit, you can put -stepm=1 and wait hours or days to get the convergence! - -To get unbiased results even with large stepm we introduce the idea of -pseudo likelihood by interpolating two exact likelihoods. Let us -detail this: -

-If the interval of d months between two waves is not a -mutliple of 'stepm', but is comprised between (n-1) stepm and -n stepm then both exact likelihoods are computed (the -contribution to the likelihood at n stepm requires one matrix -product more) (let us remember that we are modelling the probability -to be observed in a particular state after d months being -observed at a particular state at 0). The distance, (bh in -the program), from the month of interview to the rounded date of n -stepm is computed. It can be negative (interview occurs before -n stepm) or positive if the interview occurs after n -stepm (and before (n+1)stepm). -
-Then the final contribution to the total likelihood is a weighted -average of these two exact likelihoods at n stepm (out) and -at (n-1)stepm(savm). We did not want to compute the third -likelihood at (n+1)stepm because it is too costly in time, so -we used an extrapolation if bh is positive.
Formula of -inter/extrapolation may vary according to the value of parameter mle: -

-mle=1	  lli= log((1.+bbh)*out[s1][s2]- bbh*savm[s1][s2]); /* linear interpolation */
-
-mle=2	lli= (savm[s1][s2]>(double)1.e-8 ? \
+

In version 0.9 and higher you can still have valuable results even if your +stepm parameter is bigger than a month. The idea is to run with bigger stepm in +order to have a quicker convergence at the price of a small bias. Once you know +which model you want to fit, you can put stepm=1 and wait hours or days to get +the convergence! To get unbiased results even with large stepm we introduce the +idea of pseudo likelihood by interpolating two exact likelihoods. In +more detail: +

If the interval of d months between two waves is not a multiple of +'stepm', but is between (n-1) stepm and n stepm then +both exact likelihoods are computed (the contribution to the likelihood at n +stepm requires one matrix product more) (let us remember that we are +modelling the probability to be observed in a particular state after d +months being observed at a particular state at 0). The distance, (bh in +the program), from the month of interview to the rounded date of n +stepm is computed. It can be negative (interview occurs before n +stepm) or positive if the interview occurs after n stepm (and +before (n+1)stepm).
Then the final contribution to the total +likelihood is a weighted average of these two exact likelihoods at n +stepm (out) and at (n-1)stepm(savm). We did not want to compute +the third likelihood at (n+1)stepm because it is too costly in time, so +we used an extrapolation if bh is positive.
The formula +for the inter/extrapolation may vary according to the value of parameter mle:

mle=1	  lli= log((1.+bbh)*out[s1][s2]- bbh*savm[s1][s2]); /* linear interpolation */
+ 
+mle=2	lli= (savm[s1][s2]>(double)1.e-8 ? \
           log((1.+bbh)*out[s1][s2]- bbh*(savm[s1][s2])): \
           log((1.+bbh)*out[s1][s2])); /* linear interpolation */
-mle=3	lli= (savm[s1][s2]>1.e-8 ? \
+mle=3	lli= (savm[s1][s2]>1.e-8 ? \
           (1.+bbh)*log(out[s1][s2])- bbh*log(savm[s1][s2]): \
           log((1.+bbh)*out[s1][s2])); /* exponential inter-extrapolation */
 
 mle=4   lli=log(out[s[mw[mi][i]][i]][s[mw[mi+1][i]][i]]); /* No interpolation  */
         no need to save previous likelihood into memory.
-

-If the death occurs between first and second pass, and for example -more precisely between n stepm and (n+1)stepm the -contribution of this people to the likelihood is simply the difference -between the probability of dying before n stepm and the -probability of dying before (n+1)stepm. There was a bug in -version 0.8 and death was treated as any other state, i.e. as if it -was an observed death at second pass. This was not precise but -correct, but when information on the precise month of death came -(death occuring prior to second pass) we did not change the likelihood -accordingly. Thanks to Chris Jackson for correcting us. In earlier + +

If the death occurs between the first and second pass, and for example more +precisely between n stepm and (n+1)stepm the contribution of +these people to the likelihood is simply the difference between the probability +of dying before n stepm and the probability of dying before +(n+1)stepm. There was a bug in version 0.8 and death was treated as any +other state, i.e. as if it was an observed death at second pass. This was not +precise but correct, although when information on the precise month of +death came (death occuring prior to second pass) we did not change the +likelihood accordingly. We thank Chris Jackson for correcting it. In earlier versions (fortunately before first publication) the total mortality -was overestimated (people were dying too early) of about 10%. Version -0.95 and higher are correct. - -

Our suggested choice is mle=1 . If stepm=1 there is no difference -between various mle options (methods of interpolation). If stepm is -big, like 12 or 24 or 48 and mle=4 (no interpolation) the bias may be -very important if the mean duration between two waves is not a -multiple of stepm. See the appendix in our main publication concerning -the sine curve of biases. - - -

Guess values for computing variances

- -

These values are output by the maximisation of the likelihood mle=1. These valuse can be used as an input of a -second run in order to get the various output data files (Health -expectancies, period prevalence etc.) and figures without rerunning -the long maximisation phase (mle=0).

- -

These 'scales' are small values needed for the computing of -numerical derivatives. These derivatives are used to compute the -hessian matrix of the parameters, that is the inverse of the -covariance matrix. They are often used for estimating variances and -confidence intervals. Each line consists in indices "ij" -followed by the initial scales (zero to simplify) associated with aij -and bij.

- -

If mle=1 you can enter zeros:

# Scales (for hessian or gradient estimation) +was thus overestimated (people were dying too early) by about 10%. Version +0.95 and higher are correct. + +

Our suggested choice is mle=1 . If stepm=1 there is no difference between +various mle options (methods of interpolation). If stepm is big, like 12 or 24 +or 48 and mle=4 (no interpolation) the bias may be very important if the mean +duration between two waves is not a multiple of stepm. See the appendix in our +main publication concerning the sine curve of biases. +

Guess values for computing variances

These values are output by the maximisation of the likelihood mle=1 and +can be used as an input for a second run in order to get the various output data +files (Health expectancies, period prevalence etc.) and figures without +rerunning the long maximisation phase (mle=0).

The 'scales' are small values needed for the computing of numerical +derivatives. These derivatives are used to compute the hessian matrix of the +parameters, that is the inverse of the covariance matrix. They are often used +for estimating variances and confidence intervals. Each line consists of indices +"ij" followed by the initial scales (zero to simplify) associated with aij and +bij.

If mle=1 you can enter zeros: +

# Scales (for hessian or gradient estimation)
 12 0. 0. 
 13 0. 0. 
 21 0. 0. 
-23 0. 0.

If mle=0 (no maximisation of Likelihood) you must enter a covariance matrix (usually - obtained from an earlier run).

- -

Covariance matrix of parameters

- -

The covariance matrix is output if mle=1. But it can be -also used as an input to get the various output data files (Health -expectancies, period prevalence etc.) and figures without -rerunning the maximisation phase (mle=0).
-Each line starts with indices "ijk" followed by the -covariances between aij and bij:
-

- -

-   121 Var(a12) 
+23 0. 0.

If mle=0 (no maximisation of Likelihood) you must enter a covariance + matrix (usually obtained from an earlier run).

Covariance matrix of parameters

The covariance matrix is output if mle=1. But it can be +also be used as an input to get the various output data files (Health +expectancies, period prevalence etc.) and figures without rerunning +the maximisation phase (mle=0).
Each line starts with indices +"ijk" followed by the covariances between aij and bij:
+

   121 Var(a12) 
    122 Cov(b12,a12)  Var(b12) 
           ...
-   232 Cov(b23,a12)  Cov(b23,b12) ... Var (b23)

- -

If mle=1 you can enter zeros.
```
# Covariance matrix
+   232 Cov(b23,a12)  Cov(b23,b12) ... Var (b23) 
```
+
- If mle=1 you can enter zeros. +
- ```
# Covariance matrix
 121 0.
 122 0. 0.
 131 0. 0. 0. 
@@ -634,306 +470,203 @@ covariances between aij and bij:

 211 0. 0. 0. 0. 0. 
 212 0. 0. 0. 0. 0. 0. 
 231 0. 0. 0. 0. 0. 0. 0. 
-232 0. 0. 0. 0. 0. 0. 0. 0.
```
  -
- If mle=0 you must enter a covariance matrix (usually - obtained from an earlier run).
- -
Age range for calculation of stationary -prevalences and health expectancies
- -
```
agemin=70 agemax=100 bage=50 fage=100
```
- -
-Once we obtained the estimated parameters, the program is able -to calculate period prevalence, transitions probabilities -and life expectancies at any age. Choice of age range is useful -for extrapolation. In this example, age of people interviewed varies -from 69 to 102 and the model is estimated using their exact ages. But -if you are interested in the age-specific period prevalence you can -start the simulation at an exact age like 70 and stop at 100. Then the -program will draw at least two curves describing the forecasted -prevalences of two cohorts, one for healthy people at age 70 and the second -for disabled people at the same initial age. And according to the -mixing property (ergodicity) and because of recovery, both prevalences -will tend to be identical at later ages. Thus if you want to compute -the prevalence at age 70, you should enter a lower agemin value. - -
-Setting bage=50 (begin age) and fage=100 (final age), let -the program compute life expectancy from age 'bage' to age -'fage'. As we use a model, we can interessingly compute life -expectancy on a wider age range than the age range from the data. -But the model can be rather wrong on much larger intervals. -Program is limited to around 120 for upper age! - - -
- agemin= Minimum age for calculation of the - period prevalence
- agemax= Maximum age for calculation of the - period prevalence
- bage= Minimum age for calculation of the health - expectancies
- fage= Maximum age for calculation of the health - expectancies
- -
Computing the cross-sectional prevalence
- -
```
begin-prev-date=1/1/1984 end-prev-date=1/6/1988 estepm=1
```
- -
-Statements 'begin-prev-date' and 'end-prev-date' allow to -select the period in which we calculate the observed prevalences -in each state. In this example, the prevalences are calculated on -data survey collected between 1 january 1984 and 1 june 1988. -
- -
- begin-prev-date= Starting date - (day/month/year)
- end-prev-date= Final date - (day/month/year)
- estepm= Unit (in months).We compute the - life expectancy from trapezoids spaced every estepm - months. This is mainly to measure the difference between - two models: for example if stepm=24 months pijx are given - only every 2 years and by summing them we are calculating - an estimate of the Life Expectancy assuming a linear - progression inbetween and thus overestimating or - underestimating according to the curvature of the - survival function. If, for the same date, we estimate the - model with stepm=1 month, we can keep estepm to 24 months - to compare the new estimate of Life expectancy with the - same linear hypothesis. A more precise result, taking - into account a more precise curvature will be obtained if - estepm is as small as stepm.
- -
Population- or status-based health -expectancies
- -
```
pop_based=0
```
- -
The program computes status-based health expectancies, i.e health -expectancies which depend on the initial health state. If you are -healthy, your healthy life expectancy (e11) is higher than if you were -disabled (e21, with e11 > e21).
To compute a healthy life -expectancy 'independent' of the initial status we have to weight e11 -and e21 according to the probability to be in each state at initial -age which are corresponding to the proportions of people in each health -state (cross-sectional prevalences).
- -We could also compute e12 and e12 and get e.2 by weighting them -according to the observed cross-sectional prevalences at initial age. -
In a similar way we could compute the total life expectancy by -summing e.1 and e.2 . -
-The main difference between 'population based' and 'implied' or -'period' consists in the weights used. 'Usually', cross-sectional -prevalences of disability are higher than period prevalences -particularly at old ages. This is true if the country is improving its -health system by teaching people how to prevent disability as by -promoting better screening, for example of people needing cataracts -surgeryand for many unknown reasons that this program may help to -discover. Then the proportion of disabled people at age 90 will be -lower than the current observed proportion. -
-Thus a better Health Expectancy and even a better Life Expectancy -value is given by forecasting not only the current lower mortality at -all ages but also a lower incidence of disability and higher recovery. -
Using the period prevalences as weight instead of the -cross-sectional prevalences we are computing indices which are more -specific to the current situations and therefore more useful to -predict improvements or regressions in the future as to compare -different policies in various countries. - -
- popbased= 0 Health expectancies are computed - at each age from period prevalences 'expected' at this initial - age.
- popbased= 1 Health expectancies are - computed at each age from cross-sectional 'observed' prevalence at - this initial age. As all the population is not observed at the - same exact date we define a short period were the observed - prevalence can be computed.
  - - We simply sum all people surveyed within these two exact dates - who belong to a particular age group (single year) at the date of - interview and being in a particular health state. Then it is easy to -get the proportion of people of a particular health status among all -people of the same age group.
  - -If both dates are spaced and are covering two waves or more, people -being interviewed twice or more are counted twice or more. The program -takes into account the selection of individuals interviewed between -firstpass and lastpass too (we don't know if it can be useful). -
- -
Prevalence forecasting (Experimental)
- -
```
starting-proj-date=1/1/1989 final-proj-date=1/1/1992 mov_average=0 
```
- -
Prevalence and population projections are only available if -the interpolation unit is a month, i.e. stepm=1 and if there are -no covariate. The programme estimates the prevalence in each -state at a precise date expressed in day/month/year. The -programme computes one forecasted prevalence a year from a -starting date (1 january of 1989 in this example) to a final date -(1 january 1992). The statement mov_average allows to compute -smoothed forecasted prevalences with a five-age moving average -centered at the mid-age of the five-age period.
-
- -
Population forecasting (Experimental)
- -
- starting-proj-date= starting date - (day/month/year) of forecasting
- final-proj-date= final date - (day/month/year) of forecasting
- mov_average= smoothing with a five-age - moving average centered at the mid-age of the five-age - period. The command mov_average takes - value 1 if the prevalences are smoothed and 0 otherwise.
- - -
- popforecast= - 0 Option for population forecasting. If - popforecast=1, the programme does the forecasting.
- popfile= - name of the population file
- popfiledate= - date of the population population
- last-popfiledate= - date of the last population projection
- -
- -
Running Imach -with this example
- -
We assume that you already typed your 1st_example -parameter file as explained above. - -To run the program under Windows you should either: -
- -
- click on the imach.exe icon and either: -
  - enter the name of the - parameter file which is for example -C:\home\myname\lsoa\biaspar.imach"
  - or locate the biaspar.imach icon in your folder such as - C:\home\myname\lsoa - and drag it, with your mouse, on the already open imach window.
  - -
- With version (0.97b) if you ran setup at installation, Windows is - supposed to understand the ".imach" extension and you can - right click the biaspar.imach icon and either edit with wordpad - (better than notepad) the parameter file or execute it with - IMaCh.
- -
The time to converge depends on the step unit that you used (1 -month is more precise but more cpu consuming), on the number of cases, -and on the number of variables (covariates). - -
-The program outputs many files. Most of them are files which will be -plotted for better understanding. - -
-To run under Linux it is mostly the same. -
-It is neither more difficult to run it under a MacIntosh. -
- -
Output of the program -and graphs
- -
Once the optimization is finished (once the convergence is -reached), many tables and graphics are produced.
-The IMaCh program will create a subdirectory of the same name as your -parameter file (here mypar) where all the tables and figures will be -stored.
- -Important files like the log file and the output parameter file (which -contains the estimates of the maximisation) are stored at the main -level not in this subdirectory. File with extension .log and .txt can -be edited with a standard editor like wordpad or notepad or even can be -viewed with a browser like Internet Explorer or Mozilla. - -
The main html file is also named with the same name biaspar.htm. You can click on it by holding -your shift key in order to open it in another window (Windows). -
- Our grapher is Gnuplot, it is an interactive plotting program (GPL) which - can also work in batch. A gnuplot reference manual is available here.
When the run is - finished, and in order that the window doesn't disappear, the user - should enter a character like q for quitting.
These - characters are:
-
-
- 'e' for opening the main result html file biaspar.htm file to edit - the output files and graphs.
- 'g' to graph again
- 'c' to start again the program from the beginning.
- 'q' for exiting.
- -The main gnuplot file is named biaspar.gp and can be edited (right -click) and run again. -
Gnuplot is easy and you can use it to make more complex -graphs. Just click on gnuplot and type plot sin(x) to see how easy it -is. - - -
Results files
-
-- cross-sectional prevalence in each state (and at first pass): -biaspar/prbiaspar.txt
-
- -
The first line is the title and displays each field of the -file. First column corresponds to age. Fields 2 and 6 are the -proportion of individuals in states 1 and 2 respectively as -observed at first exam. Others fields are the numbers of -people in states 1, 2 or more. The number of columns increases if -the number of states is higher than 2.
-The header of the file is
- -
```
# Age Prev(1) N(1) N Age Prev(2) N(2) N
+232 0. 0. 0. 0. 0. 0. 0. 0.
```
+
If mle=0 you must enter a covariance matrix (usually obtained from an + earlier run).

Age range for calculation of stationary prevalences and +health expectancies

agemin=70 agemax=100 bage=50 fage=100

Once we obtained the estimated parameters, the program is able to calculate +period prevalence, transitions probabilities and life expectancies at any age. +Choice of the age range is useful for extrapolation. In this example, +the age of people interviewed varies from 69 to 102 and the model is +estimated using their exact ages. But if you are interested in the +age-specific period prevalence you can start the simulation at an +exact age like 70 and stop at 100. Then the program will draw at +least two curves describing the forecasted prevalences of two cohorts, +one for healthy people at age 70 and the second for disabled people at +the same initial age. And according to the mixing property +(ergodicity) and because of recovery, both prevalences will tend to be +identical at later ages. Thus if you want to compute the prevalence at +age 70, you should enter a lower agemin value. +

Setting bage=50 (begin age) and fage=100 (final age), let the program compute +life expectancy from age 'bage' to age 'fage'. As we use a model, we can +interessingly compute life expectancy on a wider age range than the age range +from the data. But the model can be rather wrong on much larger intervals. +Program is limited to around 120 for upper age!

agemin= Minimum age for calculation of the period prevalence +
agemax= Maximum age for calculation of the period prevalence +
bage= Minimum age for calculation of the health expectancies +
fage= Maximum age for calculation of the health expectancies +

Computing the cross-sectional prevalence

begin-prev-date=1/1/1984 end-prev-date=1/6/1988 estepm=1

Statements 'begin-prev-date' and 'end-prev-date' allow the user to +select the period in which the observed prevalences in each state. In +this example, the prevalences are calculated on data survey collected +between 1 January 1984 and 1 June 1988.

begin-prev-date= Starting date (day/month/year) +
end-prev-date= Final date (day/month/year) +
estepm= Unit (in months).We compute the life expectancy + from trapezoids spaced every estepm months. This is mainly to measure the + difference between two models: for example if stepm=24 months pijx are given + only every 2 years and by summing them we are calculating an estimate of the + Life Expectancy assuming a linear progression inbetween and thus + overestimating or underestimating according to the curvature of the survival + function. If, for the same date, we estimate the model with stepm=1 month, we + can keep estepm to 24 months to compare the new estimate of Life expectancy + with the same linear hypothesis. A more precise result, taking into account a + more precise curvature will be obtained if estepm is as small as stepm. +

Population- or status-based health +expectancies

pop_based=0

The program computes status-based health expectancies, i.e health +expectancies which depend on the initial health state. If you are healthy, your +healthy life expectancy (e11) is higher than if you were disabled (e21, with e11 +> e21).
To compute a healthy life expectancy 'independent' of the initial +status we have to weight e11 and e21 according to the probability of +being in each state at initial age which correspond to the proportions +of people in each health state (cross-sectional prevalences). +

We could also compute e12 and e12 and get e.2 by weighting them according to +the observed cross-sectional prevalences at initial age. +

In a similar way we could compute the total life expectancy by summing e.1 +and e.2 .
The main difference between 'population based' and 'implied' or +'period' is in the weights used. 'Usually', cross-sectional prevalences of +disability are higher than period prevalences particularly at old ages. This is +true if the country is improving its health system by teaching people how to +prevent disability by promoting better screening, for example of people +needing cataract surgery. Then the proportion of disabled people at +age 90 will be lower than the current observed proportion. +

Thus a better Health Expectancy and even a better Life Expectancy value is +given by forecasting not only the current lower mortality at all ages but also a +lower incidence of disability and higher recovery.
Using the period +prevalences as weight instead of the cross-sectional prevalences we are +computing indices which are more specific to the current situations and +therefore more useful to predict improvements or regressions in the future as to +compare different policies in various countries. +

popbased= 0 Health expectancies are computed at each age + from period prevalences 'expected' at this initial age. +
popbased= 1 Health expectancies are computed at each age + from cross-sectional 'observed' prevalence at the initial age. As all the + population is not observed at the same exact date we define a short period + where the observed prevalence can be computed as follows:
we simply sum all people + surveyed within these two exact dates who belong to a particular age group + (single year) at the date of interview and are in a particular health state. + Then it is easy to get the proportion of people in a particular + health state as a percentage of all people of the same age group.
If both dates are spaced and are + covering two waves or more, people being interviewed twice or more are counted + twice or more. The program takes into account the selection of individuals + interviewed between firstpass and lastpass too (we don't know if + this is useful).

Prevalence forecasting (Experimental)

starting-proj-date=1/1/1989 final-proj-date=1/1/1992 mov_average=0

Prevalence and population projections are only available if the interpolation +unit is a month, i.e. stepm=1 and if there are no covariate. The programme +estimates the prevalence in each state at a precise date expressed in +day/month/year. The programme computes one forecasted prevalence a year from a +starting date (1 January 1989 in this example) to a final date (1 January +1992). The statement mov_average allows computation of smoothed forecasted +prevalences with a five-age moving average centered at the mid-age of the +fiveyear-age period.

Population forecasting (Experimental)

starting-proj-date= starting date (day/month/year) of + forecasting +
final-proj-date= final date (day/month/year) of + forecasting +
mov_average= smoothing with a five-age moving average + centered at the mid-age of the fiveyear-age period. The command + mov_average takes value 1 if the prevalences are smoothed and 0 + otherwise.

popforecast= 0 Option for population forecasting. If popforecast=1, + the programme does the forecasting. +
popfile= name of the population file +
popfiledate= date of the population population +
last-popfiledate= date of the last population projection +

+ +

Running Imach with this +example

We assume that you have already typed your 1st_example parameter +file as explained above. To run +the program under Windows you should either:

click on the imach.exe icon and either: +
- enter the name of the parameter file which is for example + C:\home\myname\lsoa\biaspar.imach +
- or locate the biaspar.imach icon in your folder such as + C:\home\myname\lsoa and drag it, with your mouse, on the already + open imach window.
+
With version (0.97b) if you ran setup at installation, Windows is supposed + to understand the ".imach" extension and you can right click the biaspar.imach + icon and either edit with wordpad (better than notepad) the parameter file or + execute it with IMaCh.

The time to converge depends on the step unit used (1 month is more +precise but more cpu time consuming), on the number of cases, and on the number of +variables (covariates). +

The program outputs many files. Most of them are files which will be plotted +for better understanding.

To run under Linux is mostly the same. +

It is no more difficult to run IMaCh on a MacIntosh. +

+ +

Output of the program and graphs +

Once the optimization is finished (once the convergence is reached), many +tables and graphics are produced. +

The IMaCh program will create a subdirectory with the same name as your +parameter file (here mypar) where all the tables and figures will be +stored.
Important files like the log file and the output parameter file +(the latter contains the maximum likelihood estimates) are stored at +the main level not in this subdirectory. Files with extension .log and +.txt can be edited with a standard editor like wordpad or notepad or +even can be viewed with a browser like Internet Explorer or Mozilla. +

The main html file is also named with the same name biaspar.htm. You can +click on it by holding your shift key in order to open it in another window +(Windows). +

Our grapher is Gnuplot, an interactive plotting program (GPL) which can +also work in batch mode. A gnuplot reference manual is available here.
When the run is finished, and in +order that the window doesn't disappear, the user should enter a character like +q for quitting.
These characters are:

'e' for opening the main result html file biaspar.htm + file to edit the output files and graphs. +
'g' to graph again +
'c' to start again the program from the beginning. +
'q' for exiting.

The main gnuplot file is named +biaspar.gp and can be edited (right click) and run again. +

Gnuplot is easy and you can use it to make more complex graphs. Just click on +gnuplot and type plot sin(x) to see how easy it is. +

Results files

- cross-sectional prevalence in each state (and at first pass): biaspar/prbiaspar.txt

The first line is the title and displays each field of the file. First column +corresponds to age. Fields 2 and 6 are the proportion of individuals in states 1 +and 2 respectively as observed at first exam. Others fields are the numbers of +people in states 1, 2 or more. The number of columns increases if the number of +states is higher than 2.
The header of the file is

# Age Prev(1) N(1) N Age Prev(2) N(2) N
 70 1.00000 631 631 70 0.00000 0 631
 71 0.99681 625 627 71 0.00319 2 627 
-72 0.97125 1115 1148 72 0.02875 33 1148

- -

It means that at age 70 (between 70 and 71), the prevalence in state 1 is 1.000 -and in state 2 is 0.00 . At age 71 the number of individuals in -state 1 is 625 and in state 2 is 2, hence the total number of -people aged 71 is 625+2=627.
-

- -

- Estimated parameters and -covariance matrix: rbiaspar.imach

- -

This file contains all the maximisation results:

- -

 -2 log likelihood= 21660.918613445392
+72 0.97125 1115 1148 72 0.02875 33 1148

It means that at age 70 (between 70 and 71), the prevalence in state 1 is +1.000 and in state 2 is 0.00 . At age 71 the number of individuals in state 1 is +625 and in state 2 is 2, hence the total number of people aged 71 is 625+2=627. +

- Estimated parameters and covariance +matrix: rbiaspar.imach

This file contains all the maximisation results:

 -2 log likelihood= 21660.918613445392
  Estimated parameters: a12 = -12.290174 b12 = 0.092161 
                        a13 = -9.155590  b13 = 0.046627 
                        a21 = -2.629849  b21 = -0.022030 
@@ -946,126 +679,103 @@ covariance matrix: rbiaspar.imach
-
-By substitution of these parameters in the regression model,
-we obtain the elementary transition probabilities:
-
-
-
-- Transition probabilities:
-biaspar/pijrbiaspar.txt
-
-Here are the transitions probabilities Pij(x, x+nh). The second
-column is the starting age x (from age 95 to 65), the third is age
-(x+nh) and the others are the transition probabilities p11, p12, p13,
-p21, p22, p23. The first column indicates the value of the covariate
-(without any other variable than age it is equal to 1) For example, line 5 of the file
-is: 
-
-1 100 106 0.02655 0.17622 0.79722 0.01809 0.13678 0.84513 
-
-and this means: 
-
-p11(100,106)=0.02655
+ 
+By substitution of these parameters in the regression model, we obtain the 
+elementary transition probabilities:
+
+- Transition probabilities: 
+biaspar/pijrbiaspar.txt
+Here are the transitions probabilities Pij(x, x+nh). The second column is the 
+starting age x (from age 95 to 65), the third is age (x+nh) and the others are 
+the transition probabilities p11, p12, p13, p21, p22, p23. The first column 
+indicates the value of the covariate (without any other variable than age it is 
+equal to 1) For example, line 5 of the file is: 
1 100 106 0.02655 0.17622 0.79722 0.01809 0.13678 0.84513 
+and this means: 
p11(100,106)=0.02655
 p12(100,106)=0.17622
 p13(100,106)=0.79722
 p21(100,106)=0.01809
 p22(100,106)=0.13678
-p22(100,106)=0.84513 
-
-- Period prevalence in each state:
-biaspar/plrbiaspar.txt
-
-#Prevalence
+p22(100,106)=0.84513 
+- Period 
+prevalence in each state: biaspar/plrbiaspar.txt
#Prevalence
 #Age 1-1 2-2
 
 #************ 
 70 0.90134 0.09866
 71 0.89177 0.10823 
 72 0.88139 0.11861 
-73 0.87015 0.12985 
-
-At age 70 the period prevalence is 0.90134 in state 1 and 0.09866
-in state 2. This period prevalence differs from the cross-sectional
-prevalence. Here is the point. The cross-sectional prevalence at age
-70 results from the incidence of disability, incidence of recovery and
-mortality which occurred in the past of the cohort.  Period prevalence
-results from a simulation with current incidences of disability,
-recovery and mortality estimated from this cross-longitudinal
-survey. It is a good predictin of the prevalence in the
-future if "nothing changes in the future". This is exactly
-what demographers do with a period life table. Life expectancy is the
-expected mean survival time if current mortality rates (age-specific incidences
-of mortality) "remain constant" in the future. 
-
-- Standard deviation of
-period prevalence: biaspar/vplrbiaspar.txt
-
-The period prevalence has to be compared with the cross-sectional
-prevalence. But both are statistical estimates and therefore
-have confidence intervals.
-
For the cross-sectional prevalence we generally need information on
-the design of the surveys. It is usually not enough to consider the
-number of people surveyed at a particular age and to estimate a
-Bernouilli confidence interval based on the prevalence at that
-age. But you can do it to have an idea of the randomness. At least you
-can get a visual appreciation of the randomness by looking at the
-fluctuation over ages.
-
-
 For the period prevalence it is possible to estimate the
-confidence interval from the Hessian matrix (see the publication for
-details). We are supposing that the design of the survey will only
-alter the weight of each individual. IMaCh is scaling the weights of
-individuals-waves contributing to the likelihood by making the sum of
-the weights equal to the sum of individuals-waves contributing: a
-weighted survey doesn't increase or decrease the size of the survey,
-it only give more weights to some individuals and thus less to the
-others.
-
-
-cross-sectional and period
-prevalence in state (2=disable) with confidence interval:
-biaspar/vbiaspar21.png
-
-This graph exhibits the period prevalence in state (2) with the
-confidence interval in red. The green curve is the observed prevalence
-(or proportion of individuals in state (2)).  Without discussing the
-results (it is not the purpose here), we observe that the green curve
-is rather below the period prevalence. It the data where not biased by
-the non inclusion of people living in institutions we would have
-concluded that the prevalence of disability will increase in the
-future (see the main publication if you are interested in real data
-and results which are opposite).
-
-
-
--Convergence to the
-period prevalence of disability: biaspar/pbiaspar11.png

- 
-
-This graph plots the conditional transition probabilities from
-an initial state (1=healthy in red at the bottom, or 2=disable in
-green on top) at age x to the final state 2=disable at
-age x+h. Conditional means at the condition to be alive
-at age x+h which is hP12x + hP22x. The
-curves hP12x/(hP12x + hP22x) and hP22x/(hP12x
-+ hP22x) converge with h, to the period
-prevalence of disability. In order to get the period
-prevalence at age 70 we should start the process at an earlier
-age, i.e.50. If the disability state is defined by severe
-disability criteria with only a few chance to recover, then the
-incidence of recovery is low and the time to convergence is
-probably longer. But we don't have experience yet.
-
-- Life expectancies by age
-and initial health status with standard deviation: biaspar/erbiaspar.txt
-
-# Health expectancies 
+73 0.87015 0.12985 
+At age 70 the period prevalence is 0.90134 in state 1 and 0.09866 in state 2. 
+This period prevalence differs from the cross-sectional prevalence and
+we explaining. The cross-sectional prevalence at age 70 results from
+the incidence of disability, incidence of recovery and mortality which
+occurred in the past for the cohort. Period prevalence results from a
+simulation with current incidences of disability, recovery and
+mortality estimated from this cross-longitudinal survey. It is a good
+prediction of the prevalence in the future if "nothing changes in the
+future". This is exactly what demographers do with a period life
+table. Life expectancy is the expected mean survival time if current
+mortality rates (age-specific incidences of mortality) "remain
+constant" in the future. 
+
+- Standard deviation of period 
+prevalence: biaspar/vplrbiaspar.txt
+The period prevalence has to be compared with the cross-sectional prevalence. 
+But both are statistical estimates and therefore have confidence intervals. 
+
For the cross-sectional prevalence we generally need information on the 
+design of the surveys. It is usually not enough to consider the number of people 
+surveyed at a particular age and to estimate a Bernouilli confidence interval 
+based on the prevalence at that age. But you can do it to have an idea of the 
+randomness. At least you can get a visual appreciation of the randomness by 
+looking at the fluctuation over ages. 
+
For the period prevalence it is possible to estimate the confidence interval 
+from the Hessian matrix (see the publication for details). We are supposing that 
+the design of the survey will only alter the weight of each individual. IMaCh 
+scales the weights of individuals-waves contributing to the likelihood by 
+making the sum of the weights equal to the sum of individuals-waves 
+contributing: a weighted survey doesn't increase or decrease the size of the 
+survey, it only give more weight to some individuals and thus less to the 
+others. 
+
-cross-sectional and period prevalence in state 
+(2=disable) with confidence interval: biaspar/vbiaspar21.png
+This graph exhibits the period prevalence in state (2) with the confidence 
+interval in red. The green curve is the observed prevalence (or proportion of 
+individuals in state (2)). Without discussing the results (it is not the purpose 
+here), we observe that the green curve is somewhat below the period
+prevalence. If the data were not biased by the non inclusion of people
+living in institutions we would have concluded that the prevalence of
+disability will increase in the future (see the main publication if
+you are interested in real data and results which are opposite).
+
+-Convergence to the period prevalence of 
+disability: biaspar/pbiaspar11.png
 
+This graph plots the conditional transition probabilities from an initial 
+state (1=healthy in red at the bottom, or 2=disabled in green on the top) at age 
+x to the final state 2=disabled at age x+h
+ where conditional means conditional on being alive at age x+h which is 
+hP12x + hP22x. The curves hP12x/(hP12x + hP22x) 
+and hP22x/(hP12x + hP22x) converge with h, to the 
+period prevalence of disability. In order to get the period prevalence 
+at age 70 we should start the process at an earlier age, i.e.50. If the 
+disability state is defined by severe disability criteria with only a
+small chance of recovering, then the incidence of recovery is low and the time to convergence is 
+probably longer. But we don't have experience of this yet.
+- Life expectancies by age and initial health 
+status with standard deviation: biaspar/erbiaspar.txt
# Health expectancies 
 # Age 1-1 (SE) 1-2 (SE) 2-1 (SE) 2-2 (SE)
  70   11.0180 (0.1277)    3.1950 (0.3635)    4.6500 (0.0871)    4.4807 (0.2187)
  71   10.4786 (0.1184)    3.2093 (0.3212)    4.3384 (0.0875)    4.4820 (0.2076)
@@ -1079,304 +789,215 @@ href="biaspar/erbiaspar.txt">biaspar/erbiaspar.txt
  79    6.7464 (0.0867)    3.3220 (0.1124)    2.3794 (0.1112)    4.4646 (0.1364)
  80    6.3538 (0.0868)    3.3354 (0.1014)    2.1949 (0.1168)    4.4587 (0.1331)
  81    5.9775 (0.0873)    3.3484 (0.0933)    2.0222 (0.1230)    4.4520 (0.1320)
-
-
-For example  70  11.0180 (0.1277) 3.1950 (0.3635) 4.6500 (0.0871)  4.4807 (0.2187)
+
For example  70  11.0180 (0.1277) 3.1950 (0.3635) 4.6500 (0.0871)  4.4807 (0.2187)
 means
-e11=11.0180 e12=3.1950 e21=4.6500 e22=4.4807 
-
-
-
-For example, life expectancy of a healthy individual at age 70
-is 11.0 in the healthy state and 3.2 in the disability state
-(total of 14.2 years). If he was disable at age 70, his life expectancy
-will be shorter, 4.65 years in the healthy state and 4.5 in the
-disability state (=9.15 years). The total life expectancy is a
-weighted mean of both, 14.2 and 9.15. The weight is the proportion
-of people disabled at age 70. In order to get a period index
-(i.e. based only on incidences) we use the stable or
-period prevalence at age 70 (i.e. computed from
-incidences at earlier ages) instead of the cross-sectional prevalence
-(observed for example at first medical exam) (see
-below).
-
-- Variances of life
-expectancies by age and initial health status: biaspar/vrbiaspar.txt
-
-For example, the covariances of life expectancies Cov(ei,ej)
-at age 50 are (line 3) 
-
-   Cov(e1,e1)=0.4776  Cov(e1,e2)=0.0488=Cov(e2,e1)  Cov(e2,e2)=0.0424
-
--Variances of one-step
-probabilities : biaspar/probrbiaspar.txt
-
-For example, at age 65
-
-   p11=9.960e-001 standard deviation of p11=2.359e-004
-
-- Health
-expectancies
-with standard errors in parentheses: biaspar/trbiaspar.txt
-
-#Total LEs with variances: e.. (std) e.1 (std) e.2 (std) 
-
-70 13.26 (0.22) 9.95 (0.20) 3.30 (0.14) 
-
-Thus, at age 70 the total life expectancy, e..=13.26 years is
-the weighted mean of e1.=13.46 and e2.=11.35 by the period
-prevalences at age 70 which are 0.90134 in state 1 and 0.09866 in
-state 2 respectively (the sum is equal to one). e.1=9.95 is the
-Disability-free life expectancy at age 70 (it is again a weighted
-mean of e11 and e21). e.2=3.30 is also the life expectancy at age
-70 to be spent in the disability state.
-
--Total life expectancy by
-age and health expectancies in states (1=healthy) and (2=disable):
-biaspar/ebiaspar1.png
-
-This figure represents the health expectancies and the total
-life expectancy with a confidence interval (dashed line). 
-
-        
-
-Standard deviations (obtained from the information matrix of
-the model) of these quantities are very useful.
-Cross-longitudinal surveys are costly and do not involve huge
-samples, generally a few thousands; therefore it is very
-important to have an idea of the standard deviation of our
-estimates. It has been a big challenge to compute the Health
-Expectancy standard deviations. Don't be confuse: life expectancy
-is, as any expected value, the mean of a distribution; but here
-we are not computing the standard deviation of the distribution,
-but the standard deviation of the estimate of the mean.
-
-Our health expectancies estimates vary according to the sample
-size (and the standard deviations give confidence intervals of
-the estimates) but also according to the model fitted. Let us
-explain it in more details.
-
-Choosing a model means at least two kind of choices. At first we
-have to decide the number of disability states. And at second we have to
-design, within the logit model family, the model itself: variables,
-covariables, confounding factors etc. to be included.
-
-More disability states we have, better is our demographical
-approach of the disability process, but smaller are the number of
-transitions between each state and higher is the noise in the
-measurement. We do not have enough experiments of the various
-models to summarize the advantages and disadvantages, but it is
-important to say that even if we had huge and unbiased samples,
-the total life expectancy computed from a cross-longitudinal
-survey, varies with the number of states. If we define only two
-states, alive or dead, we find the usual life expectancy where it
-is assumed that at each age, people are at the same risk to die.
-If we are differentiating the alive state into healthy and
-disable, and as the mortality from the disability state is higher
-than the mortality from the healthy state, we are introducing
-heterogeneity in the risk of dying. The total mortality at each
-age is the weighted mean of the mortality in each state by the
-prevalence in each state. Therefore if the proportion of people
-at each age and in each state is different from the period
-equilibrium, there is no reason to find the same total mortality
-at a particular age. Life expectancy, even if it is a very useful
-tool, has a very strong hypothesis of homogeneity of the
-population. Our main purpose is not to measure differential
-mortality but to measure the expected time in a healthy or
-disability state in order to maximise the former and minimize the
-latter. But the differential in mortality complexifies the
-measurement.
-
-Incidences of disability or recovery are not affected by the number
-of states if these states are independent. But incidences estimates
-are dependent on the specification of the model. More covariates we
-added in the logit model better is the model, but some covariates are
-not well measured, some are confounding factors like in any
-statistical model. The procedure to "fit the best model' is
-similar to logistic regression which itself is similar to regression
-analysis. We haven't yet been sofar because we also have a severe
-limitation which is the speed of the convergence. On a Pentium III,
-500 MHz, even the simplest model, estimated by month on 8,000 people
-may take 4 hours to converge.  Also, the IMaCh program is not a
-statistical package, and does not allow sophisticated design
-variables. If you need sophisticated design variable you have to them
-your self and and add them as ordinary variables. IMaCX allows up to 8
-variables. The current version of this program allows only to add
-simple variables like age+sex or age+sex+ age*sex but will never be
-general enough. But what is to remember, is that incidences or
-probability of change from one state to another is affected by the
-variables specified into the model.
-
-Also, the age range of the people interviewed is linked 
-the age range of the life expectancy which can be estimated by
-extrapolation. If your sample ranges from age 70 to 95, you can
-clearly estimate a life expectancy at age 70 and trust your
-confidence interval because it is mostly based on your sample size,
-but if you want to estimate the life expectancy at age 50, you
-should rely in the design of your model. Fitting a logistic model on a age
-range of 70 to 95 and estimating probabilties of transition out of
-this age range, say at age 50, is very dangerous. At least you
-should remember that the confidence interval given by the
-standard deviation of the health expectancies, are under the
-strong assumption that your model is the 'true model', which is
-probably not the case outside the age range of your sample.
-
-- Copy of the parameter
-file: orbiaspar.txt
-
-This copy of the parameter file can be useful to re-run the
-program while saving the old output files. 
-
-- Prevalence forecasting:
-biaspar/frbiaspar.txt
-
-
-
-First,
-we have estimated the observed prevalence between 1/1/1984 and
-1/6/1988 (June, European syntax of dates). The mean date of all interviews (weighted average of the
-interviews performed between 1/1/1984 and 1/6/1988) is estimated
-to be 13/9/1985, as written on the top on the file. Then we
-forecast the probability to be in each state. 
-
-
-For example on 1/1/1989 : 
-
-# StartingAge FinalAge P.1 P.2 P.3
+e11=11.0180 e12=3.1950 e21=4.6500 e22=4.4807 

+For example, life expectancy of a healthy individual at age 70 is 11.0 in the 
+healthy state and 3.2 in the disability state (total of 14.2 years). If he was 
+disabled at age 70, his life expectancy will be shorter, 4.65 years in the 
+healthy state and 4.5 in the disability state (=9.15 years). The total life 
+expectancy is a weighted mean of both, 14.2 and 9.15. The weight is the 
+proportion of people disabled at age 70. In order to get a period index (i.e. 
+based only on incidences) we use the stable 
+or period prevalence at age 70 (i.e. computed from incidences at earlier 
+ages) instead of the cross-sectional 
+prevalence (observed for example at first interview) (see 
+below).
+- Variances of life expectancies by age and 
+initial health status: biaspar/vrbiaspar.txt
+For example, the covariances of life expectancies Cov(ei,ej) at age 50 are 
+(line 3) 
   Cov(e1,e1)=0.4776  Cov(e1,e2)=0.0488=Cov(e2,e1)  Cov(e2,e2)=0.0424
+-Variances of one-step probabilities 
+: biaspar/probrbiaspar.txt
+For example, at age 65
   p11=9.960e-001 standard deviation of p11=2.359e-004
+- Health 
+expectancies with standard errors 
+in parentheses: biaspar/trbiaspar.txt
#Total LEs with variances: e.. (std) e.1 (std) e.2 (std) 
70 13.26 (0.22) 9.95 (0.20) 3.30 (0.14) 
+Thus, at age 70 the total life expectancy, e..=13.26 years is the weighted 
+mean of e1.=13.46 and e2.=11.35 by the period prevalences at age 70 which are 
+0.90134 in state 1 and 0.09866 in state 2 respectively (the sum is equal to 
+one). e.1=9.95 is the Disability-free life expectancy at age 70 (it is again a 
+weighted mean of e11 and e21). e.2=3.30 is also the life expectancy at age 70 to 
+be spent in the disability state.
+-Total life expectancy by age and health 
+expectancies in states (1=healthy) and (2=disable): biaspar/ebiaspar1.png
+This figure represents the health expectancies and the total life expectancy 
+with a confidence interval (dashed line). 
        
+Standard deviations (obtained from the information matrix of the model) of 
+these quantities are very useful. Cross-longitudinal surveys are costly and do 
+not involve huge samples, generally a few thousands; therefore it is very 
+important to have an idea of the standard deviation of our estimates. It has 
+been a big challenge to compute the Health Expectancy standard deviations. Don't 
+be confused: life expectancy is, as any expected value, the mean of a 
+distribution; but here we are not computing the standard deviation of the 
+distribution, but the standard deviation of the estimate of the mean.
+Our health expectancy estimates vary according to the sample size (and the 
+standard deviations give confidence intervals of the estimates) but also 
+according to the model fitted. We explain this in more detail.
+Choosing a model means at least two kind of choices. First we have to 
+decide the number of disability states. And second we have to design, within 
+the logit model family, the model itself: variables, covariates, confounding 
+factors etc. to be included.
+The more disability states we have, the better is our demographical
+approximation of the disability process, but the smaller the number of
+transitions between each state and the higher the noise in the
+measurement. We have not experimented enough with the various models
+to summarize the advantages and disadvantages, but it is important to
+note that even if we had huge unbiased samples, the total life
+expectancy computed from a cross-longitudinal survey would vary with
+the number of states. If we define only two states, alive or dead, we
+find the usual life expectancy where it is assumed that at each age,
+people are at the same risk of dying. If we are differentiating the
+alive state into healthy and disabled, and as mortality from the
+disabled state is higher than mortality from the healthy state, we are
+introducing heterogeneity in the risk of dying. The total mortality at
+each age is the weighted mean of the mortality from each state by the
+prevalence of each state. Therefore if the proportion of people at each age and 
+in each state is different from the period equilibrium, there is no reason to 
+find the same total mortality at a particular age. Life expectancy, even if it 
+is a very useful tool, has a very strong hypothesis of homogeneity of the 
+population. Our main purpose is not to measure differential mortality but to 
+measure the expected time in a healthy or disabled state in order to maximise 
+the former and minimize the latter. But the differential in mortality 
+complicates the measurement.
+Incidences of disability or recovery are not affected by the number of states 
+if these states are independent. But incidence estimates are dependent on the 
+specification of the model. The more covariates we add in the logit
+model the better 
+is the model, but some covariates are not well measured, some are confounding 
+factors like in any statistical model. The procedure to "fit the best model' is 
+similar to logistic regression which itself is similar to regression analysis. 
+We haven't yet been sofar because we also have a severe limitation which is the 
+speed of the convergence. On a Pentium III, 500 MHz, even the simplest model, 
+estimated by month on 8,000 people may take 4 hours to converge. Also, the IMaCh 
+program is not a statistical package, and does not allow sophisticated design 
+variables. If you need sophisticated design variable you have to them your self 
+and and add them as ordinary variables. IMaCh allows up to 8 variables. The 
+current version of this program allows only to add simple variables like age+sex 
+or age+sex+ age*sex but will never be general enough. But what is to remember, 
+is that incidences or probability of change from one state to another is 
+affected by the variables specified into the model.
+Also, the age range of the people interviewed is linked the age range of the 
+life expectancy which can be estimated by extrapolation. If your sample ranges 
+from age 70 to 95, you can clearly estimate a life expectancy at age 70 and 
+trust your confidence interval because it is mostly based on your sample size, 
+but if you want to estimate the life expectancy at age 50, you should rely in 
+the design of your model. Fitting a logistic model on a age range of 70 to 95 
+and estimating probabilties of transition out of this age range, say at age 50, 
+is very dangerous. At least you should remember that the confidence interval 
+given by the standard deviation of the health expectancies, are under the strong 
+assumption that your model is the 'true model', which is probably not the case 
+outside the age range of your sample.
+- Copy of the parameter file: 
+orbiaspar.txt
+This copy of the parameter file can be useful to re-run the program while 
+saving the old output files. 
+- Prevalence forecasting: biaspar/frbiaspar.txt
+First, we have estimated the observed prevalence between 1/1/1984 and 
+1/6/1988 (June, European syntax of dates). The mean date of all interviews 
+(weighted average of the interviews performed between 1/1/1984 and 1/6/1988) is 
+estimated to be 13/9/1985, as written on the top on the file. Then we forecast 
+the probability to be in each state. 
+For example on 1/1/1989 : 
# StartingAge FinalAge P.1 P.2 P.3
 # Forecasting at date 1/1/1989
-  73 0.807 0.078 0.115
-
-
-
-Since the minimum age is 70 on the 13/9/1985, the youngest forecasted
-age is 73. This means that at age a person aged 70 at 13/9/1989 has a
-probability to enter state1 of 0.807 at age 73 on 1/1/1989.
-Similarly, the probability to be in state 2 is 0.078 and the
-probability to die is 0.115. Then, on the 1/1/1989, the prevalence of
-disability at age 73 is estimated to be 0.088.
-
-- Population forecasting:
-biaspar/poprbiaspar.txt
-
-# Age P.1 P.2 P.3 [Population]
+  73 0.807 0.078 0.115
+Since the minimum age is 70 on the 13/9/1985, the youngest forecasted age is 
+73. This means that at age a person aged 70 at 13/9/1989 has a probability to 
+enter state1 of 0.807 at age 73 on 1/1/1989. Similarly, the probability to be in 
+state 2 is 0.078 and the probability to die is 0.115. Then, on the 1/1/1989, the 
+prevalence of disability at age 73 is estimated to be 0.088.
+- Population forecasting: biaspar/poprbiaspar.txt
# Age P.1 P.2 P.3 [Population]
 # Forecasting at date 1/1/1989 
 75 572685.22 83798.08 
 74 621296.51 79767.99 
-73 645857.70 69320.60 
-
-# Forecasting at date 1/1/19909 
+73 645857.70 69320.60 
# Forecasting at date 1/1/19909 
 76 442986.68 92721.14 120775.48
 75 487781.02 91367.97 121915.51
-74 512892.07 85003.47 117282.76 
-
-From the population file, we estimate the number of people in
-each state. At age 73, 645857 persons are in state 1 and 69320
-are in state 2. One year latter, 512892 are still in state 1,
-85003 are in state 2 and 117282 died before 1/1/1990.
-
-
-
-Trying an example
-
-Since you know how to run the program, it is time to test it
-on your own computer. Try for example on a parameter file named imachpar.imach which is a copy
-of mypar.imach included
-in the subdirectory of imach, mytry.
-Edit it and change the name of the data file to mydata.txt if you don't want to
-copy it on the same directory. The file mydata.txt
-is a smaller file of 3,000 people but still with 4 waves. 
-
-Right click on the .imach file and a window will popup with the
-string 'Enter the parameter file name:'
-
-
-    
-        
-    
-IMACH, Version 0.97bEnter
-        the parameter file name: imachpar.imach
-        
-
-Most of the data files or image files generated, will use the
-'imachpar' string into their name. The running time is about 2-3
-minutes on a Pentium III. If the execution worked correctly, the
-outputs files are created in the current directory, and should be
-the same as the mypar files initially included in the directory mytry.
-
-
-    Output on the screen The output screen looks like biaspar.log
+74 512892.07 85003.47 117282.76 
+From the population file, we estimate the number of people in each state. At 
+age 73, 645857 persons are in state 1 and 69320 are in state 2. One year latter, 
+512892 are still in state 1, 85003 are in state 2 and 117282 died before 
+1/1/1990.
+
+
+Trying an example
+Since you know how to run the program, it is time to test it on your own 
+computer. Try for example on a parameter file named imachpar.imach 
+which is a copy of mypar.imach included 
+in the subdirectory of imach, mytry. Edit 
+it and change the name of the data file to mydata.txt if you don't want to copy it on the same directory. The 
+file mydata.txt is a smaller file of 3,000 
+people but still with 4 waves. 
+Right click on the .imach file and a window will popup with the string 
+'Enter the parameter file name:'
+
+  
+  
+    IMACH, Version 0.97b
+      Enter the parameter file name: 
+  imachpar.imach
+Most of the data files or image files generated, will use the 'imachpar' 
+string into their name. The running time is about 2-3 minutes on a Pentium III. 
+If the execution worked correctly, the outputs files are created in the current 
+directory, and should be the same as the mypar files initially included in the 
+directory mytry.
+
+  Output on the screen The output screen looks like biaspar.log
 #
 title=MLE datafile=mydaiata.txt lastobs=3000 firstpass=1 lastpass=3
-ftol=1.000000e-008 stepm=24 ncovcol=2 nlstate=2 ndeath=1 maxwav=4 mle=1 weight=0
-    
-    Total number of individuals= 2965, Agemin = 70.00, Agemax= 100.92
+ftol=1.000000e-008 stepm=24 ncovcol=2 nlstate=2 ndeath=1 maxwav=4 mle=1 weight=0
+  
Total number of individuals= 2965, Agemin = 70.00, Agemax= 100.92
 
 Warning, no any valid information for:126 line=126
 Warning, no any valid information for:2307 line=2307
 Delay (in months) between two waves Min=21 Max=51 Mean=24.495826
-These lines give some warnings on the data file and also some raw statistics on frequencies of transitions.
+These lines give some warnings on the data file and also some raw statistics on frequencies of transitions.
 Age 70 1.=230 loss[1]=3.5% 2.=16 loss[2]=12.5% 1.=222 prev[1]=94.1% 2.=14
  prev[2]=5.9% 1-1=8 11=200 12=7 13=15 2-1=2 21=6 22=7 23=1
-Age 102 1.=0 loss[1]=NaNQ% 2.=0 loss[2]=NaNQ% 1.=0 prev[1]=NaNQ% 2.=0 
-    
-
-It includes some warnings or errors which are very important for
-you. Be careful with such warnings because your results may be biased
-if, for example, you have people who accepted to be interviewed at
-first pass but never after. Or if you don't have the exact month of
-death. In such cases IMaCh doesn't take any initiative, it does only
-warn you. It is up to you to decide what to do with these
-people. Excluding them is usually a wrong decision. It is better to
-decide that the month of death is at the mid-interval between the last
-two waves for example.
-
-If you survey suffers from severe attrition, you have to analyse the
-characteristics of the lost people and overweight people with same
-characteristics for example.
-

-By default, IMaCH warns and excludes these problematic people, but you
-have to be careful with such results.
-
-
 
-
-
-    Maximisation with the Powell algorithm. 8 directions are
-        given corresponding to the 8 parameters. this can be
-        rather long to get convergence.

-        

-        Powell iter=1 -2*LL=11531.405658264877 1 0.000000000000 2
-        0.000000000000 3

-        0.000000000000 4 0.000000000000 5 0.000000000000 6
-        0.000000000000 7 

-        0.000000000000 8 0.000000000000

-        1..........2.................3..........4.................5.........

-        6................7........8...............

-        Powell iter=23 -2*LL=6744.954108371555 1 -12.967632334283
-        

-        2 0.135136681033 3 -7.402109728262 4 0.067844593326 

-        5 -0.673601538129 6 -0.006615504377 7 -5.051341616718 

-        8 0.051272038506

-        1..............2...........3..............4...........

-        5..........6................7...........8.........

-        #Number of iterations = 23, -2 Log likelihood =
-        6744.954042573691

-        # Parameters

-        12 -12.966061 0.135117 

-        13 -7.401109 0.067831 

-        21 -0.672648 -0.006627 

-        23 -5.051297 0.051271 

-        
-    Calculation of the hessian matrix. Wait...
+Age 102 1.=0 loss[1]=NaNQ% 2.=0 loss[2]=NaNQ% 1.=0 prev[1]=NaNQ% 2.=0 
It 
+includes some warnings or errors which are very important for you. Be careful 
+with such warnings because your results may be biased if, for example, you have 
+people who accepted to be interviewed at first pass but never after. Or if you 
+don't have the exact month of death. In such cases IMaCh doesn't take any 
+initiative, it does only warn you. It is up to you to decide what to do with 
+these people. Excluding them is usually a wrong decision. It is better to decide 
+that the month of death is at the mid-interval between the last two waves for 
+example.
+If you survey suffers from severe attrition, you have to analyse the 
+characteristics of the lost people and overweight people with same 
+characteristics for example. 
+
By default, IMaCH warns and excludes these problematic people, but you have 
+to be careful with such results. 
+
 
+
+  Maximisation with the Powell algorithm. 8 directions are given 
+  corresponding to the 8 parameters. this can be rather long to get 
+  convergence.

Powell iter=1 
+  -2*LL=11531.405658264877 1 0.000000000000 2 0.000000000000 3
0.000000000000 
+  4 0.000000000000 5 0.000000000000 6 0.000000000000 7 
0.000000000000 8 
+  0.000000000000
1..........2.................3..........4.................5.........
6................7........8...............
Powell 
+  iter=23 -2*LL=6744.954108371555 1 -12.967632334283 
2 0.135136681033 3 
+  -7.402109728262 4 0.067844593326 
5 -0.673601538129 6 -0.006615504377 7 
+  -5.051341616718 
8 
+  0.051272038506
1..............2...........3..............4...........
5..........6................7...........8.........
#Number 
+  of iterations = 23, -2 Log likelihood = 6744.954042573691
# 
+  Parameters
12 -12.966061 0.135117 
13 -7.401109 0.067831 
21 
+  -0.672648 -0.006627 
23 -5.051297 0.051271 

+  
Calculation of the hessian matrix. Wait...
 12345678.12.13.14.15.16.17.18.23.24.25.26.27.28.34.35.36.37.38.45.46.47.48.56.57.58.67.68.78
 
 Inverting the hessian to get the covariance matrix. Wait...
@@ -1415,86 +1036,66 @@ Computing Variance-covariance of DFLEs: file 'vrmypar.txt'
 Computing Total LEs with variances: file 'trmypar.txt' 
 Computing Variance-covariance of Prevalence limit: file 'vplrmypar.txt' 
 End of Imach
-
-    
-
-
-Once the running is finished, the program
-requires a character:
-
-
-    
-        
-    
-Type e to edit output files, g
-        to graph again, c to start again, and q for exiting:
-
-In order to have an idea of the time needed to reach convergence,
-IMaCh gives an estimation if the convergence needs 10, 20 or 30
-iterations. It might be useful.
-
-First you should enter e to
-edit the master file mypar.htm. 
-
-
-    Outputs files 

-        

-        - Copy of the parameter file: ormypar.txt

-        - Gnuplot file name: mypar.gp.txt

-        - Cross-sectional prevalence in each state: prmypar.txt 

-        - Period prevalence in each state: plrmypar.txt 

-        - Transition probabilities: pijrmypar.txt

-        - Life expectancies by age and initial health status
-        (estepm=24 months): ermypar.txt
-        

-        - Parameter file with estimated parameters and the
-        covariance matrix: rmypar.txt 

-        - Variance of one-step probabilities: probrmypar.txt 

-        - Variances of life expectancies by age and initial
-        health status (estepm=24 months): vrmypar.txt

-        - Health expectancies with their variances: trmypar.txt 

-        - Standard deviation of period prevalences: vplrmypar.txt 

-        No population forecast: popforecast = 0 (instead of 1) or
-        stepm = 24 (instead of 1) or model=. (instead of .)

-        

-        
-    Graphs 

-        

-        -One-step transition
-        probabilities

-        -Convergence to the
-        period prevalence

-        -Cross-sectional and period
-        prevalence in state (1) with the confident interval 

-        -Cross-sectional and period
-        prevalence in state (2) with the confident interval 

-        -Health life
-        expectancies by age and initial health state (1) 

-        -Health life
-        expectancies by age and initial health state (2) 

-        -Total life expectancy by
-        age and health expectancies in states (1) and (2). 
-
-
-This software have been partly granted by Euro-REVES, a concerted action
-from the European Union. Since 2003 it is also partly granted by the
-French Institute on Longevity. It will be copyrighted identically to a
-GNU software product, i.e. program and software can be distributed
-freely for non commercial use. Sources are not widely distributed
-today because some part of the codes are copyrighted by Numerical
-Recipes in C. You can get our GPL codes by asking us with a simple
-justification (name, email, institute) mailto:brouard@ined.fr and mailto:lievre@ined.fr .
-
-Latest version (0.97b of June 2004) can be accessed at http://euroreves.ined.fr/imach

-
-
-
+

Once the running is finished, the program requires a +character:

+ + + +

Type e to edit output files, g to graph again, c + to start again, and q for exiting:

In order to +have an idea of the time needed to reach convergence, IMaCh gives an estimation +if the convergence needs 10, 20 or 30 iterations. It might be useful. +

First you should enter e to edit the master +file mypar.htm.

Outputs files

- Copy of the parameter file: ormypar.txt
- + Gnuplot file name: mypar.gp.txt
- + Cross-sectional prevalence in each state: prmypar.txt
- + Period prevalence in each state: plrmypar.txt
- + Transition probabilities: pijrmypar.txt
- + Life expectancies by age and initial health status (estepm=24 months): ermypar.txt
- + Parameter file with estimated parameters and the covariance matrix: rmypar.txt
- + Variance of one-step probabilities: probrmypar.txt +
- Variances of life expectancies by age and initial health status + (estepm=24 months): vrmypar.txt
- + Health expectancies with their variances: trmypar.txt
- + Standard deviation of period prevalences: vplrmypar.txt +
No population forecast: popforecast = 0 (instead of 1) or stepm = 24 + (instead of 1) or model=. (instead of .)

+
Graphs

-One-step transition + probabilities
-Convergence to the + period prevalence
-Cross-sectional and + period prevalence in state (1) with the confident interval
-Cross-sectional and + period prevalence in state (2) with the confident interval
-Health life + expectancies by age and initial health state (1)
-Health life + expectancies by age and initial health state (2)
-Total life expectancy + by age and health expectancies in states (1) and (2).

This software have been partly granted by Euro-REVES, a concerted action from the +European Union. It will be copyrighted identically to a GNU software product, +i.e. program and software can be distributed freely for non commercial use. +Sources are not widely distributed today. You can get them by asking us with a +simple justification (name, email, institute) mailto:brouard@ined.fr and mailto:lievre@ined.fr .

Latest version (0.97b of June 2004) can be accessed at http://euroreves.ined.fr/imach

-- 2.43.0

Computing Health -Expectancies using IMaCh

(a Maximum -Likelihood Computer Program using Interpolation of Markov Chains)

INED and EUROREVES

Contribution to the mathematics: C. R. -Heathcote (Australian -National University, Canberra).

Contact: Agnès Lièvre (lievre@ined.fr)

Your first example parameter file

#Imach version 0.97b, June 2004, -INED-EUROREVES

First uncommented line

Covariates

Guess values for optimization

Computing Health Expectancies using +IMaCh

(a Maximum Likelihood Computer +Program using Interpolation of Markov Chains)

INED and EUROREVES

Contribution to the mathematics: C. R. Heathcote +(Australian National University, +Canberra).

Contact: Agnès Lièvre (lievre@ined.fr)

Your first example parameter file

#Imach version 0.97b, June 2004, INED-EUROREVES

First uncommented line

Covariates

Guess values for optimization +

Guess values for computing variances

Guess values for computing variances

Covariance matrix of parameters

Covariance matrix of parameters

Age range for calculation of stationary -prevalences and health expectancies

Computing the cross-sectional prevalence

Population- or status-based health -expectancies

Prevalence forecasting (Experimental)

Population forecasting (Experimental)

Running Imach -with this example

Results files - -- cross-sectional prevalence in each state (and at first pass): -biaspar/prbiaspar.txt -

Age range for calculation of stationary prevalences and +health expectancies

Computing the cross-sectional prevalence

Population- or status-based health +expectancies

Prevalence forecasting (Experimental)

Population forecasting (Experimental)

Running Imach with this +example

Results files - cross-sectional prevalence in each state (and at first pass): biaspar/prbiaspar.txt

- Estimated parameters and -covariance matrix: rbiaspar.imach

- Estimated parameters and covariance +matrix: rbiaspar.imach

- Transition probabilities: +biaspar/pijrbiaspar.txt

- Period prevalence in each state: -biaspar/plrbiaspar.txt

- Period +prevalence in each state: biaspar/plrbiaspar.txt

- Standard deviation of -period prevalence: biaspar/vplrbiaspar.txt

-cross-sectional and period -prevalence in state (2=disable) with confidence interval: -biaspar/vbiaspar21.png

-Convergence to the -period prevalence of disability: biaspar/pbiaspar11.png -

- Life expectancies by age -and initial health status with standard deviation: biaspar/erbiaspar.txt

- Standard deviation of period +prevalence: biaspar/vplrbiaspar.txt

-cross-sectional and period prevalence in state +(2=disable) with confidence interval: biaspar/vbiaspar21.png

-Convergence to the period prevalence of +disability: biaspar/pbiaspar11.png

- Life expectancies by age and initial health +status with standard deviation: biaspar/erbiaspar.txt

- Variances of life -expectancies by age and initial health status: biaspar/vrbiaspar.txt

-Variances of one-step -probabilities : biaspar/probrbiaspar.txt

- Health -expectancies -with standard errors in parentheses: biaspar/trbiaspar.txt

-Total life expectancy by -age and health expectancies in states (1=healthy) and (2=disable): -biaspar/ebiaspar1.png

- Copy of the parameter -file: orbiaspar.txt

- Prevalence forecasting: -biaspar/frbiaspar.txt

- Variances of life expectancies by age and +initial health status: biaspar/vrbiaspar.txt

-Variances of one-step probabilities +: biaspar/probrbiaspar.txt

- Health +expectancies with standard errors +in parentheses: biaspar/trbiaspar.txt

-Total life expectancy by age and health +expectancies in states (1=healthy) and (2=disable): biaspar/ebiaspar1.png

- Copy of the parameter file: +orbiaspar.txt

- Prevalence forecasting: biaspar/frbiaspar.txt

- Population forecasting: -biaspar/poprbiaspar.txt

- Population forecasting: biaspar/poprbiaspar.txt

Trying an example

Trying an example

Results files
-
-- cross-sectional prevalence in each state (and at first pass): -biaspar/prbiaspar.txt
-

Results files

- cross-sectional prevalence in each state (and at first pass): biaspar/prbiaspar.txt

-Convergence to the -period prevalence of disability: biaspar/pbiaspar11.png
-