File:  [Local Repository] / imach / html / doc / imach.htm
Revision 1.4: download - view: text, annotated - select for diffs
Tue May 17 22:13:25 2005 UTC (19 years, 6 months ago) by brouard
Branches: MAIN
CVS tags: HEAD
typo

<!-- $Id: imach.htm,v 1.4 2005/05/17 22:13:25 brouard Exp $ --!>
<html>

<head>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
<title>Computing Health Expectancies using IMaCh</title>
<html>

<head>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
<title>IMaCh</title>
</head>

<body bgcolor="#FFFFFF">

<hr size="3" color="#EC5E5E">

<h1 align="center"><font color="#00006A">Computing Health
Expectancies using IMaCh</font></h1>

<h1 align="center"><font color="#00006A" size="5">(a Maximum
Likelihood Computer Program using Interpolation of Markov Chains)</font></h1>

<p align="center">&nbsp;</p>

<p align="center"><a href="http://www.ined.fr/"><img
src="logo-ined.gif" border="0" width="151" height="76"></a><img
src="euroreves2.gif" width="151" height="75"></p>

<h3 align="center"><a href="http://www.ined.fr/"><font
color="#00006A">INED</font></a><font color="#00006A"> and </font><a
href="http://euroreves.ined.fr"><font color="#00006A">EUROREVES</font></a></h3>

<p align="center"><font color="#00006A" size="4"><strong>Version
0.97, June 2004</strong></font></p>

<hr size="3" color="#EC5E5E">

<p align="center"><font color="#00006A"><strong>Authors of the
program: </strong></font><a href="http://sauvy.ined.fr/brouard"><font
color="#00006A"><strong>Nicolas Brouard</strong></font></a><font
color="#00006A"><strong>, senior researcher at the </strong></font><a
href="http://www.ined.fr"><font color="#00006A"><strong>Institut
National d'Etudes Démographiques</strong></font></a><font
color="#00006A"><strong> (INED, Paris) in the &quot;Mortality,
Health and Epidemiology&quot; Research Unit </strong></font></p>

<p align="center"><font color="#00006A"><strong>and Agnès
Lièvre<br clear="left">
</strong></font></p>

<h4><font color="#00006A">Contribution to the mathematics: C. R.
Heathcote </font><font color="#00006A" size="2">(Australian
National University, Canberra).</font></h4>

<h4><font color="#00006A">Contact: Agnès Lièvre (</font><a
href="mailto:lievre@ined.fr"><font color="#00006A"><i>lievre@ined.fr</i></font></a><font
color="#00006A">) </font></h4>

<hr>

<ul>
    <li><a href="#intro">Introduction</a> </li>
    <li><a href="#data">On what kind of data can it be used?</a></li>
    <li><a href="#datafile">The data file</a> </li>
    <li><a href="#biaspar">The parameter file</a> </li>
    <li><a href="#running">Running Imach</a> </li>
    <li><a href="#output">Output files and graphs</a> </li>
    <li><a href="#example">Exemple</a> </li>
</ul>

<hr>

<h2><a name="intro"><font color="#00006A">Introduction</font></a></h2>

<p>This program computes <b>Healthy Life Expectancies</b> from <b>cross-longitudinal
data</b> using the methodology pioneered by Laditka and Wolf (1).
Within the family of Health Expectancies (HE), Disability-free
life expectancy (DFLE) is probably the most important index to
monitor. In low mortality countries, there is a fear that when
mortality declines, the increase in DFLE is not proportionate to
the increase in total Life expectancy. This case is called the <em>Expansion
of morbidity</em>. Most of the data collected today, in
particular by the international <a href="http://www.reves.org">REVES</a>
network on Health expectancy, and most HE indices based on these
data, are <em>cross-sectional</em>. It means that the information
collected comes from a single cross-sectional survey: people from
various ages (but mostly old people) are surveyed on their health
status at a single date. Proportion of people disabled at each
age, can then be measured at that date. This age-specific
prevalence curve is then used to distinguish, within the
stationary population (which, by definition, is the life table
estimated from the vital statistics on mortality at the same
date), the disable population from the disability-free
population. Life expectancy (LE) (or total population divided by
the yearly number of births or deaths of this stationary
population) is then decomposed into DFLE and DLE. This method of
computing HE is usually called the Sullivan method (from the name
of the author who first described it).</p>

<p>Age-specific proportions of people disabled (prevalence of
disability) are dependent on the historical flows from entering
disability and recovering in the past until today. The age-specific
forces (or incidence rates), estimated over a recent period of time
(like for period forces of mortality), of entering disability or
recovering a good health, are reflecting current conditions and
therefore can be used at each age to forecast the future of this
cohort<em>if nothing changes in the future</em>, i.e to forecast the
prevalence of disability of each cohort. Our finding (2) is that the period
prevalence of disability (computed from period incidences) is lower
than the cross-sectional prevalence. For example if a country is
improving its technology of prosthesis, the incidence of recovering
the ability to walk will be higher at each (old) age, but the
prevalence of disability will only slightly reflect an improve because
the prevalence is mostly affected by the history of the cohort and not
by recent period effects. To measure the period improvement we have to
simulate the future of a cohort of new-borns entering or leaving at
each age the disability state or dying according to the incidence
rates measured today on different cohorts. The proportion of people
disabled at each age in this simulated cohort will be much lower that
the proportions observed at each age in a cross-sectional survey. This
new prevalence curve introduced in a life table will give a more
realistic HE level than the Sullivan method which mostly measured the
History of health conditions in this country.</p>

<p>Therefore, the main question is how to measure incidence rates
from cross-longitudinal surveys? This is the goal of the IMaCH
program. From your data and using IMaCH you can estimate period
HE and not only Sullivan's HE. Also the standard errors of the HE
are computed.</p>

<p>A cross-longitudinal survey consists in a first survey
(&quot;cross&quot;) where individuals from different ages are
interviewed on their health status or degree of disability. At
least a second wave of interviews (&quot;longitudinal&quot;)
should measure each new individual health status. Health
expectancies are computed from the transitions observed between
waves and are computed for each degree of severity of disability
(number of life states). More degrees you consider, more time is
necessary to reach the Maximum Likelihood of the parameters
involved in the model. Considering only two states of disability
(disable and healthy) is generally enough but the computer
program works also with more health statuses.<br>
<br>
The simplest model is the multinomial logistic model where <i>pij</i>
is the probability to be observed in state <i>j</i> at the second
wave conditional to be observed in state <em>i</em> at the first
wave. Therefore a simple model is: log<em>(pij/pii)= aij +
bij*age+ cij*sex,</em> where '<i>age</i>' is age and '<i>sex</i>'
is a covariate. The advantage that this computer program claims,
comes from that if the delay between waves is not identical for
each individual, or if some individual missed an interview, the
information is not rounded or lost, but taken into account using
an interpolation or extrapolation. <i>hPijx</i> is the
probability to be observed in state <i>i</i> at age <i>x+h</i>
conditional to the observed state <i>i</i> at age <i>x</i>. The
delay '<i>h</i>' can be split into an exact number (<i>nh*stepm</i>)
of unobserved intermediate states. This elementary transition (by
month or quarter trimester, semester or year) is modeled as a
multinomial logistic. The <i>hPx</i> matrix is simply the matrix
product of <i>nh*stepm</i> elementary matrices and the
contribution of each individual to the likelihood is simply <i>hPijx</i>.
<br>
</p>

<p>The program presented in this manual is a quite general
program named <strong>IMaCh</strong> (for <strong>I</strong>nterpolated
<strong>MA</strong>rkov <strong>CH</strong>ain), designed to
analyse transition data from longitudinal surveys. The first step
is the parameters estimation of a transition probabilities model
between an initial status and a final status. From there, the
computer program produces some indicators such as observed and
stationary prevalence, life expectancies and their variances and
graphs. Our transition model consists in absorbing and
non-absorbing states with the possibility of return across the
non-absorbing states. The main advantage of this package,
compared to other programs for the analysis of transition data
(For example: Proc Catmod of SAS<sup>®</sup>) is that the whole
individual information is used even if an interview is missing, a
status or a date is unknown or when the delay between waves is
not identical for each individual. The program can be executed
according to parameters: selection of a sub-sample, number of
absorbing and non-absorbing states, number of waves taken in
account (the user inputs the first and the last interview), a
tolerance level for the maximization function, the periodicity of
the transitions (we can compute annual, quarterly or monthly
transitions), covariates in the model. It works on Windows or on
Unix.<br>
</p>

<hr>

<p>(1) Laditka, Sarah B. and Wolf, Douglas A. (1998), &quot;New
Methods for Analyzing Active Life Expectancy&quot;. <i>Journal of
Aging and Health</i>. Vol 10, No. 2. </p>
<p>(2) <a href=http://taylorandfrancis.metapress.com/app/home/contribution.asp?wasp=1f99bwtvmk5yrb7hlhw3&referrer=parent&backto=issue,1,2;journal,2,5;linkingpublicationresults,1:300265,1
>Lièvre A., Brouard N. and Heathcote Ch. (2003) Estimating Health Expectancies 
from Cross-longitudinal surveys. <em>Mathematical Population Studies</em>.- 10(4), pp. 211-248</a>

<hr>

<h2><a name="data"><font color="#00006A">On what kind of data can
it be used?</font></a></h2>

<p>The minimum data required for a transition model is the
recording of a set of individuals interviewed at a first date and
interviewed again at least one another time. From the
observations of an individual, we obtain a follow-up over time of
the occurrence of a specific event. In this documentation, the
event is related to health status at older ages, but the program
can be applied on a lot of longitudinal studies in different
contexts. To build the data file explained into the next section,
you must have the month and year of each interview and the
corresponding health status. But in order to get age, date of
birth (month and year) is required (missing values is allowed for
month). Date of death (month and year) is an important
information also required if the individual is dead. Shorter
steps (i.e. a month) will more closely take into account the
survival time after the last interview.</p>

<hr>

<h2><a name="datafile"><font color="#00006A">The data file</font></a></h2>

<p>In this example, 8,000 people have been interviewed in a
cross-longitudinal survey of 4 waves (1984, 1986, 1988, 1990).  Some
people missed 1, 2 or 3 interviews. Health statuses are healthy (1)
and disable (2). The survey is not a real one. It is a simulation of
the American Longitudinal Survey on Aging. The disability state is
defined if the individual missed one of four ADL (Activity of daily
living, like bathing, eating, walking).  Therefore, even if the
individuals interviewed in the sample are virtual, the information
brought with this sample is close to the situation of the United
States. Sex is not recorded is this sample. The LSOA survey is biased
in the sense that people living in an institution were not surveyed at
first pass in 1984. Thus the prevalence of disability in 1984 is
biased downwards at old ages. But when people left their household to
an institution, they have been surveyed in their institution in 1986,
1988 or 1990. Thus incidences are not biased. But cross-sectional
prevalences of disability at old ages are thus artificially increasing
in 1986, 1988 and 1990 because of a higher weight of people
institutionalized in the sample. Our article shows the
opposite: the period prevalence is lower at old ages than the
adjusted cross-sectional prevalence proving important current progress
against disability.</p>

<p>Each line of the data set (named <a href="data1.txt">data1.txt</a>
in this first example) is an individual record. Fields are separated
by blanks: </p>

<ul>
    <li><b>Index number</b>: positive number (field 1) </li>
    <li><b>First covariate</b> positive number (field 2) </li>
    <li><b>Second covariate</b> positive number (field 3) </li>
    <li><a name="Weight"><b>Weight</b></a>: positive number
        (field 4) . In most surveys individuals are weighted
        according to the stratification of the sample.</li>
    <li><b>Date of birth</b>: coded as mm/yyyy. Missing dates are
        coded as 99/9999 (field 5) </li>
    <li><b>Date of death</b>: coded as mm/yyyy. Missing dates are
        coded as 99/9999 (field 6) </li>
    <li><b>Date of first interview</b>: coded as mm/yyyy. Missing
        dates are coded as 99/9999 (field 7) </li>
    <li><b>Status at first interview</b>: positive number.
        Missing values ar coded -1. (field 8) </li>
    <li><b>Date of second interview</b>: coded as mm/yyyy.
        Missing dates are coded as 99/9999 (field 9) </li>
    <li><strong>Status at second interview</strong> positive
        number. Missing values ar coded -1. (field 10) </li>
    <li><b>Date of third interview</b>: coded as mm/yyyy. Missing
        dates are coded as 99/9999 (field 11) </li>
    <li><strong>Status at third interview</strong> positive
        number. Missing values ar coded -1. (field 12) </li>
    <li><b>Date of fourth interview</b>: coded as mm/yyyy.
        Missing dates are coded as 99/9999 (field 13) </li>
    <li><strong>Status at fourth interview</strong> positive
        number. Missing values are coded -1. (field 14) </li>
    <li>etc</li>
</ul>

<p>&nbsp;</p>

<p>If your longitudinal survey does not include information about
weights or covariates, you must fill the column with a number
(e.g. 1) because a missing field is not allowed.</p>

<hr>

<h2><font color="#00006A">Your first example parameter file</font><a
href="http://euroreves.ined.fr/imach"></a><a name="uio"></a></h2>

<h2><a name="biaspar"></a>#Imach version 0.97b, June 2004,
INED-EUROREVES </h2>

<p>This first line was a comment. Comments line start with a '#'.</p>

<h4><font color="#FF0000">First uncommented line</font></h4>

<pre>title=1st_example datafile=data1.txt lastobs=8600 firstpass=1 lastpass=4</pre>

<ul>
    <li><b>title=</b> 1st_example is title of the run. </li>
    <li><b>datafile=</b> data1.txt is the name of the data set.
        Our example is a six years follow-up survey. It consists
        in a baseline followed by 3 reinterviews. </li>
    <li><b>lastobs=</b> 8600 the program is able to run on a
        subsample where the last observation number is lastobs.
        It can be set a bigger number than the real number of
        observations (e.g. 100000). In this example, maximisation
        will be done on the 8600 first records. </li>
    <li><b>firstpass=1</b> , <b>lastpass=4 </b>In case of more
        than two interviews in the survey, the program can be run
        on selected transitions periods. firstpass=1 means the
        first interview included in the calculation is the
        baseline survey. lastpass=4 means that the information
        brought by the 4th interview is taken into account.</li>
</ul>

<p>&nbsp;</p>

<h4><a name="biaspar-2"><font color="#FF0000">Second uncommented
line</font></a></h4>

<pre>ftol=1.e-08 stepm=1 ncovcol=2 nlstate=2 ndeath=1 maxwav=4 mle=1 weight=0</pre>

<ul>
    <li><b>ftol=1e-8</b> Convergence tolerance on the function
        value in the maximisation of the likelihood. Choosing a
        correct value for ftol is difficult. 1e-8 is a correct
        value for a 32 bits computer.</li>
    <li><b>stepm=1</b> Time unit in months for interpolation.
        Examples:<ul>
            <li>If stepm=1, the unit is a month </li>
            <li>If stepm=4, the unit is a trimester</li>
            <li>If stepm=12, the unit is a year </li>
            <li>If stepm=24, the unit is two years</li>
            <li>... </li>
        </ul>
    </li>
    <li><b>ncovcol=2</b> Number of covariate columns included in the
        datafile before the column of the date of birth. You can have
covariates that won't necessary be used during the
        run. It is not the number of covariates that will be
        specified by the model. The 'model' syntax describes the
        covariates to be taken into account during the run. </li>
    <li><b>nlstate=2</b> Number of non-absorbing (alive) states.
        Here we have two alive states: disability-free is coded 1
        and disability is coded 2. </li>
    <li><b>ndeath=1</b> Number of absorbing states. The absorbing
        state death is coded 3. </li>
    <li><b>maxwav=4</b> Number of waves in the datafile.</li>
    <li><a name="mle"><b>mle</b></a><b>=1</b> Option for the
        Maximisation Likelihood Estimation. <ul>
            <li>If mle=1 the program does the maximisation and
                the calculation of health expectancies </li>
            <li>If mle=0 the program only does the calculation of
                the health expectancies and other indices and graphs
but without the maximization.. </li>
               There also other possible values:
          <ul>
            <li>If mle=-1 you get a template which can be useful if
your model is complex with many covariates.</li>
            <li> If mle=-3 IMaCh computes the mortality but without
            any health status (May 2004)</li> <li>If mle=2 IMach
            likelihood corresponds to a linear interpolation</li> <li>
            If mle=3 IMach likelihood corresponds to an exponential
            inter-extrapolation</li> 
            <li> If mle=4 IMach likelihood
            corresponds to no inter-extrapolation, and thus biasing
            the results. </li> 
            <li> If mle=5 IMach likelihood
            corresponds to no inter-extrapolation, and before the
            correction of the Jackson's bug (avoid this).</li>
            </ul>
        </ul>
    </li>
    <li><b>weight=0</b> Possibility to add weights. <ul>
            <li>If weight=0 no weights are included </li>
            <li>If weight=1 the maximisation integrates the
                weights which are in field <a href="#Weight">4</a></li>
        </ul>
    </li>
</ul>

<h4><font color="#FF0000">Covariates</font></h4>

<p>Intercept and age are systematically included in the model.
Additional covariates can be included with the command: </p>

<pre>model=<em>list of covariates</em></pre>

<ul>
    <li>if<strong> model=. </strong>then no covariates are
        included</li>
    <li>if <strong>model=V1</strong> the model includes the first
        covariate (field 2)</li>
    <li>if <strong>model=V2 </strong>the model includes the
        second covariate (field 3)</li>
    <li>if <strong>model=V1+V2 </strong>the model includes the
        first and the second covariate (fields 2 and 3)</li>
    <li>if <strong>model=V1*V2 </strong>the model includes the
        product of the first and the second covariate (fields 2
        and 3)</li>
    <li>if <strong>model=V1+V1*age</strong> the model includes
        the product covariate*age</li>
</ul>

<p>In this example, we have two covariates in the data file
(fields 2 and 3). The number of covariates included in the data
file between the id and the date of birth is ncovcol=2 (it was
named ncov in version prior to 0.8). If you have 3 covariates in
the datafile (fields 2, 3 and 4), you will set ncovcol=3. Then
you can run the programme with a new parametrisation taking into
account the third covariate. For example, <strong>model=V1+V3 </strong>estimates
a model with the first and third covariates. More complicated
models can be used, but it will takes more time to converge. With
a simple model (no covariates), the programme estimates 8
parameters. Adding covariates increases the number of parameters
: 12 for <strong>model=V1, </strong>16 for <strong>model=V1+V1*age
</strong>and 20 for <strong>model=V1+V2+V3.</strong></p>

<h4><font color="#FF0000">Guess values for optimization</font><font
color="#00006A"> </font></h4>

<p>You must write the initial guess values of the parameters for
optimization. The number of parameters, <em>N</em> depends on the
number of absorbing states and non-absorbing states and on the
number of covariates. <br>
<em>N</em> is given by the formula <em>N</em>=(<em>nlstate</em> +
<em>ndeath</em>-1)*<em>nlstate</em>*<em>ncovmodel</em>&nbsp;. <br>
<br>
Thus in the simple case with 2 covariates (the model is log
(pij/pii) = aij + bij * age where intercept and age are the two
covariates), and 2 health degrees (1 for disability-free and 2
for disability) and 1 absorbing state (3), you must enter 8
initials values, a12, b12, a13, b13, a21, b21, a23, b23. You can
start with zeros as in this example, but if you have a more
precise set (for example from an earlier run) you can enter it
and it will speed up them<br>
Each of the four lines starts with indices &quot;ij&quot;: <b>ij
aij bij</b> </p>

<blockquote>
    <pre># Guess values of aij and bij in log (pij/pii) = aij + bij * age
12 -14.155633  0.110794 
13  -7.925360  0.032091 
21  -1.890135 -0.029473 
23  -6.234642  0.022315 </pre>
</blockquote>

<p>or, to simplify (in most of cases it converges but there is no
warranty!): </p>

<blockquote>
    <pre>12 0.0 0.0
13 0.0 0.0
21 0.0 0.0
23 0.0 0.0</pre>
</blockquote>

<p>In order to speed up the convergence you can make a first run
with a large stepm i.e stepm=12 or 24 and then decrease the stepm
until stepm=1 month. If newstepm is the new shorter stepm and
stepm can be expressed as a multiple of newstepm, like newstepm=n
stepm, then the following approximation holds: </p>

<pre>aij(stepm) = aij(n . stepm) - ln(n)
</pre>

<p>and </p>

<pre>bij(stepm) = bij(n . stepm) .</pre>

<p>For example if you already ran for a 6 months interval and
got:<br>
</p>

<pre># Parameters
12 -13.390179  0.126133 
13  -7.493460  0.048069 
21   0.575975 -0.041322 
23  -4.748678  0.030626 
</pre>

<p>If you now want to get the monthly estimates, you can guess
the aij by substracting ln(6)= 1,7917<br>
and running<br>
</p>

<pre>12 -15.18193847  0.126133 
13 -9.285219469  0.048069
21 -1.215784469 -0.041322
23 -6.540437469  0.030626
</pre>

<p>and get<br>
</p>

<pre>12 -15.029768 0.124347 
13 -8.472981 0.036599 
21 -1.472527 -0.038394 
23 -6.553602 0.029856 

which is closer to the results. The approximation is probably useful
only for very small intervals and we don't have enough experience to
know if you will speed up the convergence or not.
</pre>

<pre>         -ln(12)= -2.484
 -ln(6/1)=-ln(6)= -1.791
 -ln(3/1)=-ln(3)= -1.0986
-ln(12/6)=-ln(2)= -0.693
</pre>

In version 0.9 and higher you can still have valuable results even if
your stepm parameter is bigger than a month. The idea is to run with
bigger stepm in order to have a quicker convergence at the price of a
small bias. Once you know which model you want to fit, you can put
stepm=1 and wait hours or days to get the convergence!

To get unbiased results even with large stepm we introduce the idea of
pseudo likelihood by interpolating two exact likelihoods. Let us
detail this:
<p>
If the interval of <em>d</em> months between two waves is not a
mutliple of 'stepm', but is comprised between <em>(n-1) stepm</em> and
<em>n stepm</em> then both exact likelihoods are computed (the
contribution to the likelihood at <em>n stepm</em> requires one matrix
product more) (let us remember that we are modelling the probability
to be observed in a particular state after <em>d</em> months being
observed at a particular state at 0). The distance, (<em>bh</em> in
the program), from the month of interview to the rounded date of <em>n
stepm</em> is computed. It can be negative (interview occurs before
<em>n stepm</em>) or positive if the interview occurs after <em>n
stepm</em> (and before <em>(n+1)stepm</em>).
<br>
Then the final contribution to the total likelihood is a weighted
average of these two exact likelihoods at <em>n stepm</em> (out) and
at <em>(n-1)stepm</em>(savm). We did not want to compute the third
likelihood at <em>(n+1)stepm</em> because it is too costly in time, so
we used an extrapolation if <em>bh</em> is positive.  <br> Formula of
inter/extrapolation may vary according to the value of parameter mle:
<pre>
mle=1	  lli= log((1.+bbh)*out[s1][s2]- bbh*savm[s1][s2]); /* linear interpolation */

mle=2	lli= (savm[s1][s2]>(double)1.e-8 ? \
          log((1.+bbh)*out[s1][s2]- bbh*(savm[s1][s2])): \
          log((1.+bbh)*out[s1][s2])); /* linear interpolation */
mle=3	lli= (savm[s1][s2]>1.e-8 ? \
          (1.+bbh)*log(out[s1][s2])- bbh*log(savm[s1][s2]): \
          log((1.+bbh)*out[s1][s2])); /* exponential inter-extrapolation */

mle=4   lli=log(out[s[mw[mi][i]][i]][s[mw[mi+1][i]][i]]); /* No interpolation  */
        no need to save previous likelihood into memory.
</pre>
<p>
If the death occurs between first and second pass, and for example
more precisely between <em>n stepm</em> and <em>(n+1)stepm</em> the
contribution of this people to the likelihood is simply the difference
between the probability of dying before <em>n stepm</em> and the
probability of dying before <em>(n+1)stepm</em>. There was a bug in
version 0.8 and death was treated as any other state, i.e. as if it
was an observed death at second pass. This was not precise but
correct, but when information on the precise month of death came
(death occuring prior to second pass) we did not change the likelihood
accordingly. Thanks to Chris Jackson for correcting us. In earlier
versions (fortunately before first publication) the total mortality
was overestimated (people were dying too early) of about 10%. Version
0.95 and higher are correct.

<p> Our suggested choice is mle=1 . If stepm=1 there is no difference
between various mle options (methods of interpolation). If stepm is
big, like 12 or 24 or 48 and mle=4 (no interpolation) the bias may be
very important if the mean duration between two waves is not a
multiple of stepm. See the appendix in our main publication concerning
the sine curve of biases.
 

<h4><font color="#FF0000">Guess values for computing variances</font></h4>

<p>These values are output by the maximisation of the likelihood <a
href="#mle">mle</a>=1. These valuse can be used as an input of a
second run in order to get the various output data files (Health
expectancies, period prevalence etc.) and figures without rerunning
the long maximisation phase (mle=0). </p>

<p>These 'scales' are small values needed for the computing of
numerical derivatives. These derivatives are used to compute the
hessian matrix of the parameters, that is the inverse of the
covariance matrix. They are often used for estimating variances and
confidence intervals. Each line consists in indices &quot;ij&quot;
followed by the initial scales (zero to simplify) associated with aij
and bij. </p>

<ul>
    <li>If mle=1 you can enter zeros:</li>
    <li><blockquote>
            <pre># Scales (for hessian or gradient estimation)
12 0. 0. 
13 0. 0. 
21 0. 0. 
23 0. 0. </pre>
        </blockquote>
    </li>
    <li>If mle=0 (no maximisation of Likelihood) you must enter a covariance matrix (usually
        obtained from an earlier run).</li>
</ul>

<h4><font color="#FF0000">Covariance matrix of parameters</font></h4>

<p>The covariance matrix is output if <a href="#mle">mle</a>=1. But it can be
also used as an input to get the various output data files (Health
expectancies, period prevalence etc.) and figures without
rerunning the maximisation phase (mle=0). <br>
Each line starts with indices &quot;ijk&quot; followed by the
covariances between aij and bij:<br>
</p>

<pre>
   121 Var(a12) 
   122 Cov(b12,a12)  Var(b12) 
          ...
   232 Cov(b23,a12)  Cov(b23,b12) ... Var (b23) </pre>

<ul>
    <li>If mle=1 you can enter zeros. </li>
    <li><pre># Covariance matrix
121 0.
122 0. 0.
131 0. 0. 0. 
132 0. 0. 0. 0. 
211 0. 0. 0. 0. 0. 
212 0. 0. 0. 0. 0. 0. 
231 0. 0. 0. 0. 0. 0. 0. 
232 0. 0. 0. 0. 0. 0. 0. 0.</pre>
    </li>
    <li>If mle=0 you must enter a covariance matrix (usually
        obtained from an earlier run). </li>
</ul>

<h4><font color="#FF0000">Age range for calculation of stationary
prevalences and health expectancies</font></h4>

<pre>agemin=70 agemax=100 bage=50 fage=100</pre>

<p>
Once we obtained the estimated parameters, the program is able
to calculate period prevalence, transitions probabilities
and life expectancies at any age. Choice of age range is useful
for extrapolation. In this example, age of people interviewed varies
from 69 to 102 and the model is estimated using their exact ages. But
if you are interested in the age-specific period prevalence you can
start the simulation at an exact age like 70 and stop at 100. Then the
program will draw at least two curves describing the forecasted
prevalences of two cohorts, one for healthy people at age 70 and the second
for disabled people at the same initial age. And according to the
mixing property (ergodicity) and because of recovery, both prevalences
will tend to be identical at later ages. Thus if you want to compute
the prevalence at age 70, you should enter a lower agemin value.

<p>
Setting bage=50 (begin age) and fage=100 (final age), let
the program compute life expectancy from age 'bage' to age
'fage'. As we use a model, we can interessingly compute life
expectancy on a wider age range than the age range from the data.
But the model can be rather wrong on much larger intervals.
Program is limited to around 120 for upper age!
</pre>

<ul>
    <li><b>agemin=</b> Minimum age for calculation of the
        period prevalence </li>
    <li><b>agemax=</b> Maximum age for calculation of the
        period prevalence </li>
    <li><b>bage=</b> Minimum age for calculation of the health
        expectancies </li>
    <li><b>fage=</b> Maximum age for calculation of the health
        expectancies </li>
</ul>

<h4><a name="Computing"><font color="#FF0000">Computing</font></a><font
color="#FF0000"> the cross-sectional prevalence</font></h4>

<pre>begin-prev-date=1/1/1984 end-prev-date=1/6/1988 estepm=1</pre>

<p>
Statements 'begin-prev-date' and 'end-prev-date' allow to
select the period in which we calculate the observed prevalences
in each state. In this example, the prevalences are calculated on
data survey collected between 1 january 1984 and 1 june 1988. 
</p>

<ul>
    <li><strong>begin-prev-date= </strong>Starting date
        (day/month/year)</li>
    <li><strong>end-prev-date= </strong>Final date
        (day/month/year)</li>
    <li><strong>estepm= </strong>Unit (in months).We compute the
        life expectancy from trapezoids spaced every estepm
        months. This is mainly to measure the difference between
        two models: for example if stepm=24 months pijx are given
        only every 2 years and by summing them we are calculating
        an estimate of the Life Expectancy assuming a linear
        progression inbetween and thus overestimating or
        underestimating according to the curvature of the
        survival function. If, for the same date, we estimate the
        model with stepm=1 month, we can keep estepm to 24 months
        to compare the new estimate of Life expectancy with the
        same linear hypothesis. A more precise result, taking
        into account a more precise curvature will be obtained if
        estepm is as small as stepm.</li>
</ul>

<h4><font color="#FF0000">Population- or status-based health
expectancies</font></h4>

<pre>pop_based=0</pre>

<p>The program computes status-based health expectancies, i.e health
expectancies which depend on the initial health state.  If you are
healthy, your healthy life expectancy (e11) is higher than if you were
disabled (e21, with e11 &gt; e21).<br> To compute a healthy life
expectancy 'independent' of the initial status we have to weight e11
and e21 according to the probability to be in each state at initial
age which are corresponding to the proportions of people in each health
state (cross-sectional prevalences).<p> 

We could also compute e12 and e12 and get e.2 by weighting them
according to the observed cross-sectional prevalences at initial age.
<p> In a similar way we could compute the total life expectancy by
summing e.1 and e.2 .
<br>
The main difference between 'population based' and 'implied' or
'period' consists in the weights used. 'Usually', cross-sectional
prevalences of disability are higher than period prevalences
particularly at old ages. This is true if the country is improving its
health system by teaching people how to prevent disability as by
promoting better screening, for example of people needing cataracts
surgeryand for many unknown reasons that this program may help to
discover. Then the proportion of disabled people at age 90 will be
lower than the current observed proportion.
<p>
Thus a better Health Expectancy and even a better Life Expectancy
value is given by forecasting not only the current lower mortality at
all ages but also a lower incidence of disability and higher recovery.
<br> Using the period prevalences as weight instead of the
cross-sectional prevalences we are computing indices which are more
specific to the current situations and therefore more useful to
predict improvements or regressions in the future as to compare
different policies in various countries.

<ul>
    <li><strong>popbased= 0 </strong>Health expectancies are computed
    at each age from period prevalences 'expected' at this initial
    age.</li> 
    <li><strong>popbased= 1 </strong>Health expectancies are
    computed at each age from cross-sectional 'observed' prevalence at
    this initial age. As all the population is not observed at the
    same exact date we define a short period were the observed
    prevalence can be computed.<br>

 We simply sum all people surveyed within these two exact dates
 who belong to a particular age group (single year) at the date of
 interview and being in a particular health state. Then it is easy to
get the proportion of people of a particular health status among all
people of the same age group.<br>

If both dates are spaced and are covering two waves or more, people
being interviewed twice or more are counted twice or more. The program
takes into account the selection of individuals interviewed between
firstpass and lastpass too (we don't know if it can be useful).
</li>
</ul>

<h4><font color="#FF0000">Prevalence forecasting (Experimental)</font></h4>

<pre>starting-proj-date=1/1/1989 final-proj-date=1/1/1992 mov_average=0 </pre>

<p>Prevalence and population projections are only available if
the interpolation unit is a month, i.e. stepm=1 and if there are
no covariate. The programme estimates the prevalence in each
state at a precise date expressed in day/month/year. The
programme computes one forecasted prevalence a year from a
starting date (1 january of 1989 in this example) to a final date
(1 january 1992). The statement mov_average allows to compute
smoothed forecasted prevalences with a five-age moving average
centered at the mid-age of the five-age period. <br>
</p>

<h4><font color="#FF0000">Population forecasting (Experimental)</font></h4>

<ul>
    <li><strong>starting-proj-date</strong>= starting date
        (day/month/year) of forecasting</li>
    <li><strong>final-proj-date= </strong>final date
        (day/month/year) of forecasting</li>
    <li><strong>mov_average</strong>= smoothing with a five-age
        moving average centered at the mid-age of the five-age
        period. The command<strong> mov_average</strong> takes
        value 1 if the prevalences are smoothed and 0 otherwise.</li>
</ul>


<ul type="disc">
    <li><b>popforecast=
        0 </b>Option for population forecasting. If
        popforecast=1, the programme does the forecasting<b>.</b></li>
    <li><b>popfile=
        </b>name of the population file</li>
    <li><b>popfiledate=</b>
        date of the population population</li>
    <li><b>last-popfiledate</b>=
        date of the last population projection&nbsp;</li>
</ul>

<hr>

<h2><a name="running"></a><font color="#00006A">Running Imach
with this example</font></h2>

<p>We assume that you already typed your <a href="biaspar.imach">1st_example
parameter file</a> as explained <a href="#biaspar">above</a>. 

To run the program under Windows you should either:
</p>

<ul>
    <li>click on the imach.exe icon and either:
      <ul>
         <li>enter the name of the
        parameter file which is for example <tt>
C:\home\myname\lsoa\biaspar.imach"</tt></li>
    <li>or locate the biaspar.imach icon in your folder such as
    <tt>C:\home\myname\lsoa</tt> 
    and drag it, with your mouse, on the already open imach window. </li>
  </ul>

 <li>With version (0.97b) if you ran setup at installation, Windows is
 supposed to understand the &quot;.imach&quot; extension and you can
 right click the biaspar.imach icon and either edit with wordpad
 (better than notepad) the parameter file or execute it with
 IMaCh. </li>
</ul>

<p>The time to converge depends on the step unit that you used (1
month is more precise but more cpu consuming), on the number of cases,
and on the number of variables (covariates).

<p>
The program outputs many files. Most of them are files which will be
plotted for better understanding.

</p>
To run under Linux it is mostly the same.
<p>
It is neither more difficult to run it under a MacIntosh.
<hr>

<h2><a name="output"><font color="#00006A">Output of the program
and graphs</font> </a></h2>

<p>Once the optimization is finished (once the convergence is
reached), many tables and graphics are produced.<p>
The IMaCh program will create a subdirectory of the same name as your
parameter file (here mypar) where all the tables and figures will be
stored.<br>

Important files like the log file and the output parameter file (which
contains the estimates of the maximisation) are stored at the main
level not in this subdirectory. File with extension .log and .txt can
be edited with a standard editor like wordpad or notepad or even can be
viewed with a browser like Internet Explorer or Mozilla.

<p> The main html file is also named with the same name <a
href="biaspar.htm">biaspar.htm</a>. You can click on it by holding
your shift key in order to open it in another window (Windows).
<p>
 Our grapher is Gnuplot, it is an interactive plotting program (GPL) which
 can also work in batch. A gnuplot reference manual is available <a
 href="http://www.gnuplot.info/">here</a>. <br> When the run is
 finished, and in order that the window doesn't disappear, the user
 should enter a character like <tt>q</tt> for quitting. <br> These
 characters are:<br>
</p>
<ul>
    <li>'e' for opening the main result html file <a
    href="biaspar.htm"><strong>biaspar.htm</strong></a> file to edit
    the output files and graphs. </li> 
    <li>'g' to graph again</li>
    <li>'c' to start again the program from the beginning.</li>
    <li>'q' for exiting.</li>
</ul>

The main gnuplot file is named <tt>biaspar.gp</tt> and can be edited (right
click) and run again.
<p>Gnuplot is easy and you can use it to make more complex
graphs. Just click on gnuplot and type plot sin(x) to see how easy it
is.


<h5><font size="4"><strong>Results files </strong></font><br>
<br>
<font color="#EC5E5E" size="3"><strong>- </strong></font><a
name="cross-sectional prevalence in each state"><font color="#EC5E5E"
size="3"><strong>cross-sectional prevalence in each state</strong></font></a><font
color="#EC5E5E" size="3"><strong> (and at first pass)</strong></font><b>:
</b><a href="biaspar/prbiaspar.txt"><b>biaspar/prbiaspar.txt</b></a><br>
</h5>

<p>The first line is the title and displays each field of the
file. First column corresponds to age. Fields 2 and 6 are the
proportion of individuals in states 1 and 2 respectively as
observed at first exam. Others fields are the numbers of
people in states 1, 2 or more. The number of columns increases if
the number of states is higher than 2.<br>
The header of the file is </p>

<pre># Age Prev(1) N(1) N Age Prev(2) N(2) N
70 1.00000 631 631 70 0.00000 0 631
71 0.99681 625 627 71 0.00319 2 627 
72 0.97125 1115 1148 72 0.02875 33 1148 </pre>

<p>It means that at age 70 (between 70 and 71), the prevalence in state 1 is 1.000
and in state 2 is 0.00 . At age 71 the number of individuals in
state 1 is 625 and in state 2 is 2, hence the total number of
people aged 71 is 625+2=627. <br>
</p>

<h5><font color="#EC5E5E" size="3"><b>- Estimated parameters and
covariance matrix</b></font><b>: </b><a href="rbiaspar.txt"><b>rbiaspar.imach</b></a></h5>

<p>This file contains all the maximisation results: </p>

<pre> -2 log likelihood= 21660.918613445392
 Estimated parameters: a12 = -12.290174 b12 = 0.092161 
                       a13 = -9.155590  b13 = 0.046627 
                       a21 = -2.629849  b21 = -0.022030 
                       a23 = -7.958519  b23 = 0.042614  
 Covariance matrix: Var(a12) = 1.47453e-001
                    Var(b12) = 2.18676e-005
                    Var(a13) = 2.09715e-001
                    Var(b13) = 3.28937e-005  
                    Var(a21) = 9.19832e-001
                    Var(b21) = 1.29229e-004
                    Var(a23) = 4.48405e-001
                    Var(b23) = 5.85631e-005 
 </pre>

<p>By substitution of these parameters in the regression model,
we obtain the elementary transition probabilities:</p>

<p><img src="biaspar/pebiaspar11.png" width="400" height="300"></p>

<h5><font color="#EC5E5E" size="3"><b>- Transition probabilities</b></font><b>:
</b><a href="biaspar/pijrbiaspar.txt"><b>biaspar/pijrbiaspar.txt</b></a></h5>

<p>Here are the transitions probabilities Pij(x, x+nh). The second
column is the starting age x (from age 95 to 65), the third is age
(x+nh) and the others are the transition probabilities p11, p12, p13,
p21, p22, p23. The first column indicates the value of the covariate
(without any other variable than age it is equal to 1) For example, line 5 of the file
is: </p>

<pre>1 100 106 0.02655 0.17622 0.79722 0.01809 0.13678 0.84513 </pre>

<p>and this means: </p>

<pre>p11(100,106)=0.02655
p12(100,106)=0.17622
p13(100,106)=0.79722
p21(100,106)=0.01809
p22(100,106)=0.13678
p22(100,106)=0.84513 </pre>

<h5><font color="#EC5E5E" size="3"><b>- </b></font><a
name="Period prevalence in each state"><font color="#EC5E5E"
size="3"><b>Period prevalence in each state</b></font></a><b>:
</b><a href="biaspar/plrbiaspar.txt"><b>biaspar/plrbiaspar.txt</b></a></h5>

<pre>#Prevalence
#Age 1-1 2-2

#************ 
70 0.90134 0.09866
71 0.89177 0.10823 
72 0.88139 0.11861 
73 0.87015 0.12985 </pre>

<p>At age 70 the period prevalence is 0.90134 in state 1 and 0.09866
in state 2. This period prevalence differs from the cross-sectional
prevalence. Here is the point. The cross-sectional prevalence at age
70 results from the incidence of disability, incidence of recovery and
mortality which occurred in the past of the cohort.  Period prevalence
results from a simulation with current incidences of disability,
recovery and mortality estimated from this cross-longitudinal
survey. It is a good predictin of the prevalence in the
future if &quot;nothing changes in the future&quot;. This is exactly
what demographers do with a period life table. Life expectancy is the
expected mean survival time if current mortality rates (age-specific incidences
of mortality) &quot;remain constant&quot; in the future. </p>

<h5><font color="#EC5E5E" size="3"><b>- Standard deviation of
period prevalence</b></font><b>: </b><a
href="biaspar/vplrbiaspar.txt"><b>biaspar/vplrbiaspar.txt</b></a></h5>

<p>The period prevalence has to be compared with the cross-sectional
prevalence. But both are statistical estimates and therefore
have confidence intervals.
<br>For the cross-sectional prevalence we generally need information on
the design of the surveys. It is usually not enough to consider the
number of people surveyed at a particular age and to estimate a
Bernouilli confidence interval based on the prevalence at that
age. But you can do it to have an idea of the randomness. At least you
can get a visual appreciation of the randomness by looking at the
fluctuation over ages.

<p> For the period prevalence it is possible to estimate the
confidence interval from the Hessian matrix (see the publication for
details). We are supposing that the design of the survey will only
alter the weight of each individual. IMaCh is scaling the weights of
individuals-waves contributing to the likelihood by making the sum of
the weights equal to the sum of individuals-waves contributing: a
weighted survey doesn't increase or decrease the size of the survey,
it only give more weights to some individuals and thus less to the
others.

<h5><font color="#EC5E5E" size="3">-cross-sectional and period
prevalence in state (2=disable) with confidence interval</font>:<b>
</b><a href="biaspar/vbiaspar21.htm"><b>biaspar/vbiaspar21.png</b></a></h5>

<p>This graph exhibits the period prevalence in state (2) with the
confidence interval in red. The green curve is the observed prevalence
(or proportion of individuals in state (2)).  Without discussing the
results (it is not the purpose here), we observe that the green curve
is rather below the period prevalence. It the data where not biased by
the non inclusion of people living in institutions we would have
concluded that the prevalence of disability will increase in the
future (see the main publication if you are interested in real data
and results which are opposite).</p>

<p><img src="biaspar/vbiaspar21.png" width="400" height="300"></p>

<h5><font color="#EC5E5E" size="3"><b>-Convergence to the
period prevalence of disability</b></font><b>: </b><a
href="biaspar/pbiaspar11.png"><b>biaspar/pbiaspar11.png</b></a><br>
<img src="biaspar/pbiaspar11.png" width="400" height="300"> </h5>

<p>This graph plots the conditional transition probabilities from
an initial state (1=healthy in red at the bottom, or 2=disable in
green on top) at age <em>x </em>to the final state 2=disable<em> </em>at
age <em>x+h. </em>Conditional means at the condition to be alive
at age <em>x+h </em>which is <i>hP12x</i> + <em>hP22x</em>. The
curves <i>hP12x/(hP12x</i> + <em>hP22x) </em>and <i>hP22x/(hP12x</i>
+ <em>hP22x) </em>converge with <em>h, </em>to the <em>period
prevalence of disability</em>. In order to get the period
prevalence at age 70 we should start the process at an earlier
age, i.e.50. If the disability state is defined by severe
disability criteria with only a few chance to recover, then the
incidence of recovery is low and the time to convergence is
probably longer. But we don't have experience yet.</p>

<h5><font color="#EC5E5E" size="3"><b>- Life expectancies by age
and initial health status with standard deviation</b></font><b>: </b><a
href="biaspar/erbiaspar.txt"><b>biaspar/erbiaspar.txt</b></a></h5>

<pre># Health expectancies 
# Age 1-1 (SE) 1-2 (SE) 2-1 (SE) 2-2 (SE)
 70   11.0180 (0.1277)    3.1950 (0.3635)    4.6500 (0.0871)    4.4807 (0.2187)
 71   10.4786 (0.1184)    3.2093 (0.3212)    4.3384 (0.0875)    4.4820 (0.2076)
 72    9.9551 (0.1103)    3.2236 (0.2827)    4.0426 (0.0885)    4.4827 (0.1966)
 73    9.4476 (0.1035)    3.2379 (0.2478)    3.7621 (0.0899)    4.4825 (0.1858)
 74    8.9564 (0.0980)    3.2522 (0.2165)    3.4966 (0.0920)    4.4815 (0.1754)
 75    8.4815 (0.0937)    3.2665 (0.1887)    3.2457 (0.0946)    4.4798 (0.1656)
 76    8.0230 (0.0905)    3.2806 (0.1645)    3.0090 (0.0979)    4.4772 (0.1565)
 77    7.5810 (0.0884)    3.2946 (0.1438)    2.7860 (0.1017)    4.4738 (0.1484)
 78    7.1554 (0.0871)    3.3084 (0.1264)    2.5763 (0.1062)    4.4696 (0.1416)
 79    6.7464 (0.0867)    3.3220 (0.1124)    2.3794 (0.1112)    4.4646 (0.1364)
 80    6.3538 (0.0868)    3.3354 (0.1014)    2.1949 (0.1168)    4.4587 (0.1331)
 81    5.9775 (0.0873)    3.3484 (0.0933)    2.0222 (0.1230)    4.4520 (0.1320)
</pre>

<pre>For example  70  11.0180 (0.1277) 3.1950 (0.3635) 4.6500 (0.0871)  4.4807 (0.2187)
means
e11=11.0180 e12=3.1950 e21=4.6500 e22=4.4807 </pre>

<pre><img src="biaspar/expbiaspar21.png" width="400" height="300"><img
src="biaspar/expbiaspar11.png" width="400" height="300"></pre>

<p>For example, life expectancy of a healthy individual at age 70
is 11.0 in the healthy state and 3.2 in the disability state
(total of 14.2 years). If he was disable at age 70, his life expectancy
will be shorter, 4.65 years in the healthy state and 4.5 in the
disability state (=9.15 years). The total life expectancy is a
weighted mean of both, 14.2 and 9.15. The weight is the proportion
of people disabled at age 70. In order to get a period index
(i.e. based only on incidences) we use the <a
href="#Period prevalence in each state">stable or
period prevalence</a> at age 70 (i.e. computed from
incidences at earlier ages) instead of the <a
href="#cross-sectional prevalence in each state">cross-sectional prevalence</a>
(observed for example at first medical exam) (<a href="#Health expectancies">see
below</a>).</p>

<h5><font color="#EC5E5E" size="3"><b>- Variances of life
expectancies by age and initial health status</b></font><b>: </b><a
href="biaspar/vrbiaspar.txt"><b>biaspar/vrbiaspar.txt</b></a></h5>

<p>For example, the covariances of life expectancies Cov(ei,ej)
at age 50 are (line 3) </p>

<pre>   Cov(e1,e1)=0.4776  Cov(e1,e2)=0.0488=Cov(e2,e1)  Cov(e2,e2)=0.0424</pre>

<h5><font color="#EC5E5E" size="3"><b>-Variances of one-step
probabilities </b></font><b>: </b><a href="biaspar/probrbiaspar.txt"><b>biaspar/probrbiaspar.txt</b></a></h5>

<p>For example, at age 65</p>

<pre>   p11=9.960e-001 standard deviation of p11=2.359e-004</pre>

<h5><font color="#EC5E5E" size="3"><b>- </b></font><a
name="Health expectancies"><font color="#EC5E5E" size="3"><b>Health
expectancies</b></font></a><font color="#EC5E5E" size="3"><b>
with standard errors in parentheses</b></font><b>: </b><a
href="biaspar/trbiaspar.txt"><font face="Courier New"><b>biaspar/trbiaspar.txt</b></font></a></h5>

<pre>#Total LEs with variances: e.. (std) e.1 (std) e.2 (std) </pre>

<pre>70 13.26 (0.22) 9.95 (0.20) 3.30 (0.14) </pre>

<p>Thus, at age 70 the total life expectancy, e..=13.26 years is
the weighted mean of e1.=13.46 and e2.=11.35 by the period
prevalences at age 70 which are 0.90134 in state 1 and 0.09866 in
state 2 respectively (the sum is equal to one). e.1=9.95 is the
Disability-free life expectancy at age 70 (it is again a weighted
mean of e11 and e21). e.2=3.30 is also the life expectancy at age
70 to be spent in the disability state.</p>

<h5><font color="#EC5E5E" size="3"><b>-Total life expectancy by
age and health expectancies in states (1=healthy) and (2=disable)</b></font><b>:
</b><a href="biaspar/ebiaspar1.png"><b>biaspar/ebiaspar1.png</b></a></h5>

<p>This figure represents the health expectancies and the total
life expectancy with a confidence interval (dashed line). </p>

<pre>        <img src="biaspar/ebiaspar1.png" width="400" height="300"></pre>

<p>Standard deviations (obtained from the information matrix of
the model) of these quantities are very useful.
Cross-longitudinal surveys are costly and do not involve huge
samples, generally a few thousands; therefore it is very
important to have an idea of the standard deviation of our
estimates. It has been a big challenge to compute the Health
Expectancy standard deviations. Don't be confuse: life expectancy
is, as any expected value, the mean of a distribution; but here
we are not computing the standard deviation of the distribution,
but the standard deviation of the estimate of the mean.</p>

<p>Our health expectancies estimates vary according to the sample
size (and the standard deviations give confidence intervals of
the estimates) but also according to the model fitted. Let us
explain it in more details.</p>

<p>Choosing a model means at least two kind of choices. At first we
have to decide the number of disability states. And at second we have to
design, within the logit model family, the model itself: variables,
covariables, confounding factors etc. to be included.</p>

<p>More disability states we have, better is our demographical
approach of the disability process, but smaller are the number of
transitions between each state and higher is the noise in the
measurement. We do not have enough experiments of the various
models to summarize the advantages and disadvantages, but it is
important to say that even if we had huge and unbiased samples,
the total life expectancy computed from a cross-longitudinal
survey, varies with the number of states. If we define only two
states, alive or dead, we find the usual life expectancy where it
is assumed that at each age, people are at the same risk to die.
If we are differentiating the alive state into healthy and
disable, and as the mortality from the disability state is higher
than the mortality from the healthy state, we are introducing
heterogeneity in the risk of dying. The total mortality at each
age is the weighted mean of the mortality in each state by the
prevalence in each state. Therefore if the proportion of people
at each age and in each state is different from the period
equilibrium, there is no reason to find the same total mortality
at a particular age. Life expectancy, even if it is a very useful
tool, has a very strong hypothesis of homogeneity of the
population. Our main purpose is not to measure differential
mortality but to measure the expected time in a healthy or
disability state in order to maximise the former and minimize the
latter. But the differential in mortality complexifies the
measurement.</p>

<p>Incidences of disability or recovery are not affected by the number
of states if these states are independent. But incidences estimates
are dependent on the specification of the model. More covariates we
added in the logit model better is the model, but some covariates are
not well measured, some are confounding factors like in any
statistical model. The procedure to &quot;fit the best model' is
similar to logistic regression which itself is similar to regression
analysis. We haven't yet been sofar because we also have a severe
limitation which is the speed of the convergence. On a Pentium III,
500 MHz, even the simplest model, estimated by month on 8,000 people
may take 4 hours to converge.  Also, the IMaCh program is not a
statistical package, and does not allow sophisticated design
variables. If you need sophisticated design variable you have to them
your self and and add them as ordinary variables. IMaCX allows up to 8
variables. The current version of this program allows only to add
simple variables like age+sex or age+sex+ age*sex but will never be
general enough. But what is to remember, is that incidences or
probability of change from one state to another is affected by the
variables specified into the model.</p>

<p>Also, the age range of the people interviewed is linked 
the age range of the life expectancy which can be estimated by
extrapolation. If your sample ranges from age 70 to 95, you can
clearly estimate a life expectancy at age 70 and trust your
confidence interval because it is mostly based on your sample size,
but if you want to estimate the life expectancy at age 50, you
should rely in the design of your model. Fitting a logistic model on a age
range of 70 to 95 and estimating probabilties of transition out of
this age range, say at age 50, is very dangerous. At least you
should remember that the confidence interval given by the
standard deviation of the health expectancies, are under the
strong assumption that your model is the 'true model', which is
probably not the case outside the age range of your sample.</p>

<h5><font color="#EC5E5E" size="3"><b>- Copy of the parameter
file</b></font><b>: </b><a href="orbiaspar.txt"><b>orbiaspar.txt</b></a></h5>

<p>This copy of the parameter file can be useful to re-run the
program while saving the old output files. </p>

<h5><font color="#EC5E5E" size="3"><b>- Prevalence forecasting</b></font><b>:
</b><a href="biaspar/frbiaspar.txt"><b>biaspar/frbiaspar.txt</b></a></h5>

<p>

First,
we have estimated the observed prevalence between 1/1/1984 and
1/6/1988 (June, European syntax of dates). The mean date of all interviews (weighted average of the
interviews performed between 1/1/1984 and 1/6/1988) is estimated
to be 13/9/1985, as written on the top on the file. Then we
forecast the probability to be in each state. </p>

<p>
For example on 1/1/1989 : </p>

<pre class="MsoNormal"># StartingAge FinalAge P.1 P.2 P.3
# Forecasting at date 1/1/1989
  73 0.807 0.078 0.115</pre>

<p>

Since the minimum age is 70 on the 13/9/1985, the youngest forecasted
age is 73. This means that at age a person aged 70 at 13/9/1989 has a
probability to enter state1 of 0.807 at age 73 on 1/1/1989.
Similarly, the probability to be in state 2 is 0.078 and the
probability to die is 0.115. Then, on the 1/1/1989, the prevalence of
disability at age 73 is estimated to be 0.088.</p>

<h5><font color="#EC5E5E" size="3"><b>- Population forecasting</b></font><b>:
</b><a href="biaspar/poprbiaspar.txt"><b>biaspar/poprbiaspar.txt</b></a></h5>

<pre># Age P.1 P.2 P.3 [Population]
# Forecasting at date 1/1/1989 
75 572685.22 83798.08 
74 621296.51 79767.99 
73 645857.70 69320.60 </pre>

<pre># Forecasting at date 1/1/19909 
76 442986.68 92721.14 120775.48
75 487781.02 91367.97 121915.51
74 512892.07 85003.47 117282.76 </pre>

<p>From the population file, we estimate the number of people in
each state. At age 73, 645857 persons are in state 1 and 69320
are in state 2. One year latter, 512892 are still in state 1,
85003 are in state 2 and 117282 died before 1/1/1990.</p>

<hr>

<h2><a name="example"></a><font color="#00006A">Trying an example</font></h2>

<p>Since you know how to run the program, it is time to test it
on your own computer. Try for example on a parameter file named <a
href="imachpar.imach">imachpar.imach</a> which is a copy
of <font size="2" face="Courier New">mypar.imach</font> included
in the subdirectory of imach, <font size="2" face="Courier New">mytry</font>.
Edit it and change the name of the data file to <font size="2"
face="Courier New">mydata.txt</font> if you don't want to
copy it on the same directory. The file <font face="Courier New">mydata.txt</font>
is a smaller file of 3,000 people but still with 4 waves. </p>

<p>Right click on the .imach file and a window will popup with the
string '<strong>Enter the parameter file name:'</strong></p>

<table border="1">
    <tr>
        <td width="100%"><strong>IMACH, Version 0.97b</strong><p><strong>Enter
        the parameter file name: imachpar.imach</strong></p>
        </td>
    </tr>
</table>

<p>Most of the data files or image files generated, will use the
'imachpar' string into their name. The running time is about 2-3
minutes on a Pentium III. If the execution worked correctly, the
outputs files are created in the current directory, and should be
the same as the mypar files initially included in the directory <font
size="2" face="Courier New">mytry</font>.</p>

<ul>
    <li><pre><u>Output on the screen</u> The output screen looks like <a
href="biaspar.log">biaspar.log</a>
#
title=MLE datafile=mydaiata.txt lastobs=3000 firstpass=1 lastpass=3
ftol=1.000000e-008 stepm=24 ncovcol=2 nlstate=2 ndeath=1 maxwav=4 mle=1 weight=0</pre>
    </li>
    <li><pre>Total number of individuals= 2965, Agemin = 70.00, Agemax= 100.92

Warning, no any valid information for:126 line=126
Warning, no any valid information for:2307 line=2307
Delay (in months) between two waves Min=21 Max=51 Mean=24.495826
<font face="Times New Roman">These lines give some warnings on the data file and also some raw statistics on frequencies of transitions.</font>
Age 70 1.=230 loss[1]=3.5% 2.=16 loss[2]=12.5% 1.=222 prev[1]=94.1% 2.=14
 prev[2]=5.9% 1-1=8 11=200 12=7 13=15 2-1=2 21=6 22=7 23=1
Age 102 1.=0 loss[1]=NaNQ% 2.=0 loss[2]=NaNQ% 1.=0 prev[1]=NaNQ% 2.=0 </pre>
    </li>
</ul>
It includes some warnings or errors which are very important for
you. Be careful with such warnings because your results may be biased
if, for example, you have people who accepted to be interviewed at
first pass but never after. Or if you don't have the exact month of
death. In such cases IMaCh doesn't take any initiative, it does only
warn you. It is up to you to decide what to do with these
people. Excluding them is usually a wrong decision. It is better to
decide that the month of death is at the mid-interval between the last
two waves for example.<p>

If you survey suffers from severe attrition, you have to analyse the
characteristics of the lost people and overweight people with same
characteristics for example.
<p>
By default, IMaCH warns and excludes these problematic people, but you
have to be careful with such results.

<p>&nbsp;</p>

<ul>
    <li>Maximisation with the Powell algorithm. 8 directions are
        given corresponding to the 8 parameters. this can be
        rather long to get convergence.<br>
        <font size="1" face="Courier New"><br>
        Powell iter=1 -2*LL=11531.405658264877 1 0.000000000000 2
        0.000000000000 3<br>
        0.000000000000 4 0.000000000000 5 0.000000000000 6
        0.000000000000 7 <br>
        0.000000000000 8 0.000000000000<br>
        1..........2.................3..........4.................5.........<br>
        6................7........8...............<br>
        Powell iter=23 -2*LL=6744.954108371555 1 -12.967632334283
        <br>
        2 0.135136681033 3 -7.402109728262 4 0.067844593326 <br>
        5 -0.673601538129 6 -0.006615504377 7 -5.051341616718 <br>
        8 0.051272038506<br>
        1..............2...........3..............4...........<br>
        5..........6................7...........8.........<br>
        #Number of iterations = 23, -2 Log likelihood =
        6744.954042573691<br>
        # Parameters<br>
        12 -12.966061 0.135117 <br>
        13 -7.401109 0.067831 <br>
        21 -0.672648 -0.006627 <br>
        23 -5.051297 0.051271 </font><br>
        </li>
    <li><pre><font size="2">Calculation of the hessian matrix. Wait...
12345678.12.13.14.15.16.17.18.23.24.25.26.27.28.34.35.36.37.38.45.46.47.48.56.57.58.67.68.78

Inverting the hessian to get the covariance matrix. Wait...

#Hessian matrix#
3.344e+002 2.708e+004 -4.586e+001 -3.806e+003 -1.577e+000 -1.313e+002 3.914e-001 3.166e+001 
2.708e+004 2.204e+006 -3.805e+003 -3.174e+005 -1.303e+002 -1.091e+004 2.967e+001 2.399e+003 
-4.586e+001 -3.805e+003 4.044e+002 3.197e+004 2.431e-002 1.995e+000 1.783e-001 1.486e+001 
-3.806e+003 -3.174e+005 3.197e+004 2.541e+006 2.436e+000 2.051e+002 1.483e+001 1.244e+003 
-1.577e+000 -1.303e+002 2.431e-002 2.436e+000 1.093e+002 8.979e+003 -3.402e+001 -2.843e+003 
-1.313e+002 -1.091e+004 1.995e+000 2.051e+002 8.979e+003 7.420e+005 -2.842e+003 -2.388e+005 
3.914e-001 2.967e+001 1.783e-001 1.483e+001 -3.402e+001 -2.842e+003 1.494e+002 1.251e+004 
3.166e+001 2.399e+003 1.486e+001 1.244e+003 -2.843e+003 -2.388e+005 1.251e+004 1.053e+006 
# Scales
12 1.00000e-004 1.00000e-006
13 1.00000e-004 1.00000e-006
21 1.00000e-003 1.00000e-005
23 1.00000e-004 1.00000e-005
# Covariance
  1 5.90661e-001
  2 -7.26732e-003 8.98810e-005
  3 8.80177e-002 -1.12706e-003 5.15824e-001
  4 -1.13082e-003 1.45267e-005 -6.50070e-003 8.23270e-005
  5 9.31265e-003 -1.16106e-004 6.00210e-004 -8.04151e-006 1.75753e+000
  6 -1.15664e-004 1.44850e-006 -7.79995e-006 1.04770e-007 -2.12929e-002 2.59422e-004
  7 1.35103e-003 -1.75392e-005 -6.38237e-004 7.85424e-006 4.02601e-001 -4.86776e-003 1.32682e+000
  8 -1.82421e-005 2.35811e-007 7.75503e-006 -9.58687e-008 -4.86589e-003 5.91641e-005 -1.57767e-002 1.88622e-004
# agemin agemax for lifexpectancy, bage fage (if mle==0 ie no data nor Max likelihood).


agemin=70 agemax=100 bage=50 fage=100
Computing prevalence limit: result on file 'plrmypar.txt' 
Computing pij: result on file 'pijrmypar.txt' 
Computing Health Expectancies: result on file 'ermypar.txt' 
Computing Variance-covariance of DFLEs: file 'vrmypar.txt' 
Computing Total LEs with variances: file 'trmypar.txt' 
Computing Variance-covariance of Prevalence limit: file 'vplrmypar.txt' 
End of Imach
</font></pre>
    </li>
</ul>

<p><font size="3">Once the running is finished, the program
requires a character:</font></p>

<table border="1">
    <tr>
        <td width="100%"><strong>Type e to edit output files, g
        to graph again, c to start again, and q for exiting:</strong></td>
    </tr>
</table>

In order to have an idea of the time needed to reach convergence,
IMaCh gives an estimation if the convergence needs 10, 20 or 30
iterations. It might be useful.

<p><font size="3">First you should enter <strong>e </strong>to
edit the master file mypar.htm. </font></p>

<ul>
    <li><u>Outputs files</u> <br>
        <br>
        - Copy of the parameter file: <a href="ormypar.txt">ormypar.txt</a><br>
        - Gnuplot file name: <a href="mypar.gp.txt">mypar.gp.txt</a><br>
        - Cross-sectional prevalence in each state: <a
        href="prmypar.txt">prmypar.txt</a> <br>
        - Period prevalence in each state: <a
        href="plrmypar.txt">plrmypar.txt</a> <br>
        - Transition probabilities: <a href="pijrmypar.txt">pijrmypar.txt</a><br>
        - Life expectancies by age and initial health status
        (estepm=24 months): <a href="ermypar.txt">ermypar.txt</a>
        <br>
        - Parameter file with estimated parameters and the
        covariance matrix: <a href="rmypar.txt">rmypar.txt</a> <br>
        - Variance of one-step probabilities: <a
        href="probrmypar.txt">probrmypar.txt</a> <br>
        - Variances of life expectancies by age and initial
        health status (estepm=24 months): <a href="vrmypar.txt">vrmypar.txt</a><br>
        - Health expectancies with their variances: <a
        href="trmypar.txt">trmypar.txt</a> <br>
        - Standard deviation of period prevalences: <a
        href="vplrmypar.txt">vplrmypar.txt</a> <br>
        No population forecast: popforecast = 0 (instead of 1) or
        stepm = 24 (instead of 1) or model=. (instead of .)<br>
        <br>
        </li>
    <li><u>Graphs</u> <br>
        <br>
        -<a href="../mytry/pemypar1.gif">One-step transition
        probabilities</a><br>
        -<a href="../mytry/pmypar11.gif">Convergence to the
        period prevalence</a><br>
        -<a href="..\mytry\vmypar11.gif">Cross-sectional and period
        prevalence in state (1) with the confident interval</a> <br>
        -<a href="..\mytry\vmypar21.gif">Cross-sectional and period
        prevalence in state (2) with the confident interval</a> <br>
        -<a href="..\mytry\expmypar11.gif">Health life
        expectancies by age and initial health state (1)</a> <br>
        -<a href="..\mytry\expmypar21.gif">Health life
        expectancies by age and initial health state (2)</a> <br>
        -<a href="..\mytry\emypar1.gif">Total life expectancy by
        age and health expectancies in states (1) and (2).</a> </li>
</ul>

<p>This software have been partly granted by <a
href="http://euroreves.ined.fr">Euro-REVES</a>, a concerted action
from the European Union. Since 2003 it is also partly granted by the
French Institute on Longevity. It will be copyrighted identically to a
GNU software product, i.e. program and software can be distributed
freely for non commercial use. Sources are not widely distributed
today because some part of the codes are copyrighted by Numerical
Recipes in C. You can get our GPL codes by asking us with a simple
justification (name, email, institute) <a
href="mailto:brouard@ined.fr">mailto:brouard@ined.fr</a> and <a
href="mailto:lievre@ined.fr">mailto:lievre@ined.fr</a> .</p>

<p>Latest version (0.97b of June 2004) can be accessed at <a
href="http://euroreves.ined.fr/imach">http://euroreves.ined.fr/imach</a><br>
</p>
</body>
</html>

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>