imach/html/doc/imach.htm - view

File: [Local Repository] / imach / html / doc / imach.htm
Revision 1.2: download - view: text, annotated - select for diffs
Wed Jun 16 21:35:20 2004 UTC (20 years ago) by brouard
Branches: MAIN
CVS tags: HEAD

*** empty log message ***

1: <!-- $Id: imach.htm,v 1.2 2004/06/16 21:35:20 brouard Exp $ --!> 2: <html> 3: 4: <head> 5: <meta http-equiv="Content-Type" 6: content="text/html; charset=iso-8859-1"> 7: <title>Computing Health Expectancies using IMaCh</title> 8: <html> 9: 10: <head> 11: <meta http-equiv="Content-Type" 12: content="text/html; charset=iso-8859-1"> 13: <title>IMaCh</title> 14: </head> 15: 16: <body bgcolor="#FFFFFF"> 17: 18: <hr size="3" color="#EC5E5E"> 19: 20: <h1 align="center">Computing Health 21: Expectancies using IMaCh</h1> 22: 23: <h1 align="center">(a Maximum 24: Likelihood Computer Program using Interpolation of Markov Chains)</h1> 25: 26:   27: 28: <a href="http://www.ined.fr/"><img 29: src="logo-ined.gif" border="0" width="151" height="76"></a><img 30: src="euroreves2.gif" width="151" height="75"> 31: 32: <h3 align="center"><a href="http://www.ined.fr/">INED</a> and <a 34: href="http://euroreves.ined.fr">EUROREVES</a></h3> 35: 36: Version 37: 0.97, June 2004 38: 39: <hr size="3" color="#EC5E5E"> 40: 41: Authors of the 42: program: <a href="http://sauvy.ined.fr/brouard">Nicolas Brouard</a>, senior researcher at the <a 45: href="http://www.ined.fr">Institut 46: National d'Etudes Démographiques</a> (INED, Paris) in the "Mortality, 48: Health and Epidemiology" Research Unit 49: 50: and Agnès 51: Lièvre 52: 53: 54: <h4>Contribution to the mathematics: C. R. 55: Heathcote (Australian 56: National University, Canberra).</h4> 57: 58: <h4>Contact: Agnès Lièvre (<a 59: href="mailto:lievre@ined.fr">lievre@ined.fr</a>) </h4> 61: 62: <hr> 63: 64: <ul> 65: <li><a href="#intro">Introduction</a> </li> 66: <li><a href="#data">On what kind of data can it be used?</a></li> 67: <li><a href="#datafile">The data file</a> </li> 68: <li><a href="#biaspar">The parameter file</a> </li> 69: <li><a href="#running">Running Imach</a> </li> 70: <li><a href="#output">Output files and graphs</a> </li> 71: <li><a href="#example">Exemple</a> </li> 72: </ul> 73: 74: <hr> 75: 76: <h2><a name="intro">Introduction</a></h2> 77: 78: This program computes Healthy Life Expectancies from cross-longitudinal 79: data using the methodology pioneered by Laditka and Wolf (1). 80: Within the family of Health Expectancies (HE), Disability-free 81: life expectancy (DFLE) is probably the most important index to 82: monitor. In low mortality countries, there is a fear that when 83: mortality declines, the increase in DFLE is not proportionate to 84: the increase in total Life expectancy. This case is called the Expansion 85: of morbidity. Most of the data collected today, in 86: particular by the international <a href="http://www.reves.org">REVES</a> 87: network on Health expectancy, and most HE indices based on these 88: data, are cross-sectional. It means that the information 89: collected comes from a single cross-sectional survey: people from 90: various ages (but mostly old people) are surveyed on their health 91: status at a single date. Proportion of people disabled at each 92: age, can then be measured at that date. This age-specific 93: prevalence curve is then used to distinguish, within the 94: stationary population (which, by definition, is the life table 95: estimated from the vital statistics on mortality at the same 96: date), the disable population from the disability-free 97: population. Life expectancy (LE) (or total population divided by 98: the yearly number of births or deaths of this stationary 99: population) is then decomposed into DFLE and DLE. This method of 100: computing HE is usually called the Sullivan method (from the name 101: of the author who first described it). 102: 103: Age-specific proportions of people disabled (prevalence of 104: disability) are dependent on the historical flows from entering 105: disability and recovering in the past until today. The age-specific 106: forces (or incidence rates), estimated over a recent period of time 107: (like for period forces of mortality), of entering disability or 108: recovering a good health, are reflecting current conditions and 109: therefore can be used at each age to forecast the future of this 110: cohortif nothing changes in the future, i.e to forecast the 111: prevalence of disability of each cohort. Our finding (2) is that the period 112: prevalence of disability (computed from period incidences) is lower 113: than the cross-sectional prevalence. For example if a country is 114: improving its technology of prosthesis, the incidence of recovering 115: the ability to walk will be higher at each (old) age, but the 116: prevalence of disability will only slightly reflect an improve because 117: the prevalence is mostly affected by the history of the cohort and not 118: by recent period effects. To measure the period improvement we have to 119: simulate the future of a cohort of new-borns entering or leaving at 120: each age the disability state or dying according to the incidence 121: rates measured today on different cohorts. The proportion of people 122: disabled at each age in this simulated cohort will be much lower that 123: the proportions observed at each age in a cross-sectional survey. This 124: new prevalence curve introduced in a life table will give a more 125: realistic HE level than the Sullivan method which mostly measured the 126: History of health conditions in this country. 127: 128: Therefore, the main question is how to measure incidence rates 129: from cross-longitudinal surveys? This is the goal of the IMaCH 130: program. From your data and using IMaCH you can estimate period 131: HE and not only Sullivan's HE. Also the standard errors of the HE 132: are computed. 133: 134: A cross-longitudinal survey consists in a first survey 135: ("cross") where individuals from different ages are 136: interviewed on their health status or degree of disability. At 137: least a second wave of interviews ("longitudinal") 138: should measure each new individual health status. Health 139: expectancies are computed from the transitions observed between 140: waves and are computed for each degree of severity of disability 141: (number of life states). More degrees you consider, more time is 142: necessary to reach the Maximum Likelihood of the parameters 143: involved in the model. Considering only two states of disability 144: (disable and healthy) is generally enough but the computer 145: program works also with more health statuses. 146: 147: The simplest model is the multinomial logistic model where pij 148: is the probability to be observed in state j at the second 149: wave conditional to be observed in state i at the first 150: wave. Therefore a simple model is: log(pij/pii)= aij + 151: bij*age+ cij*sex, where 'age' is age and 'sex' 152: is a covariate. The advantage that this computer program claims, 153: comes from that if the delay between waves is not identical for 154: each individual, or if some individual missed an interview, the 155: information is not rounded or lost, but taken into account using 156: an interpolation or extrapolation. hPijx is the 157: probability to be observed in state i at age x+h 158: conditional to the observed state i at age x. The 159: delay 'h' can be split into an exact number (nh*stepm) 160: of unobserved intermediate states. This elementary transition (by 161: month or quarter trimester, semester or year) is modeled as a 162: multinomial logistic. The hPx matrix is simply the matrix 163: product of nh*stepm elementary matrices and the 164: contribution of each individual to the likelihood is simply hPijx. 165: 166: 167: 168: The program presented in this manual is a quite general 169: program named IMaCh (for Interpolated 170: MArkov CHain), designed to 171: analyse transition data from longitudinal surveys. The first step 172: is the parameters estimation of a transition probabilities model 173: between an initial status and a final status. From there, the 174: computer program produces some indicators such as observed and 175: stationary prevalence, life expectancies and their variances and 176: graphs. Our transition model consists in absorbing and 177: non-absorbing states with the possibility of return across the 178: non-absorbing states. The main advantage of this package, 179: compared to other programs for the analysis of transition data 180: (For example: Proc Catmod of SAS®) is that the whole 181: individual information is used even if an interview is missing, a 182: status or a date is unknown or when the delay between waves is 183: not identical for each individual. The program can be executed 184: according to parameters: selection of a sub-sample, number of 185: absorbing and non-absorbing states, number of waves taken in 186: account (the user inputs the first and the last interview), a 187: tolerance level for the maximization function, the periodicity of 188: the transitions (we can compute annual, quarterly or monthly 189: transitions), covariates in the model. It works on Windows or on 190: Unix. 191: 192: 193: <hr> 194: 195: (1) Laditka, Sarah B. and Wolf, Douglas A. (1998), "New 196: Methods for Analyzing Active Life Expectancy". Journal of 197: Aging and Health. Vol 10, No. 2. 198: (2) <a href=http://taylorandfrancis.metapress.com/app/home/contribution.asp?wasp=1f99bwtvmk5yrb7hlhw3&referrer=parent&backto=issue,1,2;journal,2,5;linkingpublicationresults,1:300265,1 199: >Lièvre A., Brouard N. and Heathcote Ch. (2003) Estimating Health Expectancies 200: from Cross-longitudinal surveys. Mathematical Population Studies.- 10(4), pp. 211-248</a> 201: 202: <hr> 203: 204: <h2><a name="data">On what kind of data can 205: it be used?</a></h2> 206: 207: The minimum data required for a transition model is the 208: recording of a set of individuals interviewed at a first date and 209: interviewed again at least one another time. From the 210: observations of an individual, we obtain a follow-up over time of 211: the occurrence of a specific event. In this documentation, the 212: event is related to health status at older ages, but the program 213: can be applied on a lot of longitudinal studies in different 214: contexts. To build the data file explained into the next section, 215: you must have the month and year of each interview and the 216: corresponding health status. But in order to get age, date of 217: birth (month and year) is required (missing values is allowed for 218: month). Date of death (month and year) is an important 219: information also required if the individual is dead. Shorter 220: steps (i.e. a month) will more closely take into account the 221: survival time after the last interview. 222: 223: <hr> 224: 225: <h2><a name="datafile">The data file</a></h2> 226: 227: In this example, 8,000 people have been interviewed in a 228: cross-longitudinal survey of 4 waves (1984, 1986, 1988, 1990). Some 229: people missed 1, 2 or 3 interviews. Health statuses are healthy (1) 230: and disable (2). The survey is not a real one. It is a simulation of 231: the American Longitudinal Survey on Aging. The disability state is 232: defined if the individual missed one of four ADL (Activity of daily 233: living, like bathing, eating, walking). Therefore, even if the 234: individuals interviewed in the sample are virtual, the information 235: brought with this sample is close to the situation of the United 236: States. Sex is not recorded is this sample. The LSOA survey is biased 237: in the sense that people living in an institution were not surveyed at 238: first pass in 1984. Thus the prevalence of disability in 1984 is 239: biased downwards at old ages. But when people left their household to 240: an institution, they have been surveyed in their institution in 1986, 241: 1988 or 1990. Thus incidences are not biased. But cross-sectional 242: prevalences of disability at old ages are thus artificially increasing 243: in 1986, 1988 and 1990 because of a higher weight of people 244: institutionalized in the sample. Our article shows the 245: opposite: the period prevalence is lower at old ages than the 246: adjusted cross-sectional prevalence proving important current progress 247: against disability. 248: 249: Each line of the data set (named <a href="data1.txt">data1.txt</a> 250: in this first example) is an individual record. Fields are separated 251: by blanks: 252: 253: <ul> 254: <li>Index number: positive number (field 1) </li> 255: <li>First covariate positive number (field 2) </li> 256: <li>Second covariate positive number (field 3) </li> 257: <li><a name="Weight">Weight</a>: positive number 258: (field 4) . In most surveys individuals are weighted 259: according to the stratification of the sample.</li> 260: <li>Date of birth: coded as mm/yyyy. Missing dates are 261: coded as 99/9999 (field 5) </li> 262: <li>Date of death: coded as mm/yyyy. Missing dates are 263: coded as 99/9999 (field 6) </li> 264: <li>Date of first interview: coded as mm/yyyy. Missing 265: dates are coded as 99/9999 (field 7) </li> 266: <li>Status at first interview: positive number. 267: Missing values ar coded -1. (field 8) </li> 268: <li>Date of second interview: coded as mm/yyyy. 269: Missing dates are coded as 99/9999 (field 9) </li> 270: <li>Status at second interview positive 271: number. Missing values ar coded -1. (field 10) </li> 272: <li>Date of third interview: coded as mm/yyyy. Missing 273: dates are coded as 99/9999 (field 11) </li> 274: <li>Status at third interview positive 275: number. Missing values ar coded -1. (field 12) </li> 276: <li>Date of fourth interview: coded as mm/yyyy. 277: Missing dates are coded as 99/9999 (field 13) </li> 278: <li>Status at fourth interview positive 279: number. Missing values are coded -1. (field 14) </li> 280: <li>etc</li> 281: </ul> 282: 283:   284: 285: If your longitudinal survey do not include information about 286: weights or covariates, you must fill the column with a number 287: (e.g. 1) because a missing field is not allowed. 288: 289: <hr> 290: 291: <h2>Your first example parameter file<a 292: href="http://euroreves.ined.fr/imach"></a><a name="uio"></a></h2> 293: 294: <h2><a name="biaspar"></a>#Imach version 0.97b, June 2004, 295: INED-EUROREVES </h2> 296: 297: This first line was a comment. Comments line start with a '#'. 298: 299: <h4>First uncommented line</h4> 300: 301: <pre>title=1st_example datafile=data1.txt lastobs=8600 firstpass=1 lastpass=4</pre> 302: 303: <ul> 304: <li>title= 1st_example is title of the run. </li> 305: <li>datafile= data1.txt is the name of the data set. 306: Our example is a six years follow-up survey. It consists 307: in a baseline followed by 3 reinterviews. </li> 308: <li>lastobs= 8600 the program is able to run on a 309: subsample where the last observation number is lastobs. 310: It can be set a bigger number than the real number of 311: observations (e.g. 100000). In this example, maximisation 312: will be done on the 8600 first records. </li> 313: <li>firstpass=1 , lastpass=4 In case of more 314: than two interviews in the survey, the program can be run 315: on selected transitions periods. firstpass=1 means the 316: first interview included in the calculation is the 317: baseline survey. lastpass=4 means that the information 318: brought by the 4th interview is taken into account.</li> 319: </ul> 320: 321:   322: 323: <h4><a name="biaspar-2">Second uncommented 324: line</a></h4> 325: 326: <pre>ftol=1.e-08 stepm=1 ncovcol=2 nlstate=2 ndeath=1 maxwav=4 mle=1 weight=0</pre> 327: 328: <ul> 329: <li>ftol=1e-8 Convergence tolerance on the function 330: value in the maximisation of the likelihood. Choosing a 331: correct value for ftol is difficult. 1e-8 is a correct 332: value for a 32 bits computer.</li> 333: <li>stepm=1 Time unit in months for interpolation. 334: Examples:<ul> 335: <li>If stepm=1, the unit is a month </li> 336: <li>If stepm=4, the unit is a trimester</li> 337: <li>If stepm=12, the unit is a year </li> 338: <li>If stepm=24, the unit is two years</li> 339: <li>... </li> 340: </ul> 341: </li> 342: <li>ncovcol=2 Number of covariate columns included in the 343: datafile before the column of the date of birth. You can have 344: covariates that won't necessary be used during the 345: run. It is not the number of covariates that will be 346: specified by the model. The 'model' syntax describes the 347: covariates to be taken into account during the run. </li> 348: <li>nlstate=2 Number of non-absorbing (alive) states. 349: Here we have two alive states: disability-free is coded 1 350: and disability is coded 2. </li> 351: <li>ndeath=1 Number of absorbing states. The absorbing 352: state death is coded 3. </li> 353: <li>maxwav=4 Number of waves in the datafile.</li> 354: <li><a name="mle">mle</a>=1 Option for the 355: Maximisation Likelihood Estimation. <ul> 356: <li>If mle=1 the program does the maximisation and 357: the calculation of health expectancies </li> 358: <li>If mle=0 the program only does the calculation of 359: the health expectancies and other indices and graphs 360: but without the maximization.. </li> 361: There also other possible values: 362: <ul> 363: <li>If mle=-1 you get a template which can be useful if 364: your model is complex with many covariates.</li> 365: <li> If mle=-3 IMaCh computes the mortality but without 366: any health status (May 2004)</li> <li>If mle=2 IMach 367: likelihood corresponds to a linear interpolation</li> <li> 368: If mle=3 IMach likelihood corresponds to an exponential 369: inter-extrapolation</li> 370: <li> If mle=4 IMach likelihood 371: corresponds to no inter-extrapolation, and thus biasing 372: the results. </li> 373: <li> If mle=5 IMach likelihood 374: corresponds to no inter-extrapolation, and before the 375: correction of the Jackson's bug (avoid this).</li> 376: </ul> 377: </ul> 378: </li> 379: <li>weight=0 Possibility to add weights. <ul> 380: <li>If weight=0 no weights are included </li> 381: <li>If weight=1 the maximisation integrates the 382: weights which are in field <a href="#Weight">4</a></li> 383: </ul> 384: </li> 385: </ul> 386: 387: <h4>Covariates</h4> 388: 389: Intercept and age are systematically included in the model. 390: Additional covariates can be included with the command: 391: 392: <pre>model=list of covariates</pre> 393: 394: <ul> 395: <li>if model=. then no covariates are 396: included</li> 397: <li>if model=V1 the model includes the first 398: covariate (field 2)</li> 399: <li>if model=V2 the model includes the 400: second covariate (field 3)</li> 401: <li>if model=V1+V2 the model includes the 402: first and the second covariate (fields 2 and 3)</li> 403: <li>if model=V1*V2 the model includes the 404: product of the first and the second covariate (fields 2 405: and 3)</li> 406: <li>if model=V1+V1*age the model includes 407: the product covariate*age</li> 408: </ul> 409: 410: In this example, we have two covariates in the data file 411: (fields 2 and 3). The number of covariates included in the data 412: file between the id and the date of birth is ncovcol=2 (it was 413: named ncov in version prior to 0.8). If you have 3 covariates in 414: the datafile (fields 2, 3 and 4), you will set ncovcol=3. Then 415: you can run the programme with a new parametrisation taking into 416: account the third covariate. For example, model=V1+V3 estimates 417: a model with the first and third covariates. More complicated 418: models can be used, but it will takes more time to converge. With 419: a simple model (no covariates), the programme estimates 8 420: parameters. Adding covariates increases the number of parameters 421: : 12 for model=V1, 16 for model=V1+V1*age 422: and 20 for model=V1+V2+V3. 423: 424: <h4>Guess values for optimization </h4> 426: 427: You must write the initial guess values of the parameters for 428: optimization. The number of parameters, N depends on the 429: number of absorbing states and non-absorbing states and on the 430: number of covariates. 431: N is given by the formula N=(nlstate + 432: ndeath-1)*nlstate*ncovmodel . 433: 434: Thus in the simple case with 2 covariates (the model is log 435: (pij/pii) = aij + bij * age where intercept and age are the two 436: covariates), and 2 health degrees (1 for disability-free and 2 437: for disability) and 1 absorbing state (3), you must enter 8 438: initials values, a12, b12, a13, b13, a21, b21, a23, b23. You can 439: start with zeros as in this example, but if you have a more 440: precise set (for example from an earlier run) you can enter it 441: and it will speed up them 442: Each of the four lines starts with indices "ij": ij 443: aij bij 444: 445: <blockquote> 446: <pre># Guess values of aij and bij in log (pij/pii) = aij + bij * age 447: 12 -14.155633 0.110794 448: 13 -7.925360 0.032091 449: 21 -1.890135 -0.029473 450: 23 -6.234642 0.022315 </pre> 451: </blockquote> 452: 453: or, to simplify (in most of cases it converges but there is no 454: warranty!): 455: 456: <blockquote> 457: <pre>12 0.0 0.0 458: 13 0.0 0.0 459: 21 0.0 0.0 460: 23 0.0 0.0</pre> 461: </blockquote> 462: 463: In order to speed up the convergence you can make a first run 464: with a large stepm i.e stepm=12 or 24 and then decrease the stepm 465: until stepm=1 month. If newstepm is the new shorter stepm and 466: stepm can be expressed as a multiple of newstepm, like newstepm=n 467: stepm, then the following approximation holds: 468: 469: <pre>aij(stepm) = aij(n . stepm) - ln(n) 470: </pre> 471: 472: and 473: 474: <pre>bij(stepm) = bij(n . stepm) .</pre> 475: 476: For example if you already ran for a 6 months interval and 477: got: 478: 479: 480: <pre># Parameters 481: 12 -13.390179 0.126133 482: 13 -7.493460 0.048069 483: 21 0.575975 -0.041322 484: 23 -4.748678 0.030626 485: </pre> 486: 487: If you now want to get the monthly estimates, you can guess 488: the aij by substracting ln(6)= 1,7917 489: and running 490: 491: 492: <pre>12 -15.18193847 0.126133 493: 13 -9.285219469 0.048069 494: 21 -1.215784469 -0.041322 495: 23 -6.540437469 0.030626 496: </pre> 497: 498: and get 499: 500: 501: <pre>12 -15.029768 0.124347 502: 13 -8.472981 0.036599 503: 21 -1.472527 -0.038394 504: 23 -6.553602 0.029856 505: 506: which is closer to the results. The approximation is probably useful 507: only for very small intervals and we don't have enough experience to 508: know if you will speed up the convergence or not. 509: </pre> 510: 511: <pre> -ln(12)= -2.484 512: -ln(6/1)=-ln(6)= -1.791 513: -ln(3/1)=-ln(3)= -1.0986 514: -ln(12/6)=-ln(2)= -0.693 515: </pre> 516: 517: In version 0.9 and higher you can still have valuable results even if 518: your stepm parameter is bigger than a month. The idea is to run with 519: bigger stepm in order to have a quicker convergence at the price of a 520: small bias. Once you know which model you want to fit, you can put 521: stepm=1 and wait hours or days to get the convergence! 522: 523: To get unbiased results even with large stepm we introduce the idea of 524: pseudo likelihood by interpolating two exact likelihoods. Let us 525: detail this: 526: 527: If the interval of d months between two waves is not a 528: mutliple of 'stepm', but is comprised between (n-1) stepm and 529: n stepm then both exact likelihoods are computed (the 530: contribution to the likelihood at n stepm requires one matrix 531: product more) (let us remember that we are modelling the probability 532: to be observed in a particular state after d months being 533: observed at a particular state at 0). The distance, (bh in 534: the program), from the month of interview to the rounded date of n 535: stepm is computed. It can be negative (interview occurs before 536: n stepm) or positive if the interview occurs after n 537: stepm (and before (n+1)stepm). 538: 539: Then the final contribution to the total likelihood is a weighted 540: average of these two exact likelihoods at n stepm (out) and 541: at (n-1)stepm(savm). We did not want to compute the third 542: likelihood at (n+1)stepm because it is too costly in time, so 543: we used an extrapolation if bh is positive. Formula of 544: inter/extrapolation may vary according to the value of parameter mle: 545: <pre> 546: mle=1 lli= log((1.+bbh)*out[s1][s2]- bbh*savm[s1][s2]); /* linear interpolation */ 547: 548: mle=2 lli= (savm[s1][s2]>(double)1.e-8 ? \ 549: log((1.+bbh)*out[s1][s2]- bbh*(savm[s1][s2])): \ 550: log((1.+bbh)*out[s1][s2])); /* linear interpolation */ 551: mle=3 lli= (savm[s1][s2]>1.e-8 ? \ 552: (1.+bbh)*log(out[s1][s2])- bbh*log(savm[s1][s2]): \ 553: log((1.+bbh)*out[s1][s2])); /* exponential inter-extrapolation */ 554: 555: mle=4 lli=log(out[s[mw[mi][i]][i]][s[mw[mi+1][i]][i]]); /* No interpolation */ 556: no need to save previous likelihood into memory. 557: </pre> 558: 559: If the death occurs between first and second pass, and for example 560: more precisely between n stepm and (n+1)stepm the 561: contribution of this people to the likelihood is simply the difference 562: between the probability of dying before n stepm and the 563: probability of dying before (n+1)stepm. There was a bug in 564: version 0.8 and death was treated as any other state, i.e. as if it 565: was an observed death at second pass. This was not precise but 566: correct, but when information on the precise month of death came 567: (death occuring prior to second pass) we did not change the likelihood 568: accordingly. Thanks to Chris Jackson for correcting us. In earlier 569: versions (fortunately before first publication) the total mortality 570: was overestimated (people were dying too early) of about 10%. Version 571: 0.95 and higher are correct. 572: 573: Our suggested choice is mle=1 . If stepm=1 there is no difference 574: between various mle options (methods of interpolation). If stepm is 575: big, like 12 or 24 or 48 and mle=4 (no interpolation) the bias may be 576: very important if the mean duration between two waves is not a 577: multiple of stepm. See the appendix in our main publication concerning 578: the sine curve of biases. 579: 580: 581: <h4>Guess values for computing variances</h4> 582: 583: These values are output by the maximisation of the likelihood <a 584: href="#mle">mle</a>=1. These valuse can be used as an input of a 585: second run in order to get the various output data files (Health 586: expectancies, period prevalence etc.) and figures without rerunning 587: the long maximisation phase (mle=0). 588: 589: These 'scales' are small values needed for the computing of 590: numerical derivatives. These derivatives are used to compute the 591: hessian matrix of the parameters, that is the inverse of the 592: covariance matrix. They are often used for estimating variances and 593: confidence intervals. Each line consists in indices "ij" 594: followed by the initial scales (zero to simplify) associated with aij 595: and bij. 596: 597: <ul> 598: <li>If mle=1 you can enter zeros:</li> 599: <li><blockquote> 600: <pre># Scales (for hessian or gradient estimation) 601: 12 0. 0. 602: 13 0. 0. 603: 21 0. 0. 604: 23 0. 0. </pre> 605: </blockquote> 606: </li> 607: <li>If mle=0 (no maximisation of Likelihood) you must enter a covariance matrix (usually 608: obtained from an earlier run).</li> 609: </ul> 610: 611: <h4>Covariance matrix of parameters</h4> 612: 613: The covariance matrix is output if <a href="#mle">mle</a>=1. But it can be 614: also used as an input to get the various output data files (Health 615: expectancies, period prevalence etc.) and figures without 616: rerunning the maximisation phase (mle=0). 617: Each line starts with indices "ijk" followed by the 618: covariances between aij and bij: 619: 620: 621: <pre> 622: 121 Var(a12) 623: 122 Cov(b12,a12) Var(b12) 624: ... 625: 232 Cov(b23,a12) Cov(b23,b12) ... Var (b23) </pre> 626: 627: <ul> 628: <li>If mle=1 you can enter zeros. </li> 629: <li><pre># Covariance matrix 630: 121 0. 631: 122 0. 0. 632: 131 0. 0. 0. 633: 132 0. 0. 0. 0. 634: 211 0. 0. 0. 0. 0. 635: 212 0. 0. 0. 0. 0. 0. 636: 231 0. 0. 0. 0. 0. 0. 0. 637: 232 0. 0. 0. 0. 0. 0. 0. 0.</pre> 638: </li> 639: <li>If mle=0 you must enter a covariance matrix (usually 640: obtained from an earlier run). </li> 641: </ul> 642: 643: <h4>Age range for calculation of stationary 644: prevalences and health expectancies</h4> 645: 646: <pre>agemin=70 agemax=100 bage=50 fage=100</pre> 647: 648: 649: Once we obtained the estimated parameters, the program is able 650: to calculate period prevalence, transitions probabilities 651: and life expectancies at any age. Choice of age range is useful 652: for extrapolation. In this example, age of people interviewed varies 653: from 69 to 102 and the model is estimated using their exact ages. But 654: if you are interested in the age-specific period prevalence you can 655: start the simulation at an exact age like 70 and stop at 100. Then the 656: program will draw at least two curves describing the forecasted 657: prevalences of two cohorts, one for healthy people at age 70 and the second 658: for disabled people at the same initial age. And according to the 659: mixing property (ergodicity) and because of recovery, both prevalences 660: will tend to be identical at later ages. Thus if you want to compute 661: the prevalence at age 70, you should enter a lower agemin value. 662: 663: 664: Setting bage=50 (begin age) and fage=100 (final age), let 665: the program compute life expectancy from age 'bage' to age 666: 'fage'. As we use a model, we can interessingly compute life 667: expectancy on a wider age range than the age range from the data. 668: But the model can be rather wrong on much larger intervals. 669: Program is limited to around 120 for upper age! 670: </pre> 671: 672: <ul> 673: <li>agemin= Minimum age for calculation of the 674: period prevalence </li> 675: <li>agemax= Maximum age for calculation of the 676: period prevalence </li> 677: <li>bage= Minimum age for calculation of the health 678: expectancies </li> 679: <li>fage= Maximum age for calculation of the health 680: expectancies </li> 681: </ul> 682: 683: <h4><a name="Computing">Computing</a> the cross-sectional prevalence</h4> 685: 686: <pre>begin-prev-date=1/1/1984 end-prev-date=1/6/1988 estepm=1</pre> 687: 688: 689: Statements 'begin-prev-date' and 'end-prev-date' allow to 690: select the period in which we calculate the observed prevalences 691: in each state. In this example, the prevalences are calculated on 692: data survey collected between 1 january 1984 and 1 june 1988. 693: 694: 695: <ul> 696: <li>begin-prev-date= Starting date 697: (day/month/year)</li> 698: <li>end-prev-date= Final date 699: (day/month/year)</li> 700: <li>estepm= Unit (in months).We compute the 701: life expectancy from trapezoids spaced every estepm 702: months. This is mainly to measure the difference between 703: two models: for example if stepm=24 months pijx are given 704: only every 2 years and by summing them we are calculating 705: an estimate of the Life Expectancy assuming a linear 706: progression inbetween and thus overestimating or 707: underestimating according to the curvature of the 708: survival function. If, for the same date, we estimate the 709: model with stepm=1 month, we can keep estepm to 24 months 710: to compare the new estimate of Life expectancy with the 711: same linear hypothesis. A more precise result, taking 712: into account a more precise curvature will be obtained if 713: estepm is as small as stepm.</li> 714: </ul> 715: 716: <h4>Population- or status-based health 717: expectancies</h4> 718: 719: <pre>pop_based=0</pre> 720: 721: The program computes status-based health expectancies, i.e health 722: expectancies which depend on the initial health state. If you are 723: healthy, your healthy life expectancy (e11) is higher than if you were 724: disabled (e21, with e11 > e21). To compute a healthy life 725: expectancy 'independent' of the initial status we have to weight e11 726: and e21 according to the probability to be in each state at initial 727: age which are corresponding to the proportions of people in each health 728: state (cross-sectional prevalences). 729: 730: We could also compute e12 and e12 and get e.2 by weighting them 731: according to the observed cross-sectional prevalences at initial age. 732: In a similar way we could compute the total life expectancy by 733: summing e.1 and e.2 . 734: 735: The main difference between 'population based' and 'implied' or 736: 'period' consists in the weights used. 'Usually', cross-sectional 737: prevalences of disability are higher than period prevalences 738: particularly at old ages. This is true if the country is improving its 739: health system by teaching people how to prevent disability as by 740: promoting better screening, for example of people needing cataracts 741: surgeryand for many unknown reasons that this program may help to 742: discover. Then the proportion of disabled people at age 90 will be 743: lower than the current observed proportion. 744: 745: Thus a better Health Expectancy and even a better Life Expectancy 746: value is given by forecasting not only the current lower mortality at 747: all ages but also a lower incidence of disability and higher recovery. 748: Using the period prevalences as weight instead of the 749: cross-sectional prevalences we are computing indices which are more 750: specific to the current situations and therefore more useful to 751: predict improvements or regressions in the future as to compare 752: different policies in various countries. 753: 754: <ul> 755: <li>popbased= 0 Health expectancies are computed 756: at each age from period prevalences 'expected' at this initial 757: age.</li> 758: <li>popbased= 1 Health expectancies are 759: computed at each age from cross-sectional 'observed' prevalence at 760: this initial age. As all the population is not observed at the 761: same exact date we define a short period were the observed 762: prevalence can be computed. 763: 764: We simply sum all people surveyed within these two exact dates 765: who belong to a particular age group (single year) at the date of 766: interview and being in a particular health state. Then it is easy to 767: get the proportion of people of a particular health status among all 768: people of the same age group. 769: 770: If both dates are spaced and are covering two waves or more, people 771: being interviewed twice or more are counted twice or more. The program 772: takes into account the selection of individuals interviewed between 773: firstpass and lastpass too (we don't know if it can be useful). 774: </li> 775: </ul> 776: 777: <h4>Prevalence forecasting (Experimental)</h4> 778: 779: <pre>starting-proj-date=1/1/1989 final-proj-date=1/1/1992 mov_average=0 </pre> 780: 781: Prevalence and population projections are only available if 782: the interpolation unit is a month, i.e. stepm=1 and if there are 783: no covariate. The programme estimates the prevalence in each 784: state at a precise date expressed in day/month/year. The 785: programme computes one forecasted prevalence a year from a 786: starting date (1 january of 1989 in this example) to a final date 787: (1 january 1992). The statement mov_average allows to compute 788: smoothed forecasted prevalences with a five-age moving average 789: centered at the mid-age of the five-age period. 790: 791: 792: <h4>Population forecasting (Experimental)</h4> 793: 794: <ul> 795: <li>starting-proj-date= starting date 796: (day/month/year) of forecasting</li> 797: <li>final-proj-date= final date 798: (day/month/year) of forecasting</li> 799: <li>mov_average= smoothing with a five-age 800: moving average centered at the mid-age of the five-age 801: period. The command mov_average takes 802: value 1 if the prevalences are smoothed and 0 otherwise.</li> 803: </ul> 804: 805: 806: <ul type="disc"> 807: <li>popforecast= 808: 0 Option for population forecasting. If 809: popforecast=1, the programme does the forecasting.</li> 810: <li>popfile= 811: name of the population file</li> 812: <li>popfiledate= 813: date of the population population</li> 814: <li>last-popfiledate= 815: date of the last population projection </li> 816: </ul> 817: 818: <hr> 819: 820: <h2><a name="running"></a>Running Imach 821: with this example</h2> 822: 823: We assume that you already typed your <a href="biaspar.imach">1st_example 824: parameter file</a> as explained <a href="#biaspar">above</a>. 825: 826: To run the program under Windows you should either: 827: 828: 829: <ul> 830: <li>click on the imach.exe icon and either: 831: <ul> 832: <li>enter the name of the 833: parameter file which is for example <tt> 834: C:\home\myname\lsoa\biaspar.imach"</tt></li> 835: <li>or locate the biaspar.imach icon in your folder such as 836: <tt>C:\home\myname\lsoa</tt> 837: and drag it, with your mouse, on the already open imach window. </li> 838: </ul> 839: 840: <li>With version (0.97b) if you ran setup at installation, Windows is 841: supposed to understand the ".imach" extension and you can 842: right click the biaspar.imach icon and either edit with wordpad 843: (better than notepad) the parameter file or execute it with 844: IMaCh. </li> 845: </ul> 846: 847: The time to converge depends on the step unit that you used (1 848: month is more precise but more cpu consuming), on the number of cases, 849: and on the number of variables (covariates). 850: 851: 852: The program outputs many files. Most of them are files which will be 853: plotted for better understanding. 854: 855: 856: To run under Linux it is mostly the same. 857: 858: It is neither more difficult to run it under a MacIntosh. 859: <hr> 860: 861: <h2><a name="output">Output of the program 862: and graphs </a></h2> 863: 864: Once the optimization is finished (once the convergence is 865: reached), many tables and graphics are produced. 866: The IMaCh program will create a subdirectory of the same name as your 867: parameter file (here mypar) where all the tables and figures will be 868: stored. 869: 870: Important files like the log file and the output parameter file (which 871: contains the estimates of the maximisation) are stored at the main 872: level not in this subdirectory. File with extension .log and .txt can 873: be edited with a standard editor like wordpad or notepad or even can be 874: viewed with a browser like Internet Explorer or Mozilla. 875: 876: The main html file is also named with the same name <a 877: href="biaspar.htm">biaspar.htm</a>. You can click on it by holding 878: your shift key in order to open it in another window (Windows). 879: 880: Our grapher is Gnuplot, it is an interactive plotting program (GPL) which 881: can also work in batch. A gnuplot reference manual is available <a 882: href="http://www.gnuplot.info/">here</a>. When the run is 883: finished, and in order that the window doesn't disappear, the user 884: should enter a character like <tt>q</tt> for quitting. These 885: characters are: 886: 887: <ul> 888: <li>'e' for opening the main result html file <a 889: href="biaspar.htm">biaspar.htm</a> file to edit 890: the output files and graphs. </li> 891: <li>'g' to graph again</li> 892: <li>'c' to start again the program from the beginning.</li> 893: <li>'q' for exiting.</li> 894: </ul> 895: 896: The main gnuplot file is named <tt>biaspar.gp</tt> and can be edited (right 897: click) and run again. 898: Gnuplot is easy and you can use it to make more complex 899: graphs. Just click on gnuplot and type plot sin(x) to see how easy it 900: is. 901: 902: 903: <h5>Results files 904: 905: - <a 906: name="cross-sectional prevalence in each state">cross-sectional prevalence in each state</a> (and at first pass): 909: <a href="biaspar/prbiaspar.txt">biaspar/prbiaspar.txt</a> 910: </h5> 911: 912: The first line is the title and displays each field of the 913: file. First column corresponds to age. Fields 2 and 6 are the 914: proportion of individuals in states 1 and 2 respectively as 915: observed at first exam. Others fields are the numbers of 916: people in states 1, 2 or more. The number of columns increases if 917: the number of states is higher than 2. 918: The header of the file is 919: 920: <pre># Age Prev(1) N(1) N Age Prev(2) N(2) N 921: 70 1.00000 631 631 70 0.00000 0 631 922: 71 0.99681 625 627 71 0.00319 2 627 923: 72 0.97125 1115 1148 72 0.02875 33 1148 </pre> 924: 925: It means that at age 70 (between 70 and 71), the prevalence in state 1 is 1.000 926: and in state 2 is 0.00 . At age 71 the number of individuals in 927: state 1 is 625 and in state 2 is 2, hence the total number of 928: people aged 71 is 625+2=627. 929: 930: 931: <h5>- Estimated parameters and 932: covariance matrix: <a href="rbiaspar.txt">rbiaspar.imach</a></h5> 933: 934: This file contains all the maximisation results: 935: 936: <pre> -2 log likelihood= 21660.918613445392 937: Estimated parameters: a12 = -12.290174 b12 = 0.092161 938: a13 = -9.155590 b13 = 0.046627 939: a21 = -2.629849 b21 = -0.022030 940: a23 = -7.958519 b23 = 0.042614 941: Covariance matrix: Var(a12) = 1.47453e-001 942: Var(b12) = 2.18676e-005 943: Var(a13) = 2.09715e-001 944: Var(b13) = 3.28937e-005 945: Var(a21) = 9.19832e-001 946: Var(b21) = 1.29229e-004 947: Var(a23) = 4.48405e-001 948: Var(b23) = 5.85631e-005 949: </pre> 950: 951: By substitution of these parameters in the regression model, 952: we obtain the elementary transition probabilities: 953: 954: <img src="biaspar/pebiaspar11.png" width="400" height="300"> 955: 956: <h5>- Transition probabilities: 957: <a href="biaspar/pijrbiaspar.txt">biaspar/pijrbiaspar.txt</a></h5> 958: 959: Here are the transitions probabilities Pij(x, x+nh). The second 960: column is the starting age x (from age 95 to 65), the third is age 961: (x+nh) and the others are the transition probabilities p11, p12, p13, 962: p21, p22, p23. The first column indicates the value of the covariate 963: (without any other variable than age it is equal to 1) For example, line 5 of the file 964: is: 965: 966: <pre>1 100 106 0.02655 0.17622 0.79722 0.01809 0.13678 0.84513 </pre> 967: 968: and this means: 969: 970: <pre>p11(100,106)=0.02655 971: p12(100,106)=0.17622 972: p13(100,106)=0.79722 973: p21(100,106)=0.01809 974: p22(100,106)=0.13678 975: p22(100,106)=0.84513 </pre> 976: 977: <h5>- <a 978: name="Period prevalence in each state">Period prevalence in each state</a>: 980: <a href="biaspar/plrbiaspar.txt">biaspar/plrbiaspar.txt</a></h5> 981: 982: <pre>#Prevalence 983: #Age 1-1 2-2 984: 985: #************ 986: 70 0.90134 0.09866 987: 71 0.89177 0.10823 988: 72 0.88139 0.11861 989: 73 0.87015 0.12985 </pre> 990: 991: At age 70 the period prevalence is 0.90134 in state 1 and 0.09866 992: in state 2. This period prevalence differs from the cross-sectional 993: prevalence. Here is the point. The cross-sectional prevalence at age 994: 70 results from the incidence of disability, incidence of recovery and 995: mortality which occurred in the past of the cohort. Period prevalence 996: results from a simulation with current incidences of disability, 997: recovery and mortality estimated from this cross-longitudinal 998: survey. It is a good predictin of the prevalence in the 999: future if "nothing changes in the future". This is exactly 1000: what demographers do with a period life table. Life expectancy is the 1001: expected mean survival time if current mortality rates (age-specific incidences 1002: of mortality) "remain constant" in the future. 1003: 1004: <h5>- Standard deviation of 1005: period prevalence: <a 1006: href="biaspar/vplrbiaspar.txt">biaspar/vplrbiaspar.txt</a></h5> 1007: 1008: The period prevalence has to be compared with the cross-sectional 1009: prevalence. But both are statistical estimates and therefore 1010: have confidence intervals. 1011: For the cross-sectional prevalence we generally need information on 1012: the design of the surveys. It is usually not enough to consider the 1013: number of people surveyed at a particular age and to estimate a 1014: Bernouilli confidence interval based on the prevalence at that 1015: age. But you can do it to have an idea of the randomness. At least you 1016: can get a visual appreciation of the randomness by looking at the 1017: fluctuation over ages. 1018: 1019: For the period prevalence it is possible to estimate the 1020: confidence interval from the Hessian matrix (see the publication for 1021: details). We are supposing that the design of the survey will only 1022: alter the weight of each individual. IMaCh is scaling the weights of 1023: individuals-waves contributing to the likelihood by making the sum of 1024: the weights equal to the sum of individuals-waves contributing: a 1025: weighted survey doesn't increase or decrease the size of the survey, 1026: it only give more weights to some individuals and thus less to the 1027: others. 1028: 1029: <h5>-cross-sectional and period 1030: prevalence in state (2=disable) with confidence interval: 1031: <a href="biaspar/vbiaspar21.htm">biaspar/vbiaspar21.png</a></h5> 1032: 1033: This graph exhibits the period prevalence in state (2) with the 1034: confidence interval in red. The green curve is the observed prevalence 1035: (or proportion of individuals in state (2)). Without discussing the 1036: results (it is not the purpose here), we observe that the green curve 1037: is rather below the period prevalence. It the data where not biased by 1038: the non inclusion of people living in institutions we would have 1039: concluded that the prevalence of disability will increase in the 1040: future (see the main publication if you are interested in real data 1041: and results which are opposite). 1042: 1043: <img src="biaspar/vbiaspar21.png" width="400" height="300"> 1044: 1045: <h5>-Convergence to the 1046: period prevalence of disability: <a 1047: href="biaspar/pbiaspar11.png">biaspar/pbiaspar11.png</a> 1048: <img src="biaspar/pbiaspar11.png" width="400" height="300"> </h5> 1049: 1050: This graph plots the conditional transition probabilities from 1051: an initial state (1=healthy in red at the bottom, or 2=disable in 1052: green on top) at age x to the final state 2=disable at 1053: age x+h. Conditional means at the condition to be alive 1054: at age x+h which is hP12x + hP22x. The 1055: curves hP12x/(hP12x + hP22x) and hP22x/(hP12x 1056: + hP22x) converge with h, to the period 1057: prevalence of disability. In order to get the period 1058: prevalence at age 70 we should start the process at an earlier 1059: age, i.e.50. If the disability state is defined by severe 1060: disability criteria with only a few chance to recover, then the 1061: incidence of recovery is low and the time to convergence is 1062: probably longer. But we don't have experience yet. 1063: 1064: <h5>- Life expectancies by age 1065: and initial health status with standard deviation: <a 1066: href="biaspar/erbiaspar.txt">biaspar/erbiaspar.txt</a></h5> 1067: 1068: <pre># Health expectancies 1069: # Age 1-1 (SE) 1-2 (SE) 2-1 (SE) 2-2 (SE) 1070: 70 11.0180 (0.1277) 3.1950 (0.3635) 4.6500 (0.0871) 4.4807 (0.2187) 1071: 71 10.4786 (0.1184) 3.2093 (0.3212) 4.3384 (0.0875) 4.4820 (0.2076) 1072: 72 9.9551 (0.1103) 3.2236 (0.2827) 4.0426 (0.0885) 4.4827 (0.1966) 1073: 73 9.4476 (0.1035) 3.2379 (0.2478) 3.7621 (0.0899) 4.4825 (0.1858) 1074: 74 8.9564 (0.0980) 3.2522 (0.2165) 3.4966 (0.0920) 4.4815 (0.1754) 1075: 75 8.4815 (0.0937) 3.2665 (0.1887) 3.2457 (0.0946) 4.4798 (0.1656) 1076: 76 8.0230 (0.0905) 3.2806 (0.1645) 3.0090 (0.0979) 4.4772 (0.1565) 1077: 77 7.5810 (0.0884) 3.2946 (0.1438) 2.7860 (0.1017) 4.4738 (0.1484) 1078: 78 7.1554 (0.0871) 3.3084 (0.1264) 2.5763 (0.1062) 4.4696 (0.1416) 1079: 79 6.7464 (0.0867) 3.3220 (0.1124) 2.3794 (0.1112) 4.4646 (0.1364) 1080: 80 6.3538 (0.0868) 3.3354 (0.1014) 2.1949 (0.1168) 4.4587 (0.1331) 1081: 81 5.9775 (0.0873) 3.3484 (0.0933) 2.0222 (0.1230) 4.4520 (0.1320) 1082: </pre> 1083: 1084: <pre>For example 70 11.0180 (0.1277) 3.1950 (0.3635) 4.6500 (0.0871) 4.4807 (0.2187) 1085: means 1086: e11=11.0180 e12=3.1950 e21=4.6500 e22=4.4807 </pre> 1087: 1088: <pre><img src="biaspar/expbiaspar21.png" width="400" height="300"><img 1089: src="biaspar/expbiaspar11.png" width="400" height="300"></pre> 1090: 1091: For example, life expectancy of a healthy individual at age 70 1092: is 11.0 in the healthy state and 3.2 in the disability state 1093: (total of 14.2 years). If he was disable at age 70, his life expectancy 1094: will be shorter, 4.65 years in the healthy state and 4.5 in the 1095: disability state (=9.15 years). The total life expectancy is a 1096: weighted mean of both, 14.2 and 9.15. The weight is the proportion 1097: of people disabled at age 70. In order to get a period index 1098: (i.e. based only on incidences) we use the <a 1099: href="#Period prevalence in each state">stable or 1100: period prevalence</a> at age 70 (i.e. computed from 1101: incidences at earlier ages) instead of the <a 1102: href="#cross-sectional prevalence in each state">cross-sectional prevalence</a> 1103: (observed for example at first medical exam) (<a href="#Health expectancies">see 1104: below</a>). 1105: 1106: <h5>- Variances of life 1107: expectancies by age and initial health status: <a 1108: href="biaspar/vrbiaspar.txt">biaspar/vrbiaspar.txt</a></h5> 1109: 1110: For example, the covariances of life expectancies Cov(ei,ej) 1111: at age 50 are (line 3) 1112: 1113: <pre> Cov(e1,e1)=0.4776 Cov(e1,e2)=0.0488=Cov(e2,e1) Cov(e2,e2)=0.0424</pre> 1114: 1115: <h5>-Variances of one-step 1116: probabilities : <a href="biaspar/probrbiaspar.txt">biaspar/probrbiaspar.txt</a></h5> 1117: 1118: For example, at age 65 1119: 1120: <pre> p11=9.960e-001 standard deviation of p11=2.359e-004</pre> 1121: 1122: <h5>- <a 1123: name="Health expectancies">Health 1124: expectancies</a> 1125: with standard errors in parentheses: <a 1126: href="biaspar/trbiaspar.txt">biaspar/trbiaspar.txt</a></h5> 1127: 1128: <pre>#Total LEs with variances: e.. (std) e.1 (std) e.2 (std) </pre> 1129: 1130: <pre>70 13.26 (0.22) 9.95 (0.20) 3.30 (0.14) </pre> 1131: 1132: Thus, at age 70 the total life expectancy, e..=13.26 years is 1133: the weighted mean of e1.=13.46 and e2.=11.35 by the period 1134: prevalences at age 70 which are 0.90134 in state 1 and 0.09866 in 1135: state 2 respectively (the sum is equal to one). e.1=9.95 is the 1136: Disability-free life expectancy at age 70 (it is again a weighted 1137: mean of e11 and e21). e.2=3.30 is also the life expectancy at age 1138: 70 to be spent in the disability state. 1139: 1140: <h5>-Total life expectancy by 1141: age and health expectancies in states (1=healthy) and (2=disable): 1142: <a href="biaspar/ebiaspar1.png">biaspar/ebiaspar1.png</a></h5> 1143: 1144: This figure represents the health expectancies and the total 1145: life expectancy with a confidence interval (dashed line). 1146: 1147: <pre> <img src="biaspar/ebiaspar1.png" width="400" height="300"></pre> 1148: 1149: Standard deviations (obtained from the information matrix of 1150: the model) of these quantities are very useful. 1151: Cross-longitudinal surveys are costly and do not involve huge 1152: samples, generally a few thousands; therefore it is very 1153: important to have an idea of the standard deviation of our 1154: estimates. It has been a big challenge to compute the Health 1155: Expectancy standard deviations. Don't be confuse: life expectancy 1156: is, as any expected value, the mean of a distribution; but here 1157: we are not computing the standard deviation of the distribution, 1158: but the standard deviation of the estimate of the mean. 1159: 1160: Our health expectancies estimates vary according to the sample 1161: size (and the standard deviations give confidence intervals of 1162: the estimates) but also according to the model fitted. Let us 1163: explain it in more details. 1164: 1165: Choosing a model means at least two kind of choices. At first we 1166: have to decide the number of disability states. And at second we have to 1167: design, within the logit model family, the model itself: variables, 1168: covariables, confounding factors etc. to be included. 1169: 1170: More disability states we have, better is our demographical 1171: approach of the disability process, but smaller are the number of 1172: transitions between each state and higher is the noise in the 1173: measurement. We do not have enough experiments of the various 1174: models to summarize the advantages and disadvantages, but it is 1175: important to say that even if we had huge and unbiased samples, 1176: the total life expectancy computed from a cross-longitudinal 1177: survey, varies with the number of states. If we define only two 1178: states, alive or dead, we find the usual life expectancy where it 1179: is assumed that at each age, people are at the same risk to die. 1180: If we are differentiating the alive state into healthy and 1181: disable, and as the mortality from the disability state is higher 1182: than the mortality from the healthy state, we are introducing 1183: heterogeneity in the risk of dying. The total mortality at each 1184: age is the weighted mean of the mortality in each state by the 1185: prevalence in each state. Therefore if the proportion of people 1186: at each age and in each state is different from the period 1187: equilibrium, there is no reason to find the same total mortality 1188: at a particular age. Life expectancy, even if it is a very useful 1189: tool, has a very strong hypothesis of homogeneity of the 1190: population. Our main purpose is not to measure differential 1191: mortality but to measure the expected time in a healthy or 1192: disability state in order to maximise the former and minimize the 1193: latter. But the differential in mortality complexifies the 1194: measurement. 1195: 1196: Incidences of disability or recovery are not affected by the number 1197: of states if these states are independent. But incidences estimates 1198: are dependent on the specification of the model. More covariates we 1199: added in the logit model better is the model, but some covariates are 1200: not well measured, some are confounding factors like in any 1201: statistical model. The procedure to "fit the best model' is 1202: similar to logistic regression which itself is similar to regression 1203: analysis. We haven't yet been sofar because we also have a severe 1204: limitation which is the speed of the convergence. On a Pentium III, 1205: 500 MHz, even the simplest model, estimated by month on 8,000 people 1206: may take 4 hours to converge. Also, the IMaCh program is not a 1207: statistical package, and does not allow sophisticated design 1208: variables. If you need sophisticated design variable you have to them 1209: your self and and add them as ordinary variables. IMaCX allows up to 8 1210: variables. The current version of this program allows only to add 1211: simple variables like age+sex or age+sex+ age*sex but will never be 1212: general enough. But what is to remember, is that incidences or 1213: probability of change from one state to another is affected by the 1214: variables specified into the model. 1215: 1216: Also, the age range of the people interviewed is linked 1217: the age range of the life expectancy which can be estimated by 1218: extrapolation. If your sample ranges from age 70 to 95, you can 1219: clearly estimate a life expectancy at age 70 and trust your 1220: confidence interval because it is mostly based on your sample size, 1221: but if you want to estimate the life expectancy at age 50, you 1222: should rely in the design of your model. Fitting a logistic model on a age 1223: range of 70 to 95 and estimating probabilties of transition out of 1224: this age range, say at age 50, is very dangerous. At least you 1225: should remember that the confidence interval given by the 1226: standard deviation of the health expectancies, are under the 1227: strong assumption that your model is the 'true model', which is 1228: probably not the case outside the age range of your sample. 1229: 1230: <h5>- Copy of the parameter 1231: file: <a href="orbiaspar.txt">orbiaspar.txt</a></h5> 1232: 1233: This copy of the parameter file can be useful to re-run the 1234: program while saving the old output files. 1235: 1236: <h5>- Prevalence forecasting: 1237: <a href="biaspar/frbiaspar.txt">biaspar/frbiaspar.txt</a></h5> 1238: 1239: 1240: 1241: First, 1242: we have estimated the observed prevalence between 1/1/1984 and 1243: 1/6/1988 (June, European syntax of dates). The mean date of all interviews (weighted average of the 1244: interviews performed between 1/1/1984 and 1/6/1988) is estimated 1245: to be 13/9/1985, as written on the top on the file. Then we 1246: forecast the probability to be in each state. 1247: 1248: 1249: For example on 1/1/1989 : 1250: 1251: <pre class="MsoNormal"># StartingAge FinalAge P.1 P.2 P.3 1252: # Forecasting at date 1/1/1989 1253: 73 0.807 0.078 0.115</pre> 1254: 1255: 1256: 1257: Since the minimum age is 70 on the 13/9/1985, the youngest forecasted 1258: age is 73. This means that at age a person aged 70 at 13/9/1989 has a 1259: probability to enter state1 of 0.807 at age 73 on 1/1/1989. 1260: Similarly, the probability to be in state 2 is 0.078 and the 1261: probability to die is 0.115. Then, on the 1/1/1989, the prevalence of 1262: disability at age 73 is estimated to be 0.088. 1263: 1264: <h5>- Population forecasting: 1265: <a href="biaspar/poprbiaspar.txt">biaspar/poprbiaspar.txt</a></h5> 1266: 1267: <pre># Age P.1 P.2 P.3 [Population] 1268: # Forecasting at date 1/1/1989 1269: 75 572685.22 83798.08 1270: 74 621296.51 79767.99 1271: 73 645857.70 69320.60 </pre> 1272: 1273: <pre># Forecasting at date 1/1/19909 1274: 76 442986.68 92721.14 120775.48 1275: 75 487781.02 91367.97 121915.51 1276: 74 512892.07 85003.47 117282.76 </pre> 1277: 1278: From the population file, we estimate the number of people in 1279: each state. At age 73, 645857 persons are in state 1 and 69320 1280: are in state 2. One year latter, 512892 are still in state 1, 1281: 85003 are in state 2 and 117282 died before 1/1/1990. 1282: 1283: <hr> 1284: 1285: <h2><a name="example"></a>Trying an example</h2> 1286: 1287: Since you know how to run the program, it is time to test it 1288: on your own computer. Try for example on a parameter file named <a 1289: href="imachpar.imach">imachpar.imach</a> which is a copy 1290: of mypar.imach included 1291: in the subdirectory of imach, mytry. 1292: Edit it and change the name of the data file to mydata.txt if you don't want to 1294: copy it on the same directory. The file mydata.txt 1295: is a smaller file of 3,000 people but still with 4 waves. 1296: 1297: Right click on the .imach file and a window will popup with the 1298: string 'Enter the parameter file name:' 1299: 1300: <table border="1"> 1301: <tr> 1302: <td width="100%">IMACH, Version 0.97bEnter 1303: the parameter file name: imachpar.imach 1304: </td> 1305: </tr> 1306: </table> 1307: 1308: Most of the data files or image files generated, will use the 1309: 'imachpar' string into their name. The running time is about 2-3 1310: minutes on a Pentium III. If the execution worked correctly, the 1311: outputs files are created in the current directory, and should be 1312: the same as the mypar files initially included in the directory mytry. 1314: 1315: <ul> 1316: <li><pre>Output on the screen The output screen looks like <a 1317: href="biaspar.log">biaspar.log</a> 1318: # 1319: title=MLE datafile=mydaiata.txt lastobs=3000 firstpass=1 lastpass=3 1320: ftol=1.000000e-008 stepm=24 ncovcol=2 nlstate=2 ndeath=1 maxwav=4 mle=1 weight=0</pre> 1321: </li> 1322: <li><pre>Total number of individuals= 2965, Agemin = 70.00, Agemax= 100.92 1323: 1324: Warning, no any valid information for:126 line=126 1325: Warning, no any valid information for:2307 line=2307 1326: Delay (in months) between two waves Min=21 Max=51 Mean=24.495826 1327: These lines give some warnings on the data file and also some raw statistics on frequencies of transitions. 1328: Age 70 1.=230 loss[1]=3.5% 2.=16 loss[2]=12.5% 1.=222 prev[1]=94.1% 2.=14 1329: prev[2]=5.9% 1-1=8 11=200 12=7 13=15 2-1=2 21=6 22=7 23=1 1330: Age 102 1.=0 loss[1]=NaNQ% 2.=0 loss[2]=NaNQ% 1.=0 prev[1]=NaNQ% 2.=0 </pre> 1331: </li> 1332: </ul> 1333: It includes some warnings or errors which are very important for 1334: you. Be careful with such warnings because your results may be biased 1335: if, for example, you have people who accepted to be interviewed at 1336: first pass but never after. Or if you don't have the exact month of 1337: death. In such cases IMaCh doesn't take any initiative, it does only 1338: warn you. It is up to you to decide what to do with these 1339: people. Excluding them is usually a wrong decision. It is better to 1340: decide that the month of death is at the mid-interval between the last 1341: two waves for example. 1342: 1343: If you survey suffers from severe attrition, you have to analyse the 1344: characteristics of the lost people and overweight people with same 1345: characteristics for example. 1346: 1347: By default, IMaCH warns and excludes these problematic people, but you 1348: have to be careful with such results. 1349: 1350:   1351: 1352: <ul> 1353: <li>Maximisation with the Powell algorithm. 8 directions are 1354: given corresponding to the 8 parameters. this can be 1355: rather long to get convergence. 1356: 1357: Powell iter=1 -2*LL=11531.405658264877 1 0.000000000000 2 1358: 0.000000000000 3 1359: 0.000000000000 4 0.000000000000 5 0.000000000000 6 1360: 0.000000000000 7 1361: 0.000000000000 8 0.000000000000 1362: 1..........2.................3..........4.................5......... 1363: 6................7........8............... 1364: Powell iter=23 -2*LL=6744.954108371555 1 -12.967632334283 1365: 1366: 2 0.135136681033 3 -7.402109728262 4 0.067844593326 1367: 5 -0.673601538129 6 -0.006615504377 7 -5.051341616718 1368: 8 0.051272038506 1369: 1..............2...........3..............4........... 1370: 5..........6................7...........8......... 1371: #Number of iterations = 23, -2 Log likelihood = 1372: 6744.954042573691 1373: # Parameters 1374: 12 -12.966061 0.135117 1375: 13 -7.401109 0.067831 1376: 21 -0.672648 -0.006627 1377: 23 -5.051297 0.051271 1378: </li> 1379: <li><pre>Calculation of the hessian matrix. Wait... 1380: 12345678.12.13.14.15.16.17.18.23.24.25.26.27.28.34.35.36.37.38.45.46.47.48.56.57.58.67.68.78 1381: 1382: Inverting the hessian to get the covariance matrix. Wait... 1383: 1384: #Hessian matrix# 1385: 3.344e+002 2.708e+004 -4.586e+001 -3.806e+003 -1.577e+000 -1.313e+002 3.914e-001 3.166e+001 1386: 2.708e+004 2.204e+006 -3.805e+003 -3.174e+005 -1.303e+002 -1.091e+004 2.967e+001 2.399e+003 1387: -4.586e+001 -3.805e+003 4.044e+002 3.197e+004 2.431e-002 1.995e+000 1.783e-001 1.486e+001 1388: -3.806e+003 -3.174e+005 3.197e+004 2.541e+006 2.436e+000 2.051e+002 1.483e+001 1.244e+003 1389: -1.577e+000 -1.303e+002 2.431e-002 2.436e+000 1.093e+002 8.979e+003 -3.402e+001 -2.843e+003 1390: -1.313e+002 -1.091e+004 1.995e+000 2.051e+002 8.979e+003 7.420e+005 -2.842e+003 -2.388e+005 1391: 3.914e-001 2.967e+001 1.783e-001 1.483e+001 -3.402e+001 -2.842e+003 1.494e+002 1.251e+004 1392: 3.166e+001 2.399e+003 1.486e+001 1.244e+003 -2.843e+003 -2.388e+005 1.251e+004 1.053e+006 1393: # Scales 1394: 12 1.00000e-004 1.00000e-006 1395: 13 1.00000e-004 1.00000e-006 1396: 21 1.00000e-003 1.00000e-005 1397: 23 1.00000e-004 1.00000e-005 1398: # Covariance 1399: 1 5.90661e-001 1400: 2 -7.26732e-003 8.98810e-005 1401: 3 8.80177e-002 -1.12706e-003 5.15824e-001 1402: 4 -1.13082e-003 1.45267e-005 -6.50070e-003 8.23270e-005 1403: 5 9.31265e-003 -1.16106e-004 6.00210e-004 -8.04151e-006 1.75753e+000 1404: 6 -1.15664e-004 1.44850e-006 -7.79995e-006 1.04770e-007 -2.12929e-002 2.59422e-004 1405: 7 1.35103e-003 -1.75392e-005 -6.38237e-004 7.85424e-006 4.02601e-001 -4.86776e-003 1.32682e+000 1406: 8 -1.82421e-005 2.35811e-007 7.75503e-006 -9.58687e-008 -4.86589e-003 5.91641e-005 -1.57767e-002 1.88622e-004 1407: # agemin agemax for lifexpectancy, bage fage (if mle==0 ie no data nor Max likelihood). 1408: 1409: 1410: agemin=70 agemax=100 bage=50 fage=100 1411: Computing prevalence limit: result on file 'plrmypar.txt' 1412: Computing pij: result on file 'pijrmypar.txt' 1413: Computing Health Expectancies: result on file 'ermypar.txt' 1414: Computing Variance-covariance of DFLEs: file 'vrmypar.txt' 1415: Computing Total LEs with variances: file 'trmypar.txt' 1416: Computing Variance-covariance of Prevalence limit: file 'vplrmypar.txt' 1417: End of Imach 1418: </pre> 1419: </li> 1420: </ul> 1421: 1422: Once the running is finished, the program 1423: requires a character: 1424: 1425: <table border="1"> 1426: <tr> 1427: <td width="100%">Type e to edit output files, g 1428: to graph again, c to start again, and q for exiting:</td> 1429: </tr> 1430: </table> 1431: 1432: In order to have an idea of the time needed to reach convergence, 1433: IMaCh gives an estimation if the convergence needs 10, 20 or 30 1434: iterations. It might be useful. 1435: 1436: First you should enter e to 1437: edit the master file mypar.htm. 1438: 1439: <ul> 1440: <li>Outputs files 1441: 1442: - Copy of the parameter file: <a href="ormypar.txt">ormypar.txt</a> 1443: - Gnuplot file name: <a href="mypar.gp.txt">mypar.gp.txt</a> 1444: - Cross-sectional prevalence in each state: <a 1445: href="prmypar.txt">prmypar.txt</a> 1446: - Period prevalence in each state: <a 1447: href="plrmypar.txt">plrmypar.txt</a> 1448: - Transition probabilities: <a href="pijrmypar.txt">pijrmypar.txt</a> 1449: - Life expectancies by age and initial health status 1450: (estepm=24 months): <a href="ermypar.txt">ermypar.txt</a> 1451: 1452: - Parameter file with estimated parameters and the 1453: covariance matrix: <a href="rmypar.txt">rmypar.txt</a> 1454: - Variance of one-step probabilities: <a 1455: href="probrmypar.txt">probrmypar.txt</a> 1456: - Variances of life expectancies by age and initial 1457: health status (estepm=24 months): <a href="vrmypar.txt">vrmypar.txt</a> 1458: - Health expectancies with their variances: <a 1459: href="trmypar.txt">trmypar.txt</a> 1460: - Standard deviation of period prevalences: <a 1461: href="vplrmypar.txt">vplrmypar.txt</a> 1462: No population forecast: popforecast = 0 (instead of 1) or 1463: stepm = 24 (instead of 1) or model=. (instead of .) 1464: 1465: </li> 1466: <li>Graphs 1467: 1468: -<a href="../mytry/pemypar1.gif">One-step transition 1469: probabilities</a> 1470: -<a href="../mytry/pmypar11.gif">Convergence to the 1471: period prevalence</a> 1472: -<a href="..\mytry\vmypar11.gif">Cross-sectional and period 1473: prevalence in state (1) with the confident interval</a> 1474: -<a href="..\mytry\vmypar21.gif">Cross-sectional and period 1475: prevalence in state (2) with the confident interval</a> 1476: -<a href="..\mytry\expmypar11.gif">Health life 1477: expectancies by age and initial health state (1)</a> 1478: -<a href="..\mytry\expmypar21.gif">Health life 1479: expectancies by age and initial health state (2)</a> 1480: -<a href="..\mytry\emypar1.gif">Total life expectancy by 1481: age and health expectancies in states (1) and (2).</a> </li> 1482: </ul> 1483: 1484: This software have been partly granted by <a 1485: href="http://euroreves.ined.fr">Euro-REVES</a>, a concerted 1486: action from the European Union. It will be copyrighted 1487: identically to a GNU software product, i.e. program and software 1488: can be distributed freely for non commercial use. Sources are not 1489: widely distributed today. You can get them by asking us with a 1490: simple justification (name, email, institute) <a 1491: href="mailto:brouard@ined.fr">mailto:brouard@ined.fr</a> and <a 1492: href="mailto:lievre@ined.fr">mailto:lievre@ined.fr</a> . 1493: 1494: Latest version (0.97b of June 2004) can be accessed at <a 1495: href="http://euroreves.ined.fr/imach">http://euroreves.ined.fr/imach</a> 1496: 1497: </body> 1498: </html>