Skip to main content

Analysis of the Natural History of Dementia Using Longitudinal Grade of Membership Models

  • Chapter
  • First Online:

Part of the book series: The Springer Series on Demographic Methods and Population Analysis ((PSDE,volume 40))

Abstract

We present a longitudinal form of the Grade of Membership (GoM) model for time-varying covariates, provide a self-contained description of its estimation, and illustrate its application with a substantively meaningful analysis of the progression of dementia among National Long Term Care Survey (NLTCS) respondents. The chapter has two goals—one methodological and the other substantive. Methodologically, we present the Kuhn-Tucker conditions for convergence of the maximum likelihood estimator and show how the associated estimates can be obtained using a new constrained form of the Newton-Raphson iteration algorithm that preserves the summation constraints at each update; we also present and discuss known results regarding the consistency and asymptotic normality of the longitudinal GoM model and offer a conjecture regarding how these results might be extended to the less restrictive cross-sectional form of the GoM model. Substantively, the natural history of dementia is modeled as a complex irreversible multidimensional process governed by a latent three-dimensional bounded state-space process. Individual dementia cases are initially widely dispersed in the latent state space. Over time, they move to state-space locations associated with severe cognitive and physical impairment and dramatically increased need for care. The application to the NLTCS data has been independently validated using Alzheimer’s disease data from the two cohorts of the Predictors Study.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Woodbury et al. (1978) implemented a set of Newton-Raphson procedures for the cross-sectional GoM model but found it necessary to change the equation for p ijl (without a time index) to the form:

    $$ {p}_{ijl} = {\mathbf{g}}_i^{\mathbf{\prime}}{\boldsymbol{\uplambda}}_{m_{jl}}/{\displaystyle \sum_{l^{\prime }}{\mathbf{g}}_i^{\mathbf{\prime}}{\boldsymbol{\uplambda}}_{m_{jl\prime }}} $$

    which allowed the g- and λ-parameters to be estimated without Lagrange side conditions by removing the summation constraints on the λ-parameters.

    Although this modification facilitated the estimation of g- and λ-parameters, it constituted a fundamental change in the parametric specification of the model in which the λ-parameters were no longer interpretable as probabilities. An alternative approach is to modify the Newton-Raphson procedure to be consistent with all of the convexity constraints.

  2. 2.

    Bradley and Gart’s (1962) proof of uniqueness of the consistent estimator in their Theorem 2 (iii) follows that of Chanda’s (1954) Theorem 2 and, hence, suffers from the deficiency noted by Tarone and Gruenhage (1975) who provided a corrected proof in their Theorem 2′. This means that Bradley and Gart’s (1962) result on uniqueness can be proven following Tarone and Gruenhage’s (1975) Theorem 2′, but not Chanda’s (1954) Theorem 2. Our appeal to Bradley and Gart’s (1962) Theorem 2 assumes that Tarone and Gruenhage’s (1975) correction has been made, even though Bradley and Gart did not actually do it.

  3. 3.

    N* was denoted as N** and the corresponding BIC measure as BIC2 in Stallard et al. (2010).

References

  • Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov & F. Caski (Eds.), Second international symposium on information theory (pp. 267–281). Budapest: Akademiai Kiado.

    Google Scholar 

  • Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, AC–19(6), 716–723.

    Article  Google Scholar 

  • Alzheimer’s Association. (2016). 2016 Alzheimer’s disease facts and figures. Chicago: Alzheimer’s Association.

    Google Scholar 

  • Berkman, L., Singer, B., & Manton, K. G. (1989). Black/white differences in health status and mortality among the elderly. Demography, 26(4), 661–678.

    Article  Google Scholar 

  • Billingsley, P. (1986). Probability and measure (2nd ed.). New York: Wiley.

    Google Scholar 

  • Birch, M. W. (1964). A new proof of the Pearson-Fisher theorem. Annals of Mathematical Statistics, 35(2), 817–824.

    Article  Google Scholar 

  • Bradley, R. A., & Gart, J. J. (1962). The asymptotic properties of ML estimators when sampling from associated populations. Biometrika, 49(1-2), 205–214.

    Article  Google Scholar 

  • Chanda, K. C. (1954). A note on the consistency and maxima of the roots of likelihood equations. Biometrika, 41(1/2), 56–61.

    Article  Google Scholar 

  • Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. New York: Wiley.

    Book  Google Scholar 

  • Cox, D. R. (1972). Regression models and life-tables (with discussion). Journal of the Royal Statistical Society: Series B (Methodological), 34(2), 187–220.

    Google Scholar 

  • Dooneief, G., Marder, K., Tang, M. X., & Stern, Y. (1996). The clinical dementia rating scale: Community-based validation of ‘profound’ and ‘terminal’ stages. Neurology, 46(6), 1746–1749.

    Article  Google Scholar 

  • Eisdorfer, C., Cohen, D., Paveza, G. J., Ashford, J. W., Luchins, D. J., Gorelick, P. B., Hirschman, R. S., Freels, S. A., Levy, P. S., Semla, T. P., & Shaw, H. A. (1992). An empirical evaluation of the global deterioration scale for staging Alzheimer’s disease. American Journal of Psychiatry, 149(2), 190–194.

    Article  Google Scholar 

  • Erosheva, E. A. 2002. Grade of Membership and Latent Structure models with application to disability survey data. Ph.D. dissertation thesis, Department of Statistics Carnegie Mellon University, Pittsburgh, PA. http://www.stat.cmu.edu/~fienberg/NLTCS_Models/Erosheva-thesis-2002.pdf

  • Feller, W. (1971). An introduction to probability theory and its applications (2nd ed., Vol. II). New York: Wiley.

    Google Scholar 

  • Fillenbaum, G. G., & Woodbury, M. A. (1998). Typology of Alzheimer’s disease: Findings from CERAD data. Aging and Mental Health, 2(2), 105–127.

    Article  Google Scholar 

  • Folstein, M. F., Folstein, S. E., & McHugh, P. R. (1975). “Mini-mental state”: A practical method for grading the cognitive state of patients for the clinician. Journal of Psychiatric Research, 12 (3), 189–198.

    Article  Google Scholar 

  • Freedman, D. A. (2006). On the so-called “Huber sandwich estimator” and “robust standard errors”. The American Statistician, 60(4), 299–302.

    Article  Google Scholar 

  • Freedman, V. A., Martin, L. G., & Schoeni, R. F. (2002). Recent trends in disability and functioning among older adults in the United States: A systematic review. Journal of the American Medical Association, 288(24), 3137–3146.

    Article  Google Scholar 

  • Gaenssler, P., & Wellner, J. A. (1981). Glivenko–Cantelli theorems. In S. Kotz, N. L. Johnson, & C. B. Read (Eds.), Encyclopedia of statistical sciences (Vol. 3). New York: Wiley.

    Google Scholar 

  • Green, C. (2007). Modelling disease progression in Alzheimer’s disease: A review of modelling methods used for cost-effectiveness analysis. PharmacoEconomics, 25(9), 735–750.

    Article  Google Scholar 

  • Green, C., Shearer, J., Ritchie, C. W., & Zajicek, J. P. (2011). Model-based economic evaluation in Alzheimer’s disease: A review of the methods available to model Alzheimer’s disease progression. Value in Health, 14(5), 621–630.

    Article  Google Scholar 

  • Grossberg, G. T., & Desai, A. K. (2003). Management of Alzheimer’s disease. Journal of Gerontology: Medical Sciences, 58A(4), M331–M353.

    Google Scholar 

  • Haberman, S. J. (1995). Book review of “Statistical Applications Using Fuzzy Sets”. Journal of the American Statistical Association, 90(431), 1131–1133.

    Article  Google Scholar 

  • Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. In L. M. Le Cam & J. Neyman (Eds.), Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 221–233). Berkeley: University of California Press.

    Google Scholar 

  • Hughes, C. P., Berg, L., Danziger, W. L., Coben, L. A., & Martin, R. L. (1982). A new clinical scale for the staging of dementia. British Journal of Psychiatry, 140(6), 566–572.

    Article  Google Scholar 

  • Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430), 773–795.

    Article  Google Scholar 

  • Katz, S., & Akpom, C. A. (1976). A measure of primary sociobiological functions. International Journal of Health Services, 6(3), 493–508.

    Article  Google Scholar 

  • Kinosian, B., Stallard, E., Lee, J., Woodbury, M. A., Zbrozek, A., & Glick, H. A. (2000). Predicting 10-year care requirements for older people with suspected Alzheimer’s disease. Journal of the American Geriatrics Society, 48(6), 631–638.

    Article  Google Scholar 

  • Kinosian, B., Stallard, E., Manton, K. G., Straley, D. L., Zbrozek, A., & Glick, H. A. (2004). The expected outcomes and costs of U.S. patients with incident suspected Alzheimer’s Disease (AD) over 15 years. In Abstract of poster session at the 9th international conference on Alzheimer’s disease and related disorders. Alzheimer’s Association Conference, Philadelphia, July 17–22.

    Google Scholar 

  • Kovtun, M., Akushevich, I., Manton, K. G., & Tolley, H. D. (2007). Linear latent structure analysis: Mixture distribution models with linear constraints. Statistical Methodology, 4(1), 90–110.

    Article  Google Scholar 

  • Kovtun, M., Akushevich, I., & Yashin, A. I. (2014). On identifiability of mixtures of independent distribution laws. ESAIM: Probability and Statistics, PS 18, 207–232.

    Article  Google Scholar 

  • Kramer, M. (1980). The rising pandemic of mental disorders and associated chronic diseases and disabilities. Acta Psychiatrica Scandinavica, 62(Suppl. 285), 382–397.

    Article  Google Scholar 

  • Kuhn, H. W., & Tucker, A. W. (1951). Nonlinear programming. In J. Neyman (Ed.), Proceedings of the second Berkeley symposium on mathematical statistics and probability (pp. 481–492). Berkeley: University of California Press.

    Google Scholar 

  • Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22(1), 79–86.

    Article  Google Scholar 

  • Lanza, S. T., Collins, L. M., Lemmon, D. R., & Schafer, J. L. (2007). PROC LCA: A SAS procedure for latent class analysis. Structural Equation Modeling, 14(4), 671–694.

    Article  Google Scholar 

  • Lawton, M. P., & Brody, E. P. (1969). Assessment of older people: Self-maintaining and instrumental activities of daily living. The Gerontologist, 9(3), 179–186.

    Article  Google Scholar 

  • Lee, J., Kinosian, B., Stallard, E., Woodbury, M., Berzon, R., Zbrosek, A., & Glick, H. (1998). A comparison of the Mini-Mental State Exam and the Short Portable Mental Status Questionnaire in Alzheimer’s disease (abstract). Journal of the American Geriatrics Society, 46(9), S97.

    Google Scholar 

  • Mak, T. K. (1982). Estimation in the presence of incidental parameters. Canadian Journal of Statistics, 10(2), 121–132.

    Article  Google Scholar 

  • Manton, K. G., & Gu, X. (2001). Changes in the prevalence of chronic disability in the United States black and nonblack population above age 65 from 1982 to 1999. Proceedings of the National Academy of Sciences, 98(11), 6354–6359.

    Article  Google Scholar 

  • Manton, K. G., Stallard, E., & Woodbury, M. A. (1991). A multivariate event history model based upon fuzzy states: Estimation from longitudinal surveys with informative nonresponse. Journal of Official Statistics, 7(3), 261–293.

    Google Scholar 

  • Manton, K. G., Stallard, E., & Singer, B. (1992). Projecting the future size and health status of the US elderly population. International Journal of Forecasting, 8(3), 433–458.

    Article  Google Scholar 

  • Manton, K. G., Woodbury, M. A., & Tolley, H. D. (1994). Statistical applications using fuzzy sets. New York: Wiley.

    Google Scholar 

  • Manton, K. G., Corder, L. S., & Stallard, E. (1997). Chronic disability trends in elderly United States populations: 1982–1994. Proceedings of the National Academy of Sciences, 94(6), 2593–2598.

    Article  Google Scholar 

  • McCulloch, R. E. (1988). Information and the likelihood function in exponential families. The American Statistician, 42(1), 73–75.

    Google Scholar 

  • Nagi, S. Z. (1976). An epidemiology of disability among adults on the United States. Milbank Memorial Fund Quarterly Health and Society, 54(4), 439–467.

    Article  Google Scholar 

  • Orchard, R., & Woodbury, M. A. (1971). A missing information principle: Theory and applications. In L. M. Le Cam, J. Neyman, & E. L. Scott (Eds.), Proceedings of the sixth Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 697–715). Berkeley: University of California Press.

    Google Scholar 

  • Pfeiffer, E. (1975). A short portable mental status questionnaire for the assessment of organic brain deficit in elderly patients. Journal of the American Geriatrics Society, 23(10), 433–441.

    Article  Google Scholar 

  • Portrait, F., Lindeboom, M., & Deeg, D. (2001). Life expectancies in specific health states: Results from a joint model of health status and mortality of older persons. Demography, 38(4), 525–536.

    Article  Google Scholar 

  • Pressley, J. C., Trott, C., Tang, M., Durkin, M., & Stern, Y. (2003). Dementia in community-dwelling elderly patients: A comparison of survey data, medicare claims, cognitive screening, reported symptoms, and activity limitations. Journal of Clinical Epidemiology, 56(9), 896–905.

    Article  Google Scholar 

  • Raftery, A. E. (1995). Bayesian model selection in social research. Sociological Methodology, 25, 111–163.

    Article  Google Scholar 

  • Rao, C. R. (1958). Maximum likelihood estimation for the multinomial distribution with infinite number of cells. Sankhyā: The Indian Journal of Statistics, 20(3/4), 211–218.

    Google Scholar 

  • Razlighi, Q. R., Stallard, E., Brandt, J., Blacker, D., Albert, M., Scarmeas, N., Kinosian, B., Yashin, A. I., & Stern, Y. (2014). A new algorithm for predicting time to disease endpoints in Alzheimer’s disease patients. Journal of Alzheimer’s Disease, 38(3), 661–668.

    Google Scholar 

  • Reisberg, B., Ferris, S. H., de Leon, M. J., & Crook, T. (1982). The global deterioration scale for assessment of primary degenerative dementia. American Journal of Psychiatry, 139(9), 1136–1139.

    Article  Google Scholar 

  • Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3), 581–592.

    Article  Google Scholar 

  • Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461–464.

    Article  Google Scholar 

  • Seplaki, C. L., Goldman, N., Weinstein, M., & Lin, Y. (2006). Measurement of cumulative physiological dysregulation in an older population. Demography, 43(1), 165–183.

    Article  Google Scholar 

  • Stallard, E. (2007). Trajectories of morbidity, disability, and mortality among the U.S. elderly population: Evidence from the 1984–1999 NLTCS. North American Actuarial Journal, 11(3), 16–53.

    Article  Google Scholar 

  • Stallard, E., Kinosian, B., Zbrozek, A. S., Yashin, A. I., Glick, H. A., & Stern, Y. (2010). Estimation and validation of a multi-attribute model of Alzheimer’s disease progression. Medical Decision Making, 30(6), 625–638.

    Article  Google Scholar 

  • Stern, Y., Albert, M., Brandt, J., Jacobs, D. M., Tang, M. X., Marder, K., Bell, K., Sano, M., Devanand, D. P., Bylsma, F., & Lafleche, G. (1994). Utility of extrapyramidal signs and psychosis as predictors of cognitive and functional decline, nursing home admission, and death in Alzheimer’s disease: Prospective analyses from the predictors study. Neurology, 44 (12), 2300–2307.

    Article  Google Scholar 

  • Stern, Y., Liu, X., Albert, M., Brandt, J., Jacobs, D. M., Del Castillo-Castenada, C., Marder, K., Bell, K., Sano, M., Bylsma, F., Lafleche, G., & Tsai, W. Y. (1996). Application of a growth curve approach to modeling the progression of Alzheimer’s disease. Journal of Gerontology: Medical Sciences, 51A(4), M179–M184.

    Google Scholar 

  • Stern, Y., Tang, M. X., Albert, M. S., Brandt, J., Jacobs, D. M., Bell, K., Marder, K., Sano, M., Devanand, D., Albert, S. M., Bylsma, F., & Tsai, W. Y. (1997). Predicting time to nursing home care and death in individuals with Alzheimer’s disease. JAMA, 277(10), 806–812.

    Article  Google Scholar 

  • Tarone, R. E., & Gruenhage, G. (1975). A note on the uniqueness of roots of the likelihood equations for vector-valued parameters. Journal of the American Statistical Association, 70 (352), 903–904.

    Article  Google Scholar 

  • Taylor, D. H., Fillenbaum, G. G., & Ezell, M. E. (2002). The accuracy of Medicare claims data in identifying Alzheimer’s disease. Journal of Clinical Epidemiology, 55(9), 929–937.

    Article  Google Scholar 

  • Taylor, D. H., Sloan, F. A., & Doraiswamy, P. M. (2004). Marked increase in Alzheimer’s disease identified in Medicare claims records between 1991 and 1999. Journal of Gerontology: Medical Sciences, 59A(7), M762–M766.

    Google Scholar 

  • Tolley, H. D., & Manton, K. G. (1992). Large sample properties of estimates of a discrete Grade of Membership model. Annals of the Institute of Statistical Mathematics, 44(1), 85–95.

    Article  Google Scholar 

  • Vaupel, J. W., Manton, K. G., & Stallard, E. (1979). The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography, 16(3), 439–454.

    Article  Google Scholar 

  • Wachter, K. W. (1999). Grade of Membership models in low dimensions. Statistical Papers, 40 (4), 439–457.

    Article  Google Scholar 

  • Wald, A. (1948). Estimation of a parameter when the number of unknown parameters increases indefinitely with the number of observations. The Annals of Mathematical Statistics, 19(2), 220–227.

    Article  Google Scholar 

  • Wellner, J. A. (1981). A Glivenko-Cantelli theorem for empirical measures of independent but non-identically distributed random variables. Stochastic Processes and Their Applications, 11 (3), 309–312.

    Article  Google Scholar 

  • Wieland, D., Kinosian, B., Stallard, E., & Boland, R. (2013). Does Medicaid pay more to a program of all-inclusive care for the elderly (PACE) than for fee-for-service long-term care? Journal of Gerontology: Medical Sciences, 68(1), M47–M55.

    Google Scholar 

  • Wilks, S. S. (1938). The large-sample distribution of the likelihood ratio for testing composite hypotheses. Annals of Mathematical Statistics, 9(1), 60–62.

    Article  Google Scholar 

  • Woodbury, M. A., & Clive, J. (1974). Clinical pure types as a fuzzy partition. Journal of Cybernetics, 4(3), 111–121.

    Article  Google Scholar 

  • Woodbury, M. A., Clive, J., & Garson, A. (1978). Mathematical typology: A Grade of Membership technique for obtaining disease definition. Computers and Biomedical Research, 11(3), 277–298.

    Article  Google Scholar 

  • Woodbury, M. A., Corder, L. S., & Manton, K. G. 1993. Change over time: Observational state, missing data, and repeated measures in the Grade of Membership model. In Proceedings of the survey research methods section, American Statistical Association (1993) (Vol. II, pp. 888–891). Alexandria: American Statistical Association.

    Google Scholar 

Download references

Acknowledgements

Support for the research presented in this chapter was provided by the National Institute on Aging, through grant numbers P01-AG017937, P01-AG043352, U01-AG007198, U01-AG023712, R01-AG007370, and R01-AG046860; and by the Department of Veterans Affairs’ Geriatric and Extended Care Data Analysis Center, through an IPA contract. David L. Straley provided programming support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eric Stallard .

Appendix

Appendix

1.1 Synthesis of Known Results Regarding the Consistency of the General (Cross-Sectional) Empirical GoM Model

Haberman’s (1995) challenge was to identify conditions under which the conditional GoM likelihood estimator is or is not consistent; he cautioned that this would be “very difficult” to do (Haberman 1995, p. 1132). Wachter (1999) responded to this challenge by recasting the conditional GoM model in a framework based on concepts of dimensionality reduction; he commented that the theorems needed to respond directly to Haberman’s consistency challenge were not “readily available” so that a direct response would take one “quickly into uncharted territory” (Wachter 1999, p. 441).

We agree with Haberman about the challenge being very difficult but we disagree with Wachter that the territory is almost completely uncharted. Indeed, our review of the statistical literature indicates that the existing theorems are highly informative. They allow unambiguous determination of consistency/inconsistency for most forms of GoM and they provide substantial insight into the issues to be resolved in such determinations. We emphasize that, for cases where the maximum likelihood estimator is inconsistent, the existing theorems (e.g., Huber 1967, Theorem 1) indicate that the resulting maximum likelihood estimates are likely to be optimal (or approximately so) in the sense that they are the closest possible to the true parameter values using the Kullback-Leibler information criterion (i.e., relative entropy, or divergence) as the distance measure (Kullback and Leibler 1951). As explained below, we conjecture that a generalization of Huber’s (1967) Theorem 1 could apply to a form of GoM in which the empirical GoM-score mixing distribution is used to create an empirical marginal likelihood with the same set of λ- and g-parameters as in conditional GoM. If proven true, this would move the empirical cross-sectional GoM model into the mainstream statistical literature, thereby extending the range of applications far beyond the dimensionality-reduction applications considered by Wachter (1999).

We summarize the relevant literature, existing theorems, and implications for consistency in the form of 15 observations. Our goal is to document the findings in one accessible location and to stimulate further work on proving our conjecture:

  1. 1.

    Wald (1948, Theorem 2.1) provided necessary and sufficient conditions for the existence of a uniformly consistent estimator of a parameter (like λ; but scalar in Wald’s case) in the presence of a set of vectors of incidental parameters (like the g-parameters). Tolley and Manton’s (1992) proof of consistency for the marginal GoM likelihood for fixed J implies that conditional GoM meets Wald’s existence conditions. Thus, marginal GoM was shown to be consistent; the consistency of conditional GoM was not addressed.

  2. 2.

    Tolley and Manton’s (1992) marginal GoM likelihood is difficult to use in practice because it requires one to specify the form of the mixing distribution and this is generally unknown. Indeed, one major reason for performing a GoM analysis is to discover the form of this mixing distribution.

  3. 3.

    Kovtun et al. (2007) commented that an empirical estimator of the mixing distribution of the g-parameters can be formed directly from the estimates of the GoM scores with each individual providing a unit contribution to the histogram of the mixing distribution; in this case, they claimed that the empirical distribution converges to the true mixing distribution as J, along with N, goes to infinity. However, they did not provide a proof of convergence for this case. We refer to the marginal GoM likelihood using the empirical estimator in place of the true mixing distribution as the empirical marginal GoM likelihood.

  4. 4.

    Mak (1982, Theorem 2.1) implies that conditional and empirical marginal GoM likelihoods with fixed J and increasing N will yield estimators that converge to points in the λ-parameter space that generally differ from the true λ-parameter values; hence the associated estimators are not consistent.

  5. 5.

    Mak (1982, Theorem 2.1) also implies that conditional and empirical marginal GoM likelihoods with increasing J but fixed N will yield estimators that converge to points in the g-parameter space that generally differ from the true g-parameter values. Thus, the associated estimators are also not consistent for this case.

  6. 6.

    It follows that N and J must both go to infinity for consistency to be established for the conditional and empirical marginal GoM estimators; in this case, if the empirical mixing distribution converges to the true mixing distribution, then the empirical marginal GoM likelihood estimator will yield consistent estimates of the λ- and g-parameters, without prior specification of the form of the mixing distribution.

  7. 7.

    Substantial insight into the empirical marginal GoM model can be gained by letting J go to infinity first, and then considering the behavior of the model as N goes to infinity. Letting J go to infinity means that each observed data vector, x i , follows a multinomial distribution with an uncountably infinite number of cells representing all combinations of response outcomes for a countably infinite number of variables (see Feller 1971, p. 123, Theorem 1). Hence, under the general GoM model, each cell, c, will have a probability \( {\pi}_{ic}=\underset{J\to \infty }{ \lim }{\displaystyle \prod_{j=1}^J{\displaystyle \sum_{k=1}^K{g}_{ik}{\lambda}_{kj{l}_{jc}}}} \) for each individual i and a marginal probability \( {\pi}_c^0=E\left({\pi}_{ic}\right) \) in the population, where the expectation is taken with respect to the GoM-score distribution. Thus, as J goes to infinity the set {π 0 c } defines a multinomial distribution with a countably infinite number of λ- and g-parameters and an uncountably infinite number of cells (i.e., “points”). The presence of an uncountably infinite number of cells introduces several technical problems in defining countably additive probability measures for this distribution, but standard solutions are well-known (e.g., see Billingsley 1986). Three properties of this distribution are relevant: (1) the observed frequency distribution provides a consistent unrestricted (i.e., nonparametric) maximum likelihood estimator of {π 0 c } (by a generalization of the Glivenko-Cantelli Theorem; Wellner (1981, Theorem 1)—see Gaenssler and Wellner (1981) for discussion); (2) the entropy of {π 0 c } becomes infinite (because variable-specific entropies are additive over j, and J becomes infinite) (Cover and Thomas 1991, Theorem 2.6.6); and (3) the distribution {π 0 c } becomes continuous (because, at the limit, no cell c carries positive probability mass; see Feller 1971, p. 137–138).

  8. 8.

    The conditional GoM likelihood is an “empirical estimator” in the sense that the GoM scores are directly represented via the g-parameters without consideration of a mixing distribution, and more importantly, without prior specification of the form of the mixing distribution. It can be shown that the empirical marginal GoM likelihood for fixed N and increasing J converges to a form proportional to the conditional GoM likelihood. Hence, the estimates under the conditional GoM likelihood will converge to a limit point as J goes to infinity that is the same as that of the estimates under the empirical marginal GoM likelihood: if the empirical marginal GoM likelihood estimator is consistent for infinite J, then the conditional GoM likelihood estimator will be likewise consistent. This convergence property implies that the conditional GoM likelihood estimator will provide a good approximation to the empirical marginal GoM likelihood estimator for large J.

  9. 9.

    Rao (1958, Assumption A1) provided a sufficient condition for the uniform consistency of the restricted (i.e., parametric) maximum likelihood estimator for the infinite multinomial distribution as N goes to infinity: the entropy of the distribution {π 0 c } must be finite. Rao (1958) emphasized that while this condition is not necessary for consistency, it is sufficient. Unfortunately, as noted in Observation 7, this condition does not hold for GoM. Nonetheless, given that the unrestricted maximum likelihood estimator is known to be consistent for the infinite multinomial distribution as N goes to infinity, it at least seems plausible that the restricted maximum likelihood estimator may also be consistent.

  10. 10.

    To complete our synthesis, we refer again to Mak (1982, Theorem 2.1), from which it follows that the restricted maximum likelihood estimator for GoM will converge to some point in the λ-,g-parameter space. What point is that? Mak’s (1982) Theorem 2.1 does not provide an answer, but it is highly likely that Huber’s (1967) Theorem 1 does, and if it does, then: it will converge to the unique point in the λ-,g-parameter space that minimizes the relative entropy (i.e., Kullback-Leibler divergence) between the restricted and unrestricted models.

  11. 11.

    See McCulloch (1988) and Freedman (2006) for non-technical discussion of this result. Note that the use of relative entropies resolves the “problem” in Observations 7 and 9 that the entropy of {π 0 c } becomes infinite for both the restricted and unrestricted models. If the restricted GoM model is true (i.e., is the correct model), then the proof of a generalized form of Huber’s (1967) Theorem 1 will need to identify the conditions under which the parameter estimates for the restricted model converge to the true values as N and J go to infinity.

  12. 12.

    These must be the same values as obtained for the unrestricted model.

  13. 13.

    As written, Huber’s (1967) assumptions for his Theorem 1 require that a sequence of maximum likelihood estimators can be formed for each N as N goes to infinity: any sequence will do. No consideration, however, was given to forming a second asymptotic sequence for J as J goes to infinity. Such consideration would clearly require some generalization of Huber’s assumptions, which is what is needed to prove our conjecture. This would only work if, in fact, such sequences exist. Thus, we need to consider how at least one sequence of maximum likelihood estimators could be formed for combinations of N and J such that both N and J could go to infinity.

  14. 14.

    Before doing so, we first need to note that Wald (1948, Theorem 3.1) provided an additional condition for consistency which implies that the total amount of Fisher information in the empirical marginal GoM model must go to infinity as N and J go to infinity. Second, Kovtun et al. (2014, Theorem 5.4) provided additional conditions that restrict the set of admissible variables to those that yield identifiable mixture distributions with a property that they term “∞-stability.” Intuitively, this restricts the set of admissible variables to some well-defined measurement domain, which should not be a serious restriction for most substantive applications. We assume that these conditions can be met, in theory, by selecting cases and variables such that the associated Hessian matrices (i.e., the “observed” Fisher information matrices) converge to block diagonal form with unbounded positive-definite diagonal blocks as N and J go to infinity. Hence, we assume that a sequence of maximum likelihood estimators that satisfy the requirements of Observation 13 can be formed using the algorithm in Sect. 17.2.8, which is justified by the convergence property in Observation 8, by letting N and J increase in fixed ratios with variables selected so that the diagonal terms of the Hessian matrices are unbounded. Convergence to block diagonal form as N and J increase follows from the structure of Eqs. (17.12) and (17.13): the only nonzero terms outside the diagonal blocks correspond to the cross-derivatives of the λ- and g-parameters and these contain only one additive term, independent of N and J. Hence, the relative sizes of these cross-derivatives will tend to zero as N and J go to infinity. Convergence to positive-definite diagonal blocks will satisfy Kuhn-Tucker Condition 5; the inverse Hessian matrices will be used in eqn. (17.36). Akaike (1973, p. 269–270) showed the close connections between the Hessian matrix, the Fisher information matrix, and the relative entropy measures used to develop the AIC and Kullback-Leibler statistics.

  15. 15.

    The method described in Observation 14 allows one to construct a sequence of joint λ- and g-estimators for N and J that permit the conditions of Huber’s (1967) Theorem 1 to be extended from one to two sequences. If needed, one can ensure by setting N = J that the terms of the paired sequences use just a single index, say N, which would most closely match the existing form of Huber’s assumptions. It remains to provide a precise specification of conditions for the asymptotic convergence of these joint sequences and to rigorously determine the changes needed in each step of Huber’s proof. Given that the unrestricted maximum likelihood estimator is known to be consistent (Wellner 1981), we expect that it will be possible to specify such conditions; this expectation forms the basis of our conjecture. An essential part of Huber’s proof is the assumption that the restricted maximum likelihood estimates are unique; Mak (1982, Theorem 2.1) provides conditions that justify this assumption for the GoM model and these could be incorporated into the generalized Huber theorem. Then, if the GoM model is true, the uniqueness of the limit point would ensure that the restricted and unrestricted maximum likelihood estimators would both tend to the same limit as N and J go to infinity, in which case consistency would be proven. The Kullback-Leibler divergence would converge to zero under these same conditions. For the case where the GoM model is true but one of N or J is not infinite, it would follow from the original Huber argument that the Kullback-Leibler divergence would be minimized, confirming our conjecture in its entirety.

The above synthesis clarifies Manton et al.’s (1994, p. 24) statement that the parameters obtained using the conditional GoM likelihood estimator “asymptotically maximize” the marginal GoM likelihood. Further work is needed to generalize Huber’s (1967) Theorem 1 to a form directly applicable to the empirical marginal GoM estimator and to determine for practical applications how large (or small) J needs to be for the approximation in Observation 8 to apply to conditional GoM for given sizes of N and configurations of the empirical mixing distribution. Chapter 18 (Sect. 18.4.3) discusses alternative approaches based on linear latent structure (LLS) analysis to establishing conditions for consistent and asymptotically normal estimators of the λ- and g-parameters for the conditional GoM likelihood—suggesting that the consistency issue can best be completely resolved by several modes of attack. Kovtun et al. (2007) showed that J-values in the range 250–1000 were sufficient to obtain good estimates of the empirical mixing distributions for the generalized GoM scores used in the LLS model for N = 10,000. This suggests that J-values substantially below this range may have acceptable performance characteristics when considered in the context of the associated Kullback-Leibler divergences.

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media B.V.

About this chapter

Cite this chapter

Stallard, E., Sloan, F.A. (2016). Analysis of the Natural History of Dementia Using Longitudinal Grade of Membership Models. In: Biodemography of Aging. The Springer Series on Demographic Methods and Population Analysis, vol 40. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-7587-8_17

Download citation

  • DOI: https://doi.org/10.1007/978-94-017-7587-8_17

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-017-7585-4

  • Online ISBN: 978-94-017-7587-8

  • eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics