Skip to main content
Erschienen in: Prevention Science 3/2015

01.04.2015

Assessing the Generalizability of Randomized Trial Results to Target Populations

verfasst von: Elizabeth A. Stuart, Catherine P. Bradshaw, Philip J. Leaf

Erschienen in: Prevention Science | Ausgabe 3/2015

Einloggen, um Zugang zu erhalten

Abstract

Recent years have seen increasing interest in and attention to evidence-based practices, where the “evidence” generally comes from well-conducted randomized trials. However, while those trials yield accurate estimates of the effect of the intervention for the participants in the trial (known as “internal validity”), they do not always yield relevant information about the effects in a particular target population (known as “external validity”). This may be due to a lack of specification of a target population when designing the trial, difficulties recruiting a sample that is representative of a prespecified target population, or to interest in considering a target population somewhat different from the population directly targeted by the trial. This paper first provides an overview of existing design and analysis methods for assessing and enhancing the ability of a randomized trial to estimate treatment effects in a target population. It then provides a case study using one particular method, which weights the subjects in a randomized trial to match the population on a set of observed characteristics. The case study uses data from a randomized trial of school-wide positive behavioral interventions and supports (PBIS); our interest is in generalizing the results to the state of Maryland. In the case of PBIS, after weighting, estimated effects in the target population were similar to those observed in the randomized trial. The paper illustrates that statistical methods can be used to assess and enhance the external validity of randomized trials, making the results more applicable to policy and clinical questions. However, there are also many open research questions; future research should focus on questions of treatment effect heterogeneity and further developing these methods for enhancing external validity. Researchers should think carefully about the external validity of randomized trials and be cautious about extrapolating results to specific populations unless they are confident of the similarity between the trial sample and that target population.
Literatur
Zurück zum Zitat Bradshaw, C. P., Koth, C. W., Thornton, L. A., & Leaf, P. J. (2009). Altering school climate through school-wide positive behavioral interventions and supports: Findings from a group-randomized effectiveness trial. Prevention Science, 10, 100–115.CrossRefPubMed Bradshaw, C. P., Koth, C. W., Thornton, L. A., & Leaf, P. J. (2009). Altering school climate through school-wide positive behavioral interventions and supports: Findings from a group-randomized effectiveness trial. Prevention Science, 10, 100–115.CrossRefPubMed
Zurück zum Zitat Bradshaw, C. P., Waasdorp, T. E., & Leaf, P. J. (2012). Effects of school-wide positive behavioral interventions and supports on child behavior problems. Pediatrics, 130, 1136–1145.CrossRef Bradshaw, C. P., Waasdorp, T. E., & Leaf, P. J. (2012). Effects of school-wide positive behavioral interventions and supports on child behavior problems. Pediatrics, 130, 1136–1145.CrossRef
Zurück zum Zitat Braslow, J. T., Duan, N., Starks, S. L., Polo, A., Bromley, E., & Wells, K. B. (2005). Generalizability of studies on mental health treatment and outcomes, 1981–1996. Psychiatric Services, 56, 1261–1268.CrossRefPubMed Braslow, J. T., Duan, N., Starks, S. L., Polo, A., Bromley, E., & Wells, K. B. (2005). Generalizability of studies on mental health treatment and outcomes, 1981–1996. Psychiatric Services, 56, 1261–1268.CrossRefPubMed
Zurück zum Zitat Brown, C. H., Wang, W., & Sandler, I. (2008). Examining how context changes intervention impact: The use of effect sizes in multilevel mixture meta-analysis. Child Development Perspectives, 2, 198–205.PubMedCentralCrossRefPubMed Brown, C. H., Wang, W., & Sandler, I. (2008). Examining how context changes intervention impact: The use of effect sizes in multilevel mixture meta-analysis. Child Development Perspectives, 2, 198–205.PubMedCentralCrossRefPubMed
Zurück zum Zitat Cole, S. R., & Stuart, E. A. (2010). Generalizing evidence from randomized clinical trials to target populations: The ACTG-320 trial. American Journal of Epidemiology, 172, 107–115.PubMedCentralCrossRefPubMed Cole, S. R., & Stuart, E. A. (2010). Generalizing evidence from randomized clinical trials to target populations: The ACTG-320 trial. American Journal of Epidemiology, 172, 107–115.PubMedCentralCrossRefPubMed
Zurück zum Zitat Flay, B. R., Biglan, A., Boruch, R. F., Castro, F. G., Gottfredson, D., Kellam, S., et al. (2005). Standards of evidence: Criteria for efficacy, effectiveness, and dissemination. Prevention Science, 6, 151–175.CrossRefPubMed Flay, B. R., Biglan, A., Boruch, R. F., Castro, F. G., Gottfredson, D., Kellam, S., et al. (2005). Standards of evidence: Criteria for efficacy, effectiveness, and dissemination. Prevention Science, 6, 151–175.CrossRefPubMed
Zurück zum Zitat Green, L. W., & Glasgow, R. E. (2006). Evaluating the relevance, generalization, and applicability of research: Issues in external validation and translation methodology. Evaluation & the Health Professions, 29, 126–153.CrossRef Green, L. W., & Glasgow, R. E. (2006). Evaluating the relevance, generalization, and applicability of research: Issues in external validation and translation methodology. Evaluation & the Health Professions, 29, 126–153.CrossRef
Zurück zum Zitat Hansen, B. B. (2008). The prognostic analogue of the propensity score. Biometrika, 95, 481–488.CrossRef Hansen, B. B. (2008). The prognostic analogue of the propensity score. Biometrika, 95, 481–488.CrossRef
Zurück zum Zitat Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. Orlando: Academic. Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. Orlando: Academic.
Zurück zum Zitat Holt, D., & Smith, T. M. F. (1979). Post stratification. Journal of the Royal Statistical Society, Series A, 142, 33–46.CrossRef Holt, D., & Smith, T. M. F. (1979). Post stratification. Journal of the Royal Statistical Society, Series A, 142, 33–46.CrossRef
Zurück zum Zitat Horner, R. H., Sugai, G., Smolkowski, K., Eber, L., Nakasato, J., Todd, A. W., et al. (2009). A randomized, wait-list controlled effectiveness trial assessing school-wide positive behavior support in elementary schools. Journal of Positive Behavior Interventions, 11, 133–144.CrossRef Horner, R. H., Sugai, G., Smolkowski, K., Eber, L., Nakasato, J., Todd, A. W., et al. (2009). A randomized, wait-list controlled effectiveness trial assessing school-wide positive behavior support in elementary schools. Journal of Positive Behavior Interventions, 11, 133–144.CrossRef
Zurück zum Zitat Horvitz, D., & Thompson, D. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663–685.CrossRef Horvitz, D., & Thompson, D. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663–685.CrossRef
Zurück zum Zitat Humphreys, K., Weingardt, K. R., & Harris, A. H. S. (2007). Influence of subject eligibility criteria on compliance with national institutes of health guidelines for inclusion of women, minorities, and children in treatment research. Alcoholism: Clinical and Experimental Research, 31, 988–995.CrossRef Humphreys, K., Weingardt, K. R., & Harris, A. H. S. (2007). Influence of subject eligibility criteria on compliance with national institutes of health guidelines for inclusion of women, minorities, and children in treatment research. Alcoholism: Clinical and Experimental Research, 31, 988–995.CrossRef
Zurück zum Zitat Imai, K., King, G., & Stuart, E. A. (2008). Misunderstandings between experimentalists and observationalists about causal inference. Journal of the Royal Statistical Society, Series A, 171, 481–502.CrossRef Imai, K., King, G., & Stuart, E. A. (2008). Misunderstandings between experimentalists and observationalists about causal inference. Journal of the Royal Statistical Society, Series A, 171, 481–502.CrossRef
Zurück zum Zitat Kang, J. D., & Schafer, J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science, 22, 523–539.CrossRef Kang, J. D., & Schafer, J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science, 22, 523–539.CrossRef
Zurück zum Zitat Koth, C. W., Bradshaw, C. P., & Leaf, P. J. (2009). Teacher observation of classroom adaptation-checklist: Development and factor structure. Measurement and Evaluation in Counseling and Development, 42, 15–30.CrossRef Koth, C. W., Bradshaw, C. P., & Leaf, P. J. (2009). Teacher observation of classroom adaptation-checklist: Development and factor structure. Measurement and Evaluation in Counseling and Development, 42, 15–30.CrossRef
Zurück zum Zitat Murray, D. M. (1998). Design and analysis of group-randomized trials. New York: Oxford. Murray, D. M. (1998). Design and analysis of group-randomized trials. New York: Oxford.
Zurück zum Zitat Nature. (2010). Editorial: Putting gender on the agenda. Nature, 465, 665. Nature. (2010). Editorial: Putting gender on the agenda. Nature, 465, 665.
Zurück zum Zitat O’Muircheartaigh, C., & Hedges, L. V. (2014). Generalizing from unrepresentative experiments: A stratified propensity score approach. Journal of the Royal Statistical Society: Series C: Applied Statistics. doi:10.1111/rssc.12037. Early view online. O’Muircheartaigh, C., & Hedges, L. V. (2014). Generalizing from unrepresentative experiments: A stratified propensity score approach. Journal of the Royal Statistical Society: Series C: Applied Statistics. doi:10.​1111/​rssc.​12037. Early view online.
Zurück zum Zitat Olsen, R., Bell, S., Orr, L., & Stuart, E. A. (2013). External validity in policy evaluations that choose sites purposively. Journal of Policy Analysis and Management, 32, 107–121.PubMedCentralCrossRefPubMed Olsen, R., Bell, S., Orr, L., & Stuart, E. A. (2013). External validity in policy evaluations that choose sites purposively. Journal of Policy Analysis and Management, 32, 107–121.PubMedCentralCrossRefPubMed
Zurück zum Zitat Pan, Q., & Schaubel, D. E. (2009). Evaluating bias correction in weighted proportional hazards regression. Lifetime Data Analysis, 15, 120–146.PubMedCentralCrossRefPubMed Pan, Q., & Schaubel, D. E. (2009). Evaluating bias correction in weighted proportional hazards regression. Lifetime Data Analysis, 15, 120–146.PubMedCentralCrossRefPubMed
Zurück zum Zitat Pas, E., Bradshaw, C. P., & Mitchell, M. M. (2011). Examining the validity of office discipline referrals as an indicator of student behavior problems. Psychology in the Schools, 48, 541–555.CrossRef Pas, E., Bradshaw, C. P., & Mitchell, M. M. (2011). Examining the validity of office discipline referrals as an indicator of student behavior problems. Psychology in the Schools, 48, 541–555.CrossRef
Zurück zum Zitat Prevost, T. C., Abrams, K. R., & Jones, D. R. (2000). Hierarchical models in generalized synthesis of evidence: An example based on studies of breast cancer screening. Statistics in Medicine, 19, 3359–3376.CrossRefPubMed Prevost, T. C., Abrams, K. R., & Jones, D. R. (2000). Hierarchical models in generalized synthesis of evidence: An example based on studies of breast cancer screening. Statistics in Medicine, 19, 3359–3376.CrossRefPubMed
Zurück zum Zitat R Core Team. (2013). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from the R project website: http://www.R-project.org/. R Core Team. (2013). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from the R project website: http://​www.​R-project.​org/​.
Zurück zum Zitat Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2004). Generalized multilevel structural equation modelling. Psychometrika, 69, 167–190.CrossRef Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2004). Generalized multilevel structural equation modelling. Psychometrika, 69, 167–190.CrossRef
Zurück zum Zitat Raudenbush, S. W., Bryk, A. S., Cheong, Y. F., Congdon, R. T., Jr., & du Toit, M. (2011). Hierarchical linear and nonlinear modeling (HLM7). Lincolnwood: Scientific Software International, Inc. Raudenbush, S. W., Bryk, A. S., Cheong, Y. F., Congdon, R. T., Jr., & du Toit, M. (2011). Hierarchical linear and nonlinear modeling (HLM7). Lincolnwood: Scientific Software International, Inc.
Zurück zum Zitat Rosenbaum, P. R. (1987). Model-based direct adjustment. Journal of the American Statistical Association, 82, 387–394.CrossRef Rosenbaum, P. R. (1987). Model-based direct adjustment. Journal of the American Statistical Association, 82, 387–394.CrossRef
Zurück zum Zitat Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41–55.CrossRef Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41–55.CrossRef
Zurück zum Zitat Rothwell, P. M. (2005). External validity of randomised controlled trials: “To whom do the results of this trial apply?”. Lancet, 365, 82–93.CrossRefPubMed Rothwell, P. M. (2005). External validity of randomised controlled trials: “To whom do the results of this trial apply?”. Lancet, 365, 82–93.CrossRefPubMed
Zurück zum Zitat Rubin, D. B. (2001). Using propensity scores to help design observational studies: Application to the tobacco litigation. Health Services & Outcomes Research Methodology, 2, 169–188.CrossRef Rubin, D. B. (2001). Using propensity scores to help design observational studies: Application to the tobacco litigation. Health Services & Outcomes Research Methodology, 2, 169–188.CrossRef
Zurück zum Zitat Schochet, P. Z., Burghardt, J., & McConnell, S. (2008). Does job corps work? impact findings from the national job corps study. American Economic Review, 98, 1864–86.CrossRef Schochet, P. Z., Burghardt, J., & McConnell, S. (2008). Does job corps work? impact findings from the national job corps study. American Economic Review, 98, 1864–86.CrossRef
Zurück zum Zitat Shadish, W. R. (1995). The logic of generalization: Five principles common to experiments and ethnographies. American Journal of Community Psychology, 23, 419–428.CrossRef Shadish, W. R. (1995). The logic of generalization: Five principles common to experiments and ethnographies. American Journal of Community Psychology, 23, 419–428.CrossRef
Zurück zum Zitat Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton Mifflin Company. Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton Mifflin Company.
Zurück zum Zitat StataCorp. (2011). Stata Statistical Software: Release 12. College Station, TX: StataCorp LP. StataCorp. (2011). Stata Statistical Software: Release 12. College Station, TX: StataCorp LP.
Zurück zum Zitat Stirman, S. W., Derubeis, R. J., Crits-Christoph, P., & Rothman, A. (2005). Can the randomized controlled trial literature generalize to nonrandomized patients? Journal of Consulting and Clinical Psychology, 73, 127–35. PMID: 15709839.CrossRefPubMed Stirman, S. W., Derubeis, R. J., Crits-Christoph, P., & Rothman, A. (2005). Can the randomized controlled trial literature generalize to nonrandomized patients? Journal of Consulting and Clinical Psychology, 73, 127–35. PMID: 15709839.CrossRefPubMed
Zurück zum Zitat Stuart, E. A., Cole, S., Bradshaw, C. P., & Leaf, P. J. (2011). The use of propensity scores to assess the generalizability of results from randomized trials. Journal of the Royal Statistical Society, Series A, 174, 369–386.CrossRef Stuart, E. A., Cole, S., Bradshaw, C. P., & Leaf, P. J. (2011). The use of propensity scores to assess the generalizability of results from randomized trials. Journal of the Royal Statistical Society, Series A, 174, 369–386.CrossRef
Zurück zum Zitat Sugai, G., Horner, R., & Gresham, F. (2001). Behaviorally effective school environments. In M. Shinn, G. Stoner, & H. Walker (Eds.), Interventions for academic and behavior problems: Preventive and remedial approaches (pp. 315–350). Silver Spring: National Association of School Psychiatrists. Sugai, G., Horner, R., & Gresham, F. (2001). Behaviorally effective school environments. In M. Shinn, G. Stoner, & H. Walker (Eds.), Interventions for academic and behavior problems: Preventive and remedial approaches (pp. 315–350). Silver Spring: National Association of School Psychiatrists.
Zurück zum Zitat Sutton, A. J., & Higgins, J. P. (2008). Recent developments in meta-analysis. Statistics in Medicine, 27, 625–650.CrossRefPubMed Sutton, A. J., & Higgins, J. P. (2008). Recent developments in meta-analysis. Statistics in Medicine, 27, 625–650.CrossRefPubMed
Zurück zum Zitat Tipton, E. (2013). Improving generalizations from experiments using propensity score subclassification: Assumptions, properties, and contexts. Journal of Educational and Behavioral Statistics, 38, 239–266. Tipton, E. (2013). Improving generalizations from experiments using propensity score subclassification: Assumptions, properties, and contexts. Journal of Educational and Behavioral Statistics, 38, 239–266.
Zurück zum Zitat Tipton, E., Hedges, L. V., Vaden-Kiernan, M., Borman, G. D., Sullivan, K., & Caverly, S. (2014). Sample selection in randomized experiments: A new method using propensity score stratified sampling. Journal of Research on Educational Effectiveness, 7, 114–135.CrossRef Tipton, E., Hedges, L. V., Vaden-Kiernan, M., Borman, G. D., Sullivan, K., & Caverly, S. (2014). Sample selection in randomized experiments: A new method using propensity score stratified sampling. Journal of Research on Educational Effectiveness, 7, 114–135.CrossRef
Zurück zum Zitat Turner, R. M., Spiegelhalter, D. J., Smith, G. C. S., & Thompson, S. G. (2009). Bias modelling in evidence synthesis. Journal of the Royal Statistical Society, Series A, 172, 21–47.CrossRef Turner, R. M., Spiegelhalter, D. J., Smith, G. C. S., & Thompson, S. G. (2009). Bias modelling in evidence synthesis. Journal of the Royal Statistical Society, Series A, 172, 21–47.CrossRef
Zurück zum Zitat U.S. Department of Education. (2009). The impacts of regular upward bound on postsecondary outcomes seven to nine years after scheduled high school graduation. Washington: Office of Planning, Evaluation and Policy Development, Policy and Program Studies Service. U.S. Department of Education. (2009). The impacts of regular upward bound on postsecondary outcomes seven to nine years after scheduled high school graduation. Washington: Office of Planning, Evaluation and Policy Development, Policy and Program Studies Service.
Zurück zum Zitat U.S. Department of Health and Human Services. (2010). Head start impact study final report. Washington: Office of Planning, Evaluation and Policy Development, Administration for Children and Families, Policy and Program Studies Service. U.S. Department of Health and Human Services. (2010). Head start impact study final report. Washington: Office of Planning, Evaluation and Policy Development, Administration for Children and Families, Policy and Program Studies Service.
Zurück zum Zitat Waasdorp, T. E., Bradshaw, C. P., & Leaf, P. J. (2012). The impact of schoolwide positive behavioral interventions and supports on bullying and peer rejection. Archives of Pediatrics and Adolescent Medicine, 166, 149–156.CrossRefPubMed Waasdorp, T. E., Bradshaw, C. P., & Leaf, P. J. (2012). The impact of schoolwide positive behavioral interventions and supports on bullying and peer rejection. Archives of Pediatrics and Adolescent Medicine, 166, 149–156.CrossRefPubMed
Zurück zum Zitat Westen, D. I., Stirman, S. W., & DeRubeis, R. J. (2006). Are research patients and clinical trials representative of clinical practice? In J. C. Norcross, L. E. Beutler, & R. F. Levant (Eds.), Evidence-based practices in mental health: Debate and dialogue on the fundamental questions (pp. 161–189). Washington: American Psychological Association.CrossRef Westen, D. I., Stirman, S. W., & DeRubeis, R. J. (2006). Are research patients and clinical trials representative of clinical practice? In J. C. Norcross, L. E. Beutler, & R. F. Levant (Eds.), Evidence-based practices in mental health: Debate and dialogue on the fundamental questions (pp. 161–189). Washington: American Psychological Association.CrossRef
Zurück zum Zitat Wisniewski, S., Rush, A., Nierenberg, A., Gaynes, B., Warden, D., Luther, J., et al. (2009). Can phase III trial results of antidepresseant medications be generalized to clinical practice? a STAR*D report. American Journal of Psychiatry, 166, 599–607.CrossRefPubMed Wisniewski, S., Rush, A., Nierenberg, A., Gaynes, B., Warden, D., Luther, J., et al. (2009). Can phase III trial results of antidepresseant medications be generalized to clinical practice? a STAR*D report. American Journal of Psychiatry, 166, 599–607.CrossRefPubMed
Metadaten
Titel
Assessing the Generalizability of Randomized Trial Results to Target Populations
verfasst von
Elizabeth A. Stuart
Catherine P. Bradshaw
Philip J. Leaf
Publikationsdatum
01.04.2015
Verlag
Springer US
Erschienen in
Prevention Science / Ausgabe 3/2015
Print ISSN: 1389-4986
Elektronische ISSN: 1573-6695
DOI
https://doi.org/10.1007/s11121-014-0513-z

Weitere Artikel der Ausgabe 3/2015

Prevention Science 3/2015 Zur Ausgabe