Original Article
Standardization of depression measurement: a common metric was developed for 11 self-report depression measures

https://doi.org/10.1016/j.jclinepi.2013.04.019Get rights and content

Abstract

Objectives

To provide a standardized metric for the assessment of depression severity to enable comparability among results of established depression measures.

Study Design and Setting

A common metric for 11 depression questionnaires was developed applying item response theory (IRT) methods. Data of 33,844 adults were used for secondary analysis including routine assessments of 23,817 in- and outpatients with mental and/or medical conditions (46% with depressive disorders) and a general population sample of 10,027 randomly selected participants from three representative German household surveys.

Results

A standardized metric for depression severity was defined by 143 items, and scores were normed to a general population mean of 50 (standard deviation = 10) for easy interpretability. It covers the entire range of depression severity assessed by established instruments. The metric allows comparisons among included measures. Large differences were found in their measurement precision and range, providing a rationale for instrument selection. Published scale-specific threshold scores of depression severity showed remarkable consistencies across different questionnaires.

Conclusion

An IRT-based instrument-independent metric for depression severity enables direct comparisons among established measures. The "common ruler" simplifies the interpretation of depression assessment by identifying key thresholds for clinical and epidemiologic decision making and facilitates integrative psychometric research across studies, including meta-analysis.

Introduction

What is new?

Key findings

  1. In this study, a common metric was developed for the first time for a large number of established depression measures using item response theory methods.

What this adds to what was known?
  1. To date, the variety of different scales for the assessment of certain patient-reported outcomes (PROs) seriously impairs research and communication among clinicians. Thus, standardization of PRO measurement is urgently needed.

  2. The new standardized metric for depression severity provides easy comparability of scores, measurement range, and precision among the different scales.

What is the implication and what should change now?
  1. The results offer a conjoint definition and understanding of the latent depression construct as defined by the items from a variety of established depression questionnaires.

  2. The outlined standardization approach calibrating different depression measures to a common latent metric can be applied to the assessment of other PROs as well.

Depressive disorders are severe and widespread diseases, imposing a significant burden for the individual and the society [1], [2]. Reliable tools for depression measurement are essential for case recognition [3], [4], [5], treatment monitoring [6], [7], and clinical research in general [8], [9], [10], [11], [12], [13], [14]. Today, a plethora of carefully developed and well-established self-report instruments for the assessment of depressive symptoms exist. However, scores of these instruments are not directly comparable. The heterogeneity of scale-specific metrics seriously impairs comparability across study results and complicates communication among researchers and clinicians. Pooling study results from different depression measures in quantitative reviews or meta-analyses is difficult and may even lead to biased results [15], [16]. To avoid this bias, some meta-analyses limit the selection of studies to those that use the same instrument(s) [6], [7]. However, such restrictions lead to a significant loss of information.

It is recognized that results for biomedical parameters need to be comparable across laboratory methods and facilities [17], and in our opinion, this is equally important for the measurement of patient-reported outcomes (PROs) [18], [19].

This issue has been identified earlier [15], [16], but only the recent increases in computational power have enabled the introduction of new psychometric methods in this field of health care [20], [21], [22], [23], [24], [25]. The most frequently discussed solution [26], [27], [28] to achieve a standardized metric for PROs is offered by the item response theory (IRT) [29], [30], [31], [32], [33]. Items of different established depression questionnaires can be included in one "item bank" to provide one common metric [34], [35], [36]. Some depression item banks have already been developed [37], [38], [39], [40], [41], [42], but to our knowledge, no study so far has attempted to establish a comprehensive metric to achieve comparability for a larger number of existing depression measures.

In this study, we aim to provide such a metric for some of the most established depression measures. This metric should allow the comparison of results from different instruments on one common "ruler," like using different thermometers to measure temperature on a meaningful anchored metric.

Section snippets

Sample

The study is based on secondary data analysis. The total study sample contains data from seven clinical and three general population samples. The clinical samples were consecutively drawn within clinical routines or cross-sectional studies [37], [43], [44], including in- and outpatients with mental and/or medical conditions being treated in 7 hospitals and 12 family practices across Germany. Clinical diagnoses according to ICD-10 criteria were given by health-care providers. The representative

Sample

The total sample included 33,844 respondents, comprising a clinical subsample of 23,817 in- and outpatients with mental and/or medical conditions, and a representative sample of 10,027 persons (Table 1). Almost half of the clinical sample was diagnosed with some type of depressive disorder by their treating clinicians, and 28% received the diagnosis of a major depressive disorder. One-fifth of the clinical sample did not have any mental disorder but had some medical condition.

Design

Participants

Discussion

This study identifies the common construct measured by 11 established depression questionnaires and offers a standardized metric for the assessment of depression severity jointly calibrating these instruments to a common latent metric. Results of the study are based on the broadest available sample in the field of depression measurement. The complete item bank is accessible on the journal's Web site.

Conclusion

This study identified the common latent construct measured by several established depression questionnaires, and it provided a standardized metric for the assessment of this construct jointly calibrating these instruments to a common latent metric. This metric allows the comparison of results of the different depression measures and provides an easy interpretation of its score values. We believe a more standardized assessment of PRO is essential to introduce the patient's perspective into

Acknowledgments

The authors thank Dr. Alexandra Murray (Department of Psychosomatic Medicine and Psychotherapy, University Medical Center Hamburg-Eppendorf, Germany) and Dr. Maria Orlando Edelen (RAND Corporation, Boston, MA, USA) for careful proofreading of the manuscript. The authors acknowledge Alice Bodnar (communication design, Berlin, Germany) for the design of Fig. 2. Ethical approval was not required as decided by the Ethics Committee of the Medical Registering Authority Hamburg, Germany.

References (121)

  • F. Kendel et al.

    Screening for depression: Rasch analysis of the dimensional structure of the PHQ-9 and the HADS-D

    J Affect Disord

    (2010)
  • R.C. Kessler et al.

    The epidemiology of major depressive disorder: results from the National Comorbidity Survey Replication (NCS-R)

    JAMA

    (2003)
  • T.B. Üstün et al.

    Global burden of depressive disorders in the year 2000

    Br J Psychiatry

    (2004)
  • J.H. Lichtman et al.

    Depression and coronary heart disease: recommendations for screening, referral, and treatment: a science advisory from the American Heart Association Prevention Committee of the Council on Cardiovascular Nursing, Council on Clinical Cardiology, Council on Epidemiology and Prevention, and Interdisciplinary Council on Quality of Care and Outcomes Research: endorsed by the American Psychiatric Association

    Circulation

    (2008)
  • Depression: core interventions in the management of depression in primary and secondary care

    (2004)
  • Screening for depression: recommendations and rationale

    Ann Intern Med

    (2002)
  • E. Driessen et al.

    Does pretreatment severity moderate the efficacy of psychological treatment of adult outpatient depression? A meta-analysis

    J Consult Clin Psychol

    (2010)
  • J.C. Fournier et al.

    Antidepressant drug effects and depression severity: a patient-level meta-analysis

    JAMA

    (2010)
  • J. Barth et al.

    Depression as a risk factor for mortality in patients with coronary heart disease: a meta-analysis

    Psychosom Med

    (2004)
  • L.E. Egede et al.

    Depression and all-cause and coronary heart disease mortality among adults with and without diabetes

    Diabetes Care

    (2005)
  • S.H. Golden et al.

    Examining a bidirectional association between depressive symptoms and diabetes

    JAMA

    (2008)
  • J.R. Satin et al.

    Depression as a predictor of disease progression and mortality in cancer patients: a meta-analysis

    Cancer

    (2009)
  • M.A. Whooley et al.

    Depressive symptoms, health behaviors, and risk of cardiovascular events in patients with coronary heart disease

    JAMA

    (2008)
  • L.R. Wulsin et al.

    Do depressive symptoms increase the risk for the onset of coronary disease? A systematic quantitative review

    Psychosom Med

    (2003)
  • K.W. Davidson et al.

    Assessment and treatment of depression in patients with cardiovascular disease: National Heart, Lung, and Blood Institute Working Group Report

    Psychosom Med

    (2006)
  • M.A. Puhan et al.

    Combining scores from different patient reported outcome measures in meta-analyses: when is it justified?

    Health Qual Life Outcomes

    (2006)
  • D.L. Patrick et al.

    Chapter 17: patient-reported outcomes

  • U.S. Food and Drug Administration. Guidance for industry. Patient-reported outcome measures: use in medical product...
  • H.F. Fischer et al.

    How to compare scores from different depression scales: equating the Patient Health Questionnaire (PHQ) and the ICD-10-Symptom Rating (ISR) using Item Response Theory

    Int J Methods Psychiatr Res

    (2011)
  • L.E. Gibbons et al.

    Migrating from a legacy fixed-format measure to CAT administration: calibrating the PHQ-9 to the PROMIS depression measures

    Qual Life Res

    (2011)
  • T.M. Olino et al.

    Measuring depression using item response theory: an examination of three measures of depressive symptomatology

    Int J Methods Psychiatr Res

    (2012)
  • M. Orlando et al.

    Summed-score linking using item response theory: application to depression measurement

    Psychol Assess

    (2000)
  • M.H. Trivedi et al.

    The Inventory of Depressive Symptomatology, Clinician Rating (IDS-C) and Self-Report (IDS-SR), and the Quick Inventory of Depressive Symptomatology, Clinician Rating (QIDS-C) and Self-Report (QIDS-SR) in public sector patients with mood disorders: a psychometric evaluation

    Psychol Med

    (2004)
  • D. Cella et al.

    The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH Roadmap cooperative group during its first two years

    Med Care

    (2007)
  • B.B. Reeve et al.

    Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS)

    Med Care

    (2007)
  • B.B. Reeve

    Item response theory modeling in health outcomes measurement

    Expert Rev Pharmacoecon Outcomes Res

    (2003)
  • G. Guilera et al.

    Item response theory test equating in health sciences education

    Adv Health Sci Educ Theory Pract

    (2008)
  • J.B. Bjorner et al.

    Computerized adaptive testing and item banking

  • J.S. Lai et al.

    Item banking to improve, shorten and computerize self-reported fatigue: an illustration of steps to create a core item bank from the FACIT-Fatigue Scale

    Qual Life Res

    (2003)
  • R.D. Hays et al.

    Item response theory and health outcomes measurement in the 21st century

    Med Care

    (2000)
  • J.B. Bjorner et al.

    Using item response theory to calibrate the Headache Impact Test (HIT) to the metric of traditional headache scales

    Qual Life Res

    (2003)
  • N.J. Dorans

    Linking scores from multiple health outcome instruments

    Qual Life Res

    (2007)
  • H. Fliege et al.

    Development of a computer-adaptive test for depression (D-CAT)

    Qual Life Res

    (2005)
  • T. Forkmann et al.

    Development of an item bank for the assessment of depression in persons with mental illnesses and physical diseases using Rasch analysis

    Rehabil Psychol

    (2009)
  • R.D. Gibbons et al.

    Using computerized adaptive testing to reduce the burden of mental health assessment

    Psychiatr Serv

    (2008)
  • A.J. Mitchell et al.

    Redefining diagnostic symptoms of depression using Rasch analysis: testing an item bank suitable for DSM-V and computer adaptive testing

    Aust N Z J Psychiatry

    (2011)
  • P.A. Pilkonis et al.

    Item banks for measuring emotional distress from the Patient-Reported Outcomes Measurement Information System (PROMIS(r)): depression, anxiety, and anger

    Assessment

    (2011)
  • A.B. Smith et al.

    The initial development of an item bank to assess and screen for psychological distress in cancer patients

    Psychooncology

    (2007)
  • B. Löwe et al.

    Diagnosing ICD-10 depressive episodes: superior criterion validity of the Patient Health Questionnaire

    Psychother Psychosom

    (2004)
  • A.T. Beck et al.

    Beck depression inventory. Manual

    (1993)
  • Cited by (137)

    View all citing articles on Scopus

    Conflict of interest: All authors have no financial conflicts of interest to declare.

    Funding: This research did not receive external funding. The Charité-Universitätsmedizin Berlin, five hospitals of the Schön Kliniken Company (Bad Arolsen, Bad Bramstedt, Hamburg Eilbek, Roseneck, and Starnberger See), the Heidelberg University Hospital, 12 family practices in Heidelberg, and the Leipzig University supported the study by providing the anonymous data for secondary analysis.

    View full text