Standardization of depression measurement: a common metric was developed for 11 self-report depression measures

doi:10.1016/j.jclinepi.2013.04.019

Journal of Clinical Epidemiology

Volume 67, Issue 1, January 2014, Pages 73-86

https://doi.org/10.1016/j.jclinepi.2013.04.019 Get rights and content

Abstract

Objectives

To provide a standardized metric for the assessment of depression severity to enable comparability among results of established depression measures.

Study Design and Setting

A common metric for 11 depression questionnaires was developed applying item response theory (IRT) methods. Data of 33,844 adults were used for secondary analysis including routine assessments of 23,817 in- and outpatients with mental and/or medical conditions (46% with depressive disorders) and a general population sample of 10,027 randomly selected participants from three representative German household surveys.

Results

A standardized metric for depression severity was defined by 143 items, and scores were normed to a general population mean of 50 (standard deviation = 10) for easy interpretability. It covers the entire range of depression severity assessed by established instruments. The metric allows comparisons among included measures. Large differences were found in their measurement precision and range, providing a rationale for instrument selection. Published scale-specific threshold scores of depression severity showed remarkable consistencies across different questionnaires.

Conclusion

An IRT-based instrument-independent metric for depression severity enables direct comparisons among established measures. The "common ruler" simplifies the interpretation of depression assessment by identifying key thresholds for clinical and epidemiologic decision making and facilitates integrative psychometric research across studies, including meta-analysis.

Introduction

What is new?

Key findings

•
In this study, a common metric was developed for the first time for a large number of established depression measures using item response theory methods.

What this adds to what was known?

•
To date, the variety of different scales for the assessment of certain patient-reported outcomes (PROs) seriously impairs research and communication among clinicians. Thus, standardization of PRO measurement is urgently needed.
•
The new standardized metric for depression severity provides easy comparability of scores, measurement range, and precision among the different scales.

What is the implication and what should change now?

•
The results offer a conjoint definition and understanding of the latent depression construct as defined by the items from a variety of established depression questionnaires.
•
The outlined standardization approach calibrating different depression measures to a common latent metric can be applied to the assessment of other PROs as well.

Depressive disorders are severe and widespread diseases, imposing a significant burden for the individual and the society [1], [2]. Reliable tools for depression measurement are essential for case recognition [3], [4], [5], treatment monitoring [6], [7], and clinical research in general [8], [9], [10], [11], [12], [13], [14]. Today, a plethora of carefully developed and well-established self-report instruments for the assessment of depressive symptoms exist. However, scores of these instruments are not directly comparable. The heterogeneity of scale-specific metrics seriously impairs comparability across study results and complicates communication among researchers and clinicians. Pooling study results from different depression measures in quantitative reviews or meta-analyses is difficult and may even lead to biased results [15], [16]. To avoid this bias, some meta-analyses limit the selection of studies to those that use the same instrument(s) [6], [7]. However, such restrictions lead to a significant loss of information.

It is recognized that results for biomedical parameters need to be comparable across laboratory methods and facilities [17], and in our opinion, this is equally important for the measurement of patient-reported outcomes (PROs) [18], [19].

This issue has been identified earlier [15], [16], but only the recent increases in computational power have enabled the introduction of new psychometric methods in this field of health care [20], [21], [22], [23], [24], [25]. The most frequently discussed solution [26], [27], [28] to achieve a standardized metric for PROs is offered by the item response theory (IRT) [29], [30], [31], [32], [33]. Items of different established depression questionnaires can be included in one "item bank" to provide one common metric [34], [35], [36]. Some depression item banks have already been developed [37], [38], [39], [40], [41], [42], but to our knowledge, no study so far has attempted to establish a comprehensive metric to achieve comparability for a larger number of existing depression measures.

In this study, we aim to provide such a metric for some of the most established depression measures. This metric should allow the comparison of results from different instruments on one common "ruler," like using different thermometers to measure temperature on a meaningful anchored metric.

Section snippets

Sample

The study is based on secondary data analysis. The total study sample contains data from seven clinical and three general population samples. The clinical samples were consecutively drawn within clinical routines or cross-sectional studies [37], [43], [44], including in- and outpatients with mental and/or medical conditions being treated in 7 hospitals and 12 family practices across Germany. Clinical diagnoses according to ICD-10 criteria were given by health-care providers. The representative

Sample

The total sample included 33,844 respondents, comprising a clinical subsample of 23,817 in- and outpatients with mental and/or medical conditions, and a representative sample of 10,027 persons (Table 1). Almost half of the clinical sample was diagnosed with some type of depressive disorder by their treating clinicians, and 28% received the diagnosis of a major depressive disorder. One-fifth of the clinical sample did not have any mental disorder but had some medical condition.

Design

Participants

Discussion

This study identifies the common construct measured by 11 established depression questionnaires and offers a standardized metric for the assessment of depression severity jointly calibrating these instruments to a common latent metric. Results of the study are based on the broadest available sample in the field of depression measurement. The complete item bank is accessible on the journal's Web site.

Conclusion

This study identified the common latent construct measured by several established depression questionnaires, and it provided a standardized metric for the assessment of this construct jointly calibrating these instruments to a common latent metric. This metric allows the comparison of results of the different depression measures and provides an easy interpretation of its score values. We believe a more standardized assessment of PRO is essential to introduce the patient's perspective into

Acknowledgments

The authors thank Dr. Alexandra Murray (Department of Psychosomatic Medicine and Psychotherapy, University Medical Center Hamburg-Eppendorf, Germany) and Dr. Maria Orlando Edelen (RAND Corporation, Boston, MA, USA) for careful proofreading of the manuscript. The authors acknowledge Alice Bodnar (communication design, Berlin, Germany) for the design of Fig. 2. Ethical approval was not required as decided by the Ethics Committee of the Medical Registering Authority Hamburg, Germany.

References (121)

A.A. Onitilo et al.
Effect of depression on all-cause mortality in adults with cancer and differential effects by cancer site
Gen Hosp Psychiatry
(2006)
M. Panteghini et al.
Standardization in laboratory medicine: new challenges
Clin Chim Acta
(2005)
A.J. Rush et al.
The 16-Item Quick Inventory of Depressive Symptomatology (QIDS), clinician rating (QIDS-C), and self-report (QIDS-SR): a psychometric evaluation in patients with chronic major depression
Biol Psychiatry
(2003)
D. Cella et al.
Initial adult health item banks and first wave testing of the Patient-Reported Outcomes Measurement Information System (PROMIS(tm)) Network: 2005-2008
J Clin Epidemiol
(2010)
W.H. Chen et al.
Linking pain items from two studies onto a common scale using item response theory
J Pain Symptom Manage
(2009)
B. Löwe et al.
Comparative validity of three screening questionnaires for DSM-IV depressive disorders and physicians' diagnoses
J Affect Disord
(2004)
B. Löwe et al.
A 4-item measure of depression and anxiety: validation and standardization of the Patient Health Questionnaire-4 (PHQ-4) in the general population
J Affect Disord
(2010)
B. Löwe et al.
Measuring depression outcome with a brief self-report instrument: sensitivity to change of the Patient Health Questionnaire (PHQ-9)
J Affect Disord
(2004)
B. Löwe et al.
Detecting and monitoring depression with a two-item questionnaire (PHQ-2)
J Psychosom Res
(2005)
M. Rose et al.
Evaluation of a preliminary physical function item bank supported the expected advantages of the Patient-Reported Outcomes Measurement Information System (PROMIS)
J Clin Epidemiol
(2008)

F. Kendel et al.

Screening for depression: Rasch analysis of the dimensional structure of the PHQ-9 and the HADS-D

J Affect Disord

(2010)

R.C. Kessler et al.

The epidemiology of major depressive disorder: results from the National Comorbidity Survey Replication (NCS-R)

JAMA

(2003)

T.B. Üstün et al.

Global burden of depressive disorders in the year 2000

Br J Psychiatry

(2004)

J.H. Lichtman et al.

Depression and coronary heart disease: recommendations for screening, referral, and treatment: a science advisory from the American Heart Association Prevention Committee of the Council on Cardiovascular Nursing, Council on Clinical Cardiology, Council on Epidemiology and Prevention, and Interdisciplinary Council on Quality of Care and Outcomes Research: endorsed by the American Psychiatric Association

Circulation

(2008)

Depression: core interventions in the management of depression in primary and secondary care

(2004)

Screening for depression: recommendations and rationale

Ann Intern Med

(2002)

E. Driessen et al.

Does pretreatment severity moderate the efficacy of psychological treatment of adult outpatient depression? A meta-analysis

J Consult Clin Psychol

(2010)

J.C. Fournier et al.

Antidepressant drug effects and depression severity: a patient-level meta-analysis

JAMA

(2010)

J. Barth et al.

Depression as a risk factor for mortality in patients with coronary heart disease: a meta-analysis

Psychosom Med

(2004)

L.E. Egede et al.

Depression and all-cause and coronary heart disease mortality among adults with and without diabetes

Diabetes Care

(2005)

S.H. Golden et al.

Examining a bidirectional association between depressive symptoms and diabetes

JAMA

(2008)

J.R. Satin et al.

Depression as a predictor of disease progression and mortality in cancer patients: a meta-analysis

Cancer

(2009)

M.A. Whooley et al.

Depressive symptoms, health behaviors, and risk of cardiovascular events in patients with coronary heart disease

JAMA

(2008)

L.R. Wulsin et al.

Do depressive symptoms increase the risk for the onset of coronary disease? A systematic quantitative review

Psychosom Med

(2003)

K.W. Davidson et al.

Assessment and treatment of depression in patients with cardiovascular disease: National Heart, Lung, and Blood Institute Working Group Report

Psychosom Med

(2006)

M.A. Puhan et al.

Combining scores from different patient reported outcome measures in meta-analyses: when is it justified?

Health Qual Life Outcomes

(2006)

D.L. Patrick et al.

Chapter 17: patient-reported outcomes

U.S. Food and Drug Administration. Guidance for industry. Patient-reported outcome measures: use in medical product...

H.F. Fischer et al.

How to compare scores from different depression scales: equating the Patient Health Questionnaire (PHQ) and the ICD-10-Symptom Rating (ISR) using Item Response Theory

Int J Methods Psychiatr Res

(2011)

L.E. Gibbons et al.

Migrating from a legacy fixed-format measure to CAT administration: calibrating the PHQ-9 to the PROMIS depression measures

Qual Life Res

(2011)

T.M. Olino et al.

Measuring depression using item response theory: an examination of three measures of depressive symptomatology

Int J Methods Psychiatr Res

(2012)

M. Orlando et al.

Summed-score linking using item response theory: application to depression measurement

Psychol Assess

(2000)

M.H. Trivedi et al.

The Inventory of Depressive Symptomatology, Clinician Rating (IDS-C) and Self-Report (IDS-SR), and the Quick Inventory of Depressive Symptomatology, Clinician Rating (QIDS-C) and Self-Report (QIDS-SR) in public sector patients with mood disorders: a psychometric evaluation

Psychol Med

(2004)

D. Cella et al.

The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH Roadmap cooperative group during its first two years

Med Care

(2007)

B.B. Reeve et al.

Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS)

Med Care

(2007)

B.B. Reeve

Item response theory modeling in health outcomes measurement

Expert Rev Pharmacoecon Outcomes Res

(2003)

G. Guilera et al.

Item response theory test equating in health sciences education

Adv Health Sci Educ Theory Pract

(2008)

J.B. Bjorner et al.

Computerized adaptive testing and item banking

J.S. Lai et al.

Item banking to improve, shorten and computerize self-reported fatigue: an illustration of steps to create a core item bank from the FACIT-Fatigue Scale

Qual Life Res

(2003)

R.D. Hays et al.

Item response theory and health outcomes measurement in the 21st century

Med Care

(2000)

J.B. Bjorner et al.

Using item response theory to calibrate the Headache Impact Test (HIT) to the metric of traditional headache scales

Qual Life Res

(2003)

N.J. Dorans

Linking scores from multiple health outcome instruments

Qual Life Res

(2007)

H. Fliege et al.

Development of a computer-adaptive test for depression (D-CAT)

Qual Life Res

(2005)

T. Forkmann et al.

Development of an item bank for the assessment of depression in persons with mental illnesses and physical diseases using Rasch analysis

Rehabil Psychol

(2009)

R.D. Gibbons et al.

Using computerized adaptive testing to reduce the burden of mental health assessment

Psychiatr Serv

(2008)

A.J. Mitchell et al.

Redefining diagnostic symptoms of depression using Rasch analysis: testing an item bank suitable for DSM-V and computer adaptive testing

Aust N Z J Psychiatry

(2011)

P.A. Pilkonis et al.

Item banks for measuring emotional distress from the Patient-Reported Outcomes Measurement Information System (PROMIS^(r)): depression, anxiety, and anger

Assessment

(2011)

A.B. Smith et al.

The initial development of an item bank to assess and screen for psychological distress in cancer patients

Psychooncology

(2007)

B. Löwe et al.

Diagnosing ICD-10 depressive episodes: superior criterion validity of the Patient Health Questionnaire

Psychother Psychosom

(2004)

A.T. Beck et al.

Beck depression inventory. Manual

(1993)

Cited by (137)

Baseline depression severity as moderator on depression outcomes in psychotherapy and pharmacotherapy
2024, Journal of Affective Disorders
Evidence-based treatments for adult depression include psychotherapy and pharmacotherapy, yet little is known about how baseline depression severity moderates treatment outcome.
We aimed to compare the effects of psychotherapy and pharmacotherapy for adult depression and to examine the association between baseline depression severity and treatment outcome, converting multiple baseline depression measures into the scores of the Beck Depression Inventory, second edition (BDI-II).
We conducted systematic searches in bibliographical databases up to September 2022 to identify randomized controlled trials (RCTs) in which psychotherapy was compared with pharmacotherapy in the treatment of adult depression. Various meta-regressions using the baseline depression severity as predictor of the relative effects of psychotherapy and pharmacotherapy were performed.
We identified 65 RCTs including 7250 participants for the meta-analyses and 56 RCTs including 5548 participants for the meta-regression. We found no significant difference between psychotherapy and pharmacotherapy (g = −0.08, 95 % CI: −0.2 to 0.04, p = 0.193) and baseline depression severity was not significantly associated with the relative effects of psychotherapy and pharmacotherapy (B = 0.0032, SE = 0.0096, p = 0.74). Results were similar in several sensitivity analyses.
Limitations included the low quality of the included studies, and the omission of long-term effects and within-study variability.
We found no indication for a moderation effect of baseline depression severity on the relative effects of psychotherapy and pharmacotherapy. Thus, other factors such as availability and patients' preference must be considered when deciding for treatment options.
Pituitary gland volumes and stress: Results of a population-based adult sample
2023, Journal of Psychiatric Research
Early and chronic stress was reported to alter the hypothalamic-pituitary-adrenal axis functioning which regulates the secretion of cortisol. Nevertheless, few studies mainly focused on specific study populations (e.g. adolescents, pregnant women, and psychiatric patients), and researched interactive associations of pituitary volumes and single stress markers. The present study used pituitary volumes of two adult general-population cohorts of the Study of Health in Pomerania (SHIP-START-2: N = 1026, 54% Men, 30–90 years; SHIP-TREND-0: N = 1868, 53% Men, 21–82 years). In linear regression models, main effects of the pituitary volumes as well as interaction effects with childhood abuse and neglect (Childhood Trauma Questionnaire) were estimated using depressive symptoms (Beck Depression Inventory-II), and serum cortisol concentrations as outcome variables. The results of both cohorts were integrated via meta-analyses. No main effect between pituitary volumes and depressive symptoms was observed (START-2: β = −0.004 [-0.082; 0.075], p = .929; TREND-0: β = 0.020 [-0.033; 0.073], p = .466; Meta-analysis: β = 0.012 [-0.031; 0.056], p = .580). However, larger pituitary volumes were associated with more depressive symptoms in participants with more severe childhood neglect (START-2: β = 0.051 [-0.024; 0.126], p = .183; TREND-0: β = 0.083 [0.006; 0.159], p = .034; Meta-analysis: β = 0.066 [0.013; 0.120], p = .015). Further, larger pituitary volumes were associated with lower serum cortisol concentrations in participants with more severe depressive symptoms (START-2: β = −0.087 [-0.145; −0.030], p = .003; TREND-0: β = −0.053 [-0.091; −0.015], p = .006; Meta-analysis: β = −0.063 [-0.095; −0.032], p = 8.39e-05). Summarizing, larger pituitary volumes were associated with more severe psychopathological symptoms, particularly in participants reporting early life stress. This was supported by stronger associations between pituitary volumes and cortisol concentrations in participants with more severe depressive symptoms. Future studies are needed to transfer these results into developmental stages of high hormonal changes and patient samples.
Psychological treatment of adult depression in primary care compared with outpatient mental health care: A meta-analysis
2023, Journal of Affective Disorders
It is not yet known whether psychological treatments of depression in primary care have comparable effects to treatments in specialized mental health care. We conducted a meta-analysis comparing randomized controlled trials in primary and specialized care.
We selected studies from an existing database of randomized trials of psychological treatments of depression in adults, which was built through searches in PubMed, PsychINFO, Embase and the Cochrane Library. Random effects meta-analyses were conducted to examine the effects of therapies and mixed effects subgroup analyses were used to compare the effects in primary and specialized care.
We included 52 trials (7984 patients) in primary care and compared them with 50 trials (3685 patients) in specialized care. The main effect of therapies in primary care was g = 0.43 (95 % CI: 0.32; 0.53; PI: −0.18; 1.03). The overall effects were significantly smaller than those in specialized care (p = 0.006), but this was no longer significant after adjustment for differences between the two settings. The proportion of patients responding to treatment was comparable in primary (0.38; 95 % CI: 0.33; 0.43) and specialized care (0.34; 95 % CI: 0.28; 0.41; p = 0.41), but higher in control conditions in primary care (0.25; 95 % CI: 0.22; 0.28) compared to specialized care (0.16; 95 % CI: 0.12; 0.20; p < 0.001).
Psychological treatments are effective in primary care, but somewhat less than in specialized care. Response rates in control conditions in primary care are higher than in specialized care, which may point at a transient nature of depression in primary care.
Does the use of pharmacotherapy interact with the effects of psychotherapy? A meta-analytic review
2023, European Psychiatry
Internet-delivered transdiagnostic psychological treatments for individuals with depression, anxiety or both: a systematic review with meta-analysis of randomised controlled trials
2024, BMJ Open
Acceptability, Feasibility, and Preliminary Impact of 4 Remotely-Delivered Interventions for Rural Older Adults Living with HIV
2024, AIDS and Behavior

View all citing articles on Scopus

: Conflict of interest: All authors have no financial conflicts of interest to declare.

: Funding: This research did not receive external funding. The Charité-Universitätsmedizin Berlin, five hospitals of the Schön Kliniken Company (Bad Arolsen, Bad Bramstedt, Hamburg Eilbek, Roseneck, and Starnberger See), the Heidelberg University Hospital, 12 family practices in Heidelberg, and the Leipzig University supported the study by providing the anonymous data for secondary analysis.

View full text

Original ArticleStandardization of depression measurement: a common metric was developed for 11 self-report depression measures

Abstract

Objectives

Study Design and Setting

Results

Conclusion

Introduction

Section snippets

Sample

Sample

Design

Discussion

Conclusion

Acknowledgments

Gen Hosp Psychiatry

Clin Chim Acta

Biol Psychiatry

J Clin Epidemiol

J Pain Symptom Manage

J Affect Disord

J Affect Disord

J Affect Disord

J Psychosom Res

J Clin Epidemiol

J Affect Disord

The epidemiology of major depressive disorder: results from the National Comorbidity Survey Replication (NCS-R)

JAMA

Global burden of depressive disorders in the year 2000

Br J Psychiatry

Circulation

Depression: core interventions in the management of depression in primary and secondary care

Screening for depression: recommendations and rationale

Ann Intern Med

Does pretreatment severity moderate the efficacy of psychological treatment of adult outpatient depression? A meta-analysis

J Consult Clin Psychol

Antidepressant drug effects and depression severity: a patient-level meta-analysis

JAMA

Depression as a risk factor for mortality in patients with coronary heart disease: a meta-analysis

Psychosom Med

Depression and all-cause and coronary heart disease mortality among adults with and without diabetes

Diabetes Care

Examining a bidirectional association between depressive symptoms and diabetes

JAMA

Depression as a predictor of disease progression and mortality in cancer patients: a meta-analysis

Cancer

Depressive symptoms, health behaviors, and risk of cardiovascular events in patients with coronary heart disease

JAMA

Do depressive symptoms increase the risk for the onset of coronary disease? A systematic quantitative review

Psychosom Med

Assessment and treatment of depression in patients with cardiovascular disease: National Heart, Lung, and Blood Institute Working Group Report

Psychosom Med

Combining scores from different patient reported outcome measures in meta-analyses: when is it justified?

Health Qual Life Outcomes

Chapter 17: patient-reported outcomes

How to compare scores from different depression scales: equating the Patient Health Questionnaire (PHQ) and the ICD-10-Symptom Rating (ISR) using Item Response Theory

Int J Methods Psychiatr Res

Migrating from a legacy fixed-format measure to CAT administration: calibrating the PHQ-9 to the PROMIS depression measures

Qual Life Res

Measuring depression using item response theory: an examination of three measures of depressive symptomatology

Int J Methods Psychiatr Res

Summed-score linking using item response theory: application to depression measurement

Psychol Assess

The Inventory of Depressive Symptomatology, Clinician Rating (IDS-C) and Self-Report (IDS-SR), and the Quick Inventory of Depressive Symptomatology, Clinician Rating (QIDS-C) and Self-Report (QIDS-SR) in public sector patients with mood disorders: a psychometric evaluation

Psychol Med

The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH Roadmap cooperative group during its first two years

Med Care

Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS)

Med Care

Item response theory modeling in health outcomes measurement

Expert Rev Pharmacoecon Outcomes Res

Item response theory test equating in health sciences education

Adv Health Sci Educ Theory Pract

Computerized adaptive testing and item banking

Item banking to improve, shorten and computerize self-reported fatigue: an illustration of steps to create a core item bank from the FACIT-Fatigue Scale

Qual Life Res

Item response theory and health outcomes measurement in the 21st century

Med Care

Using item response theory to calibrate the Headache Impact Test (HIT) to the metric of traditional headache scales

Qual Life Res

Linking scores from multiple health outcome instruments

Original Article
Standardization of depression measurement: a common metric was developed for 11 self-report depression measures

Item banks for measuring emotional distress from the Patient-Reported Outcomes Measurement Information System (PROMIS^(r)): depression, anxiety, and anger