What is new?
Key findings- •
In this study, a common metric was developed for the first time for a large number of established depression measures using item response theory methods.
What this adds to what was known?- •
To date, the variety of different scales for the assessment of certain patient-reported outcomes (PROs) seriously impairs research and communication among clinicians. Thus, standardization of PRO measurement is urgently needed.
- •
The new standardized metric for depression severity provides easy comparability of scores, measurement range, and precision among the different scales.
What is the implication and what should change now?- •
The results offer a conjoint definition and understanding of the latent depression construct as defined by the items from a variety of established depression questionnaires.
- •
The outlined standardization approach calibrating different depression measures to a common latent metric can be applied to the assessment of other PROs as well.
Depressive disorders are severe and widespread diseases, imposing a significant burden for the individual and the society [1], [2]. Reliable tools for depression measurement are essential for case recognition [3], [4], [5], treatment monitoring [6], [7], and clinical research in general [8], [9], [10], [11], [12], [13], [14]. Today, a plethora of carefully developed and well-established self-report instruments for the assessment of depressive symptoms exist. However, scores of these instruments are not directly comparable. The heterogeneity of scale-specific metrics seriously impairs comparability across study results and complicates communication among researchers and clinicians. Pooling study results from different depression measures in quantitative reviews or meta-analyses is difficult and may even lead to biased results [15], [16]. To avoid this bias, some meta-analyses limit the selection of studies to those that use the same instrument(s) [6], [7]. However, such restrictions lead to a significant loss of information.
It is recognized that results for biomedical parameters need to be comparable across laboratory methods and facilities [17], and in our opinion, this is equally important for the measurement of patient-reported outcomes (PROs) [18], [19].
This issue has been identified earlier [15], [16], but only the recent increases in computational power have enabled the introduction of new psychometric methods in this field of health care [20], [21], [22], [23], [24], [25]. The most frequently discussed solution [26], [27], [28] to achieve a standardized metric for PROs is offered by the item response theory (IRT) [29], [30], [31], [32], [33]. Items of different established depression questionnaires can be included in one "item bank" to provide one common metric [34], [35], [36]. Some depression item banks have already been developed [37], [38], [39], [40], [41], [42], but to our knowledge, no study so far has attempted to establish a comprehensive metric to achieve comparability for a larger number of existing depression measures.
In this study, we aim to provide such a metric for some of the most established depression measures. This metric should allow the comparison of results from different instruments on one common "ruler," like using different thermometers to measure temperature on a meaningful anchored metric.