Major depressive disorder in Spanish primary care
The vast majority of mental disorders in Spain are diagnosed in primary care (PC), which serves as a gateway to treatment and to the entire public health system [
1]. In this context, emotional disorders are often misdiagnosed, with rates of up to 78% for depression, 71% for generalized anxiety disorder (GAD), and 86% for panic disorder [
2]. Moreover, even among patients who are correctly diagnosed, only 35.8% of those with depression and 30.7% of those with any anxiety disorder receive adequate treatment [
3] (i.e., most patients receive primarily pharmacological treatment, which is not recommended in clinical practice guidelines [
4]). These mental disorders impose an important economic and societal burden on European countries, including Spain [
5,
6].
Major depressive disorder (MDD) is highly prevalent in Spanish PC centres, with 9.6% of attendees suffering from this disorder each year [
7], although this figure is lower than the mean prevalence rate (19%) in European countries [
8]. Nevertheless, due to the absence of systematic screening tests, general practitioners (GPs) only recognize about 60% of cases of MDD [
3], partly because this condition is frequently comorbid with other physical, somatic, and/or psychological problems such as anxiety disorders or alcohol abuse [
9]. Based on international guideline recommendations (such as the NICE) to manage depression, it is clear that improved assessment methods (for both screening and diagnosis) are needed to improve MDD identification in order to refer these individuals to the appropriate therapeutic intervention [
10]. For this reason, screening tests are very helpful to obtain a quick, initial identification of a possible case of MDD; however, such tools are not sufficiently reliable to be used as the sole detection instrument [
10,
11]. Thus, clinical interviews are required as a second step to confirm diagnoses. The use of these screening tools followed by clinical interviews should increase the efficiency of PC centres and improve overall public health outcomes for MDD.
One screening test that could be used in PC centres to identify MDD is the PHQ-9 [
12]. This self-report instrument is derived from the Primary Care Evaluation of Mental Disorders (PRIME-MD), which was originally developed to identify five mental disorders: depression, anxiety, alcohol abuse, somatoform disorder, and eating disorder. A systematic review of 16 studies that were carried out to identify depression [
13] concluded that although there are many valid tools, the PHQ-9 is equal or superior to other instruments. In this context, given that the operating characteristics of these instruments are similar, selection of the optimal tool to identify MDD should depend on its feasibility, administration and scoring times, and the capability of the instrument to serve additional purposes, such as monitoring depression severity or response to therapy. Indeed, several meta-analyses recommended the PHQ-9 to identify depression in the PC setting because, it can be administered easily, quickly, and in a wide range of clinical contexts [
14,
15]. For instance, Gilbody et al. [
14] analysed 17 validation studies (> 5000 participants), concluding that the PHQ-9 has good psychometric properties (sensitivity 0.80, specificity 0.92) using either the ≥10 cut-off score or the “diagnostic algorithm” method. Manea, Gilbody and Mcmillan [
15] analysed a total of 18 studies (7180 patients, 927 with MDD confirmed by diagnostic interviews), concluding that the PHQ-9 shows acceptable psychometric properties for MDD. In that study, using the widely-recommended cut-off score of 10, sensitivity was 0.85 and specificity 0.89, with no substantial differences in pooled sensitivity and specificity for cut-off scores ranging from 8 to 11.
The PHQ-9 items closely follow the nine criteria specified in the DSM-IV diagnostic manual (the core criteria for MDD have not changed in the DSM-5). Patients use Likert scales to rate the presence of symptoms in the prior two weeks. Depending on frequency (“not at all”, “several days”, “more than half of the days”, and “almost every day”), the nine items are scored from 0 to 3 points (total severity scores range from 0 to 27 points). Total scores of 10–14 points, 15–19 points, and 20–27 points indicate, respectively, moderate, moderately severe, and severe levels of depressive symptoms. When the PHQ-9 is used as a screening test, the most widely recommended cut-off value is 10, as previous research has demonstrated that this cut-off value provides the best combination of sensitivity (0.88) and specificity (0.88) [
12]. The PHQ-9 has also been proposed for use as a diagnostic tool using a specific coding algorithm based on the DSM-IV criteria for MDD in which MDD is diagnosed if at least one of the two first symptoms (items) is rated with a 2 (more than half of the days) or a 3 (most days) and four of the remaining items are also rated with a score of 2 or 3 (with the exception of item 9 [suicide], in which a rating of 1 is sufficient). However, the general consensus is that the PHQ-9 can be used as a screening test but not as a diagnostic test [
12‐
15].
The construct validity of the PHQ-9 has been demonstrated in PC patients in many countries, including Spain [
16], Brazil [
17], China [
18], East-Africa [
19], Holland [
20], South-Africa [
21], the US [
22] and others. These studies indicate that the PHQ has a high convergent validity with other depression measures. However, questions have been raised with regard to the optimal cut-off scores for screening to obtain the most accurate results on the PHQ-9. For example, a meta-analysis [
12] suggested that the PHQ-9 presented good screening properties with both the ≥10 cut-off and the “diagnostic algorithm” method, but that the cut-off point may be increased to ≥11 or ≥12 to obtain optimum specificity in some community-based studies. In a recent review, Kroenke et al. [
23] argued against using an inflexible adherence to a single cut-off score; rather, those authors argue that the cut-off should be adjusted to the target population. Manea et al. [
15] found no significant differences in sensitivity or specificity between a cut-off score of 10 and other cut-off scores (ranging from 8 to 11), but suggested that a cut-off of 11 may represent the best trade-off between sensitivity and specificity. Although the optimal cut-off point is controversial and may depend on the target population, the PHQ-9 presents a reasonably good sensitivity and specificity when used as a screening tool, regardless of the precise cut-off point. By contrast, in studies conducted to assess the validity of the “diagnostic algorithm”, results have been more ambiguous. A recent meta-analyses performed to assess 27 validation studies of the PHQ-9 algorithm scoring method in various settings concluded that—in most cases—sensitivity was low but specificity was good [
24]. Similarly, Mitchell et al. [
25] conducted a meta-analysis of 26 publications reporting on 40 individual studies (
n = 26,902 patients), finding that the best estimates of sensitivity and specificity for the PHQ-9 algorithm were 0.57 and 0.93, respectively. So, the PHQ-9 can be used as a screening test using different cut-off scores but the psychometric properties of the “diagnostic algorithm” were not as good.
Few studies have evaluated the Spanish version of the PHQ-9. The first study by Diez-Quevedo et al. [
26] was conducted to validate the Spanish version of the whole PHQ (including the 9 items for depression) in an inpatient setting, finding that this 9-item part of the PHQ-9 yielded satisfactory sensitivity (0.84) and excellent specificity (0.92) for MDD compared to the gold standard at that time (i.e., the Structured Clinical Interview for DSM-III-R). However, the profile of patients in PC centres is likely to differ substantially from those treated in a psychiatric inpatient setting. A Spanish version of the PHQ-9 has also been evaluated for use in PC centres in Honduras, with all of the linguistic and cultural differences implied by that setting [
27]. However, only one study has focused on a Spanish version of the PHQ-9 for Spain [
16]. In that study, although the sample was obtained from Spanish PC centres, the PHQ-9 was administered by telephone, and thus reported internal consistency of the PHQ-9 applies only to telephone administration. Consequently, little is known about how the PHQ9 performs in Spanish PC centres, nor do we know the optimal cut-off criteria that would be most appropriate in this context in Spain.