Background
Depression is frequently observed in cancer patients. Meta-analyses reported a 25 % prevalence of all types of depression among cancer patients [
1] and a 32 % prevalence of mental health conditions in general [
2]. Depression may negatively affect treatment outcomes [
3] and can be associated with elevated mortality in cancer patients [
4].
Oncologists often fail to detect depression in their patients [
5,
6]. Therefore, it is important to use standardized and easily applicable tools to detect depression. There are several screening instruments that proved to be effective for that purpose. The most often used questionnaires measuring depression in cancer patients are the Hospital Anxiety and Depression Scale HADS [
7], the Beck Depression Inventory BDI [
8] and the Center for Epidemiologic Studies Depression Scale CES-D [
9], for a new summarizing review cf. [
10]. A further, freely available and more recently developed questionnaire is the Patient Health Questionnaire PHQ-9 [
11]. Its validity has been proven in several studies [
12‐
15]. Normative scores are available [
16], and two studies supply tools for converting scores between PHQ-9 and other depression scales [
17,
18]. The PHQ-9 is generally used in its original one-dimensional form (sum score of the 9 items), but several psychometric studies with multiple disease groups challenged the one-dimensional solution [
19,
20] and showed that two-dimensional solutions fitted better [
21‐
25]. The assignments of the items to these two factors were not totally identical in these studies, but all of them obtained one factor concentrating on emotional and cognitive aspects (depressed mood, feeling worthless, and thoughts of death), and the other on somatic aspects (sleep problems, loss of energy, and appetite problems). The first central aim of this study was to test whether such a two-dimensional solution could also be found in a sample of cancer patients. In particular, we test the specific two-dimensional model that performed best in three [
22,
23,
25] of the five [
21‐
25] studies. In addition, the psychometric properties of the items in terms of item-test correlations and the correlations with other scales on emotional and somatic factors were to be examined.
Furthermore, there are age and gender differences influencing the PHQ-9 scores in the general population [
16]. These differences should be taken into account when comparing patients with different cancer locations. It has often been documented that depression is relatively high in breast cancer patients [
26] and low in prostate cancer patients [
27]. However, to what degree is this difference due to different age and gender distributions? Unbiased comparisons between cancer groups can be done by calculating expected mean scores from the general population using linear regression analyses, cf. [
28], and by considering the differences between the patients’ group means and these expected mean scores derived from the general population. The second objective of this study was to perform such regression analyses and compare the mean depression levels for multiple cancer types with and without correction for age and gender effects.
In summary, the aims of this paper were
-
to test psychometric properties and the factorial structure of the PHQ-9,
-
to calculate a regression analysis for the assessment of expected mean scores that help evaluate the PHQ-9 mean values of different cancer entities, and
-
to calculate unbiased estimates of the depression burden for several cancer diagnoses.
Results
Mean scores of the PHQ-9 items
The mean sum score of the PHQ-9 in the patients’ sample was 5.26 (Table
2). According to the cut-off criteria for mild (5–9), moderate (10–14), and severe (≥15) depression, the frequencies were 49.8 % (no), 35.1 % (mild), 11.3 % (moderate), and 3.7 % (severe) depression for the patients. In the general population, the percentages for no, mild, moderate, and severe depression were 72.2 %, 21.2 %, 5.1 %, and 1.5 %, respectively.
On the item level, the mean scores (Table
2) of the cancer sample ranged from 0.15 (suicidal ideation) to 1.07 (sleep problems). All items showed higher mean scores in the cancer group compared with the general population. The greatest differences between both samples, expressed in terms of effect sizes (
d > 0.40), were found for items 3 (sleep problems), 4 (loss of energy), 7 (concentration problems), and 8 (psychomotor agitation/retardation).
Reliability and factorial analyses
The reliability coefficient (Cronbach’s alpha) of the PHQ-9 for the patients’ sample was 0.84. The highest part-whole-corrected correlations between item and sum score (
r
it) were obtained for items 4 (loss of energy), 2 (feeling depressed), 7 (concentration problems), and 1 (loss of interest) (Table
3). All items contributed positively to the reliability of the scale. The contribution of the last item (suicidal ideation) was lowest, but positive nevertheless.
Table 3
Factor loadings and item-test correlations for the cancer patients
1 | Loss of interest | .54 | .47 | .60
| .82 |
2 | Feeling depressed | .49 | .66
| .68
| .82 |
3 | Sleep problems | .75
| .03 | .51 | .84 |
4 | Loss of energy | .78
| .27 | .70
| .81 |
5 | Appetite problems | .58 | .27 | .52 | .83 |
6 | Self-blame | .33 | .65
| .54 | .83 |
7 | Concentration problems | .65
| .32 | .61
| .82 |
8 | Agitation/retardation | .63
| .21 | .52 | .83 |
9 | Suicidal ideation | .03 | .85
| .43 | .84 |
Results of the 2-factorial principal components analysis (PCA) for the patients’ sample are also given in Table
3. The theoretically assumed structure (Items 1, 2, 6, and 9 in one factor and items 3, 4, 5, 7, and 8 in the other) was realized with one exception (item1: loss of interest). CFA results for the total scale and for the two-factorial structure are given in Table
4, indicating a better fit for the latter model.
Table 4
CFA results for the cancer patients
1-factorial model | 494.1 | 32271.2 | 0.041 | 0.094 | 0.915 | 0.887 |
2-factorial model | 302.0 | 32087.4 | 0.034 | 0.074 | 0.950 | 0.931 |
Relationship between PHQ-9 scores and other scales
The PHQ-9 items were correlated with several scales of other questionnaires (Table
5). In the left part of Table
5, the scales focus the affective and mental component, while in the right part the scales also include physical aspects.
Table 5
Correlations between item scores of the PHQ-9 and scale scores of other instruments in the cancer patients’ sample
1 | Loss of interest | -.51
| -.37 | .45 | .43 | -.53
| -.43 | .49 |
2 | Feeling depressed | -.62
| -.39 | .60
| .54
| -.51
| -.40 | .49 |
3 | Sleep problems | -.44 | -.33 | .40 | .41 | -.38 | -.34 | .46 |
4 | Loss of energy | -.61
| -.48 | .52
| .52
| -.58
| -.51
| .70
|
5 | Appetite problems | -.42 | -.32 | .39 | .37 | -.39 | -.34 | .43 |
6 | Self-blame | -.49 | -.36 | .49 | .40 | -.32 | -.22 | .34 |
7 | Concentration problems | -.53 | -.72
| .48 | .44 | -.42 | -.36 | .49 |
8 | Agitation/retardation | -.46 | -.48 | .47 | .36 | -.35 | -.35 | .42 |
9 | Suicidal ideation | -.37 | -.30 | .47 | .32 | -.29 | -.24 | .27 |
| Sum score | -.74 | -.62 | .70 | .63 | -.63 | -.54 | .69 |
There is a clear correspondence between item 7 (concentration problems) and Cognitive functioning (r = −0.72). Item 4 (loss of energy) is highly correlated with all scales, including affective scales and those with physical aspects. Among the four items (1, 2, 6, 9) assigned to Factor 2 (emotional and cognitive aspect), all correlations with the Emotional functioning scale are higher than those with the EORTC fatigue scale. On the other hand, among the five items (3, 4, 5, 7, 8) of Factor 1 (somatic aspect), only three items (3, 4, and 5) showed higher correlations with fatigue, compared with the correlations to Emotional functioning.
Tumor-specific analyses
Table
6 shows PHQ-9 mean scores for all tumor sites with subsample sizes of 25 and above, arranged according to the PHQ-9 mean score. There are great differences among the subsamples concerning age and sex distribution. Since the PHQ-9 mean scores depend on age and sex in the general population, a fair comparison among the tumor sites requires the consideration of these age and sex differences.
Table 6
PHQ-9 scores, broken down by cancer site
Thyroid gland | 29 | 8.2 | 72 | 50.6 | 3.0 | 5.2 |
Other Non-Hodgkin lymphoma | 31 | 7.6 | 45 | 57.2 | 3.1 | 4.5 |
Ovary | 41 | 7.6 | 100 | 63.2 | 3.5 | 4.1 |
Oesophagus | 30 | 6.9 | 13 | 64.0 | 3.3 | 3.6 |
Breast | 346 | 6.3 | 99 | 57.0 | 3.3 | 3.0 |
Non-follicular lymphoma | 30 | 6.1 | 47 | 61.4 | 3.3 | 2.8 |
Pancreas | 31 | 5.9 | 32 | 65.2 | 3.4 | 2.5 |
Stomach | 53 | 5.8 | 34 | 66.9 | 3.4 | 2.4 |
Hodgkin lymphoma | 64 | 5.6 | 56 | 37.1 | 2.4 | 3.2 |
Rectum | 90 | 5.3 | 38 | 65.8 | 3.4 | 1.9 |
Kidney | 119 | 5.2 | 43 | 65.9 | 3.4 | 1.8 |
Colon | 196 | 5.0 | 54 | 70.4 | 3.6 | 1.4 |
Testis | 35 | 4.9 | 0 | 35.9 | 2.2 | 2.7 |
Bladder | 91 | 4.6 | 22 | 69.5 | 3.5 | 1.1 |
Prostate | 640 | 4.0 | 0 | 66.9 | 3.3 | 0.7 |
Total | 2,059 | 5.26 | 41 | 62.3 | 3.3 | 2.0 |
The linear regression analysis of the general population’s sample yielded the following regression equation:
$$ \mathrm{P}\mathrm{H}\mathrm{Q} = 0.0367\ *\ \mathrm{age} + 0.310\ *\ \mathrm{sex} + 0.884. $$
Sex is to be coded with the values of 0 (males) and 1 (females). For example, the expected PHQ-9 score of a 60-years old woman is 0.0367 * 60 + 0.310 * 1 + 0.884 = 3.396. For the whole sample of the general population (41 % women; mean age: 62.2 years), the calculation is as follows: PHQ-9 (expected) = 0.0367 * 62.2 + 0.310 * 0.41 + 0.884 = 3.294. The column “Expected PHQ-9 Mean” in Table
6 shows these expected values for samples with the age and gender distribution of the cancer groups. These expected means deliver the basis for the comparison of depression burden of the different cancer patients groups.
All groups of patients show higher mean values than the (matched) controls, with differences ranging from 0.7 (prostate) to 5.2 (thyroid gland). The sequence of the cancer sites according to the PHQ-9 mean scores is similar to the sequence according to the age-and gender-corrected mean values (right part of Table
6). Patients with testis cancer have a mean PHQ-9 score of 4.9, which is the third lowest mean value in Table
6. Taking into account that the patients are males and that they are relatively young, the difference between the actual score and the expected one (
diff = 2.7) indicates a higher level of distress in this group. A similar phenomenon can be observed for patients with Hodgkin lymphoma.
Discussion
The first aim of this study was to test the factorial structure of the PHQ-9 administered to cancer patients. The results of the factorial analyses demonstrate that a two-dimensional model according to [
23] performed better than the one-dimensional model. With one exception (item 1; loss if interest) the hypothetically assumed structure emerged in the PCA. It is interesting to note that the two items that were selected for the PHQ-2 (loss of interest and feeling depressed) reached good part-whole corrected item-test-correlations (0.60 and 0.68), and that they had positive loadings in both factors in the PCA. Together with the results of other studies reported in the literature, we can conclude that the PHQ-9 comprises two aspects, an affective-cognitive component (feeling depressed, self-blame, and suicidal ideation) and a somatic component (sleep problems, loss of energy, and appetite problems), but that the assignment of the remaining three items to the scales according to the factorial analyses (loss of interest, concentration problems, and agitation/retardation) is less clear. The reliability coefficient of the total scale (Cronbach’s alpha) was good (alpha = 0.84), and all items contributed to this scale. This is similar to the results of other studies [
38,
39]. Insufficient CFA fit indices for the total sum scale are also found in other depression questionnaires (e.g., [
40]). As long as there is no other structure of the questionnaire that can be reliably replicated in several studies, we believe that it is best to maintain the sum score.
Sleep problems (item 3) and loss of energy (item 4) were the symptoms that differed most greatly between the cancer patients and the general population, followed by concentration problems (item 7) and agitation/retardation (item 8). As such, “classical” depression features like feeling depressed and loss of interest, were not reported to be key burdens of cancer patients half a year after rehabilitation. Item 8 contains two contradictory aspects of psychomotorics: agitation and retardation. This item fitted most poorly in the Forkmann et al. study [
19] and was therefore excluded there. Clinicians report that patients have difficulties answering this item because of its seemingly contradictory nature. In the PCA, the item was associated with factor 1 in the patients’ sample. It cannot be clearly interpreted. Item 9 (suicidal ideation) showed very low mean scores, and the item-total correlations were lowest, though both coefficients were greater than 0.40. The contribution to the sum score of the PHQ-9 is small. However, physicians may obtain relevant information when this item is not totally denied [
41]. Taking these properties of the PHQ-9 together, it is an advantage of this short instrument that it can nevertheless been used for different purposes: (a) general screening for depression, (b) focusing on two aspects of depression according to the two factors, and (c) considering single items such as suicidal ideation.
The mean score differences between cancer patients and the general population were most pronounced for the items indicating sleep problems and loss of energy. These items belong to Factor 1 and indicate general health problems. All nine items are heightened in the cancer patients’ sample, but it is worth noticing that the health-related components are most strongly affected.
The comparison between the cancer types confirmed high degrees of depression in patients with thyroid cancer [
42] and low degrees in those with prostate cancer [
27]. While breast cancer patients also show high mean levels of mental distress [
43], in this study breast cancer was in the upper margin, but not at the top. Moreover, PHQ-9 mean scores were presented for several other, more seldom types of cancer that have not been extensively examined in psycho-oncological research. In addition to the raw PHQ-9 mean scores for the different cancer types, we also calculated the differences between these mean scores and the expected mean scores, based on the age and gender distribution. There were no great differences between raw scores and corrected scores in the sequences (Table
6). However, for cancer types with large proportions of males and young patients, (testis, Hodgkin lymphoma), the burden of cancer is underestimated when only simple mean scores are considered. Regression analyses such as those performed here can also be calculated for other questionnaires in order to provide a basis for unbiased comparisons among subgroups of patients.
Some limitations of this study should be mentioned. We examined patients half a year after discharge from a rehabilitation clinic. Patients with a very bad prognosis may be underrepresented or overrepresented in the sample. Though we believe that patients in a good health state are more compliant in filling in the questionnaire, resulting in a slight underestimation of the depression burden in the sample, we have no information on the non-participants. A further limitation is the limited information surrounding the health status of the respondents. In addition, participants of a rehabilitation program are not totally representative of all cancer patients. We only calculated CFA analyses for the one-dimensional model and one two-dimensional model. It would be possible to refine the CFA models and to arrive at better fit indices if several modifications were made such as: considering sub-dimensions, correlated error terms or removing items. However, special modifications, adapted to each data set, would not lead to generalizable results. Some patient groups in our study had small sample sizes, their depression mean scores should be considered with caution. Finally, the PHQ-9 is an economic screening instrument, which, however, is not a sufficient substitute for a clinical diagnosis of depression. Nevertheless, it can help provide aggregated information on the burden of special disease groups such as cancer patients.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
AH and AM designed the study. TS provided the data of the cancer patients. EB and RK provided the data of the general population. TF and SS performed the statistical analysis. AM managed the literature search. AH wrote the first draft and the final version of the manuscript. All authors contributed and have approved the final manuscript.