Background
Depression is a common and serious mood disorder characterized by persistent feeling of sadness and hopelessness, loss of interest in previously enjoyed interests [
1], and emotion regulation difficulties [
2]. Commonly reported comorbid conditions include chronic somatic illness such as inflammatory bowel disease [
3], diabetes [
4], cardiovascular disease [
5], and psoriasis [
6], as well as psychiatric illness, including substance abuse [
7], anxiety [
8], and eating disorders (ED) [
9]. ED are characterized by restricted or dysregulated food intake, distorted body image, and preoccupation with food, weight, and shape [
1]. While general population prevalence estimates of depression vary from 17 to 31% [
10], prevalence estimates of depression among individuals with ED have been reported to be as high as 75% [
11]. However, estimates vary considerably depending on methodological approaches [
12]. Depressive disorders are among the leading causes of worldwide burden, and are the second leading cause of years lived with disability [
13]. Detection and treatment of depression is thus a public health priority.
Structured or semi-structured diagnostic interviews are designed to accurately determine psychiatric diagnoses, but require significant time and resources to conduct. In contrast, self-report assessment tools demand fewer resources to adopt and are easier to administer and score. Brief self-report questionnaires are an efficient way to screen individuals who score above a predetermined cut-off and may be in need of further clinical attention. Also, screening measures may be appropriate to use as an initial stage one in epidemiological studies prior to stage two diagnostic interviews. The high rates of chronicity and disability associated with depression [
14] underscore the benefit of early screening and detection. A range of different self-report assessment tools have been used to measure symptoms and severity of depression, including Hospital Anxiety and Depression Scale (HAD) [
15], the Beck Depression Inventory (BDI) [
16], and the Patient Health Questionnaire-9 (PHQ-9) [
17]. A recent systematic review investigated specificity and sensitivity of instruments used to grade severity of depression, and found that out of twenty reviewed instruments, the PHQ-9 was one of only three measures fulfilling the minimum criteria for sensitivity and specificity, with a reported sensitivity of 88% and specificity of 78% for the cut-off score of ≥10 [
18]. This cut-off was established by the developers using an independent structured mental health professional interview as the criterion standard [
17].
The PHQ-9 consists of nine items that measure depression symptoms and severity [
19,
20]. Mixed findings have been reported with regard to factor structure. Whereas some studies have supported the originally established one-factor solution [
21], other studies suggest a two-factor solution, with one cognitive/affective- and one somatic factor [
22]. A previous Norwegian study of adolescents [
23] supported a one-factor structure in a confirmatory factor analysis (CFA), but this has not yet been confirmed among Norwegian adults. As the PHQ-9 is extensively used in both clinical and research settings for psychiatric assessment, proper validation of different versions is important to make sure that the same construct is measured. Despite its widespread use in both clinical and research settings in Norway, only one prior study has investigated the psychometric properties of the PHQ-9 [
23]. This study adapted the PHQ-9 to adolescents by shortening the time reference from fourteen to seven days. This underscores the need to confirm the psychometric properties of the Norwegian version of the PHQ-9 among adults to allow for comparisons across international studies. A recent study using PHQ-9 among college women who screened positive for an ED reported moderate depression across different ethnic groups, indicating that comorbid ED and depression can present in various ethnic groups [
24]. Including patient samples (e.g., ED) in this effort will aid in determining whether the psychometric properties of the PHQ-9 extend beyond healthy individuals. Considering the high comorbidity between ED and depression, validation of the Norwegian PHQ-9 is needed for both clinical ED samples and controls [
9]. Also, many symptoms of depression overlap with those of ED (e.g. weight loss, appetite), and it is therefore important to specifically investigate psychometric properties of the PHQ-9 in currently ill ED samples. In addition to ED psychopathology, depression is associated with emotion regulation difficulties and anxiety [
25‐
27]. In their systematic review, Sloan et al. [
28] found evidence for emotion regulation as a transdiagnostic treatment construct across various psychopathologies, including anxiety, depression, and eating disorders. Specifically, Fowler et al. [
29] reported good construct validity of the DERS based on moderate correlations with depression and anxiety.
We investigated the psychometric properties of the Norwegian version of the PHQ-9 in adults with and without a lifetime ED diagnosis. Specifically, we investigated the internal consistency and convergent validity, attempted to confirm a one-factor structure, and present normative data. Convergent validity was explored by examining correlations with other theoretically related constructs, e.g. anxiety, ED psychopathology, and emotion regulation. We hypothesized that the Norwegian PHQ-9 would exhibit acceptable psychometric properties across ED diagnostic status.
Discussion
The overarching aim of this study was to investigate the psychometric properties of the Norwegian version of the PHQ-9 in a female adult sample with and without a lifetime history of ED. The results suggest that the psychometric properties are generally good, with excellent internal consistency and good convergent validity across diagnostic status. CFA revealed that a one-factor model of the PHQ-9 was the solution with the best fit to the data, though the fit was mediocre. No evidence of DIF was found based between those with and without ED. The results indicate that level of depression measured with PHQ-9 can be compared between such groups.
Psychometric properties
The internal consistency of the Norwegian version of the PHQ-9 was excellent among adult females, with Cronbach’s alphas between .86–.92 for the different ED groups. This is similar to results from the previous Norwegian adolescent study (Cronbach’s alpha of .86 for the total sample and .88 for girls only) [
23], as well as other studies among adult (male and female) samples [
17,
45], all reporting Cronbach’s alphas between .79 and .89. Thus, reported internal consistency was similar across gender in these studies. However, it should be noted that this does not necessarily mean that the results in the current study can be generalized to males. Furthermore, the PHQ mean score in our study was positively and strongly associated with ED psychopathology, emotion regulation difficulties, and anxiety. These are constructs theoretically related to depression, thereby indicating convergent validity [
25,
26,
28]. While scores on PHQ-9 showed moderate to large correlations with scores on ED psychopathology and emotion regulation, associations between scores on depression and anxiety were large. Although the meaningfulness of separating the constructs of anxiety and depression can be debated, here it supports satisfactory convergent validity of the Norwegian translation of the PHQ-9.
A one-factor model of the PHQ-9 was the solution with the best fit to the data, even though the fit was mediocre. The one-factor model exhibit strong factor loading (0.63–0.86) and high internal consistency (.86–89). This contrasts with some aspects of the existing literature [
21], including a Norwegian study of adolescents [
23], yet other studies have reported a two-factor structure [
22], including a somatic and a cognitive/affective factor. These contradictory factor structure findings may reflect sample differences, although analyses of measurement invariance indicate that PHQ-9 is a reliable and valid measure across demographic groups [
21,
44]. Furthermore, it has been argued that since the factors in the two-factor structure are highly correlated (.86), it is of limited value to distinguish them [
44]. Additionally, the PHQ-9 is a brief, nine-item measure designed to effectively screen for depression, which could suggest that using the total score of the PHQ-9 to indicate depression severity may be beneficial when using the measure clinically and in research.
Normative data
Normative data were also presented, demonstrating higher PHQ-9 mean scores in the ED groups compared to those with no lifetime ED. Though elevated scores may be expected among individuals with a previous ED, it is worth noting that the PHQ-9 mean score of the individuals with no ED history in our study is considerably higher than that reported among representative nationwide population-based samples from other studies, such as Germany [
46], the USA [
22], and South Korea [
45]. Mean scores for males and females in these studies typically range from 2.5 to 4.5, though females tend to score somewhat higher than males. There were no exclusion criteria for our group of individuals with
no history of ED with the exception of lifetime history of ED. It is therefore possible that individuals in this group have other mental health problems. Supporting this, 18.6% of the individuals in the comparison group reported currently receiving mental health treatment, although this does not necessarily signal the presence of a psychiatric disorder. This could indicate that our sample of people with no ED history is not as healthy as those in larger population studies that strive for super healthy controls. Notably, the Norwegian adolescent study reported similar norms to the current Norwegian adult study, with a mean score of 6.89 (SD 5.13) in adolescent females [
23]. It can therefore not be ruled out that differences in normative data across studies reflect cultural differences in symptomatology or reporting; however, this cannot be concluded based on our data.
Other studies of clinical samples of individuals with ED have reported PHQ-9 mean scores falling in the same range as the past- and current ED groups in the present study [
47,
48]. For example, Hayes et al. [
47] reported normative data in study of adolescents and adults (93% females) with ED receiving treatment with a partial hospitalization and intensive outpatient program. Baseline PHQ-9 mean score was 12.79 (SD 6.91), dropping to 8.12 (SD 6.91) post treatment. Furthermore, Rose et al. [
48] reported PHQ-9 mean scores in adults (44 females and 3 males) with ED in primary care pre and post CBT for ED (mean number of sessions 17). Baseline mean was 13.5 (SD 5.48), post treatment was 7.42 (SD 6.38). Based on these studies, it may seem like baseline PHQ-9 mean scores of ED clinical samples resemble those of the two clinical groups in the present study (mean scores 10 in the previous ED group and 16 in the current ED group), whereas post-treatment scores in the clinical studies fall closer to the group with no lifetime ED in the present study. These studies consisted of predominantly female samples, but Hayes et al. reported that gender was not a significantly moderator for any of the outcome measures.
Furthermore, with regard to prevalence, a total of 53.4% with a previous ED, and 86.4% of individuals with a current ED, scored above the PHQ-9 depression screening cut-off score of 10 in the present study. As expected, the proportion of individuals with no lifetime ED history scoring above cut-off, was considerably lower at 26.1%. However, these scores are still noticeably higher than the German population data [
46], reporting that 5.6% of all participants scored above the cut-off of 10. As noted above, it cannot be determined whether these differences relate to cultural-, selection-, or other factors. Whereas the national German population study had a representative registered-based sample, our study mainly utilized online recruitment. Such various recruitment approaches may affect the samples attained, thereby potentially bias the results. It is a possible that cut-off thresholds may need to be culturally adapted. To achieve this, two-stage studies are needed.
Our findings suggest the Norwegian version of the PHQ-9 is a reliable and valid measure that can be used to assess depression symptoms among female individuals with ED. This is important as depression symptoms often co-occur with ED, and monitoring such symptoms may be of importance to assess treatment response. Moreover, our normative data showed that depression scores were elevated among recovered individuals who have a history of an ED. This has implications for the interpretation of PHQ-9 scores among such individuals. Because symptoms of depression overlap with those of ED (e.g. weight loss, appetite), future studies should evaluate whether the traditional PHQ-9 cut-off thresholds are equally valid for ED populations.
Although this study is strengthened by a large sample with and without ED, it is limited by the use of self-report data to ascertain lifetime ED diagnoses. Also, we cannot rule out that our online recruitment procedure may have affected the results. Differences across the different ED diagnoses were not addressed. Though the psychometric properties of the Norwegian translation of the PHQ-9 are found to be good among females, diagnostic interviews are required to determine diagnoses. Also, males are not included in the study. This limits the generalizability across genders and confidence in gender-specific norms. Finally, another measure of depression was not included, which would have strengthened evidence in favor of the construct validity of the PHQ-9.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.