Factor analysis of psychopathological rating scales can provide us with an estimate of the illness dimensions that underlie the respective rating scales. This naturalistic study included the three most used instruments for measuring depression in a large sample of inpatients, offering the opportunity for a comprehensive psychometric comparison.
HAMD-17
The psychometric properties of the HAMD have been repeatedly investigated [
8]. Consistent with previous findings, the HAMD-17 demonstrated good internal consistency (Cronbach’s alpha = 0.85, Omega 0,86).
Among the single item correlations, the items "agitation" and "insight" showed only weak correlations with the HAMD-17 total score, suggesting less relationship with other variables and thus little psychometric value. These two items have also consistently been described as having poor psychometric abilities, including low discriminative abilities, in previous investigations [
27,
42,
43]. It has often been argued that the poor discriminative abilities of the "agitation" and "insight" items might be due to a less severe patient population. However, in this severely depressed inpatient sample, we were able to replicate these findings [
28]. Additionally, in only 14% of all 3,690 visits, patients were rated as having some impairment of illness insight (HAMD item 17 > 0), suggesting a low overall prevalence of this item.
Thus, the four-factor solution suggested by Shafer’s meta-analysis was largely confirmed [
8]. It suggests a “depression factor” with core depressive symptoms, a “sleep” factor, an “anxiety factor” and a “somatic symptoms” factor. We additionally checked all 3- and 4-factor solutions cited by Bagby et al. (2004) [
42] and found no factor model with a better fit [
44‐
48]. However, the factor structure proposed by Onega et a. (1997) [
49] had almost the same factor structure containing the same items and also showed good model fit (CFI: 0.87, TLI: 0.85, RMSEA: 0.069).
The symptoms of the depression factor fit nicely into Parker's suggestion of classifying depression along psychomotor disturbances, which is the most specific symptom for melancholic depression [
50]. In the aforementioned review on the HAMD scale summarizing results from 15 factor analyses on the HAMD-17, Bagby and coworkers (2004) also found good evidence for the presence of such a general “depression factor” [
42].
The “anxiety” factor, included all anxiety related HAMD symptoms, in addition to “agitation”. Bagby´s review also suggested the presence of an “excitement factor” (including anxiety items along with agitation) as was found in 6 of the 15 reviewed samples [
42]. In line with clinical experience, agitation in major depression is closely related to anxiety, as it might be its physiological manifestation. This notion is also supported by findings from Angst et al. (2008) who found that agitated depression was not significantly related to bipolarity but rather closely related to anxiety symptoms [
51]. Maybe anxiety represents a separate dimension within depression [
52,
53] and may be related to worse clinical outcomes [
54‐
57]. But on the other hand anxiety symptoms are not very specific to depressive disorders, as anxiety symptoms are among the most prevalent psychopathological symptoms generally [
58].
MADRS
The CFA demonstrated a good model fit for all parameters (CFI: 0.975;TLI: 0.962; RSMEA 0,072). Our findings align well with the results reported by Williamson (2006), who initially proposed this four-factor solution in individuals with Bipolar-I disorder [
18]. The four-factor solution comprises factors related to sadness, neurovegetative symptoms, detachment, and negative thoughts. In 2013, Quilty and colleagues successfully replicated this four-factor solution and found a good model fit (CFI: 0.92; RMSEA: 0.06). They also demonstrated the invariance of this solution over time and gender [
29].
Furthermore, the authors presented support for a hierarchical model where all four factors loaded onto a second overarching depression factor. Additionally, we examined the one-factor solution as proposed by Uher and colleagues but only observed a good model fit in two out of the three indices (CFI: 0.94; TLI: 0.92; RMSEA: 0.107) [
29].
The high correlation between single items and the MADRS total score highlights the scale's excellent reliability. Compared to the HAMD, the MADRS may be better suited for detecting or measuring treatment effects within a homogeneous sample, but it may have limitations in capturing different dimensions of illness.
BDI
The CFA of the BDI using the factor structure of Shafer et al. (2006) confirmed the 3-factor solution with good model fit indices (Table
5). The three factors could probably be best referred to as the “negative perception of oneself" factor, as the “performance” factor and as the “somatic” factor [
8].
First, this result should also be seen against the background of the developmental procedure of the BDI. Aron T. Beck developed this self-rating instrument based n his depression theory of the “cognitive triad”. The core of this theory is the assumption that depression arises from negative thoughts on the self, the world, and the future. Consequently, he developed a questionnaire that includes 5 cognitive items covering content of negative self-perception (feeling like failure, guilt, feeling of being punished, disappointment in oneself, self-blame).
In our CFA, all 5 items were indeed found to load on a single factor. From a methodological perspective, a single factor is more likely to emerge when an instrument contains similar items. However, it may be that negative self-perceptions play an important role, particularly within the subjective dimension of depression. Supporting this notion, suicidality had the highest loadings on this factor. In this context, suicidality may represent the most severe form of negative self-perception, where one feels so worthless that their life is not worth living. Beck himself also described a factor called "negative attitude towards self," which aligns well with the second factor found in our analysis [
59].
These factors are also in good accordance with the results from the German BDI validation study conducted by Hautzinger et al. (1991) in a sample of 477 primarily (89%) inpatients diagnosed with a depressive episode according to ICD-9 [
38]. Describing a 3-factor solution, the study proposed a "performance impairment factor" (including items such as work, tiredness, interest in people, sadness, making decisions, crying, and irritability), a "negative self-perception factor" (including items such as guilt, self-blame, feeling like a failure, feelings of being punished, future pessimism, and suicidal thoughts), and a "physical symptoms factor" (including items such as weight loss, sleep disturbances, and appetite loss) [
38].
Combined exploratory factor analysis of BDI, MDRS and HAMD
In line with our hypothesis, we found that a 3-factor solution was the best fitting and most interpretable. Only the BDI items related to sleep, appetite, and weight loss loaded together with similar items from the HAMD and MADRS, forming a separate “psychovegetative” factor. These symptoms are not specific to depression but are sensitive markers of depression within a correctly diagnosed depressed patient sample. All other self-rated items loaded onto one strong factor explaining 17% of the total variance of all scale items. This strongly supports the notion that self-ratings in major depression may represent a separate illness dimension. Considering that the HAMD and BDI have a 50% overlap in symptoms, the strict separation into two separate factors is remarkable. Uher et al. (2008) also performed an EFA with BDI, MADRS, and HAMD-17 items and found a 3-factor solution to be the most interpretable. They found one strong self-rating factor with almost all BDI items plus suicide (HAMD-17, MADRS) and guilt (HAMD-17), a mood and anxiety factor (MADRS and HAMD-17), and a neurovegetative factor with sleep and appetite items combined from all three scales [
28].
The poor agreement between self-ratings and clinician-ratings is also reflected in the correlations of 0.58 and 0.59 between HAMD and MADRS with the BDI, respectively. Apart from the differing item content of self- and observer-rated scales, there are several reasons described in the literature that contribute to the discrepancy between self- and observer-rated scales in depression research.
First, self-ratings are more prone to be biased by depression severity. For instance, severely depressed patients tend to underestimate their symptomatology whereas less severely depressed patients may overestimate their symptoms [
60‐
62]. Second, some aspects of psychopathology cannot be adequately assessed by self-ratings, as they are mainly observable by an observer, such as psychomotor retardation or hypochondriasis. Third, self-ratings are particularly vulnerable to fixed response biases in some patients, such as acquiescence bias, social desirability bias, or symptom exaggeration in the hope of receiving better care [
63]. Fourthly, the accurate completion of a self-rating is dependent on the educational background and the patients´ ability for introspection [
64].
However, clinician ratings are not without bias as they might be easily influenced by the clinician's expectations of the allocated treatment, which is especially true within naturalistic non-blinded conditions. Despite these limitations, self-rating might still represent a dimension of its own [
4].
In our data, this notion is highlighted by the fact that even core depressive items that are closely connected or almost identical in content, such as reported sadness (MADRS), depressed mood (HAMD), and sadness (BDI), load on one self-rated factor (BDI) and one observer-rated factor (HAMD and MADRS) (Table
6).
Strengths and limitations
Strengths of this analysis are the simultaneous application of the three depression scales most widely in use, the large sample size of inpatients including acutely suicidal patients and the independent funding by the German ministry for education and research.
But there are also some principle and methodological limitations which must be carefully considered.
Firstly, although many severely depressed inpatients were included, this sample may not be easily generalizable. Although we only had missing baseline data for a small number of patients (n = 59), we did not have data for all three scales at all time points, limiting generalizability. Additionally, the German healthcare system allows for easier access to treatment and longer inpatient treatment durations than in other countries. Further, older patients and adolescents are clearly underrepresented in our sample.
Secondly, all scales were assessed by the same clinician, which implies that one rating may have influenced the other. However, an independent rating would have required double the number of raters and increased the variance in ratings, leading to more "background noise".
Thirdly, several depression items were present on all three scales, suggesting some degree of redundancy. This overrepresentation of similar items could have hindered the emergence of more distinct and well-defined factors in the combined factor analysis of all three scales. On the other hand, this overlap allowed us to confirm the existence of a "self-rating dimension," since even very similar items loaded onto different factors.
Fourthly, the depression scales used did not include atypical depressive features such as overeating, oversleeping, or mood reactivity, which prevented exploration of an "atypical depression" factor. Atypical depression may be a distinct subtype of major depression associated with specific symptoms. In another study, we found that 15% of this sample met the criteria for atypical depression [
65].
Fifthly, we chose to use a "random week dataset", which excluded observations of factor structures over time. However, our primary goal was to obtain a representative picture of the psychometric properties of the three scales. Focusing only on baseline ratings would have resulted in a dataset with less variability. Alternatively, if we had included discharge data, it would have biased our results towards a more treatment-resistant population. Galinowsky and colleagues (1995) reported a 2-factor solution at the beginning and a clear 1-factor solution for the MADRS at the end of antidepressant treatment, suggesting instability of the MADRS factors over time [
26]. Factor instability has also been reported for the Inventory of Depressive Symptoms (IDS) and its short forms, as well as the HAMD, by Fried et al. (2016), and for the BDI [
66]. On the other hand, Quilty and colleagues (2013) found factor invariance over time for the MADRS [
29], as several other researchers have also demonstrated for the CDS [
67,
68]. We therefore additionally computed CFA only for baseline and endpoint ratings for the combined factor analyses and found no substantial different results. Uher et al. (2008) also tested for invariance over time performing a longitudinal CFA. In line with our results the authors found invariance for factor one and three and only a minor deterioration in factor two (the self-report factor) [
28]. However, this remains an important issue for further research.
Sixthly, we could have used more sophisticated statistical methods, such as hierarchical models to test whether the identified factors load on a single second-order factor, bifactor models to determine if both a global overall and specific first-order factors are present, or multitrait-multimethod analyses to reflect both the dimensions and rating perspectives simultaneously. However, since most of the cited research used similar methods, our results are better comparable.
Future perspectives
Our analysis confirmed the multidimensionality of the HAMD-17 and the BDI and the the MADRS. Additionally, we observed the emergence of a distinct subjective dimension represented by the BDI. However, what are the potential consequences and implications of these findings? Symptoms of major depression may consist of clusters that are associated with distinct neurochemical disturbances [
12]. For example, suicide and aggressive behaviour may be related to hypoactivity of serotonin, while psychomotor retardation and anhedonia may be related to hypoactivity of norepinephrine and dopamine [
12].
A reasonable application of such results, for example, could be their use in neurobiological research. Instead of simply correlating the overall sum scores of depression scales with biological variables (e.g. serotonin binding capacities, fMRI), a more sophisticated approach could be used. Hypothesis-guided correlation of the respective depression dimension with an a priori assumed biological correlate could be a useful approach to discover new neurobiological substrates. In addition, for more detailed psychopathological analyses, such as predictive power of specific symptoms, using factors instead of forcing all variables of a rating scale into one statistical model (i.e., logistic regression) and being confronted with the problem of multicollinearity could be an alternative. In this regard, the issue of factor invariance across clinically meaningful endpoints, such as responders versus non-responders or remitters versus non-remitters, represents another crucial aspect to consider. Future analyses could further investigate hierarchical models that explore the underlying factors contributing to the construct of depression. Since clear biological measures of depression are lacking, quantifying depression and treatment effects still relies on detailed psychopathology using instruments with proven psychometric abilities. This goal is likely best reached with multiple complementary measures. This holds especially true as we are repeatedly reminded of the dimensionality of these disorders.