Background
The majority of systematic reviews and meta-analyses have concluded that cognitive behavior therapy (CBT) is an efficacious treatment for depression [
1-
3]. In various treatment guidelines, e.g., the NICE guideline [
4], CBT is therefore recommended as the first-line treatment for depression.
Delivering CBT for depression in a group format is a cost-effective alternative to individual treatment [
5,
6]. Group therapy may provide further advantages, as patients may benefit from group cohesion and normalization effects and may also be able to use the group as an arena for engaging in behavioral experiments, learning from others and functioning as co-therapists [
7,
8]. On the other hand, group therapy is not acceptable to some patients, and there is less time allotted and less opportunity to tailor treatment to the individual patient [
8].
A meta-analysis of 48 randomized controlled trials (RCTs) by McDermut, Miller, and Brown [
9] shows that different forms of group therapy effectively reduce depressive symptoms. The authors found an overall effect size of 1.03 and that CBT was somewhat more efficacious than psychodynamic group therapy. In a review of 34 studies on group therapy for depression, Oei and Dingle [
10] also examined measures of cognitions, behaviors and general health in addition to depression severity in their analyses. Based on 13 controlled studies, the authors found an average effect size of 1.11 in favor of group CBT. Analyses of 21 uncontrolled studies showed an average effect size of 1.30 for comparisons between pre-treatment and post-treatment scores. Oei and Dingle [
10] concluded that group CBT for depression is as effective as other bona fide treatments as defined by Wampold, Minami, Baskin, and Callen Tierney [
11]. With respect to group CBT provided in primary care or in the community, a meta-analysis of 14 randomized controlled trails by Huntley, Araya, and Salisbury [
12] showed a significant effect of group CBT over usual care at post-treatment and medium- to long-term follow-up; the standardized mean differences (SMDs) reported by the authors were -.55 and -.47, respectively. The authors further found that individually delivered CBT was superior to group CBT (SMD = .38) immediately after treatment, but not at follow-up. Similarly, the results of the meta-analysis conducted by Cuijpers, van Straten, and Warmerdam [
13] suggest that group CBT for depression might be slightly less effective than individual therapy in the short-term. A recent review by Okumura and Ichikura [
14] extended previous meta-analyses in several respects by comparing group CBT for depression with different levels of treatment intensity as described in stepped care models for depression [
4]. Their meta-analysis of 35 studies showed that group CBT was superior to non-active controls (SMD = -.68) and that there was a small but non-significant advantage of group CBT above middle-intensity interventions (SMD = .21).
Concerns have been raised as to whether the findings from research studies can be generalized to routine clinical practice. In this context, it is common to distinguish between the efficacy and the effectiveness of a treatment [
15,
16]. Efficacy refers to the results achieved in research trials, whereas effectiveness is understood as the therapy outcome in routine practice. The primary goal of research trials is to establish a causal relationship between a given treatment and an outcome (internal validity). In research trials, participants are often selected patients and are treated by trained therapists who follow treatment manuals strictly, receive regular supervision and whose treatment adherence is closely monitored [
15]. In contrast, routine clinical practice is characterized by unselected patients, high therapist caseloads, and flexible use of treatment protocols. It has been suggested that due to strict exclusion criteria, patients participating in clinical trials are not representative of patients typically seen in clinical practice in terms of severity and comorbidity, compromising the generalizability of RCTs (external validity) [
17-
20]. However, recent studies report only minor differences in clinical characteristics between patients participating in RCTs and patients seen in clinical practice, which may indicate more liberal inclusion criteria in more recent RCTs [
21-
23]. Due to practical and ethical reasons, randomization of patients to an active or non-active control condition is often not feasible in ordinary clinical settings, and some authors have argued that randomization is not representative of clinical practice [
24]. Finally, due to publication bias, the effects of treatment for depression found in research trials may be overestimated [
25]. Therefore, research on the effectiveness of treatment in routine clinical practice is needed. Although effectiveness studies do not typically have a control group and are therefore unable to establish causal relationships, they may provide valuable information about a given treatment.
Several studies have investigated the effectiveness of CBT for adult depression in routine practice e.g., [
26,
27]. Recently, Hans and Hiller [
24] conducted a meta-analysis of these studies. To define the clinical representativeness of a study, the authors suggested the following criteria based on the work of Shadish and colleagues [
28,
29]: (a) non-university setting; (b) referred patients; (c) professional therapists with regular caseloads; (d) flexible structure; (e) no monitoring of treatment implementation; and (f) no therapist training for study purposes. A total of 34 studies (1,880 patients) were included in the analyses. Hans and Hiller [
24] found an average pre-post effect size of 1.13 for treatment completers and 1.06 for intent-to-treat analysis in reducing depression severity. There were no significant differences between individual and group therapy in this regard. Effect sizes between 0.67 and 0.88 were found for secondary outcome measures (e.g., dysfunctional cognitions, anxiety). The mean dropout rate was 25% and was significantly higher in individual (on average 42%) than group CBT (17%). Hans and Hiller [
24] concluded that outpatient individual and group therapy for depression is effective in routine clinical practice. However, the authors characterized their findings as preliminary as the number of available studies was low and samples sizes were often small.
Thus, the purpose of the present study is to add to the knowledge base about the effectiveness of treatment for depression in routine clinical practice settings. In this study, we retrospectively evaluated the effectiveness of group CBT treatment administered in a specialized psychiatric outpatient clinic; the Beck Depression Inventory (BDI) [
30,
31] were used to assess patients' depression severity before treatment, during treatment, after treatment, and at 3-months follow-up. In addition, the current study aimed to investigate the pattern of patient dropout from treatment and differences between patients who benefit from the treatment and those who do not respond to the intervention.
Results
Analysis of dropouts and representativeness of the outcome sample
As mentioned above, 25 (17.5%) of the 143 patients who started the treatment dropped out. For dropouts, the demographic characteristics of age and sex and the diagnosis were collected and available for analyses. The mean age of patients who dropped out was 38 years (SD =11.4), and 18 (72%) were female. Patients who dropped out attended an average of 4.5 sessions (SD =2.8). Age, sex, diagnosis, and pre-treatment scores of the BDI-II and BAI of treatment completers versus dropouts were compared. There was a tendency (p =0 .083) for dropouts to be younger than treatment completers, but no significant differences between completers versus dropouts were found with respect to sex, diagnosis, and BDI-II and BAI scores at pre-treatment. Participants dropped out for a variety of reasons, including the need for inpatient or individual treatment (n =12), symptom reduction (n =3), disagreement with the therapist (n =2), absence due to family problems (n =2), sexual harassment of a group member (n =1), pregnancy problems (n =1), somatic illness (n =1), meeting of an acquaintance in the group (n =1), and unknown (n =2). Fourteen of the dropout patients received an alternative treatment. Treatment completers attended an average of 12.1 session (SD =1.7, range =9 - 15). Approximately one quarter of the participants (27%) attended all sessions.
To examine the representativeness of the sample, treatment completers (n =88) with and without BDI-II scores available at pre-treatment and post-treatment or follow-up were compared with respect to sex, age, diagnosis, and BDI-II and BAI scores at pretreatment. No significant differences were found between treatment completers with and without BDI-II scores, indicating that the patients included in the following analyses are representative of all treatment completers.
Effect of treatment
The means, standard deviations, and percentage of missing data at the four time points are displayed in Table
2. Little's MCAR test was non-significant, χ2(72) =66.56,
p = .66, supporting the assumption that data were missing at random, which is a prerequisite for multilevel modeling and multiple imputation [
38,
43]. As shown in Table
2, except for the BDI-II at pre-treatment, there were missing data at every time point, ranging from 17% (BDI-II at mid-treatment) to 42% (BAI at post-treatment). Approximately two third of cases (64.3%) had missing data for at least one time point; in total 24.7% of the outcome values were missing. The reasons for missing data could not be determined from the electronic record system.
Table 2
Descriptive statistics for the BDI-II and the BAI at pre-treatment, mid-treatment, post-treatment, and follow-up
BDI-II | 28.52 (10.42, 2 - 53) | 0 (0) | 23.03 (11.24, 0 - 51) | 15 (17) | 18.53 (11.09, 1-44) | 18 (20.5) | 18.26 (12.24, 0 - 53) | 19 (21.6) |
BAI | 19.07 (13.09, 0 - 58) | 21 (23.9) | 17.60 (11.73, 0-54) | 33 (37.5) | 14.12 (11.43, 0 - 45) | 37 (42) | 14.74 (12.23, 0 - 46) | 31 (35.2) |
The average BDI-II scores decreased from 28.5 to 18.5 from pre-treatment to post-treatment and remained stable at follow-up (18.2). Mixed-level analysis showed a significant linear effect of time on depression, F(1, 272,98) =66.26,
p < .001. The linear effect of time on anxiety was also significant, F(1, 215,58) =8.71,
p < .01. There were no differences between treatment groups. Effect sizes for the differences between the pre-treatment scores and patients' scores at the three other time points are shown in Table
3. The table contains effect sizes based on available data in addition to effect size estimations using multiple imputation of missing data. Applying Cohen's [
48] criteria (d = .2: small effect; d = .5: medium effect; d = .8: large effect), the effect sizes for depressive symptoms based on available data at post-treatment (d = .97) and follow-up (d =1.10) indicate a large effect, and the effect sizes for anxiety indicate a moderate effect (d = .52 and d = .50, respectively). There were only minor differences in effect size estimations between those based on actual data versus data including multiple imputations of missing data.
BDI-II | 0.59 (73) | 0.53 | 0.97 (70) | 1.00 | 1.10 (69) | 1.07 |
BAI | 0.17 (54) | 0.16 | 0.52 (51) | 0.49 | 0.50 (54) | 0.43 |
Treatment response
Application of the Jacobson and Truax [
45] formula resulted in cut-off scores indicating a reliable change in symptom severity of 10 for the BDI-II and 10.88 for the BAI. Cut-off scores for the normal range of the BDI-II and BAI were 16.66 and 9.26, respectively. The latter value for the BDI-II is slightly higher than the cut-off scores for the BDI reported by Seggar et al. [
47] and others [
49], which typically range from 13 to 15. A probable explanation for the difference in the cut-off scores is that the BDI scores are, in general, somewhat higher than the BDI-II scores according to the adjustment table in the BDI-II manual. Patients scoring in the normal range of the BDI-II and BAI at pre-treatment were excluded from the analyses.
Available data on depression severity at post-treatment (n =61) showed that 2 patients (3.3%) had deteriorated, 32 (52.5%) remained unchanged, 9 (14.8%) had improved, and 18 (29.5%) had recovered after treatment. At follow-up (n =63), 1 patient (1.6%) had deteriorated, 26 (41.3%) remained unchanged, 11 (17.5%) had improved and 25 (39.7%) had recovered compared to treatment start. With respect to anxiety (n =39), 1 patient (2.6%) had deteriorated, 26 (66.7%) remained unchanged, 11 (28.2%) had improved, and 1 (2.6%) had recovered at post-treatment. At follow-up (n =43), 3 patients (7%) had deteriorated, 28 (65.1%) remained unchanged, 6 (14%) had improved, and 6 (14%) had recovered.
Predictors of treatment effects
To investigate the characteristics of treatment responders, patients who showed reliable improvement (including recovery) were compared to the group of patients who had either no significant positive change or had deteriorated at post-treatment. The two groups were compared on all available demographic and clinical characteristics (i.e., age, sex, partner status, education, working, first diagnosis, and number of diagnoses), pre-treatment scores on the BDI-II and BAI, and the number of sessions attended. There was a tendency for patients who benefited from the treatment to have higher scores on the BDI-II at pre-treatment compared to those who did not benefit (32.59 and 28.61, respectively, p = .098). However, there were no significant differences on the remaining variables examined between treatment responders and non-responders.
Discussion
The aim of the present study was to examine the effectiveness of group cognitive behavioral therapy for depression in a routine care setting and to explore predictors of treatment dropout and response. The routine care setting - a rural outpatient clinic - meets Hans and Hiller's [
24] criteria for clinical representativeness.
The results showed a significant reduction in depression and anxiety among patients who received group CBT. The observed treatment gains were maintained at 3-months follow-up. The effect sizes of group CBT for depression were large (d = .97 and d =1.10 at post-treatment and follow-up, respectively) and were similar to the results reported in Hans and Hiller's [
24] meta-analysis of the effectiveness of outpatient CBT for depression (d =1.13), adding further support to their findings. In contrast, the effect of group CBT on the severity of anxiety symptoms was only moderate, suggesting that the treatment effect may be specific to depressive symptoms resulting in more positive outcomes for the targeted problem. In terms of clinical significance, the results showed that approximately 44% of the patients saw a significant improvement in depression severity at post-treatment, including approximately 30% who recovered. At follow-up, the proportion of patients who improved and recovered increased to 57% and 40%, respectively. Thus, a considerable number of patients benefited from the treatment. However, effect sizes in the present study are lower than the results reported from efficacy studies. For example, Teri and Lewinsohn [
50] and Neimeyer, Kazantzis, Kassler, Baker, and Fletcher [
51] found effect sizes of 1.93 and 1.19 for group CBT for depression, respectively. The lower effect sizes found in this study are in accordance with previous studies showing significantly lower effect sizes for treatment of depression in routine care settings compared to research trials [
24,
52]. On the other hand, the response rates at follow-up are comparable to those found in efficacy studies or effectiveness studies conducted in university settings. According to Keitner, Ryan and Solomon [
53], in efficacy studies, 50-58% of depressed patients respond to and 30-48% recover after psychotherapy. Peeters et al. [
54] found a remission rate of 37% after 26 weeks of individual cognitive behavioral therapy for depression using the BDI-II to assess outcomes. Unfortunately, the cut-off values on the BDI-II used to define response and remission vary between studies, making direct comparisons difficult. For example, Peeters et al. [
54] used a more conservative BDI-II cut-off score of 10 to distinguish between the normal and clinical range. In the current study, neither demographics nor pre-treatment scores on the BDI-II or BAI predicted treatment response. The finding that age and sex are unrelated to treatment outcomes has been reported previously [
55], but some studies have found that older age is associated with a poorer outcomes [
55]. There was a tendency (
p < .10) for patients with higher BDI-II scores at pre-treatment to have greater treatment gains. This finding is in line with the findings of Schindler, Hiller and Witthöft [
56]. However, Organista, Munoz and Gonzalez [
57], Merrill et al. [
26], and Teri and Lewinsohn [
50] reported that lower initial BDI scores predicted greater improvement. Surprisingly, treatment response was not predicted by the length of treatment, suggesting that a time frame of 12 sessions may be sufficient.
The dropout rate (17.5%) for patients in the present study was somewhat lower than the rates found in both the Hans and Hiller [
49] meta-analysis (24.6%) and the Neimeyer et al. [
51] and Peeters et al. [
54] studies (23.9% and 28%, respectively); however it was higher than in other investigations, e.g. the Teri and Lewinsohn [
50] study (8%). Age, sex, diagnosis, and BDI-II or BAI pre-treatment scores did not predict patient dropout. These results are consistent with previous findings [
58,
59]. As in the Arnow et al. study [
59], there was a statistically non-significant tendency for dropouts to be younger in age.
Unfortunately, in the current investigation, there were only a few variables available to examine as predictors of dropout and treatment response. Other factors that have previously shown predictive value for outcomes in the treatment for depression (e.g., chronicity of problems [
55], normal personality traits [
60], personality disorders [
61], intelligence [
55], or attachment style [
62]) should, if possible, be included in future effectiveness studies.
As encouraging as the results demonstrating the effectiveness of group CBT for depression - delivered in a specialized routine care setting, mainly by psychiatric nurses - are, too many patients drop out of treatment or do not benefit from treatment. There is a need to improve the treatment of these groups of patients. Because many clinicians overestimate the impact of their interventions [
63], monitoring treatment outcomes and providing feedback to therapists may increase the effectiveness of treatment [
64]. Systematic assessment of patients' suitability for this type of treatment may also contribute to higher response rates [
65]. Finally, a combination of traditional CBT techniques and newer approaches to CBT (e.g., mindfulness-based CBT [
66] or meta-cognitive therapy [
67]) may enhance treatment effects.
The strengths of the present study are that a follow-up was included, diagnoses were established using a structured diagnostic interview, and appropriate statistical methods were used. On the other hand, effectiveness research faces challenges and involves limitations that also apply to the present study [
27]. Because there was no control group, the observed effects cannot be attributed to the treatment with certainty and may instead be attributed to the passage of time or regression to the mean. No data were collected after patients dropped out of treatment; therefore, intent-to-treat analyses could not be performed. The retrospective design of the current study poses additional problems and may be subject to potential biases. Only information already contained in patient records could be used. There was a high number of missing data points, and the quality of the patient records varied greatly. The exact reasons why data were lost are unknown. The missing data could have been due to clinicians not delivering the instruments to the patients or their failure to record patient results in their electronic record; alternatively, the patients may not have returned the inventories. There is a possibility that therapists may have chosen to not give the inventories to non-responders, which would bias the results. However, the results of Little's MCAR test suggest that the data were missing at random. Some demographic characteristics (e.g., marital status) were difficult to collect. More importantly, the information on the patients' use of medication, which was usually prescribed by the patient's general practitioner, was often inadequate, especially in the first years of the study period. It was therefore impossible to control for use of medication in the analyses, and the possibility that the observed changes in patient outcomes are due to the start of or a change in medication cannot be ruled out. However, in our experience, medication is rarely started or changed during group treatment. As is common in clinical practice, patients were selected for group treatment. Unfortunately, there were no available data for patients who were not offered group treatment or who dropped out before the start of group treatment. Thus, any possible selection bias could not be estimated. To overcome the problems inherent to a retrospective approach, we recommend a prospective design for future studies examining the effectiveness of psychotherapy in ordinary clinical settings. Such future studies could, for example, be conducted in conjunction with routine outcome monitoring [
68]. Further, a shortcoming of the present study is that two different versions of the BDI were used. In addition, a follow-up period of three months is too short to make conclusions about the long-term effect of the treatment. Finally, in this study, only symptom reduction was measured; however, gains in well-being and life functioning should also be part of treatment evaluation [
69] in future studies.
Acknowledgements
The authors want to thank the clinicians at the group treatment unit who provided the treatment and gave valuable comments on the project: Lesley Ann Smith, Ranveig Ersdal, Stein Feragen, Ørjan Svenøy, Wenche Nordnes, Sylva Krogh, and Trine Drage. The study was supported by research funds from the Helgeland Hospital Trust (to LA). The funding body had no influence on the study design, collection, analysis, and interpretation of data, the writing of the manuscript, and the decision to submit the manuscript for publication.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
LA initiated the study, collected data, and helped to draft the manuscript. JCT performed the statistical analyses and drafted the manuscript. Both authors read and approved the final manuscript.