Key findings
The present study provides, to our knowledge, the first economic evaluation of an intervention using ESM in patients with major depression. The results suggest that ESM-I is more expensive, but also more clinically effective than both treatment as usual and pseudo-intervention.
In the cost-effectiveness analysis and cost-utility analysis, ESM-I was the most optimal strategy when willingness to pay was over €3000 and €40,500, respectively. All sensitivity analyses except one were similar to the base-case analysis. That one exception, that is the analysis unadjusted for baseline costs, had lower willingness to pay, and a probability of cost-effectiveness at €50,000 of 58%. In addition, CEAC showed that ESM-I cost-effectiveness probability increased rapidly towards the most favourable treatment.
Furthermore, although costs are below the threshold set for a QALY (€50,000), such a threshold could not be defined for the HDRS. Therefore, we can only tentatively conclude that ESM-I is cost-effective.
Cost-effectiveness of ESM-I in real life major depression treatment
The present trial shows that ESM-I consisting of protocolled feedback delivered by a researcher has the potential to be cost-effective. When implementing ESM-I in real life treatment, feedback can be delivered directly to the patient and professional caregiver. Feasibility and cost-effectiveness are hypothesized to increase when the option of feedback provided by a third person (the researcher) is replaced with ESM-I feedback that forms an integral part of the treatment. ESM-I could then also be used to enrich psychological treatments such as cognitive behaviour therapy [
52] with daily life contextual information and to bring that therapy out of the mental health care setting into daily life. Our six-week ESM intervention has been shown feasible in outpatients with major depression [
17], but the feasibility of implementation in routine clinical practise is not yet established [
13].
Web-based feedback systems for ESM-I applications are under development. If such a web-based system allows individuals to navigate through their own feedback, this may facilitate implementation of the current ESM intervention by promoting easy access to and flexible use of feedback for patients as well as professional caregivers. This should be backed up by appropriate resources for professional caregivers including training, monitoring, and technical support [
53]. In addition, withdrawal of the professional caregiver and patient disengagement may be an important issue, requiring research to improve sustained use [
54].
Effects of ESM-I on depressive symptoms
The ESM-I group showed lower HDRS scores at 32-weeks than the two control groups, suggesting that ESM-I reduced depressive symptoms. However, although the economic evaluation showed that ESM-I may be cost-effective, in the accompanying regression analyses (HDRS and QALYs; Table
3), the difference between the ESM-I and the pseudo-intervention group was not statistically significant while the difference between ESM-I and control group was statistically imprecise by conventional alpha. The effect study, accompanying the present economic evaluation [
17], did show that allocation to ESM-I was associated with a statistically significant linear decrease in HDRS depressive symptoms over time that lasted throughout the study. This decrease was significantly stronger than in the control group to a degree that can be considered clinically relevant (difference > 3 HDRS units; [
46,
47]). The difference with the pseudo-experimental group was clinically relevant and borderline significant [
17]. For the regression analysis results accompanying the cost effectiveness results in the present paper, less data were used than in the original analyses which included all follow-up assessments. In addition, the original paper analysed subjects as randomized with available data while the present paper imputed data (using last observation carried forward).
Methodological considerations
The present study was limited to patients aged between 18 and 65 years (mean age 48 years) and more than 90% of the sample was from Dutch origin. ESM-I is designed to obtain insights in everyday life and, therefore, we recruited outpatients that could engage in ESM self-monitoring in their home environment. Outpatients were included in the study if they scored above remission level (HDRS > 7) at study entrance. This mild inclusion criterion, coupled with the time intensive nature of the study protocol (multiple visits to the researcher on top of an intensive intervention consisting of 6 weeks of self-monitoring), may have led to recruiting mainly participants in a mild to moderate depressive state. However, this may be a rather accurate representation of the population of patients with major depression, of which the majority experiences mild to moderate symptoms, and using higher HDRS cut-offs would compromise the external validity of the trial [
56,
57]. On the other hand, our sample was mostly recruited from specialised mental health care settings (approximately 20% was treated in primary care only), and had a diagnosis of major depression as well as current symptoms for which they were using antidepressants. Although the results may not be generalizable to all outpatients with major depression, they may be generalizable to outpatients with complex mental problems who are using antidepressants.
The present paper has several limitations. First, owing to the nature of the intervention, it was not possible to blind participants and the use of envelopes could potentially have led to biased allocation. However, given that care-providers were not involved in the randomization process and most envelopes were drawn from a distance, with one researcher drawing an envelope for another researcher, it is unlikely that subversions to the procedure took place. Researchers conducting the post-intervention assessments were also not blind to treatment allocation due to resource constraints. Thus findings may reflect a placebo response. However, the effect study [
17] showed that directly after the six-week intervention, the decrease in HDRS ratings was similar in the ESM-I group and the pseudo-intervention group, while in the pseudo-intervention group effects did not appear to persist during the full 32-weeks of the trial. It is often assumed that placebo effects in depression do not persist in the long run [
58]. Although, this belief has been falsified [
58], the difference in persistence between the pseudo-intervention group and the ESM-I group may evidence that it is unlikely that our findings are completely attributable to a placebo effect. The improvement in the ESM-I group showed a persistent, steady and clinically relevant growth over time in the full 32 weeks, further making the possibility of a placebo effect even more unlikely.
Second, all three treatment arms were embedded in an extensive research protocol, including regular assessment of depressive symptoms and two five-day ESM assessments. Besides treatment effects, patients may have had non-specific benefits from self-monitoring. Therefore, what has been called treatment as usual in the present paper, strictly is not. ESM-I may be even more cost-effective when compared with true treatment as usual.
Third, we used the human capital method rather than the friction costs method to calculate work absence costs, because the PRODISQ absenteeism module only asked number of absent days during a period of 3 months, while friction period was longer at the time of data collection (approximately 5 months) [
37]. Therefore, end of the friction period could not be identified.
Fourth, sampling uncertainty was estimated using the non-parametric bootstrapping approach. Alternatively, another common approach for the handling of trial-based data would have been to estimate the mean total costs per treatment condition using a GLM that assumes a Gamma distribution for costs (i.e., to accommodate the skewness in the distribution of costs). This would also allow for the regression-based adjustment of cost estimates through the inclusion of possible covariates in the GLM. It could therefore be considered a limitation that non-adjusted costs were reported.
Fifth, sample sizes for the present study were rather small. Results need to be replicated in studies with larger sample sizes. However, other economic evaluations are also performed using small sample sizes. Sensitivity analyses and bootstrapping are required to correct for sampling uncertainty and to prevent chance findings. As expected, costs were not normally distributed and, therefore, a condition for regression was not met (normal distribution of the residuals). We therefore performed non-parametric bootstrap resampling. However, baseline costs were also skewed and had outliers, and regressing baseline costs onto total costs [
49] resulted in non-normal distribution of residuals, even after transformation to the natural logarithm. Several methods to deal with the problem of outliers have been advocated [
49]. However, removing various percentages (2, 5, 10, 20, or 30%) of observations at the extremes, resulted in non-normally distributed residuals and in inconsistent regression coefficients of baseline costs (B = 0.84, 0.82, 0.79, 0.72, 0.66, respectively; base-case analyses: B = 0.86). Therefore, the best option to correct for baseline costs [
49] was impossible in the present data, and it is most prudent to perform the delta method to control for baseline costs rather than regression-based adjustment [
49] (see also
Methods).
Furthermore, we chose for easy methods to deal with missing data because the number of missings was limited. The proportion of missing values was not significantly associated with treatment allocation, nor with baseline and previous observed depression scores or baseline demographics. Last observation carried forward, next observation carried backward, and mean imputation have been shown to perform as good as multiple imputation [
59].
Finally, of all parameters that we varied in the sensitivity analyses, correction for baseline costs was the only factor that changed the willingness to pay, but probability of cost-effectiveness at the a priori threshold of €50,000 remained similar to the base-case analysis. Correction for baseline costs is relatively new in economic evaluations, in contrast to epidemiology and statistics, were controlling for baseline differences is standard practise to get valid results [
60]. The present results show that the impact of controlling for baseline may be considerable and suggest that, as in other fields of research, results without baseline correction may be invalid.