Background
Depression is a debilitating long-term health condition that is one of the leading causes of global disease burden [
1,
2], and its management presents a major challenge to health care providers worldwide. As part of an emerging trend to utilise mobile devices in health care (mHealth) [
3], ubiquitous mobile technologies such as short message service (SMS or text messaging) may offer a cheap and straightforward support tool to monitor outcomes in clinical care and self-management of depression and other chronic health conditions [
4]. Text messaging has already been studied in the management of diabetes [
5‐
7], asthma [
8‐
10], lower back pain [
11‐
13] and irritable bowel syndrome [
14] for example, as well as in the support of long-term health behaviour change interventions such as weight loss [
15,
16] and smoking cessation [
17]. While the importance of validating health outcomes collected by text messaging has been recognised, few of the studies using SMS technology have implemented this [
18,
19].
Within mental health, research has primarily focussed on utilising text messaging for the management of bipolar disorder and schizophrenia. Feasible symptom monitoring was demonstrated when gathering weekly responses of validated questionnaires for depression and mania from bipolar patients [
20] and when collecting daily outcomes on several symptom dimensions from patients suffering from schizophrenia [
21]. Furthermore, when employed as a low level intervention in schizophrenia, customised daily text prompts for different illness aspects improved outcomes in those areas [
22], and weekly monitoring of early warning signs by patients and relatives improved rates of relapse and hospital readmission [
23].
Until recently, only a small number of studies with few participants had looked specifically at the possibility of collecting depression outcomes by text message. A single item SMS subjective distress rating (scale 0 to 10) was used for daily mood monitoring in patients with anxiety or depression in a remote Australian community during and after treatment [
24], and a daily SMS mood score (scale 1 to 9) was collected as an adjunct to cognitive behavioural therapy (CBT) for outpatients from different ethnic groups in the United States [
25‐
27]. These studies found mood data collection by SMS feasible, acceptable, and predictive of PHQ-9 [
28] depression scores. This has been further confirmed in a sub-study of the UK ACUDep trial [
29], which collected weekly depression scores (scale 1 to 9) by text message from over 500 depressed adult participants during the first 3 months of trial follow-up [
30]. The study demonstrated good response rates (94 % of patients responded to at least one text prompt, and patients replied to an average of 12.5 (SD = 3.45) of 15 texts), the depression rating correlated well with the PHQ-9 measure of depression (Kendall’s tau-b = 0.570), and SMS depression scores were sensitive to change in response to the trial treatments.
Monitoring patient depression with such a simple, single SMS text score instead of the administration of lengthy questionnaires represents an attractive mode of data collection in view of compliance rates and patient burden. This is in line with other efforts to condense the measurement of depression into one or two items for the purpose of efficient patient screening and monitoring [
31‐
34]. The choice between long and short form assessment tools will depend on the context and purpose of the evaluation, balancing ease of data collection with the need for robust clinical diagnoses [
35]. It remains unknown whether a single SMS depression score, as used in the ACUDep trial, can be considered a valid measure of depression and could consequently be recommended for use in research and evaluation in clinical practice.
The present study therefore aimed to establish the validity of the ACUDep SMS depression score (termed R-SMS-DS [
30]), by employing item response theory methodologies. If scores obtained for the R-SMS-DS and the PHQ-9 both measure the same latent depression variable, then this could be confirmed by including all individual items in a factor analysis. The PHQ-9 has variously been shown to be either uni-dimensional in primary care patients [
36‐
39], or to divide into an affective and somatic dimension in certain patient populations [
40‐
43]. It was of interest whether R-SMS-DS scores would align with either one of these dimensions if present in the ACUDep patient sample.
Depression prevalence, symptomatology and trajectories are known to differ between men and women [
44‐
46] as well as over the course of life [
45,
47]. Although the reasons for these disparities remain debated, they may be connected to differential use of health care systems [
48] and important aspects of depression treatment [
49]. It is therefore important that these demographic groups do not differ in the way they use the R-SMS-DS, and score differences between individuals only reflect variations in their respective levels of depression [
50]. Therefore the present study also aimed to assess any response bias for the R-SMS-DS with respect to age and gender. The absence or presence of such biases will provide evidence for the relative impact of these factors on the measurement of depression with the R-SMS-DS, before it can be considered to inform valid treatment decisions in clinical practice.
Results of this study were anticipated to inform recommendations for whether and how the increasing number of research studies using mHealth technologies for patient monitoring should incorporate these tools and their validation into their study designs.
Discussion
The present study set out to validate a single depression rating item submitted by SMS text message (R-SMS-DS) against data of the widely validated PHQ-9 concurrently collected by post, which were available for a depressed adult sub-population of the UK ACUDep trial. R-SMS-DS scores were found to correlate well with latent depression when included in a combined single-factor solution explanatory factor analysis with the individual PHQ-9 items. The most closely associated PHQ-9 items were the two core DSM-IV criteria of depressed mood and anhedonia as well as feeling bad about oneself. The correlations closely mirrored those observed for a single-item paper based depression severity rating when correlated with DSM-IV criteria in a population of psychiatric outpatients undergoing treatment for major depression [
32]. With the exception of sleep and psychomotor disturbances, item correlations were larger when patients completed the two assessments closer in time, therefore results suggest that the R-SMS-DS score did indeed measure depression as desired.
While the optimal one-factor model in this study lent further support to the uni-dimensionality of the PHQ-9, it was unsurprising to find that R-SMS-DS ratings aligned with the affective rather than somatic dimension of depression in the pre-specified two-factor analyses. This raises the possibility of complementing the R-SMS-DS with one or more physical symptom questions if monitoring of the somatic depression dimension is additionally desired. Sleep, fatigue and appetite were picked up as core somatic symptoms in line with all previous studies of a two-dimensional PHQ-9 structure. Interestingly, a model with these three symptoms alone forming the somatic dimension (found in selected previous research [
40,
42,
61]) was supported in patients who had both valid PHQ-9 data and patients with valid PHQ-9 and any R-SMS-DS data; whereas the most commonly observed two-factor structure [
40,
41,
43,
62] with the additional two somatic items of concentration difficulties and psychomotor disturbance was only observed in the sub-set of patients whose PHQ-9 and R-SMS-DS responses were closer in time (within 6 days). The possible loading of anhedonia on the somatic dimension for these patients had previously only been recorded in one study of spinal cord injury patients at a single long-term follow-up point [
40]. Patient characteristics in terms of demographics and baseline depression did not appear to differ for patients in this group, so it may be the result of differences in other patient characteristics, such as present comorbidities affecting the rating of somatic symptoms. Alternatively the model factors may be less stable in this group as the smallest analysed sub-sample.
Consistent use of the R-SMS-DS was demonstrated across men and women. However, older patients were found to be less likely to endorse higher scores even when their degree of latent depression (as defined by the PHQ-9) was indicative of such an elevated level. This could be a result of a different understanding of the ‘feeling depressed’ terminology used in the text message, which has been discussed in the epidemiological literature of depression both as a shift towards a more somatically driven concept or as confounding with other somatic morbidities [
63,
64]. Further reasons could be different attitudes towards communicating mental wellbeing by mobile technologies or a greater reluctance to potentially arouse cause for concern. Such age bias could affect the sensitivity of the R-SMS-DS score if used for depression screening, however it is unlikely for that to be its primary use. We envisage the R-SMS-DS as a monitoring tool for patients who have already undergone formal depression assessment. The direction of the age bias was opposite to that identified in a sample of UK primary care patients for the PHQ-9 items of low mood and anhedonia for patients aged 55 and over [
65]. It remains possible that the observed bias in this study is a consequence of the relatively small total sample size or the small number of older patients in the sample. While we used age as a continuous predictor, the number of patients for whom the effect was identified based on marginal effect plots was rather low (
n = 8 participants ≥ 65 years, 2.4 %). Moreover, the magnitude of the association between age and R-SMS-DS score (OR = 0.98) was only weak [
66], and the effect size in terms of pseudo R
2 [
67] was negligible. The stability of this bias remains to be confirmed in a larger patient sample including a qualitative assessment of possible reasons.
Overall, results of this study add further support to the validity of collecting depression severity outcomes by SMS, which had already been shown to be feasible and acceptable in adults with ongoing depression in primary care in the ACUDep trial [
30]. To our knowledge, this is the first study aiming to validate an SMS self-report tool for depression using item-response theory methodologies, and results are strengthened by the use of a gold standard validated patient self-report depression instrument (PHQ-9) based on DSM-IV criteria for comparison. Despite the relatively small sample size of this study, patients agreeing to submit weekly text messages and who were included in the present analyses were representative of those taking part in the ACUDep trial (Table
1), who in turn were typical of adults in the UK with ongoing depression in primary care.
However, findings cannot be extrapolated to patients who are presenting with depression for the first time or who do not consult in primary care at all. A further limitation includes the temporal difference between PHQ-9 and R-SMS-DS data completion, which had not been designed to be collected concurrently, resulting in considerable between-patient variability in the time between completing the assessments. In addition, the reference time frame differed for the two measures (PHQ-9: over the last two weeks; R-SMS-DS: average over the last week), therefore it is not certain whether patients were in the same mental state when reporting those outcomes. Indeed the positive findings of this study may only represent a conservative estimate of the level of association. However, the depression outcomes linked with one another in this study were patient reported only, and no independent assessment was carried out in order to confirm clinical validity. Moreover, only the association between R-SMS-DS and a single screening tool (PHQ-9) has been demonstrated so far, and further convergent validity needs be shown in order to establish the R-SMS-DS as a valid estimate of latent depression. Capturing the full multi-faceted nature of depression will never be possible by a single item, and this is not the aim of the R-SMS-DS monitoring tool.
For future studies we suggest to include at least one assessment that allows researchers to test the concurrent validity of their novel electronic or mHealth tools with a gold standard instrument collected at the same time, an approach that has not yet been widely adopted. The shortcomings of this study could be addressed by a more controlled, dedicated design, either as standalone work or embedded in larger investigations, with particular attention to the magnitude and context of any response bias. The successful use of tools from the framework of item response theory for the validation of SMS scores at a single time point might also be extended to investigate the longitudinal validity of the R-SMS-DS scores, which had been collected weekly over 3 months. Notwithstanding such further methodological work, we believe that findings from the present and a previous study [
30] have provided sufficient evidence for the feasibility, acceptability and validity of the R-SMS-DS for monitoring depression in the ACUDep study population. Given these findings, we encourage investigators and clinicians to incorporate the R-SMS-DS as a free to use outcome measure in the study of depression management in different clinical populations. If verified against other validated depression measures and found acceptable in different clinical contexts, the R-SMS-DS could be considered for use in routine clinical practice.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
HMP was the chief investigator of the ACUDep trial, conceived the study design and helped to draft the manuscript. SJR was the trial manager for ACUDep. He conceived, developed and led the SMS text messaging sub-study. AK performed the statistical analysis and drafted the manuscript. JRB conceived the study methodology, oversaw the data analysis and helped to draft the manuscript. TJC conceived and advised on the study methodology. All authors contributed to the manuscript and read and approved its final version.