Before initiation of the project, a search in relevant databases (including PROSPERO) showed no prior or ongoing systematic review of this subject. This systematic review protocol has been reported according to the Preferred Reporting Items for Systematic Review and Meta-analysis Protocols (PRISMA-P) guidelines [
78] (see Additional file
1). Accordingly, the protocol for this study was published in the International Prospective Register of Systematic Reviews database (PROSPERO) [
79] on 13th February 2019 (PROSPERO CRD42019115705). Should any amendments to this protocol be necessary, they will be documented on the PROSPERO platform. The systematic review and network meta-analysis itself will be presented according to the PRISMA Extension Statement for Reporting of Systematic Reviews Incorporating Network Meta-analyses of Health Care Interventions [
80].
Data items
For the calculation of relative treatment effects group means, corresponding standard deviations and group sizes will be extracted primarily. In case one of these values was missing, other statistical data that can be converted into means and standard deviations will be extracted. Conversions will be calculated according to formulas provided, e.g., [
98,
99]. If standard deviations cannot be calculated from the available study information, we will impute them using the standard deviations reported in the other included studies [
100]. We will conduct sensitivity analyses excluding studies in which standard deviations had to be imputed. If the
N was missing in the table of analysis, we will use the
N of the descriptive statistics. If studies report medians and interquartile ranges, a normal distribution will be assumed, if not indicated otherwise, to convert these values to means and standard deviations [
98]. If studies only report adjusted outcome values, data will be extracted, but sensitivity analyses will be calculated without these studies to check for possible bias. We plan to extract the effect size provided by the study authors only if no other information was available for effect size calculation. If it is not possible to impute appropriate measures for the calculation of effect sizes, and no effect sizes are reported we will contact the authors.
Among others, the following information will be extracted from each study:
-
Information on the study itself (e.g., title, publication date, authors)
-
Methods (e.g., objective, design, number of participants included in the analysis)
-
Risk of bias assessment (Cochrane revised risk of bias tool) [
101]
-
Setting (non-clinical vs. clinical, inpatient vs. outpatient)
-
Participants (i.e., mean age, inclusion and exclusion criteria, severity of depression, diagnostic tool)
-
Intervention (i.e., frequency, intensity, duration, type of exercise)
-
Comparisons (comparator conditions)
-
Outcomes (primary and secondary outcomes, adverse events)
-
Results (mean and standard deviation of outcomes pre- and post-intervention as well as follow-up)
-
Self-report vs. observer rating
-
Duration of follow-up
Data synthesis
Data will be synthesized descriptively. A summary table of included studies will entail information on the authors, population characteristics (diagnostic criteria, baseline severity of sleep quality, depression, age, and numbers), interventions (exposure in each group), outcomes measures used, and results (sleep quality, sleep duration). Network meta-analysis will be performed. Statistical (number of studies and heterogeneity of results), clinical (heterogeneous populations), and methodological (low quality of trials or follow-up duration) aspects will be considered to decide whether network meta-analysis is valid. If network meta-analysis results must be deemed methodologically inaccurate, a pairwise meta-analysis will be considered. Should a pairwise meta-analysis also not be possible, studies will be summarized narratively.
The package netmeta [
111] for the open-source software environment R [
112] will be used to calculate network meta-analyses within a frequentist framework.
A network will be created including all available jointly randomizable treatments. We assume that any patient that meets all inclusion criteria is likely, in principle, to be randomized to any of the interventions in the synthesis comparator set.
We will address the assumption of transitivity which underlies network meta-analysis [
113], by (1) assessing whether the included interventions are similar across studies using a different design, and (2) checking whether the distribution of potential moderators is balanced across comparisons [
114]. A priori we have defined depression severity, comorbidities, age, and gender as potential effect modifiers and will evaluate the comparability of the respective characteristics across comparisons qualitatively.
We expect considerable diversity of outcome measures and will, therefore, calculate standardized mean differences (SMD) using Hedge’s
g with 95% confidence intervals [
115]. SMD is the mean difference between groups divided by the pooled standard deviation. The effect size measure allows comparison of effect sizes across similar measurements of a single outcome. The conventional and somewhat arbitrary classification of SMD proposed by Cohen (1988) [
116] has been expanded to include very small (.01), small (0.2), medium (0.5), large (0.8), very large (1.2), and huge (2.0) effect sizes [
117]. Random-effects pairwise SMDs across studies will be calculated based on the available comparisons between treatment and comparator treatments [
118]. Inverse variance weighting is used for pooling. In addition, indirect evidence will be estimated using the entire network of evidence. Random-effects netmeta accounts for dependencies between comparisons in case of multi-arm trials [
119]. The command pairwise will be used in case of multi-arm trials, in order to transform the dataset to the comparison level, which is needed for conducting the network meta-analysis.
The primary outcome will be SMD of sleep quality assessed via self- or observer-reported measures. If more than one primary outcome is reported, the most frequently used scale will be included in the analysis to reduce between-study heterogeneity. If possible, we will assess the association between instruments and changes in sleep quality. Two individual analyses will be run for the outcome data at the end of treatment, and the last available follow-up. Separate network meta-analyses will be conducted for secondary outcomes if possible. Results from network meta-analysis will be presented as summary SMD for each possible pair of treatments. Whenever possible, measures of uncertainty will be reported in the form of the 95% confidence interval and 95% prediction interval.
To calculate statistical heterogeneity between studies on the pairwise level, the
Q statistic will be used [
89]. Further
τ2 will be analyzed to estimate the variance caused by the distribution of the true study means [
120].
I2 will be evaluated to indicate the amount of observed variance that can be attributed to between-study heterogeneity [
121].
I2 and the corresponding confidence interval can be interpreted as the percentage of overall heterogeneity that is due to variation of the true effects. An
I2 value of 0% to 40% might not be important, 30 to 60% may represent moderate heterogeneity, 50 to 90% may represent substantial heterogeneity, and 75 to 100% considerable heterogeneity [
89]. In NMA, we will assume a common estimate for the heterogeneity variance across the different comparisons.
Local and global methods will be used to detect inconsistency [
122]. The presence of inconsistency will be evaluated using the following approaches: (1) locally using the netsplit command (i.e., testing the difference between estimates derived from direct evidence and estimates derived from indirect estimates for statistical significance) and (2) globally using the decomp.design command (i.e., using the design-by-treatment interaction model). For this purpose, the total
Q statistic (i.e., the measure of total heterogeneity/inconsistency in the network) will be decomposed to an inconsistency factor (between designs) and a heterogeneity factor (within designs). We will compare the magnitude of heterogeneity between consistency and inconsistency models to determine how much heterogeneity will be explained by inconsistency. We will do this by testing the residual inconsistency, which remains under the assumption of a full design by treatment interaction model for statistical significance.
In the case of statistical heterogeneity or inconsistency between results from individual studies, we will investigate the potential impact of the following trial-level effect modifiers: (1) year of publication, (2) study precision (i.e., sample size), (3) studies reporting non-adjusted vs. adjusted means, (4) studies with imputed standard deviations vs. studies which reported standard deviations. If the number of studies allows it, theoretically driven subgroup analyses will be done according to population (e.g., severity of depression), duration of intervention, duration of follow-up, outcome characteristics (i.e., self- vs. observer ratings, objective vs. subjective sleep duration), and methodological quality.