Scolaris Content Display Scolaris Content Display

Exercise for women receiving adjuvant therapy for breast cancer

Collapse all Expand all

Abstract

Background

A huge clinical research database on adjuvant cancer treatment has verified improvements in breast cancer outcomes such as recurrence and mortality rates. On the other hand, adjuvant and neoadjuvant therapy with chemotherapy and radiotherapy impacts on quality of life due to substantial short‐ and long‐term side effects. A number of studies have evaluated the effect of exercise interventions on those side effects. This is an updated version of the original Cochrane review published in 2006. The original review identified some benefits of physical activity on physical fitness and the resulting capacity for performing activities of daily life. It also identified a lack of evidence for other outcomes, providing clear justification for an updated review.

Objectives

To assess the effect of aerobic or resistance exercise interventions during adjuvant treatment for breast cancer on treatment‐related side effects such as physical deterioration, fatigue, diminished quality of life, depression, and cognitive dysfunction.

Search methods

We carried out an updated search in the Cochrane Breast Cancer Group Specialised Register (30 March 2015), the Cochrane Central Register of Controlled Trials (CENTRAL) (Issue 2, 2015), MEDLINE (1966 to 30 March 2015), and EMBASE (1966 to 30 March 2015). We did not update the original searches in CINAHL (1982 to 2004), SPORTDiscus (1975 to 2004), PsycINFO (1872 to 2003), SIGLE (1880 to 2004), and ProQuest Digital Dissertations (1861 to 2004). We searched the World Health Organization International Clinical Trials Registry Platform (WHO ICTRP) and ClinicalTrials.gov for ongoing trials on 30 March 2015. We screened references in relevant reviews and published clinical trials.

Selection criteria

We included randomised controlled trials that examined aerobic or resistance exercise or both in women undergoing adjuvant treatment for breast cancer. Published and unpublished trials were eligible.

Data collection and analysis

Two review authors independently performed data extraction, assessed trials, and graded the methodological quality using Cochrane's 'Risk of bias' tool. Any disagreements were resolved through discussion or by consulting the third review author. We entered data into Review Manager for analysis. For outcomes assessed with a variety of instruments, we used the standardised mean difference (SMD) as a summary statistic for meta‐analysis; for those assessed with the same instrument, we used the mean difference (MD).

Main results

For this 2015 update we included a total of 32 studies with 2626 randomised women, 8 studies from the original search and 24 studies from the updated search. We found evidence that physical exercise during adjuvant treatment for breast cancer probably improves physical fitness (SMD 0.42, 95% confidence interval (CI) 0.25 to 0.59; 15 studies; 1310 women; moderate‐quality evidence) and slightly reduces fatigue (SMD ‐0.28, 95% CI ‐0.41 to ‐0.16; 19 studies; 1698 women; moderate‐quality evidence). Exercise may lead to little or no improvement in health‐related quality of life (MD 1.10, 95% CI ‐5.28 to 7.48; 1 study; 68 women; low‐quality evidence), a slight improvement in cancer site‐specific quality of life (MD 4.24, 95% CI ‐1.81 to 10.29; 4 studies; 262 women; low‐quality evidence), and an improvement in cognitive function (MD ‐11.55, 95% CI ‐22.06 to ‐1.05; 2 studies; 213 women; low‐quality evidence). Exercise probably leads to little or no difference in cancer‐specific quality of life (SMD 0.12, 95% CI 0.00 to 0.25; 12 studies; 1012 women; moderate‐quality evidence) and little or no difference in depression (SMD ‐0.15, 95% CI ‐0.30 to 0.01; 5 studies; 674 women; moderate‐quality evidence). Evidence for other outcomes ranged from low to moderate quality. Seven trials reported a very small number of adverse events.

Authors' conclusions

Exercise during adjuvant treatment for breast cancer can be regarded as a supportive self care intervention that probably results in less fatigue, improved physical fitness, and little or no difference in cancer‐specific quality of life and depression. Exercise may also slightly improve cancer site‐specific quality of life and cognitive function, while it may result in little or no difference in health‐related quality of life. This review is based on trials with a considerable degree of clinical heterogeneity regarding adjuvant cancer treatments and exercise interventions. Due to the difficulty of blinding exercise trials, all included trials were at high risk for performance bias. Furthermore, the majority of trials were at high risk for detection bias, largely due to most outcomes being self reported.

The findings of the updated review have enabled us to make a more precise conclusion that both aerobic and resistance exercise can be regarded as beneficial for individuals with adjuvant therapy‐related side effects. Further research is required to determine the optimal type, intensity, and timing of an exercise intervention. Furthermore, long‐term evaluation is required due to possible long‐term side effects of adjuvant treatment.

PICOs

Population
Intervention
Comparison
Outcome

The PICO model is widely used and taught in evidence-based health care as a strategy for formulating questions and search strategies and for characterizing clinical studies or meta-analyses. PICO stands for four different potential components of a clinical question: Patient, Population or Problem; Intervention; Comparison; Outcome.

See more on using PICO in the Cochrane Handbook.

Exercise for women receiving chemotherapy or radiation therapy or both (adjuvant therapy) for breast cancer

What is the issue?

In the past, women receiving cancer treatment were usually advised to rest and avoid physical activity. But, we now know that too much rest and too little physical activity can lead to muscle wasting. This reduces women's physical fitness level and may limit their regular activities. Women also often have other side effects that can affect their daily lives, such as extreme tiredness (fatigue), depression, and reduced mental functioning, for example being able to remember things or keep focused.

Why does it matter?

The side effects of breast cancer treatment can interfere with daily activities and return to work. It is important to learn of ways to reduce these side effects.

We asked if physical exercise during chemotherapy or radiation therapy or both helped to reduce treatment side effects. Side effects studied included tiredness, depression, and reduced physical fitness and mental functioning. We also studied general effects such as health‐related, cancer‐specific, and cancer site‐specific quality of life. Questionnaires for cancer‐specific quality of life ask questions that are important for patients with cancer in general, for example about pain or nausea. Cancer site‐specific quality of life is measured with questionnaires that ask women with breast cancer about topics that are especially important to them, for example about breast symptoms or body image. We only included questionnaires that have been shown to be reliable.

We found 32 studies involving 2626 women. The included studies were published up through March 2015. Not all studies considered all of these potential side effects. Combining the results of these studies suggests that physical exercise probably improves physical fitness and slightly lessens fatigue. These studies also suggest that physical exercise probably results in little or no improvement in cancer‐specific quality of life and depression. Exercise may improve mental function and slightly improve cancer site‐specific quality of life, although the quality of the evidence was low for both of these outcomes. It may result in little or no improvement in health‐related quality of life, however the quality of evidence was low for this outcome. The quality of evidence may have been low because many of the studies did not have enough participants to observe small differences and because results may be biased due to people assessing the outcomes knowing which participants were in the control group.

Importantly, physical exercise did not harm most women. Very few women experienced discomfort or pain in their arms or legs.

What does this mean?

It appears that exercise during cancer treatment can help lessen fatigue and improve physical fitness. It probably results in little or no improvement in cancer‐specific quality of life and depression. It is unknown whether it helps for other side effects. At least nine current studies will help to answer the question if and how much exercise helps with the mentioned side effects and other side effects.

Authors' conclusions

Implications for practice

Exercising while receiving adjuvant treatment for breast cancer is a feasible, supportive self care intervention. Based on current evidence, exercise probably slightly reduces fatigue and improves physical fitness. It likely leads to little or no difference in depression and cancer‐specific quality of life. Women with breast cancer may benefit from exercise during adjuvant cancer treatment through improved cognitive function and slightly improved cancer site‐specific quality of life. Exercise may lead to little or no improvement in health‐related quality of life. Muscular strength and physical activity are probably improved by exercising. Several further outcomes such as shoulder mobility showed slight improvements, and several such as self esteem showed little or no difference, but the quality of the evidence was low.

Exercise adherence during cancer treatment constitutes a challenge, and thus attempts to foster exercise participation might enhance effectiveness. For behaviour changes to occur (the adoption of regular exercise in this instance), it is essential that intervention programmes focus on underlying principles from theories about why people change their behaviours. The social cognitive theory appears to be a promising theoretical framework for promoting exercise behaviour in women with breast cancer (Pinto 2002; Rogers 2004; Rogers 2005). The key construct in the social cognitive theory is self efficacy. Exercise self efficacy can be described either as the confidence to overcome barriers to exercise or as confidence in the ability to perform certain exercise tasks. Self efficacy has proven to be an important correlate of exercise among women with breast cancer. Exercise self efficacy among women with breast cancer during cancer treatment is reported to be lowest when women are nauseated, tired, not interested, lacking time, and lacking exercise enjoyment (Rogers 2006).

Future exercise interventions should target the exercise barriers. Exercise enjoyment, for example, may be addressed through picking up recent trends in the field of fitness, such as Pilates, Nordic walking, Tai Chi, step aerobics, and dancing, of course adequately adjusted to the needs and limitations of the target group. Group exercise or partner‐assisted exercises may also increase exercise enjoyment. Time management may be addressed by exercise classes taking place in different locations, choosing venues that are accessible by public transport, and by scheduling classes at various times in the day and evening.

Implications for research

At this stage there is still a lack of evidence for several relevant potential benefits of exercise as well as for harms. The increasing number of studies assessing the benefits and harms of exercise during adjuvant therapy is promising, but while study quality and reporting of studies have certainly improved since the first studies assessing this question, the quality of the evidence is still low for many outcomes. This is due in part to the difficulty of blinding participants and supervising personnel in studies with exercise as an intervention. For other factors that diminish the quality of evidence, such as lack of outcome assessor blinding, or the reporting of methodology and data, an improvement is feasible.

As described above, the actual training stimulus may substantially deviate from the assigned exercise regimen. In efficacy trials investigators need to ensure adherence to the intervention to determine whether exercise interventions in this population work. Exercise programmes should be designed to address exercise facilitators such as exercise enjoyment; this may be achieved by offering a variety of alternating exercise modes that assure an adequate training stimulus. Inclusion of sedentary participants only may be a way to deal with contamination issues, utilising the observation that physical activity and exercise decline during cancer treatment (Irwin 2003). While efforts can and should be made to maximise adherence and to minimise contamination, imperfect adherence and an amount of contamination is even to be expected in efficacy studies, lowering the confidence in the results. In effectiveness trials, we recommend that both adherence and contamination are reported as an outcome measure because poor adherence can render an efficacious intervention ineffective. To date, effectiveness trials are rare, but would be of additional value for this field of research.

Consensus of researchers on outcome measures for exercise studies involving women with breast cancer receiving adjuvant treatment is needed in order to facilitate interpretation and comparison of results across various interventions. The long‐term follow‐up of exercise interventions also requires attention because some side effects of adjuvant cancer treatment are long term, such as fatigue or deconditioning, and the effects of exercise themselves might have a long‐term component. Besides health‐related outcome measures, adherence and contamination as well as potential harms should be assessed and reported systematically. Reporting standards for harms should help to inform practitioners and the public on potential harms of exercise interventions during adjuvant cancer treatment (Ioannidis 2004).

Regarding recruitment difficulties and thus the problem of small sample sizes, multisite trials are advisable.

The seven ongoing studies and the two studies awaiting assessment identified during our search also have small sample sizes and assess a wide variety of outcomes with different outcome measures. The majority include interventions with aerobic exercise of moderate intensity and compare exercise to usual care, apart from one study awaiting assessment in which yoga is the control intervention (Lotzke 2016). One study is notable for its rather long intervention period of 12 months, a large estimated number of enrolment of 600 participants, and a planned follow‐up of 10 years, including relapse of breast cancer disease, breast cancer‐specific mortality, and overall mortality as secondary outcomes (NCT02240836).

Once the effectiveness of exercise ‐ even in widely varying frequency and intensity ‐ for women with breast cancer during adjuvant therapy for different outcomes has been established, the next step is to assess which frequency, intensity, and type of exercise (aerobic, resistance, combination) is most effective for which outcome. There are ongoing and published studies comparing different dose regimens to each other, but which we could not include in our review because they had no usual care or non‐exercising control group. The number of studies comparing exercise to not exercising is still much higher, and there are even still feasibility studies underway or currently being published, although feasibility of exercise studies has certainly been proven. The comparison of different dosages of exercise for different outcomes might be a question for several reviews of their own.

Summary of findings

Open in table viewer
Summary of findings for the main comparison. Exercise compared with control for women receiving adjuvant therapy for breast cancer

Exercise compared with control for women receiving adjuvant therapy for breast cancer

Population: women receiving adjuvant therapy (chemo‐ or radiotherapy or both) for breast cancer

Settings: supervised or home based

Intervention: aerobic or resistance exercise or a combination of both

Comparison: control intervention (usual care or intervention that was not exercise, such as stretching)

Outcomes

Relative effects* (95% CI)

No of Participants
(studies)

Quality of the evidence
(GRADE)

Comments

Exercise vs control

Physical fitness

assessed with: 6‐ or 12‐minute walk test, peak oxygen uptake, and other scales

(follow‐up: 18 weeks to 6 months)

The mean physical fitness in the intervention group was 0.42 standard deviations higher (0.25 to 0.59 higher)

1310

(15 RCTs)

⊕⊕⊕⊝
moderate1

SMD 0.42 (95% CI 0.25 to 0.59)

Fatigue

assessed with: FACIT‐F scale, (revised) Piper Fatigue Scale, Multidimensional Fatigue Inventory and other scales

(follow‐up: 18 weeks to 6 months)

The mean fatigue in the intervention group was 0.28 standard deviations lower (0.41 lower to 0.16 lower)

1698

(19 RCTs)

⊕⊕⊕⊝
moderate2

SMD ‐0.28 (95% CI ‐0.41 to ‐0.16)

Cancer‐specific quality of life

assessed with: FACT‐G, EORTC QLQ‐C30 and other scales

(follow‐up: 12 weeks to 6 months)

The mean cancer‐specific quality of life in the intervention group was 0.12 standard deviations higher (0.00 to 0.25 higher)

1012

(12 RCTs)

⊕⊕⊕⊝
moderate3

SMD 0.12 (95% CI 0.00 to 0.25)

Health‐related quality of life

assessed with EQ‐5D visual analogue scale (higher scores indicate higher quality of life, score range from 0 to 100)

MID: 7 points

(follow‐up: end of intervention)

The mean health‐related quality of life in the intervention group was 1.10 points higher (5.28 lower to 7.48 higher)

68

(1 RCT)

⊕⊕⊝⊝
low4,5

MD 1.10 (95% CI ‐5.28 to 7.48)

Cancer site‐specific quality of life

assessed with: FACT‐B (higher scores indicate better quality of life, score range from 0 to 144)

MID: 7 to 8 points

(follow‐up: end of intervention)

The mean cancer site‐specific quality of life in the intervention group was 4.24 points higher (1.81 lower to 10.29 points higher)

262

(4 RCTs)

⊕⊕⊝⊝
low6,7

MD 4.24 (95% CI ‐1.81 to 10.29)

Depression

assessed with: BDI, CES‐D

(follow‐up: 6 months)

The mean depression in the intervention group was 0.15 standard deviations lower (0.30 lower to 0.01 higher)

674

(5 RCTs)

⊕⊕⊕⊝
moderate8

SMD ‐0.15 (95% CI ‐0.30 to 0.01)

Cognitive function

assessed with: Trail Making Test

(less time in seconds needed for completing the test means less cognitive dysfunction)

(follow‐up: end of intervention)

The mean time needed for completing the test in the intervention group was 11.55 seconds less (22.06 seconds less to 1.05 seconds less)

213

(2 RCTs)

⊕⊕⊝⊝
low9,10

MD ‐11.55 (95% CI ‐22.06 to ‐1.05)

Lymphoedema

assessed with: volumetric arm measurements and bioimpedance spectroscopy

(follow‐up: 8 weeks)

Assumed risk11:
85 per 1000

Corresponding risk:

60 per 1000 (30 to 123)

436

(2 RCTs)

⊕⊕⊝⊝
low12,13

RR 0.71 (95% CI 0.35 to 1.45)

*The basis for the assumed risk (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).
BDI: Beck Depression Inventory; CES‐D: Center for Epidemiological Studies‐Depression Scale; CI: confidence interval; FACIT‐F: Functional Assessment of Chronic Illness Therapy‐Fatigue Scale; FACT‐B: Functional Assessment of Cancer Therapy‐Breast; FACT‐G: Functional Assessment of Cancer Therapy‐General; MD: mean difference; MID: minimally important difference; RCT: randomised controlled trial; RR: risk ratio; SMD: standardised mean difference

GRADE Working Group grades of evidence
High quality: Further research is very unlikely to change our confidence in the estimate of effect.
Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low quality: We are very uncertain about the estimate.

1Lack of blinding, low adherence and high or unclear contamination, several randomisation and many allocation concealment procedures were unclear, therefore we downgraded by one level.
2Lack of blinding, low adherence and high or unclear amount of contamination, many allocation concealment procedures were unclear, therefore we downgraded by one level.
3Lack of blinding, low adherence and high or unclear amount of contamination, and a high rate of incomplete outcome data, therefore we downgraded by one level.
4Lack of blinding, low adherence and high amount of contamination, high rate of incomplete outcome data, and group similarity at baseline was at high risk, therefore we downgraded by one level.
5Small number of participants and null effect and appreciable benefit included in the confidence interval for the mean difference: imprecision, therefore we further downgraded by one level.
6Lack of blinding, low adherence, a high or unclear amount of contamination in three of four trials in the meta‐analysis, two of four allocation concealment procedures were unclear, therefore we downgraded by one level.
7Small number of participants, wide confidence intervals for two of the four trials, and null effect and appreciable benefit included in the confidence interval for the mean difference: imprecision, therefore we further downgraded by one level.
8Lack of blinding, low adherence and unclear or high contamination, two published studies could not contribute to the meta‐analysis, and in one of those there were no changes in the depression scores in any of the groups, therefore we downgraded by one level.
9Lack of blinding, low and unclear adherence and unclear contamination, group similarity at baseline for one study was at high risk of bias, therefore we downgraded by one level.
10Small number of participants: imprecision, therefore we further downgraded by one level.
11Assumed risk based on the mean control group risk in the included studies.
12Lack of blinding, low adherence and unclear or high contamination, one of two allocation procedures was unclear, group similarity at baseline was at high risk of bias for one study, therefore we downgraded by one level.
13Small number of participants and null effect and appreciable harm and benefit included in the confidence interval for the risk ratio: imprecision, therefore we further downgraded by one level.

Background

Description of the condition

Breast cancer detection and management have undergone dramatic changes over the past three decades. Women are increasingly diagnosed with early‐stage disease, leaving them with treatment choices ranging from breast‐conserving options to mastectomy (Newman 2003). With the majority of breast cancers diagnosed at an early stage, treatment is focused on cure and the prevention of relapse due to micrometastatic disease. The mainstay of care is local therapy, consisting of mostly breast‐conserving surgery followed by radiotherapy. Adjuvant systemic therapy includes chemotherapy (cytotoxic agents) when there is an increased risk for systemic relapse and hormonal and/or antibody therapy (trastuzumab), depending on the expression of hormone and HER2/neu receptors.

Besides these major advances in managing both early and locally advanced breast cancer, women still have to deal with severe side effects and psychological distress during adjuvant therapy. This has a substantial impact on their quality of life. Side effects that appear with adjuvant cancer treatment differ depending on the mode of treatment, that is radiotherapy, chemotherapy, hormonal, or antibody therapy.

Radiotherapy is frequently associated with short‐term side effects such as fatigue and skin reactions, and relatively rare long‐term side effects including lymphoedema, cardiac and pulmonary toxicities, and secondary malignancy (Brown 2015). Chemotherapy is associated with short‐term side effects such as nausea, emesis, stomatitis, alopecia, myelosuppression, thromboembolism, myalgias, neuropathy, and fatigue. Long‐term side effects of chemotherapy are premature menopause, weight gain, fatigue, cardiac dysfunction, and cognitive dysfunction (Partridge 2001). Furthermore, people receiving radiotherapy or chemotherapy report anxiety and depression prior to, during, and after therapy due to treatment side effects (Spiegel 1997). Adjuvant hormonal therapy produces symptoms secondary to oestrogen withdrawal, such as hot flushes, bone demineralisation, and psychosexual effects (Rutqvist 2004). A particular concern with antibody therapy in combination with anthracycline chemotherapy is cardiac toxicity (Rayson 2008).

Description of the intervention

Although research is producing increasingly hopeful insights into the causes and cures for cancer, efforts to manage the side effects of adjuvant therapy have not kept apace (Patrick 2003). Exercise interventions may be effective in managing some of these side effects, such as fatigue, depression, and cognitive dysfunction.

How the intervention might work

Evidence concerning the natural progression of physical activity suggests that women with breast cancer significantly decrease physical activity and exercise from pre‐diagnosis to postdiagnosis (Irwin 2003). These decreases are associated with adjuvant cancer treatment; observed decreases in physical activity were greater among women who were treated with radiation and chemotherapy (50% decrease) compared with women who underwent surgery only (24% decrease) or who were treated with surgery and radiation only (23% decrease) (Irwin 2003).

The National Comprehensive Cancer Network defines cancer‐related fatigue as a "persistent, subjective sense of tiredness related to cancer or cancer treatment that interferes with usual functioning" (NCCN 2004). Fatigue results in substantial physical, psychosocial, cognitive, and socioeconomic consequences (Holley 2000). During and after adjuvant chemotherapy and radiotherapy the prevalence of fatigue is high and fluctuating (de Jong 2002; Jereczek‐Fossa 2001). Fatigue is also associated with factors such as depression, impaired quality of sleep, or pain (de Jong 2002).

The rationale supporting exercise interventions for cancer‐related fatigue is based on the proposition that the combined effects of a toxic treatment and a decreased level of activity during treatment result in a reduction in the capacity for physical performance. Patients must in turn use greater effort and expend more energy to perform daily activities, which leads to fatigue (NCCN 2004). Physical exercise training programmes may increase functional capacity, leading to reduced effort and decreased fatigue.

Women treated for breast cancer frequently experience higher levels of emotional distress than the general population (Spiegel 1997). The rationale for considering exercise as an intervention to reduce distress in women receiving adjuvant therapy for breast cancer is based upon literature that has demonstrated ameliorating effects of exercise on these problems. Results of studies with non‐cancer populations indicate that aerobic exercise training has antidepressant and anxiolytic effects and protects against harmful consequences of stress (Salmon 2001). There is evidence that cognitive dysfunction may also occur in women receiving adjuvant chemotherapy for breast cancer (O'Shaughnessy 2003; Rugo 2003; Tchen 2003). A meta‐analytic study conducted to examine the hypothesis that aerobic fitness training enhances the cognitive vitality of healthy but sedentary older adults indicated that fitness training has robust benefits for cognition (Colcombe 2003).

Why it is important to do this review

The majority of research focused on rehabilitation and health promotion in women who had completed cancer treatment. This review aims to evaluate the role of exercise in managing common side effects of adjuvant and neoadjuvant therapy for breast cancer. We conducted this review update to incorporate and analyse the increasing number of studies in women undergoing adjuvant treatment.

Objectives

To assess the effect of aerobic or resistance exercise interventions during adjuvant treatment for breast cancer on treatment‐related side effects such as physical deterioration, fatigue, diminished quality of life, depression, and cognitive dysfunction.

Methods

Criteria for considering studies for this review

Types of studies

We considered randomised controlled trials of exercise training during adjuvant (including neoadjuvant) treatment (radiotherapy, chemotherapy) for women with non‐metastatic breast cancer.

Types of participants

We included studies involving women who were diagnosed with breast cancer stages I, II, and III and who were undergoing adjuvant (including neoadjuvant) chemotherapy, radiotherapy, or a combination concurrently with an exercise intervention in the active group.

Types of interventions

We included studies that assessed the effects of all forms of repeatedly performed aerobic or resistance exercise or both with programme duration of at least six weeks. To be included in this review, the exercise intervention had to coincide with the adjuvant treatment regimen rather than follow it. We excluded studies where the exercise intervention was part of a complex intervention (for example complete decongestive lymphatic therapy). We also excluded trials with interventions restricted to local muscular endurance (for example training of shoulders, back, or legs only) instead of including all major muscle groups or restricted to stretching exercises.

We included trials making the following comparisons:

  • exercise versus no exercise;

  • exercise versus other interventions (e.g. psychosocial interventions).

Types of outcome measures

Primary outcomes

  1. physical fitness: objective tests measuring VO2 max or distance walked per time

  2. fatigue: using a validated questionnaire such as Functional Assessment of Chronic Illness Therapy‐Fatigue (FACIT‐F)

  3. quality of life (cancer‐specific quality of life, health‐related quality of life, cancer site‐specific quality of life): using a validated questionnaire such as Functional Assessment of Cancer Therapy‐General (FACT‐G), European Organisation for Research and Treatment of Cancer Quality of Life Questionnaire‐Core 36 (EORTC QLQ‐C30), Functional Assessment of Cancer Therapy‐Breast (FACT‐B)

  4. depression: using a validated questionnaire such as Beck Depression Inventory (BDI)

  5. cognitive function: using a validated test such as the Trail Making Test

Secondary outcomes

  1. strength

  2. other psychological distress outcomes

  3. physical activity behaviour

  4. multidimensional outcomes (e.g. pain)

  5. harms

Search methods for identification of studies

Electronic searches

  • We searched the Cochrane Breast Cancer Group Specialised Register on 30 March 2015 (details of search strategies used by the group for the identification of studies and the procedure used to code references are outlined in the group's module at www.mrw.interscience.wiley.com/cochrane/clabout/articles/BREASTCA/frame.html). We extracted studies including the text words 'early', 'locally advanced', local recurrence', 'locoregional', 'exercise', and 'exercise therapy' on the Specialised Register for consideration.

  • Cochrane Central Register of Controlled Trials (CENTRAL) (via the Cochrane Library, Issue 2, 2015). See Appendix 1.

  • MEDLINE (via OvidSP) from 1966 until 30 March 2015. See Appendix 2.

  • EMBASE (via Embase.com) from 1966 until 30 March 2015. See Appendix 3.

  • The World Health Organization International Clinical Trials Registry Platform (WHO ICTRP) (apps.who.int/trialsearch/Default.aspx) for all prospectively registered and ongoing trials on 30 March 2015. See Appendix 4.

  • ClinicalTrials.gov (clinicaltrials.gov/ct2/home) until 30 March 2015. See Appendix 5.

We did not update the searches in the original review in the following databases:

  • CINAHL (1982 to 2004)

  • SPORTDiscus (1975 to 2004)

  • PsycINFO (1872 to 2003)

  • SIGLE (1880 to 2004)

  • ProQuest Digital Dissertations (1861 to 2004)

Searching other resources

References from published studies

We screened references in relevant reviews and in published clinical trials for further trials.

Other

We consulted six experts in the field of cancer and exercise to identify additional trials. We applied no language restrictions.

Data collection and analysis

Selection of studies

Two review authors (either ACF and MHM or ACF and MM) independently reviewed the titles and abstracts of reports identified by the search and selected those that potentially fulfilled the inclusion criteria of this review. We retrieved these potentially relevant reports for more detailed evaluation. Both review authors then independently made a final selection of studies to be included in the review. A report was excluded according to the first criterion that it did not fulfil. We resolved disagreements by consensus, or if necessary by consulting a third person (MM or MHM) to reach a final decision.

Data extraction and management

Two review authors (ACF, MHM) independently extracted data (including study characteristics, study results, and point estimates together with measures of variability for selected outcome variables). We reviewed all discrepancies and achieved consensus through discussion, if necessary consulting a third person (MM) to reach a final decision. Where we found more than one publication for a study, we extracted data from all available publications if applicable. When a design publication and a results publication were available, we considered the results publication to be the primary reference. In cases where a doctoral dissertation was available, we considered this to be the primary reference for the study. In other cases, we considered the publication with the most relevant reported information for the review, especially regarding results, to be the primary reference.

Assessment of risk of bias in included studies

In the original version of the review, we assessed the included studies for quality using van Tulder methodological quality criteria. In the updated review, we used the Cochrane 'Risk of bias' tool (Higgins 2011). Two review authors (either ACF and MM or ACF and MHM) assessed the risk of bias of all included studies. This included assessment of sequence generation, allocation concealment, masking or blinding (of participants, researchers/healthcare providers, and outcome assessors), methods of addressing incomplete outcome data, selective reporting of outcomes, and other possible sources of bias including attrition from the exercise intervention. We graded each risk of bias parameter as high risk, low risk, or unclear risk based on recommendations for judging risk of bias provided in Chapter 8 of the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011). For attrition bias, we judged studies to be at high risk when more than 20% of data were missing for short‐term follow‐up and more than 30% for long‐term follow‐up, as these are commonly used thresholds. We resolved disagreement through consensus, if necessary consulting a third review author (MHM or MM) for a final decision.

We used the last two rows in the 'Risk of bias' assessment tables to document reporting and amount of adherence and contamination in the exercise and control groups. If participants allocated to the exercise group do not exercise (non‐adherence), and at the same time participants allocated to the control group do exercise (contamination), the originally intended study groups are distorted into groups with participants who exercise and those who do not (moreover in unknown proportions). Effects may be underestimated as a result.

There are several bias issues inherent to exercise studies, that is blinding of participants and exercise supervising personnel is difficult or impossible, leading to high risk of performance bias in every study. It is therefore important to point out that the bias assessment for those items does not reflect a low quality of study design as such, but expresses the inevitable bias introduced by lack of blinding. Nevertheless, exercise intervention studies should be subjected to the same 'Risk of bias' assessment as other studies, for example drug intervention studies. A similar challenge applies to adherence and contamination in exercise studies. It is difficult to maximise exercise adherence, especially in a participant cohort with cancer, and a certain amount of contamination and imperfect adherence is to be expected. Confidence in the results might therefore be lowered, even if studies are well planned and reported.

When high risk of bias for a study was due to lack of blinding, contamination, and/or non‐adherence, we did not downgrade the quality of evidence for this alone. We only downgraded the evidence one level for risk of bias when other factors such as unclear allocation procedures or high attrition rates were present.

’Risk of bias’ tables for each study are presented in the Characteristics of included studies table and a summary of the risk of bias is presented.

Measures of treatment effect

Outcome measurements were presented as continuous data across included studies. As the first step, we extracted data on outcomes in the format in which they were reported. For selected outcomes we extracted group means for final values and change scores with the corresponding measures of variability such as standard deviations (SD) or confidence intervals (CI) and the number of participants on whom the outcome was assessed per group.

As a summary statistic for meta‐analysis of continuous outcomes, we either used the standardised mean difference (SMD) or the weighted mean difference (WMD). We chose the SMD in cases where different assessment instruments measuring the same construct were used across studies (for example for fatigue and physical fitness outcomes). We did not combine final values and change scores in meta‐analyses since the difference in standard deviation does not reflect "differences in measurement scale, but differences in the reliability of the measurements" (Deeks 2005).

We did not include data for outcomes assessed with subscales of questionnaires (for example physical functioning subscale of the 36‐Item Short Form Health Survey (SF‐36) or vitality subscale of the SF‐36, nausea item of the Symptom Checklist‐90 (SCL‐90)), because we only wanted to assess the respective full construct in this review as well as only include validated questionnaires, which often is not the case for subscales.

As a summary statistic for dichotomous outcomes we chose the risk ratio (RR). Lymphoedema was the only outcome that was analysed as a dichotomous outcome: two studies reported the number of participants with lymphoedema (Courneya 2007: Courneya 2007 AET and Courneya 2007 RET; Hayes 2013: Hayes 2013 FtF and Hayes 2013 Tel). For those outcomes with data available from only one study, we calculated and presented a summary statistic for this particular study.

Unit of analysis issues

Five studies were three‐arm trials (Courneya 2007 split into Courneya 2007 AET and Courneya 2007 RET; Hayes 2013: Hayes 2013 FtF and Hayes 2013 Tel; Schwartz 2007: Schwartz 2007 AET and Schwartz 2007 RET; Segal 2001: Segal 2001 SD and Segal 2001 SU; van Waart 2014: van Waart 2014 high and van Waart 2014 low), and they contributed to the meta‐analysis of physical fitness with two exercise groups. For all five studies, we incorporated both exercise arms into the meta‐analysis and allocated a control group to each of them (that is by halving the number of participants and events observed in the control group).

Dealing with missing data

Whenever possible, we tried to contact the investigators or sponsors of studies with missing data.

Assessment of heterogeneity

We used the random‐effects model to obtain the average effect of exercise because, in addition to the presence of random error, differences between exercise studies during adjuvant cancer treatment can also result from real differences between study populations, adjuvant cancer treatment, and the training stimulus. The random‐effects model considers these additional sources of between‐study variability as well as within‐study variability.

We evaluated inconsistency of results across studies using the I2 statistic, which describes the percentage of variability in the point estimates that is due to heterogeneity rather than sampling error (Higgins 2002). Following Higgins (Higgins 2003), we considered I2 values of 25% as indicating low heterogeneity, I2 values of 50% moderate heterogeneity, and I2 values of 75% large heterogeneity.

We also used visual assessment of forest plots; if no or small overlap of CIs for the results of individual studies was present, we assumed statistical heterogeneity.

Assessment of reporting biases

If we identified a sufficient number of studies (that is more than 10), we prepared funnel plots and visually examined them for signs of asymmetry to detect publication bias.

Data synthesis

We used the random‐effects model to obtain the average effect of exercise because, in addition to the presence of random error, differences between exercise studies during adjuvant cancer treatment can also result from real differences between study populations, adjuvant cancer treatment, and the training stimulus. The random‐effects model considers these additional sources of between‐study variability as well as within‐study variability.

We used the Grades of Recommendation, Assessment, Development and Evaluation (GRADE) approach to assess the quality of the evidence, grading the following main outcomes for quality: physical fitness, fatigue, cancer‐specific quality of life, health‐related quality of life, cancer site‐specific quality of life, depression, cognitive function, and lymphoedema. We used GRADEproGDT software to develop the 'Summary of findings' table, and two review authors (either ACF and MM or ACF and MHM) graded the quality of the evidence for each outcome. We resolved disagreement by consensus, if necessary consulting a third review author (MHM or MM) for a final decision.

As blinding of participants and exercise supervising personnel is difficult or impossible, and as self reported outcomes inherently carry a high risk of detection bias, those items were assessed with a high risk of bias, but did not lead to downgrading unless there were additional high risks of bias (for example sequence generation and allocation concealment).

Subgroup analysis and investigation of heterogeneity

We did not conduct subgroup analyses.

If possible, in future updates we will consider conducting the following subgroup analyses: adjuvant treatment received, chemo‐ and radiotherapy or radiotherapy only, type of exercise intervention (aerobic or resistance exercise, self directed or supervised exercise).

Sensitivity analysis

Where important statistical inconsistency existed as measured by the I2 statistic, we conducted sensitivity analyses to assess the robustness of the review results by removing those studies that seemed to be estimating a different effect.

Results

Description of studies

Results of the search

See: Characteristics of included studies; Characteristics of excluded studies; Characteristics of studies awaiting classification.

In the original review published in 2006, we retrieved 32 full‐text references for more detailed evaluation after screening of 1612 potentially relevant references. From the 32 full‐text references, 22 publications were excluded, one trial was awaiting assessment (pending publication), and nine trials were included. We considered seven of the nine trials to be appropriate for inclusion in this updated review (Campbell 2005; Crowley 2003; Drouin 2002; MacVicar 1989; Mock 2004; Segal (Segal 2001 SD & Segal 2001 SU); Winningham 1988). We excluded two of these in this updated review because they were not randomised controlled trials (MacVicar 1986; Mock 1997). We included the trial that was awaiting assessment in the original review in this updated review (Battaglini 2004).

In this updated review, we retrieved a further 146 full‐text references following screening of 3297 titles and abstracts. From these 146 full‐text references we excluded 86 and identified 60 records as appropriate for inclusion in this review. Three records belonged to a study that had been included in the first version of this review, and 55 records related to 24 new studies. One study is awaiting assessment due to pending publication (Petrella 2012), and another study was due to be published after our analyses were finished (Lotzke 2016). For further details see Figure 1.


Study flow diagram.

Study flow diagram.

Included studies

The eight studies from the original review and the 24 new studies amounted to a total of 32 studies (2626 participants) for inclusion in the review. Trial characteristics and outcomes are found in the Characteristics of included studies tables. We included only randomised controlled trials in this updated version of the review. Five of the included studies incorporated two separate exercise groups and are therefore entered twice for the purposes of statistical analysis (Courneya 2007: Courneya 2007 AET and Courneya 2007 RET; Hayes 2013: Hayes 2013 FtF and Hayes 2013 Tel; Schwartz 2007: Schwartz 2007 AET and Schwartz 2007 RET; Segal 2001: Segal 2001 SD and Segal 2001 SU; van Waart 2014: van Waart 2014 high and van Waart 2014 low).

Characteristics of participants

Women obtained different regimens of adjuvant treatment across these 32 exercise intervention studies: they received either chemotherapy or radiotherapy in one trial (Mock 2004); either chemotherapy or radiotherapy or a combination of the two in another 10 trials (Battaglini 2004; Cadmus 2007; Caldwell 2009; Campbell 2005; Eakin 2012; Haines 2010; Hayes 2013: Hayes 2013 FtF and Hayes 2013 Tel; Mutrie 2007; Perna 2010; Segal 2001: Segal 2001 SD and Segal 2001 SU), sequential chemo‐ and radiotherapy in one trial (Cornette 2013), chemotherapy only in nine trials (Courneya 2007: Courneya 2007 AET and Courneya 2007 RET; Crowley 2003; Ingram 2010; MacVicar 1989; Moros 2010; Schmidt 2014; Visovsky 2014; Winningham 1988; Yang 2011); neoadjuvant chemotherapy in two trials (Hornsby 2014; Rao 2012), neoadjuvant and adjuvant chemotherapy in one trial (Gokal 2013), and radiotherapy only in three trials (Drouin 2002; Reis 2013; Steindorf 2014). Five trials included women who were scheduled for chemotherapy, some of whom underwent radiation therapy as well (Dodd 2010; Husebo 2014; Schwartz 2007: Schwartz 2007 AET and Schwartz 2007 RET; Travier 2015; van Waart 2014: van Waart 2014 high and van Waart 2014 low).

Characteristics of the intervention

Mode of exercise differed across trials. Thirteen trials (Cadmus 2007; Dodd 2010; Drouin 2002; Gokal 2013; Hornsby 2014; Husebo 2014; MacVicar 1989; Mock 2004; Moros 2010; Reis 2013; Segal 2001: Segal 2001 SD and Segal 2001 SU; Winningham 1988; Yang 2011), and one of two intervention arms in two studies (Courneya 2007: Courneya 2007 AET; Schwartz 2007: Schwartz 2007 AET) tested aerobic exercise interventions, with three studies using cycle ergometer interval training (Hornsby 2014; MacVicar 1989; Winningham 1988), and six studies offering walking programmes (Drouin 2002; Gokal 2013; Husebo 2014; Mock 2004; Segal 2001: Segal 2001 SD and Segal 2001 SU; Yang 2011). Aerobic exercise also consisted of Nia exercise, in Reis 2013, and aerobic exercise self chosen by the participants in two studies, Cadmus 2007 and Dodd 2010, and in one intervention arm of one study (Schwartz 2007: Schwartz 2007 AET). Fifteen studies (Battaglini 2004; Caldwell 2009; Campbell 2005; Cornette 2013; Crowley 2003; Eakin 2012; Haines 2010; Hayes 2013: Hayes 2013 FtF and Hayes 2013 Tel; Husebo 2014; Ingram 2010; Moros 2010; Mutrie 2007; Perna 2010; Rao 2012; Travier 2015), and one of two intervention arms in one study (van Waart 2014: van Waart 2014 high) applied a combined aerobic‐resistance programme. Exercise was implemented as a supervised group exercise programme in seven studies (Battaglini 2004; Campbell 2005; Courneya 2007: Courneya 2007 AET and Courneya 2007 RET; MacVicar 1989; Mutrie 2007; Travier 2015; Winningham 1988), and in one of two intervention arms in two studies (Segal 2001: Segal 2001 SU; van Waart 2014: van Waart 2014 high). Two studies tested resistance exercise interventions (Schmidt 2014; Steindorf 2014), as well as one intervention arm of two further studies (Courneya 2007: Courneya 2007 RET; Schwartz 2007: Schwartz 2007 RET). One study started with supervised resistance training at the hospital, followed by a combined self directed aerobic‐resistance programme (Cornette 2013).

Four studies used a stretching intervention as a comparison arm (Drouin 2002; Haines 2010; MacVicar 1989; Winningham 1988), two studies used progressive muscle relaxation as the comparison arm (Schmidt 2014; Steindorf 2014), and the remaining 26 studies compared an exercise intervention with no intervention.

Exercise interventions lasted six to seven weeks for women undergoing radiation treatment in two trials (Drouin 2002; Mock 2004), 10 weeks in two trials for women undergoing chemotherapy (MacVicar 1989; Winningham 1988), and 12 to 13 weeks in 11 trials (Caldwell 2009; Campbell 2005; Crowley 2003; Gokal 2013; Hornsby 2014; Mutrie 2007; Reis 2013; Schmidt 2014; Steindorf 2014; Visovsky 2014; Yang 2011). In nine trials, the exercise intervention lasted 18 to 32 weeks (Battaglini 2004; Cadmus 2007; Cornette 2013; Eakin 2012; Hayes 2013: Hayes 2013 FtF and Hayes 2013 Tel; Ingram 2010; Schwartz 2007: Schwartz 2007 AET and Schwartz 2007 RET; Segal 2001: Segal 2001 SD and Segal 2001 SU; Travier 2015).The longest intervention period was 52 weeks, in two trials (Dodd 2010; Haines 2010). Trials with shorter intervention periods (six to seven weeks) were those in which women received radiation treatment, which is of shorter duration than chemotherapy. In five trials (Courneya 2007: Courneya 2007 AET and Courneya 2007 RET; Husebo 2014; Mock 2004; Moros 2010; van Waart 2014: van Waart 2014 high and van Waart 2014 low), the exercise intervention was implemented to span the period of time from initiation to cessation of the woman's adjuvant therapy, and subsequently the intervention periods of women in the intervention arm of the trial varied in length (either six weeks with radiation treatment or three to six months with chemotherapy).

In 12 trials (Battaglini 2004; Campbell 2005; Courneya 2007: Courneya 2007 AET and Courneya 2007 RET; Hornsby 2014; MacVicar 1989; Moros 2010; Mutrie 2007; Rao 2012; Schmidt 2014; Steindorf 2014; Travier 2015; Winningham 1988), and in one of the two intervention arms in three trials (Hayes 2013: Hayes 2013 FtF; Segal 2001: Segal 2001 SU; van Waart 2014: van Waart 2014 low), the exercise intervention was supervised. Women's exercise was self directed in 15 trials (Cadmus 2007; Caldwell 2009; Crowley 2003; Dodd 2010; Drouin 2002; Eakin 2012; Gokal 2013; Haines 2010; Husebo 2014; Ingram 2010; Mock 2004; Reis 2013; Schwartz 2007: Schwartz 2007 AET and Schwartz 2007 RET; Visovsky 2014; Yang 2011), and in the second intervention arm of Hayes 2013 (Hayes 2013 Tel), Segal 2001 (Segal 2001 SD), and van Waart 2014 (van Waart 2014 low). Two studies started with supervised sessions, which were followed by self directed sessions in the home (Cornette 2013; Perna 2010). Two trials applied supervised, one‐on‐one sessions, Rao 2012 home based and Hornsby 2014 at the clinical institution.

Characteristics of the outcome measures

The most frequently assessed outcomes were physical fitness and fatigue, with 22 studies measuring physical fitness (Battaglini 2004; Caldwell 2009; Campbell 2005; Cornette 2013; Courneya 2007: Courneya 2007 AET and Courneya 2007 RET; Crowley 2003; Dodd 2010; Drouin 2002; Haines 2010; Hayes 2013: Hayes 2013 FtF and Hayes 2013 Tel; Hornsby 2014; Husebo 2014; MacVicar 1989; Mock 2004; Mutrie 2007; Reis 2013; Schmidt 2014; Schwartz 2007: Schwartz 2007 AET and Schwartz 2007 RET; Segal 2001: Segal 2001 SD and Segal 2001 SU; Steindorf 2014; Travier 2015; van Waart 2014: van Waart 2014 high and van Waart 2014 low), and 21 studies measuring fatigue (Battaglini 2004; Caldwell 2009; Campbell 2005; Cornette 2013; Courneya 2007: Courneya 2007 AET and Courneya 2007 RET; Crowley 2003; Dodd 2010; Drouin 2002; Eakin 2012; Gokal 2013; Haines 2010; Hayes 2013: Hayes 2013 FtF and Hayes 2013 Tel; Hornsby 2014; Husebo 2014; Mock 2004; Mutrie 2007; Reis 2013; Schmidt 2014; Steindorf 2014; Travier 2015; van Waart 2014: van Waart 2014 high and van Waart 2014 low). Other outcomes assessed were quality of life, strength, depression, anxiety, cognitive function, self esteem, mood disturbances, physical activity level, gait and balance, subjective upper body function, neuropathy symptoms, chemotherapy completion, shoulder mobility, arm morbidity, nausea relief, sleep disturbances, endocrine symptoms, and adverse effects. For detailed information on outcome measures see the Characteristics of included studies table.

Other study characteristics

Small sample size was common among the included studies. Sixteen studies randomised fewer than 50 women. Ten of the 32 studies randomised more than 50 women per group (Courneya 2007: Courneya 2007 AET and Courneya 2007 RET; Dodd 2010; Eakin 2012; Hayes 2013: Hayes 2013 FtF and Hayes 2013 Tel; Mock 2004; Mutrie 2007; Segal 2001: Segal 2001 SD and Segal 2001 SU; Steindorf 2014; Travier 2015; van Waart 2014: van Waart 2014 high and van Waart 2014 low). Sample sizes ranged from 10 to 242 women. The median sample size was 50 women, interquartile range (IQR) 22 to 124. Sample size was reported to be based on power calculations in 14 studies, and 12 of the 14 studies reached the target sample size.

Excluded studies

In the majority of cases, we excluded studies because the exercise intervention took place after the adjuvant treatment period. Other reasons were that studies were not randomised controlled trials, exercise was part of a complex intervention or no exercise intervention was implemented, the majority of participants were not women with breast cancer, or the exercise intervention had a duration of less than six weeks. Furthermore, we excluded studies assessing yoga or qigong because we regard both as a complex intervention. Some studies could not be characterised as controlled trials (they were study protocols or reviews). For a detailed description of the reasons for exclusion, see the Characteristics of excluded studies table. Note that this table contains not only clinical studies but also review articles that were part of our full‐text retrieval to confirm our decision to exclude studies when abstracts were ambiguous.

Risk of bias in included studies

We assessed the risk of bias for each included study and reported the judgements for the individual 'Risk of bias' domains in the ’Risk of bias’ table. We have presented these in the ’Risk of bias’ summary in Figure 2.


Risk of bias summary: review authors' judgements about each risk of bias item for each included study.

Risk of bias summary: review authors' judgements about each risk of bias item for each included study.

Allocation

Random sequence generation

Twenty trials were at a low risk of selection bias as they reported to have adequately generated their randomised sequence with a random component. One trial used a non‐random component to generate the sequence (Battaglini 2004), and was thus judged to be at a high risk of selection bias. We considered 11 trials to have an unclear risk of selection bias, largely because the generation of the random sequence was not described.

Allocation concealment

Nine trials adequately concealed allocation to the intervention so that participants and investigators could not foresee assignment to the study groups, and were thus judged to be at low risk of selection bias. Twenty‐three trials did not describe the method of allocation concealment or did not describe it in detail enough to allow for a definitive judgement, and were considered to have an unclear risk of selection bias.

Blinding

Blinding of participants and personnel

All trials included in this review were at high risk for performance bias because, owing to the nature of the intervention (exercise), it was not possible to blind the participants and the study personnel. Three studies mentioned a placebo group (Haines 2010; MacVicar 1989; Winningham 1988), in which women were instructed to do stretching in a similar setting to the exercise groups. But as knowledge about the difference between physical exercise and stretching is usually present in the population, we cannot assume that participants and personnel were unaware of being in the exercise or the stretching group.

Blinding of outcome assessors (detection bias)

Eight studies reported blinding of outcome assessors (Crowley 2003; Dodd 2010; Haines 2010; Hayes 2013: Hayes 2013 FtF and Hayes 2013 Tel; Hornsby 2014; Mutrie 2007; Perna 2010; Travier 2015). Blinding was performed for assessment of fitness outcomes, as well as for lymphoedema in one study (Hayes 2013), and for upper limb swelling and shoulder range of motion in another study (Haines 2010), with a low risk of bias for these outcomes, but not for the remaining self reported outcomes, for which risk of bias was high. Perna 2010 did not report the assessed fitness outcomes. In the remaining 24 studies, no information was given on blinding of outcome assessors, which we judged as lack of blinding and therefore high risk of bias for this item for all outcomes. In cases where no fitness outcomes or no self reported outcomes were measured in a trial, this appears as unclear risk of bias in the tables and as an empty cell in the ’Risk of bias’ summary.

Incomplete outcome data

Twenty‐three of the 32 studies reported to have analysed data according to the intention‐to‐treat (ITT) principle. Twenty of these 23 studies had low drop‐out rates or less than 20% missing data and were thus judged to be at low risk of attrition bias. Of the remaining three studies, one reported imbalanced drop‐out rates between the exercise group and the control group (Mutrie 2007), with almost twice as many dropped‐out participants in the intervention group (19 of 101) than in the control group (10 of 102); we therefore judged this study to be at high risk of attrition bias. The second study had more than 30% missing data (Cornette 2013). In spite of the trial authors undertaking an ITT analysis with imputation of missing data, we judged this study to be at high risk of attrition bias due to the amount of missing data. It remained unclear if there had been missing data in the third study, leading to a judgement of unclear risk of attrition bias (Visovsky 2014).

Five studies did not report if data were analysed by intention to treat. Three of these studies had more than 20% dropouts or missing data and were thus judged to be at high risk of attrition bias (Caldwell 2009; Haines 2010; Moros 2010). Another study did not report if there were missing data (Battaglini 2004), and was thus judged to be at an unclear risk of attrition bias. The remaining study had a low drop‐out rate (4 of 44 women) and was judged to be at low risk of attrition bias (Yang 2011).

We judged one study to be at high risk of attrition bias (Reis 2013), because it only reported a per‐protocol analysis of participants that adhered to the exercise intervention (12 of 22 women). In one study (Perna 2010), the numbers randomised to each arm as well as completion rates were unclear. The authors reported that they used regression modelling to impute missing values to conduct the analyses. We thus judged risk of attrition bias for this study as unclear and extracted no data. One study reported no outcome data as the study was closed early due to changes in the chemotherapy protocol (Ingram 2010); we judged this study to be at unclear risk of attrition bias. MacVicar 1989 undertook an analysis of 45 of 62 women; nine of the excluded women were reclassified to more advanced stages of disease than stage II during participation and not analysed for that reason. We judged this study to be at high risk of attrition bias.

Selective reporting

Six studies were at a low risk of reporting bias, as the studies had been registered prospectively or study protocols had been published and the prospectively registered outcomes were in line with the published ones. We considered 10 studies to be at high risk of reporting bias, because reporting of assessed outcomes in the final paper differed from entries in trial registries or study protocols, and no explanation was given. We considered 16 studies to be at an unclear risk for reporting bias, as no study protocol or design paper was available, and no trial registration had taken place; the information was therefore insufficient to judge this item for those studies.

Other potential sources of bias

Nineteen studies were at low risk of selection bias owing to adequate group similarity at baseline, three studies were at unclear risk for selection bias, and 10 studies were at high risk for selection bias, because group similarity at baseline was inadequate.

Adherence and contamination

Different approaches were used among the included studies to measure adherence, that is the level of exercise participation achieved once the woman had agreed to undertake it. Fifteen studies reported exercise levels in non‐exercising control groups (contamination) (Cadmus 2007; Courneya 2007: Courneya 2007 AET and Courneya 2007 RET; Crowley 2003; Dodd 2010; Eakin 2012; Gokal 2013; Haines 2010; Hayes 2013: Hayes 2013 FtF and Hayes 2013 Tel; Hornsby 2014; Husebo 2014; Mock 2004; Perna 2010; Reis 2013; Travier 2015; van Waart 2014: van Waart 2014 high and van Waart 2014 low). A high percentage of women (up to 70% in Dodd 2010) in the control groups reported to be regularly exercising or had a high level of activity. In a study with two exercise groups (Hayes 2013: Hayes 2013 FtF and Hayes 2013 Tel), the usual‐care group was more active than one of the exercise groups and as active as the other exercise group, according to the survey used.

In six studies, women adhered adequately to the exercise intervention, and in four studies this was unclear. In the remaining 22 studies, adherence to the exercise intervention was so low that we judged it to cause a high risk of bias. The amount of contamination was low in two studies, high in 10 studies, and unclear in the remaining 20 studies.

Effects of interventions

See: Summary of findings for the main comparison Exercise compared with control for women receiving adjuvant therapy for breast cancer

Effectiveness of exercise programmes

Most trial authors reported study results as follow‐up values, which we pooled. When outcomes were assessed with different instruments, follow‐up values and change scores could not be pooled. We performed meta‐analyses for physical fitness, fatigue, cancer‐specific quality of life, cancer site‐specific quality of life, depression, cognitive function, strength, subjective upper body function, arm morbidity, anxiety, mood disturbance, self esteem, physical activity, gait and balance, and lymphoedema.

Studies that were included in the meta‐analysis for physical fitness predominantly either measured performance, for example distance walked in a given time, or maximum oxygen uptake. Studies that were included in the meta‐analysis for fatigue predominantly applied the Functional Assessment of Chronic Illness Therapy‐Fatigue (FACIT‐F) scale. Studies that were included in the meta‐analysis for cancer‐specific quality of life either used the Functional Assessment of Cancer Therapy‐General (FACT‐G) or the European Organisation for Research and Treatment of Cancer Quality of Life Questionnaire‐Core 36 (EORTC QLQ‐C30) questionnaires. Studies that were included in the meta‐analysis for cancer site‐specific quality of life all used the Functional Assessment of Cancer Therapy‐Breast (FACT‐B) questionnaire. Studies that were included in the meta‐analysis for depression predominantly used the Center for Epidemiological Studies‐Depression scale (CES‐D). The two studies that were included in the meta‐analysis for cognitive function both used the Trail Making Test.

Primary outcomes

Physical fitness

Twelve studies applied tests of cardiorespiratory fitness (Battaglini 2004; Cornette 2013; Courneya 2007 (Courneya 2007 AET and Courneya 2007 RET); Crowley 2003; Dodd 2010; Drouin 2002; Hornsby 2014; MacVicar 1989; Schmidt 2014; Segal 2001 (Segal 2001 SD and Segal 2001 SU); Steindorf 2014; Travier 2015), eight studies assessed physical performance via timed walking distances (Caldwell 2009; Campbell 2005; Haines 2010; Husebo 2014; Mock 2004; Mutrie 2007; Reis 2013; Schwartz 2007 (Schwartz 2007 AET and Schwartz 2007 RET)), and two studies used other physical performance tests (Hayes 2013 (Hayes 2013 FtF and Hayes 2013 Tel); van Waart 2014 (van Waart 2014 high and van Waart 2014 low)). Hayes 2013 assessed heart rate at the end of test completion, and van Waart 2014 assessed endurance time in minutes.

Meta‐analysis was feasible for 15 of those 22 studies (1310 women) yielding 20 comparisons (Caldwell 2009; Campbell 2005; Cornette 2013; Courneya 2007 (Courneya 2007 AET and Courneya 2007 RET); Drouin 2002; Haines 2010; Hayes 2013 (Hayes 2013 FtF and Hayes 2013 Tel); Hornsby 2014; Husebo 2014; Mutrie 2007; Reis 2013; Schwartz 2007 (Schwartz 2007 AET and Schwartz 2007 RET); Segal 2001 (Segal 2001 SD and Segal 2001 SU); Travier 2015; van Waart 2014 (van Waart 2014 high and van Waart 2014 low)). The standardised mean difference (SMD) for the pooled data was 0.42 (95% confidence interval (CI) 0.25 to 0.59; I2 = 49%; Analysis 1.1; Figure 3). There was moderate heterogeneity with an I2 of 49%, which could be explained by the wide range of exercise interventions and outcome assessment protocols in the different studies.


Forest plot of comparison: 1 Exercise versus control, outcome: 1.1 Physical fitness.

Forest plot of comparison: 1 Exercise versus control, outcome: 1.1 Physical fitness.

We could not transform data from two studies for meta‐analysis requirements (Crowley 2003; MacVicar 1989). Both studies reported small but statistically significant improvements. Another study reported no group differences in the ITT analysis for the 12‐minute walk test but presented no data for the ITT analysis (Mock 2004). Four studies reported neither data nor descriptive results for cardiorespiratory fitness. One study provided only a comparison of means without standard deviations for cardiorespiratory fitness data (Battaglini 2004).

We rated the result of a statistically significant improvement in physical fitness as moderate‐quality evidence due to lack of blinding, low adherence, and high or unclear contamination in most of the studies and because many randomisation and allocation procedures were unclear. Furthermore, there was a considerable number of women (390 in 4 studies) for whom no data were reported. There was no indication of publication bias from examination of the funnel plot for this outcome. See summary of findings Table for the main comparison.

Fatigue

Meta‐analysis was possible for 19 studies (1698 women) yielding 22 comparisons (Battaglini 2004; Caldwell 2009; Campbell 2005; Cornette 2013; Courneya 2007 (Courneya 2007 AET and Courneya 2007 RET); Drouin 2002; Eakin 2012; Gokal 2013; Haines 2010; Hayes 2013 (Hayes 2013 FtF and Hayes 2013 Tel); Hornsby 2014; Husebo 2014; Mock 2004; Mutrie 2007; Reis 2013; Schmidt 2014; Steindorf 2014; Travier 2015; van Waart 2014 (van Waart 2014 high and van Waart 2014 low)). Several tools were used to measure fatigue: the FACIT‐F scale, the (revised) Piper Fatigue Scale, the Multidimensional Fatigue Inventory, the Schwartz Cancer Fatigue Scale, and the Fatigue Assessment Questionnaire and the Fatigue Quality List. The SMD between intervention and control was ‐0.28 (95% CI ‐0.41 to ‐0.16; I2 = 29%; Analysis 1.2; Figure 4), favouring the exercise group. Two studies did not report data for fatigue (Crowley 2003; Dodd 2010). Both studies reported that there were no statistically significant group differences. We assumed that their results would not have substantially influenced the pooled result because they involved only 22 and 119 women, respectively. We rated the result as moderate‐quality evidence due to lack of blinding, low adherence, and high or unclear contamination and because the allocation concealment procedures in many studies were unclear. The funnel plot was asymmetrical when Battaglini 2004 was included; otherwise there was no indication of publication bias from examination of the funnel plot for this outcome, and we did not downgrade for publication bias. See summary of findings Table for the main comparison.


Forest plot of comparison: 1 Exercise versus control, outcome: 1.2 Fatigue.

Forest plot of comparison: 1 Exercise versus control, outcome: 1.2 Fatigue.

Cancer‐specific quality of life

Sixteen studies examined effects of exercise on cancer‐specific quality of life (Cadmus 2007; Campbell 2005; Cornette 2013; Courneya 2007 (Courneya 2007 AET and Courneya 2007 RET); Dodd 2010; Haines 2010; Hornsby 2014; Moros 2010; Mutrie 2007; Reis 2013; Schmidt 2014; Segal 2001 (Segal 2001 SD and Segal 2001 SU); Steindorf 2014; Travier 2015; van Waart 2014 (van Waart 2014 high and van Waart 2014 low); Visovsky 2014). Cancer‐specific quality of life was measured with either the FACT‐G scale or the EORTC QLQ‐C30 questionnaire. Twelve studies (1012 women) reported final values, and one study reported change scores for this outcome (Campbell 2005). Meta‐analysis of the 12 studies reporting final values showed a SMD of 0.12 (95% CI 0.00 to 0.25; I2 = 0%; Analysis 1.3; Figure 5) (Cadmus 2007; Cornette 2013; Courneya 2007 (Courneya 2007 AET and Courneya 2007 RET); Haines 2010; Hornsby 2014; Moros 2010; Mutrie 2007; Reis 2013; Schmidt 2014; Steindorf 2014; Travier 2015; Visovsky 2014). The study reporting change scores found statistically significant differences between groups, favouring the exercise group, in cancer‐specific quality of life, using the FACT‐G (Campbell 2005).


Forest plot of comparison: 1 Exercise versus control, outcome: 1.3 Cancer‐specific quality of life.

Forest plot of comparison: 1 Exercise versus control, outcome: 1.3 Cancer‐specific quality of life.

Segal 2001 (Segal 2001 SD and Segal 2001 SU) reported no significant differences between groups for cancer‐specific quality of life measured with FACT‐G, but reported no data. van Waart 2014 (van Waart 2014 high and van Waart 2014 low) assessed cancer‐specific quality of life with the EORTC QLQ‐C30, but reported no summary score and no score for global health, therefore we did not use data in the meta‐analysis. Results for the subscales were all in favour of the exercise groups, some reaching statistical significance and some not. One publication related to another study reported assessment of cancer‐specific quality of life with the Multidimensional Quality of Life scale, Cancer version (MQOLS‐Ca) (Dodd 2010), but neither data nor descriptive results were reported. Taken together, the reported results of studies that could not be included in the meta‐analysis seem to be in line with the pooled result. Results that were not reported concerned less than 10% of all women. We rated the result as moderate‐quality evidence due to lack of blinding, low adherence and unclear or high contamination, and a high rate of incomplete outcome data. We did not detect an indication of publication bias from the funnel plot. See summary of findings Table for the main comparison.

Health‐related quality of life

Four studies examined generic health‐related quality of life (assessed via MOS 36‐Item Short Form Health Survey (MOS SF‐36)) (Cadmus 2007; Crowley 2003; Segal 2001 (Segal 2001 SD and Segal 2001 SU); Travier 2015). We did not perform a meta‐analysis because only data for subscales, but no data for physical and mental health summary measures, were presented. The studies did not find statistically significant differences between groups. Another study assessed generic health‐related quality of life via the EQ‐5D VAS (score range 0 to 100) and did not find statistically significant differences between groups: mean difference (MD) 1.10 (95% CI ‐5.28 to 7.48; Analysis 1.4) (Haines 2010). We rated this result as low‐quality evidence due to lack of blinding, low adherence and high contamination, a high rate of incomplete outcome data, and high risk of bias for group similarity at baseline. Additionally, we further downgraded for imprecision because of a small number of participants and because the null effect and an appreciable benefit were included in the confidence interval for the mean difference. With only one study extracted for the analysis, examination of the funnel plot for publication bias was not possible. We did not downgrade for publication bias. See summary of findings Table for the main comparison.

Cancer site‐specific quality of life

Ten studies examined the effects of exercise on cancer site‐specific quality of life (Cadmus 2007; Campbell 2005; Eakin 2012; Haines 2010; Hayes 2013 (Hayes 2013 FtF and Hayes 2013 Tel); Hornsby 2014; Mutrie 2007; Schmidt 2014; Segal 2001 (Segal 2001 SD and Segal 2001 SU); Steindorf 2014). We could extract data for meta‐analysis from four studies (262 women) (Cadmus 2007; Campbell 2005; Hornsby 2014; Mutrie 2007), which had all used the FACT‐B questionnaire (score range 0 to 144): MD 4.24 (95% CI ‐1.81 to 10.29; I2 = 25%; Analysis 1.5). Three studies did not report a summary score of the EORTC QLQ‐BR23 questionnaire and were therefore not included in the meta‐analysis (Haines 2010; Schmidt 2014; Steindorf 2014). Segal 2001 (Segal 2001 SD and Segal 2001 SU ‐ 123 women) reported finding no significant differences between groups for cancer site‐specific quality of life measured with the FACT‐B questionnaire, but reported no data. We rated the result for cancer site‐specific quality of life as low‐quality evidence due to lack of blinding, low adherence, an unclear or high amount of contamination in three of the four studies in the meta‐analysis, and because two of four allocation concealment procedures were unclear. Furthermore, the number of women included in the meta‐analysis was small (n = 262), and the null effect as well as an appreciable benefit were included in the confidence interval for the mean difference, leading to further downgrading for imprecision. The four studies in the meta‐analysis were not sufficient for the examination of publication bias in the funnel plot, as at least 10 studies were considered a sufficient number. We did not downgrade for publication bias. See summary of findings Table for the main comparison.

Depression

Seven studies examined group differences for depression (Cadmus 2007; Courneya 2007: Courneya 2007 AET and Courneya 2007 RET; Dodd 2010; Mutrie 2007; Perna 2010; Schmidt 2014; Steindorf 2014). Meta‐analysis was possible for five studies (674 women) yielding six comparisons. Depression was assessed either with the Beck Depression Inventory (BDI) or with the Center for Epidemiological Studies‐Depression scale (CES‐D). The SMD between intervention and control was ‐0.15 (95% CI ‐0.30 to 0.01; I2 = 0%; Analysis 1.6; Figure 6). Two studies involving 51 and 119 women (Dodd 2010 and Perna 2010, respectively) did not report their data for depression, or it was not possible to extract data due to lack of information. Being borderline significant, their reported results might not necessarily alter that of the meta‐analysis, but they might contribute to a more or less clear inclusion of the null effect: one study reported finding a statistically significant effect in favour of the exercise group (Perna 2010), and the second study reported that depression scores did not change in any of the groups (Dodd 2010). We rated the result as moderate‐quality evidence due to lack of blinding, low adherence, and unclear or high contamination. The five studies in the meta‐analysis were not sufficient for the examination of publication bias in the funnel plot, as at least 10 studies were considered a sufficient number. We did not downgrade for publication bias. See summary of findings Table for the main comparison.


Forest plot of comparison: 1 Exercise versus control, outcome: 1.6 Depression.

Forest plot of comparison: 1 Exercise versus control, outcome: 1.6 Depression.

Cognitive function

Two studies including a total of 213 women examined the effects of exercise on cognitive function with the Trail Making Test (Schmidt 2014; Steindorf 2014). The MD between intervention and control was ‐11.55 (95% CI ‐22.06 to ‐1.05; I2 = 0%; Analysis 1.7). Another study reported in their published study protocol that it aimed to assess cognitive function besides psychosocial well‐being, but the results paper did not mention the cognitive function outcome (Gokal 2013). Crowley 2003 reported having assessed attention performance with the Attention Functional Index, finding no statistically significant difference between groups. Data could not be extracted. Considering the relatively small number of women in the meta‐analysis, and missing data or no evidence of a difference for another 72 women outside the meta‐analysis, we can make a conclusion on the effect only under reservation. We rated the result as low‐quality evidence due to lack of blinding, low and unclear adherence and unclear contamination, and because group similarity at baseline for Schmidt 2014 was at high risk of bias. Schmidt 2014 reported a higher number of participants with depression, which often leads to impairment of cognitive function, in the control group. Our confidence in the result was lowered further because of imprecision (a small number of women). The two studies in the meta‐analysis were not sufficient for the examination of publication bias in the funnel plot as at least 10 studies were considered a sufficient number. We did not downgrade for publication bias. See summary of findings Table for the main comparison.

Secondary outcomes

Strength

Fourteen studies reported assessment of changes in muscular strength (Battaglini 2004; Cornette 2013; Courneya 2007 (Courneya 2007 AET and Courneya 2007 RET); Crowley 2003; Drouin 2002; Haines 2010; Hayes 2013 (Hayes 2013 FtF and Hayes 2013 Tel); Ingram 2010; Schmidt 2014; Schwartz 2007 (Schwartz 2007 AET and Schwartz 2007 RET); Steindorf 2014; Travier 2015; van Waart 2014 (van Waart 2014 high and van Waart 2014 low); Visovsky 2014). We could extract data from nine studies yielding 13 comparisons for the meta‐analysis (Battaglini 2004; Cornette 2013; Courneya 2007 (Courneya 2007 AET and Courneya 2007 RET); Drouin 2002; Haines 2010; Hayes 2013 (Hayes 2013 FtF and Hayes 2013 Tel); Schwartz 2007 (Schwartz 2007 AET and Schwartz 2007 RET); Travier 2015; van Waart 2014 (van Waart 2014 high and van Waart 2014 low)), which showed a SMD of 0.27 (95% CI 0.04 to 0.50; I2 = 59%; 912 women; Analysis 1.8). The heterogeneity could be explained by a wide range of interventions and outcome assessment protocols.

Two studies reported significant improvements in muscle strength (Schmidt 2014; Steindorf 2014), but did not report data for the results. One study reported finding no significant change in upper or lower body strength between the two groups across the study period (Crowley 2003), but did not report data for the results. One study measuring strength was finished early with no reported results (Ingram 2010), while another study did not report results about strength (Visovsky 2014).

We rated the result of a statistically significant improvement in strength as moderate‐quality evidence due to lack of blinding, low adherence and mostly unclear amount of contamination, and because many allocation procedures were unclear. We did not detect an indication of publication bias from the funnel plot.

The studies used different assessment protocols to measure muscular strength. Extracted data for the studies in the meta‐analysis is from the following assessment protocols: leg press strength (Cornette 2013; Haines 2010), grip strength (Drouin 2002; Travier 2015 ‐ right hand; van Waart 2014), overhead press, chest (Courneya 2007: Courneya 2007 AET and Courneya 2007 RET; Schwartz 2007: Schwartz 2007 AET and Schwartz 2007 RET), overall muscular strength (Battaglini 2004), and upper body function (strength and endurance) (Hayes 2013: Hayes 2013 FtF and Hayes 2013 Tel).

Subjective upper body function

Furthermore, two studies reported subjective upper body function measured with the Disability of Arm, Shoulder and Hand Questionnaire (DASH) (Eakin 2012; Hayes 2013: Hayes 2013 FtF and Hayes 2013 Tel). There was no statistically significant difference between groups in the meta‐analysis: MD ‐0.52 (95% ‐4.45 to 3.41; 231 women; Analysis 1.9). We rated the result as low‐quality evidence due to lack of blinding, low adherence and high contamination, and because the allocation concealment procedures were unclear. The group similarity at baseline was at high risk of bias as well. Furthermore, the number of participants was small, and confidence intervals were wide, raising concerns about imprecision, which further lowered our confidence in the result.

Shoulder mobility

Two studies measured shoulder range of motion (Haines 2010; Reis 2013), and one study reported a shoulder mobility score (Mutrie 2007). We could only extract data for the shoulder mobility score, and found a statistically significant difference between groups: MD 3.10 (95% CI 1.54 to 4.66; 174 women; 1 study; Analysis 1.10). Two further studies reported assessment of shoulder range of motion in their respective design papers, but did not mention the outcome in the final publications (Schmidt 2014; Steindorf 2014). We rated the result as low‐quality evidence due to lack of blinding, low adherence, and an unclear amount of contamination in the one study that reported results. The small number of participants and the lack of data reporting in four of five studies further lowered our confidence in the result.

Arm morbidity

Two studies used the FACT‐B + 4 questionnaire, which for cancer site‐specific quality of life includes four questions in addition to the FACT‐B about arm morbidity and lymphoedema (higher scores mean less upper extremity impairment) (Eakin 2012; Hayes 2013: Hayes 2013 FtF and Hayes 2013 Tel). Meta‐analysis of the two studies yielding three comparisons showed a MD between groups of 1.11 (95% CI ‐4.07 to 6.29; I2 = 0%; 240 women; Analysis 1.11). We rated the result as low‐quality evidence due to lack of blinding, unclear allocation concealment, low adherence and high contamination, and imprecision (small number of participants).

Other psychological distress outcomes
Anxiety

Three studies assessed anxiety using the State‐Trait Anxiety Inventory (STAI) (Cadmus 2007; Courneya 2007: Courneya 2007 AET and Courneya 2007 RET; Eakin 2012); Eakin 2012 used the short form of the questionnaire, and Cadmus 2007 reported that they only used the state anxiety and not the trait anxiety scale. Cadmus 2007 found a small, not statistically significant effect. Meta‐analysis of the two studies using the whole questionnaire yielding three comparisons found no statistically significant difference between groups: MD ‐1.45 (95% CI ‐4.36 to 1.46; 331 women; Analysis 1.12) (Courneya 2007: Courneya 2007 AET and Courneya 2007 RET; Eakin 2012). We rated the result as low‐quality evidence due to lack of blinding, low adherence, and high or unclear contamination. The rather small number of participants further lowered our confidence in the results.

Mood disturbances

Meta‐analysis of data from three small studies assessing mood disturbances showed a SMD of ‐1.00 (95% CI ‐1.40 to ‐0.60; I2 = 0%; 111 women; Analysis 1.13) (Drouin 2002; Gokal 2013; Yang 2011). One study used the long version of the Profile of Mood States (POMS) questionnaire (Drouin 2002), whereas the other two studies used the short form of the questionnaire (Gokal 2013; Yang 2011). We rated the statistically significant result of diminished mood disturbances as low‐quality evidence due to lack of blinding, low adherence and unclear contamination in two of the three studies, and because the allocation concealment procedures of the three studies were unclear. The small number of participants (n = 111) further lowered our confidence in the results.

Psychological distress

Another study used the General Health Questionnaire to assess psychological distress and did not find a statistically significant difference between groups: MD ‐1.47 (95% CI ‐9.38 to 6.44) (Moros 2010). One study measured negative and positive affects with the Positive and Negative Affect Schedule and found a statistically significant difference for positive affects but not for negative affects (positive affects: MD 4.10, 95% CI 1.38 to 6.82; negative affects: MD ‐2.10, 95% CI ‐4.18 to ‐0.02) (Mutrie 2007). Cadmus 2007 reported perceived stress and happiness; in this one study stress did not show a statistically significant difference (MD ‐3.10; 95% CI ‐6.63 to 0.43), and neither did happiness (MD ‐0.90; 95% CI ‐9.92 to 8.12).

Anxiety and Depression

Four studies used the Hospital Anxiety and Depression Scale to examine symptoms of depression and anxiety in one questionnaire (Cornette 2013; Gokal 2013; Travier 2015; van Waart 2014: van Waart 2014 high and van Waart 2014 low). Cornette 2013 reported final values for the summary score and found a statistically significant difference between groups: MD ‐6.10 (95% CI ‐9.65 to ‐2.55; 20 women; Analysis 1.14). The other three studies reported finding no statistically significant differences between groups for the summary score. van Waart 2014 did not report any data. Travier 2015 and Gokal 2013 additionally reported final values for the depression and anxiety subscales.

Due to the differences of the outcome assessment instruments and the differing utilisation of summary scores and subscales, we did not pool the data from the studies, apart from those using the POMS. We rated the results from the single studies as low‐quality evidence due to the high risk of bias in all of the studies and the small numbers of participants.

Sleep disturbances

Dodd 2010 and van Waart 2014 (van Waart 2014 high and van Waart 2014 low) reported sleep quality as an outcome, measured with the General Sleep Disturbance Scale and the Pittsburgh Sleep Quality Index respectively, but did not report results. Dodd 2010 reported only that no group differences were detected over time.

Self esteem

Three studies examined the effects of exercise on self esteem with the Rosenberg self‐esteem scale (Cadmus 2007; Courneya 2007: Courneya 2007 AET and Courneya 2007 RET; Gokal 2013). We found no statistically significant difference in the meta‐analysis: MD 1.69 (95% CI ‐0.01 to 3.39; 323 women; Analysis 1.15). The heterogeneity of 57% was introduced by Gokal 2013. Removal of Gokal 2013 in a sensitivity analysis lowered the heterogeneity to 0% with a MD of 0.97 (95% CI ‐0.28 to 2.21). We rated the result as low‐quality evidence due to lack of blinding, low adherence and high or unclear contamination in two studies, and unexplained heterogeneity.

Physical activity behaviour

Meta‐analysis of seven studies showed a SMD of 0.29 (95% CI 0.12 to 0.47; 549 women; Analysis 1.16) (Caldwell 2009; Cornette 2013; Eakin 2012; Hayes 2013: Hayes 2013 FtF and Hayes 2013 Tel; Husebo 2014; Mutrie 2007; Yang 2011). Six further studies also examined the effects of exercise on physical activity, but they did not report data that could be used for meta‐analysis (Cadmus 2007; Gokal 2013; Hornsby 2014; Mock 2004; Perna 2010; van Waart 2014: van Waart 2014 high and van Waart 2014 low). However, three of these studies reported that there were no statistically significant group differences for physical activity (Hornsby 2014; Mock 2004; van Waart 2014: van Waart 2014 high and van Waart 2014 low). On the other hand, Perna 2010 reported that there were significantly higher LTEQ (leisure time exercise questionnaire) scores in the intervention group than in controls.

We rated the result as moderate‐quality evidence due to lack of blinding, low adherence and high or unclear contamination, and because many (six of seven) allocation concealment procedures were unclear.

Multidimensional outcomes
Self efficacy

One study reported that confidence to exercise (self efficacy) increased significantly more in the intervention group than in the control group (Eakin 2012). We could not extract data. A second study reported that there were no statistically significant differences between groups for physical self efficacy (Crowley 2003). Another study reported assessment of self efficacy regarding the performance of physical activity in the design paper, but did not mention the outcome in the publication of study results (Travier 2015).

Functioning in daily life and return to work

One study reported that there were no statistically significant differences for functioning in daily life for either of the two exercise groups (van Waart 2014: van Waart 2014 high and van Waart 2014 low). Functioning in daily life was measured with the Impact on Participation and Autonomy (IPA) instrument. Another study reported assessment of perceived impact of the disease on participation and autonomy assessed with the same instrument (IPA) in the study design paper, but did not mention the outcome in the final publication (Travier 2015). The first study, van Waart 2014 (van Waart 2014 high and van Waart 2014 low), additionally used the study‐specific "Return to work questionnaire", and reported that at the end of the intervention a significantly greater number of participants in the exercise groups were working than in the usual‐care group, and that at follow‐up the intervention groups had significantly higher return‐to‐work rates than the usual‐care group and worked a significantly higher percentage of the pre‐illness hours on the job. We could not extract data.

Symptom severity and symptom interference

One small study used the Taiwanese version of the MD Anderson Symptom Inventory (MDASI‐T) to assess symptom severity and symptom interference with daily life and reached statistically significant results for both outcomes: symptom severity MD ‐1.49 (95% CI ‐2.36 to ‐0.62) and symptom interference MD ‐1.10 (95% CI ‐1.89 to ‐0.31) (Yang 2011). Another study used the Symptom Experience Scale, but did not report results (Visovsky 2014). We rated the results as low‐quality evidence due to lack of blinding, low adherence and unclear amount contamination, and because the allocation concealment procedure was unclear. The small number of participants (n = 40) further lowered our confidence in the results.

One study reported finding no statistically significant differences between groups for satisfaction with life (Campbell 2005). Another study developed a study‐specific functional wellness questionnaire, and did not find a statistically significant difference between groups (Crowley 2003).

Chemotherapy completion

Two studies assessed chemotherapy completion rates, but different outcome measures did not allow for meta‐analysis (Courneya 2007: Courneya 2007 AET and Courneya 2007 RET; van Waart 2014: van Waart 2014 high and van Waart 2014 low). van Waart 2014 (van Waart 2014 high and van Waart 2014 low) reported how many women in each group required a dose adjustment of the chemotherapy and the average dose reduction among these women. Statistically significantly fewer women required a dose adjustment for the high‐intensity exercise group compared to the low‐intensity exercise and the control groups. There was also a statistically significant difference in the average dose reduction between the two exercise groups and the control group.

Courneya 2007 (Courneya 2007 AET and Courneya 2007 RET) assessed chemotherapy completion rate as the "average relative dose‐intensity (RDI) for the originally planned regimen based on standard formulas". The study reported the percentage of women in each group that received at least 85% of their planned RDI, and found no statistically significant differences between groups.

Other side effects relating to adjuvant cancer treatment: neuropathy symptoms, endocrine symptoms, pain, gait and balance, nausea

Neuropathic pain was measured with the neuropathic pain scale in one study (Hayes 2013: Hayes 2013 FtF and Hayes 2013 Tel), while another study, Visovsky 2014, measured neuropathy symptoms with the FACT‐Taxane scale in women treated with taxanes. Neither study reached statistically significant results: neuropathic pain (Hayes 2013: Hayes 2013 FtF and Hayes 2013 Tel): MD 3.64 (95% CI ‐1.32 to 8.60; 130 women; Analysis 1.17) and neuropathy symptoms (Visovsky 2014): MD ‐0.21 (95% CI ‐0.75 to 0.33; 19 women; Analysis 1.18). We rated these results as low‐quality evidence due to lack of blinding, unclear allocation concealment procedures, low or unclear adherence, and high or unclear contamination. Group similarity at baseline was unclear or at high risk of bias as well. Imprecision due to a small number of participants and wide confidence intervals further lowered our confidence in the results. Visovsky 2014 also reported assessment of cold thermal sensation and vibratory sensation in the trial registration, but did not report these outcomes in the results paper.

Hayes 2013 (Hayes 2013 FtF and Hayes 2013 Tel) measured menopausal symptoms with the Greene Climacteric Scale and found no group differences. No summary score was presented. Mutrie 2007 assessed endocrine symptoms with the FACT for endocrine symptoms (FACT‐ES): MD 1.30 (95% CI ‐1.49 to 4.09; 174 women; Analysis 1.19). We rated the result as low‐quality evidence due to lack of blinding, low adherence, and unclear contamination. The small number of participants in this one study reporting a summary score further lowered our confidence in the results.

Dodd 2010 reported pain measured with the Worst Pain Intensity Scale and found no statistically significant differences between groups. No data were reported.

Gait and balance was measured with the Timed Get‐up‐and‐Go Test in two studies (Caldwell 2009; Visovsky 2014), and with a step test in one study (Haines 2010). There was no statistically significant difference between groups in the meta‐analysis: SMD 0.10 (95% CI ‐0.25 to 0.46; 122 women; Analysis 1.20). We rated the result as low‐quality evidence due to lack of blinding, low and unclear adherence, and unclear and high contamination. Allocation concealment procedures were unclear in two studies. The small number of participants further lowered our confidence in the result.

Two studies reported nausea as an outcome (Dodd 2010; Winningham 1988). Winningham 1988 assessed nausea with the Symptom Checklist‐90 (SCL‐90) item for nausea, whereas Dodd 2010 assessed nausea intensity with a subscale of a symptom checklist of 25 commonly experienced symptoms. We did not consider either of these to be valid methods of assessment and therefore did not report data here.

Harms

Twenty‐one studies assessed adverse effects due to exercise (Battaglini 2004; Cadmus 2007; Campbell 2005; Cornette 2013; Courneya 2007: Courneya 2007 AET and Courneya 2007 RET; Crowley 2003; Dodd 2010; Drouin 2002; Eakin 2012; Haines 2010;Hornsby 2014; Husebo 2014; Mock 2004; Moros 2010; Schmidt 2014; Schwartz 2007: Schwartz 2007 AET and Schwartz 2007 RET; Segal 2001: Segal 2001 SD and Segal 2001 SU; Steindorf 2014; Travier 2015; Visovsky 2014; Yang 2011). Seven studies observed adverse effects (Crowley 2003; Dodd 2010; Drouin 2002; Eakin 2012; Haines 2010; Hornsby 2014; Husebo 2014). Details of these adverse effects can be found in the Characteristics of included studies table. In general, adverse effects concerned only a very small number of women.

Seven studies described how relevant information on adverse effects was collected. In two studies this was done by the exercise trainers who supervised the intervention (Courneya 2007: Courneya 2007 AET and Courneya 2007 RET; Hornsby 2014). In one of these studies (Hornsby 2014), safety of the supervised aerobic exercise intervention was the primary outcome, and all adverse effects during aerobic training were monitored and reported on the participant case report forms. In one study with a telephone group (Eakin 2012), exercise physiologists recorded adverse effects after each call in case management folders; Husebo 2014 used biweekly telephone calls to monitor adverse effects. The participants in Haines 2010 were told to document adverse effects and accidental falls in a log book. Two studies used standardised questionnaires where participants recorded adverse effects (Schmidt 2014; Steindorf 2014). In one of these two studies (Steindorf 2014), adverse effects reported spontaneously by the participant or observed by therapists were also recorded. Many of the studies did not describe how relevant information was collected or whether surveillance of adverse effects was passive (spontaneously reported by participants) or active (based on structured questionnaires or interviews).

Lymphoedema

Two studies (Courneya 2007: Courneya 2007 AET and Courneya 2007 RET; Hayes 2013: Hayes 2013 FtF and Hayes 2013 Tel) systematically assessed the incidence of lymphoedema. Hayes 2013 (Hayes 2013 FtF and Hayes 2013 Tel) reported four objectively measured cases each in both exercise groups (face to face: n = 67 and telephone: n = 67) and six cases in the usual‐care group (n = 60). These numbers reported for Hayes 2013 (Hayes 2013 FtF and Hayes 2013 Tel) were for all women and not only women who exercised concurrently with their adjuvant treatment. Meta‐analysis of the two studies yielding four comparisons showed a risk ratio for lymphoedema of 0.71 (95% CI 0.35 to 1.45; 436 women; Analysis 1.21). Crowley 2003 reported lymphoedema as an adverse effect in one woman. Haines 2010 measured the circumference of upper limb segments and reported that changes in the circumference of upper limb segments favoured the intervention group.

We rated the results as low‐quality evidence due to lack of blinding, low adherence and a high or unclear amount of contamination, the allocation concealment procedure was unclear in one of the two studies, and group similarity at baseline was at high risk of bias for one study. Furthermore, the null effect and appreciable harm and benefit were included in the confidence interval for the risk ratio. The two studies in the meta‐analysis were not sufficient for the examination of publication bias in the funnel plot as at least 10 studies were considered to be a sufficient number. We did not downgrade for publication bias. See summary of findings Table for the main comparison.

Effectiveness and adverse effects during follow‐up

Several studies assessed effectiveness of exercise during adjuvant therapy after a follow‐up period of several months.

Five studies measured physical fitness 18 weeks, in Travier 2015, to six months after the intervention: SMD 0.26 (95% CI ‐0.06 to 0.57; 612 women; Analysis 2.1) (Cornette 2013; Husebo 2014; Mutrie 2007; Travier 2015; van Waart 2014: van Waart 2014 high and van Waart 2014 low). We rated the result as low‐quality evidence due to lack of blinding, low adherence, and unclear or high contamination. Additionally, randomisation and allocation concealment procedures were unclear in several studies. Heterogeneity with an I2 of 70% could be only partly explained by introduction of Travier 2015, which was the only study with a follow‐up period shorter than six months. Removal of the study in the sensitivity analysis still resulted in an I2 of 40% and a SMD of 0.38 (95% CI 0.121 to 0.63). It remained unclear why a shorter follow‐up period would result in a smaller effect for cardiorespiratory fitness. With regard to all of these studies, the large variation in effect and confidence intervals that did not overlap raised concerns about inconsistency, which further lowered our confidence in the result.

Six studies assessed fatigue 18 weeks, in Travier 2015, to six months after the intervention period: SMD ‐0.21 (95% CI ‐0.35 to ‐0.07; 814 women; Analysis 2.2) (Cornette 2013; Courneya 2007: Courneya 2007 AET and Courneya 2007 RET; Husebo 2014; Mutrie 2007; Travier 2015; van Waart 2014: van Waart 2014 high and van Waart 2014 low). We rated the result as moderate‐quality evidence due to lack of blinding, low adherence, and unclear or high contamination. Furthermore, randomisation and allocation concealment procedures were unclear in several studies. A sensitivity analysis including only the five studies with a six‐month follow‐up period also showed a SMD between intervention and control of ‐0.21 (95% CI ‐0.37 to ‐0.05).

Six studies assessed cancer‐specific quality of life 12 weeks, in Visovsky 2014, to six months after the intervention: SMD 0.18 (95% CI 0.01 to 0.35; 583 women; Analysis 2.3) (Cornette 2013; Courneya 2007: Courneya 2007 AET and Courneya 2007 RET; Mutrie 2007; Travier 2015; van Waart 2014: van Waart 2014 high and van Waart 2014 low; Visovsky 2014). van Waart 2014 (van Waart 2014 high and van Waart 2014 low) only presented data for subscales and no summary score. We rated the result as moderate‐quality evidence due to lack of blinding, low adherence and unclear or high contamination, and because risk of attrition bias was high in two of the five studies in the meta‐analysis. Furthermore, randomisation and allocation concealment procedures were unclear in two studies. A sensitivity analysis including only the three studies with a six‐month follow‐up period showed a SMD between intervention and control of 0.25 (95% CI 0.04 to 0.45) (Cornette 2013; Courneya 2007: Courneya 2007 AET and Courneya 2007 RET; Mutrie 2007).

Two studies assessed depression six months after the intervention period: SMD ‐0.27 (95% CI ‐0.48 to ‐0.06; 378 women; Analysis 2.4) (Courneya 2007: Courneya 2007 AET and Courneya 2007 RET; Mutrie 2007). We rated the result as moderate‐quality evidence due to lack of blinding, low adherence, and unclear contamination.

Three studies assessed strength at six months' follow‐up: SMD 0.00 (95% CI ‐0.30 to 0.30; 386 women; Analysis 2.5) (Cornette 2013; Travier 2015; van Waart 2014: van Waart 2014 high and van Waart 2014 low). The heterogeneity, with an I2 of 49%, was introduced by the high‐intensity group (resistance training) of van Waart 2014 high and was reduced to 0% when this study was removed in a sensitivity analysis (SMD ‐0.11; 95% CI ‐0.35 to 0.13); thus heterogeneity could be explained. We rated the result as low‐quality evidence due to lack of blinding, low adherence and unclear or high contamination, and because there was a high risk of selection bias in two of three studies. Furthermore, allocation concealment procedures were unclear in two studies. The wide confidence interval introduced uncertainty about the magnitude of the effect, and thus confidence in the result was lowered further.

Three studies assessed physical activity at six months' follow‐up: SMD 0.28 (95% CI ‐0.05 to 0.61; 261 women; Analysis 2.6) (Cornette 2013; Husebo 2014; Mutrie 2007). We rated the result as low‐quality evidence due to lack of blinding, low adherence, and unclear or high contamination. Additionally, randomisation and allocation concealment procedures were unclear in two of the three studies and there was attrition bias in two studies. Furthermore, imprecision (rather small number of participants and null effect and appreciable benefit included in confidence interval for SMD) further lowered our confidence in the result.

Courneya 2007 (Courneya 2007 AET and Courneya 2007 RET) also reported results for anxiety (MD ‐3.61; 95% CI ‐7.24 to 0.03; 201 women; Analysis 2.7) and self esteem (MD 1.20; 95% CI ‐0.41 to 2.81; 201 women; Analysis 2.8) six months after the intervention, and Mutrie 2007 reported results for endocrine symptoms (MD 1.30; 95% CI ‐1.65 to 4.25; 177 women; Analysis 2.9) and positive affects (MD ‐0.59; 95% CI ‐1.63 to 0.45) and negative affects (MD ‐1.70; 95% CI ‐3.62 to 0.22). One study also assessed neuropathy symptoms (MD ‐0.45; 95% CI ‐0.98 to 0.08; 19 women; Analysis 2.10) and gait and balance (MD ‐0.59; 95% CI ‐1.63 to 0.45; 19 women; Analysis 2.11) 12 weeks after the end of the intervention (Visovsky 2014). We rated all these results as low‐quality evidence due to the small number of women in the single studies, lack of blinding, low adherence, and unclear contamination. All 'Risk of bias' items for Visovsky 2014 were at high or unclear risk of bias.

One study with an intervention period of one year reported no significant differences for any outcome after one year, but did not report data (Dodd 2010).

One study reported long‐term follow‐up data for 18 months and five years in an additional paper from 2012 (Mutrie 2007), which was the longest follow‐up period of the included studies. For both follow‐up periods, less than 70% of the original participants were included in the analysis, therefore we did not present data here.

Lymphoedema

One study reported lymphoedema incidence eight weeks after the intervention (Hayes 2013: Hayes 2013 FtF and Hayes 2013 Tel). The numbers reported were for all women and not only women who exercised concurrently with their adjuvant treatment (risk ratio 0.79; 95% CI 0.37 to 1.69; 194 women; Analysis 2.12). Otherwise, harms or adverse effects were not reported after a follow‐up period.

Discussion

Summary of main results

Exercise during adjuvant treatment for breast cancer improves physical fitness and probably slightly reduces fatigue. It likely leads to little or no difference in depression and cancer‐specific quality of life. Women with breast cancer may benefit from exercise during adjuvant cancer treatment through improved cognitive function and slightly improved cancer site‐specific quality of life. Exercise may lead to little or no improvement in health‐related quality of life and may lower the risk of lymphoedema.

Exercise also probably slightly improves muscular strength and leads to a slightly higher amount of physical activity. Women with breast cancer who exercise during adjuvant treatment may experience fewer mood disturbances, and their shoulder mobility might be slightly improved. For other outcomes such as self esteem, exercise may lead to little or no difference. For some of the outcomes not all studies could be included in the meta‐analyses, therefore the results do not reflect the body of evidence as a whole. An improvement for several other outcomes is uncertain, mostly due to scarcity or lack of data.

Overall completeness and applicability of evidence

This review is based on studies with a considerable degree of clinical heterogeneity regarding adjuvant cancer treatment and exercise interventions. It remains to be explored whether differences in adjuvant cancer treatment and exercise intervention actually affect results. In spite of our comprehensive attempts to identify all relevant studies, we retrieved predominantly English language studies for inclusion in this review. This may reflect selective publication of English language studies with statistically significant findings. In addition, most of the included studies were primarily conducted among white women in high‐income countries, which makes generalisation of the results to different ethnic groups and countries questionable.

Type of intervention

Exercise interventions varied widely regarding implementation of aerobic or resistance exercise or a combination of both, supervised or home‐based, frequency, duration, and intensity. Furthermore, the reporting of the exercise intervention differed, and details were not always available.

As exercise as such can be seen as a complex intervention, the evaluation of exercise interventions is prone to the diverse challenges associated with evaluating complex interventions.

Key features of complex interventions (according to Craig 2008: the Medical Research Council guidance on complex interventions) applicable to exercise are the:

  • number of interacting components, e.g. exercise in a group versus alone; aerobic or resistance exercise; or the empathy of an exercise instructor;

  • number and difficulty of behaviours required by those delivering or receiving the intervention, e.g. adherence to the prescribed intensity of exercise or challenging participants to exercise at a higher heart rate;

  • number and variability of outcomes, e.g. fatigue, depression, anxiety with a range of scales to assess;

  • degree of flexibility or tailoring of the intervention permitted (non‐standardisation/reproducibility), e.g. tailoring the exercise intervention according to the motivation level of the group or to the current physical condition of the woman with breast cancer.

The aim of this review was to answer quite a broad review question, namely to assess the effect of exercise per se on several patient‐relevant outcomes for women undergoing adjuvant treatment for breast cancer, which is why we included exercise interventions delivered at diverse levels and with a variety of components. Based on this evidence, further steps could be taken to identify and differentiate all the interacting components of exercise interventions.

Type of control

Most studies reported usual care as the control intervention, but they often did not describe this in detail. As studies were conducted in different settings, home‐based or at the treatment clinic, and with different types of adjuvant treatment, variation in usual care in the included studies should be taken into account. Some studies offered control interventions in order to establish similar circumstances with regards to time and attention and, if applicable, group interaction.

Type of adjuvant treatment

Women were treated in some studies with radiation treatment only, chemotherapy only, or a combination of the two, resulting in large variation in the duration and frequency of adjuvant treatment as well as differences in possible side effects. We included studies with the latest treatment protocols as well as studies from the 1980s onward, meaning that not only treatment protocols but also drugs to treat side effects were used in many different ways in the included studies. Details on treatment protocols were provided in some, but not all studies, whereas drug treatment of side effects was most often not described.

Timing of outcome assessment and follow‐up

As it has been shown that, for example psychological distress of women, diminishes with time after diagnosis, timing of measuring outcomes can make a difference in the magnitude and direction of the effect of the intervention. The same may apply to other outcomes assessed in our review. Timing of outcome assessments was very heterogeneous in the included studies, which should be kept in mind when comparing different studies and interpreting results. Follow‐up ranged from no follow‐up to five years in one study. Analyses were only possible for a few studies with a maximum follow‐up period of six months, as either data reporting was poor or attrition rates were too high for studies with longer follow‐up periods. Apart from lymphoedema, which was reported after a short follow‐up period in one study, harms or adverse effects were not reported after a follow‐up period.

Reporting of outcome measures

A wide range of outcome measures was assessed across the studies, making it difficult to combine outcomes in meta‐analysis. Moreover, data reporting was often poor and did not provide estimates of effect size that could be pooled. Assessment and reporting of harms‐related data from exercise intervention studies during adjuvant cancer treatment also needs improvement. Future studies should apply so‐called core outcome sets to facilitate comparison and meta‐analysis (Gargon 2014).

Quality of the evidence

Quality of studies

Due to the nature of exercise as an intervention, blinding of participants and exercise supervisors is not possible. The precise effect of the absence of blinding on the magnitude and direction of the treatment effect is unclear, but constitutes a high risk for performance bias. Furthermore, many outcomes were self reported, which leads to a high inherent risk of detection bias, when blinding of participants is not possible. We decided not to downgrade studies for those 'Risk of bias' items alone. However, reasons for assigning studies a high risk of bias was never due only to lack of blinding, contamination, and/or non‐adherence, because other factors such as unclear allocation procedures or high attrition rates were present as well, leading to downgrading of the evidence one level for risk of bias. We included only randomised controlled studies in this version of the review, but 11 studies reported insufficient details on random sequence generation, and 23 provided insufficient details on allocation concealment.

Statistical power

Benefits of exercise interventions may be relatively small. Subsequently, the number of included participants should be great enough to allow for the detection of small differences between groups. The sample sizes in the included studies ranged from 5 to 91 in the intervention group.

Initial fitness level

The individual's level of fitness is an important factor to consider before determining the level of exercise intensity (ACSM 2000). According to the American College of Sports Medicine (ACSM 2000), deconditioned individuals may demonstrate increases in their cardiorespiratory fitness with exercise intensities at the lower end of the intensity continuum, whereas more fit individuals need to work at the higher end of the intensity continuum to improve fitness. A small number of studies limited participation to sedentary women; however, definitions of sedentary varied.

Adherence and contamination

For sedentary individuals, a change in personal health behaviour is required in order to take up regular exercise. Thus, any exercise intervention can additionally be evaluated according to the degree of behavioural change achieved in the intervention group; a lack of adherence can compromise the training stimulus as well as the sustainability of exercise behaviour. According to the American College of Sports Medicine (ACSM 2000), the art of exercise prescription is the "successful integration of exercise science with behavioral techniques that result in long‐term program compliance". Some studies applied theory‐based methods focused on changing behaviour. Adherence problems do not only arise in terms of participation in exercise sessions and frequency of sessions, but also in terms of the training intensity and duration actually achieved during each exercise session. Insufficient exercise intensity or duration may compromise the training stimulus as a whole. However, these two facets of the training stimulus were poorly evaluated and reported in many of the included studies.

Besides adherence, the extent to which the control group performs exercise (contamination) is a second critical component in exercise studies. Exercise contamination is rarely reported and often only when the exercise programme is home‐based. Furthermore, reports of adherence and contamination most often rely on self report by participants, which can lead to over‐reporting of adherence but also contamination.

Potential biases in the review process

Despite a well‐established methodology for conducting systematic reviews, subjective judgement is inevitable throughout the process. The main limitation of this review was the lack of sufficient information or data in many studies to make a clear judgement in various bias domains. Other limitations were the heterogeneity in the intervention delivered, adjuvant treatment, timing of outcome measurement, and the assessment and reporting of outcome measures with a wide range of outcome measures.

We did not systematically evaluate assessment instruments with regards to their strengths and weaknesses. However, we noted that for the outcome physical activity, the reported activity levels of women with breast cancer were surprisingly high in some studies. Three of seven studies in the meta‐analysis for physical activity used the International Physical Activity Questionnaire (IPAQ) (Caldwell 2009; Cornette 2013; Husebo 2014), which has been reported to lead to over‐reporting of physical activity (Lee 2011).

Mostly due to limited resources, we did not systematically assess the training stimulus in this version of the review as well as the baseline activity and fitness levels of participants and the application of behaviour change theories in the studies.

Agreements and disagreements with other studies or reviews

Another Cochrane systematic review assessed the effect of exercise during adjuvant therapy for cancer on quality of life (Mishra 2012), and another on fatigue (Cramp 2012). Both reviews included adults with different cancer diagnoses, not only breast cancer. Cramp 2012 also included studies evaluating the effect of exercise after adjuvant therapy, and Mishra 2012 included 10 studies with participants both during and after adjuvant therapy. Both reviews identified benefits on fatigue for participants with breast cancer, which is in agreement with the results of our review. Mishra 2012 reported that exercise interventions resulted in improvements in overall quality of life for all participants, but found no statistically significant difference for women with breast cancer. We did not perform a meta‐analysis for overall quality of life because only one study presented a summary measure, whereas the others only reported data for subscales. The single studies reporting that outcome did not result in a significant difference between groups.

Meneses‐Echavez 2015 reviewed the effect of supervised exercise during or after adjuvant therapy for breast cancer on cancer‐related fatigue, and found benefits as well.

Bourke 2014 published another Cochrane systematic review with the main goal of assessing the effects of interventions to promote exercise behaviour in sedentary people living with and beyond cancer. Eleven of 14 studies were conducted in women with breast cancer, but mostly in women who had finished adjuvant treatment. Interventions resulted in improvements in aerobic exercise tolerance at 8 to 12 weeks in intervention participants compared with controls. Aerobic exercise tolerance was also improved at six months. These findings are in agreement with our review, but participants differed with regard to cancer diagnosis and treatment status.

One systematic review assessed depression and anxiety in addition to fatigue and cancer‐specific quality of life in women with breast cancer undergoing adjuvant therapy (Carayol 2013). The authors reported that the exercise intervention led to statistically significant improvements for fatigue, cancer‐specific quality of life, and depression, while the decrease in anxiety was "borderline significant". Our results for both cancer‐specific quality of life and depression were close to showing a statistically significant difference between groups, favouring exercise, with a lower respectively upper limit of the confidence interval of 0.00 and 0.01. In three of the 17 included studies in the review by Carayol 2013, the intervention was yoga, which we excluded, regarding it as a complex intervention. For depression and anxiety, the authors pooled data from the Hamilton Anxiety and Depression Score with data from the Beck Depression Inventory and Center for Epidemiological Studies‐Depression scale, which we did not. Keeping in mind the similar but not identical inclusion criteria, the findings of this review are thus mostly in line with our review.

Study flow diagram.
Figures and Tables -
Figure 1

Study flow diagram.

Risk of bias summary: review authors' judgements about each risk of bias item for each included study.
Figures and Tables -
Figure 2

Risk of bias summary: review authors' judgements about each risk of bias item for each included study.

Forest plot of comparison: 1 Exercise versus control, outcome: 1.1 Physical fitness.
Figures and Tables -
Figure 3

Forest plot of comparison: 1 Exercise versus control, outcome: 1.1 Physical fitness.

Forest plot of comparison: 1 Exercise versus control, outcome: 1.2 Fatigue.
Figures and Tables -
Figure 4

Forest plot of comparison: 1 Exercise versus control, outcome: 1.2 Fatigue.

Forest plot of comparison: 1 Exercise versus control, outcome: 1.3 Cancer‐specific quality of life.
Figures and Tables -
Figure 5

Forest plot of comparison: 1 Exercise versus control, outcome: 1.3 Cancer‐specific quality of life.

Forest plot of comparison: 1 Exercise versus control, outcome: 1.6 Depression.
Figures and Tables -
Figure 6

Forest plot of comparison: 1 Exercise versus control, outcome: 1.6 Depression.

Comparison 1 Exercise versus control, Outcome 1 Physical fitness.
Figures and Tables -
Analysis 1.1

Comparison 1 Exercise versus control, Outcome 1 Physical fitness.

Comparison 1 Exercise versus control, Outcome 2 Fatigue.
Figures and Tables -
Analysis 1.2

Comparison 1 Exercise versus control, Outcome 2 Fatigue.

Comparison 1 Exercise versus control, Outcome 3 Cancer‐specific quality of life.
Figures and Tables -
Analysis 1.3

Comparison 1 Exercise versus control, Outcome 3 Cancer‐specific quality of life.

Comparison 1 Exercise versus control, Outcome 4 Health‐related quality of life.
Figures and Tables -
Analysis 1.4

Comparison 1 Exercise versus control, Outcome 4 Health‐related quality of life.

Comparison 1 Exercise versus control, Outcome 5 Cancer site‐specific quality of life.
Figures and Tables -
Analysis 1.5

Comparison 1 Exercise versus control, Outcome 5 Cancer site‐specific quality of life.

Comparison 1 Exercise versus control, Outcome 6 Depression.
Figures and Tables -
Analysis 1.6

Comparison 1 Exercise versus control, Outcome 6 Depression.

Comparison 1 Exercise versus control, Outcome 7 Cognitive function.
Figures and Tables -
Analysis 1.7

Comparison 1 Exercise versus control, Outcome 7 Cognitive function.

Comparison 1 Exercise versus control, Outcome 8 Strength.
Figures and Tables -
Analysis 1.8

Comparison 1 Exercise versus control, Outcome 8 Strength.

Comparison 1 Exercise versus control, Outcome 9 Subjective upper body function.
Figures and Tables -
Analysis 1.9

Comparison 1 Exercise versus control, Outcome 9 Subjective upper body function.

Comparison 1 Exercise versus control, Outcome 10 Shoulder mobility.
Figures and Tables -
Analysis 1.10

Comparison 1 Exercise versus control, Outcome 10 Shoulder mobility.

Comparison 1 Exercise versus control, Outcome 11 Arm morbidity.
Figures and Tables -
Analysis 1.11

Comparison 1 Exercise versus control, Outcome 11 Arm morbidity.

Comparison 1 Exercise versus control, Outcome 12 Anxiety.
Figures and Tables -
Analysis 1.12

Comparison 1 Exercise versus control, Outcome 12 Anxiety.

Comparison 1 Exercise versus control, Outcome 13 Mood disturbances.
Figures and Tables -
Analysis 1.13

Comparison 1 Exercise versus control, Outcome 13 Mood disturbances.

Comparison 1 Exercise versus control, Outcome 14 Hospital Anxiety and Depression Scale.
Figures and Tables -
Analysis 1.14

Comparison 1 Exercise versus control, Outcome 14 Hospital Anxiety and Depression Scale.

Comparison 1 Exercise versus control, Outcome 15 Self esteem.
Figures and Tables -
Analysis 1.15

Comparison 1 Exercise versus control, Outcome 15 Self esteem.

Comparison 1 Exercise versus control, Outcome 16 Physical activity.
Figures and Tables -
Analysis 1.16

Comparison 1 Exercise versus control, Outcome 16 Physical activity.

Comparison 1 Exercise versus control, Outcome 17 Neuropathic pain.
Figures and Tables -
Analysis 1.17

Comparison 1 Exercise versus control, Outcome 17 Neuropathic pain.

Comparison 1 Exercise versus control, Outcome 18 Neuropathy symptoms.
Figures and Tables -
Analysis 1.18

Comparison 1 Exercise versus control, Outcome 18 Neuropathy symptoms.

Comparison 1 Exercise versus control, Outcome 19 Endocrine symptoms.
Figures and Tables -
Analysis 1.19

Comparison 1 Exercise versus control, Outcome 19 Endocrine symptoms.

Comparison 1 Exercise versus control, Outcome 20 Gait and balance.
Figures and Tables -
Analysis 1.20

Comparison 1 Exercise versus control, Outcome 20 Gait and balance.

Comparison 1 Exercise versus control, Outcome 21 Lymphoedema incidence.
Figures and Tables -
Analysis 1.21

Comparison 1 Exercise versus control, Outcome 21 Lymphoedema incidence.

Comparison 2 Exercise versus control follow‐up, Outcome 1 Physical fitness.
Figures and Tables -
Analysis 2.1

Comparison 2 Exercise versus control follow‐up, Outcome 1 Physical fitness.

Comparison 2 Exercise versus control follow‐up, Outcome 2 Fatigue.
Figures and Tables -
Analysis 2.2

Comparison 2 Exercise versus control follow‐up, Outcome 2 Fatigue.

Comparison 2 Exercise versus control follow‐up, Outcome 3 Cancer‐specific quality of life.
Figures and Tables -
Analysis 2.3

Comparison 2 Exercise versus control follow‐up, Outcome 3 Cancer‐specific quality of life.

Comparison 2 Exercise versus control follow‐up, Outcome 4 Depression.
Figures and Tables -
Analysis 2.4

Comparison 2 Exercise versus control follow‐up, Outcome 4 Depression.

Comparison 2 Exercise versus control follow‐up, Outcome 5 Strength.
Figures and Tables -
Analysis 2.5

Comparison 2 Exercise versus control follow‐up, Outcome 5 Strength.

Comparison 2 Exercise versus control follow‐up, Outcome 6 Physical activity.
Figures and Tables -
Analysis 2.6

Comparison 2 Exercise versus control follow‐up, Outcome 6 Physical activity.

Comparison 2 Exercise versus control follow‐up, Outcome 7 Anxiety.
Figures and Tables -
Analysis 2.7

Comparison 2 Exercise versus control follow‐up, Outcome 7 Anxiety.

Comparison 2 Exercise versus control follow‐up, Outcome 8 Self esteem.
Figures and Tables -
Analysis 2.8

Comparison 2 Exercise versus control follow‐up, Outcome 8 Self esteem.

Comparison 2 Exercise versus control follow‐up, Outcome 9 Endocrine symptoms.
Figures and Tables -
Analysis 2.9

Comparison 2 Exercise versus control follow‐up, Outcome 9 Endocrine symptoms.

Comparison 2 Exercise versus control follow‐up, Outcome 10 Neuropathy symptoms.
Figures and Tables -
Analysis 2.10

Comparison 2 Exercise versus control follow‐up, Outcome 10 Neuropathy symptoms.

Comparison 2 Exercise versus control follow‐up, Outcome 11 Gait and balance.
Figures and Tables -
Analysis 2.11

Comparison 2 Exercise versus control follow‐up, Outcome 11 Gait and balance.

Comparison 2 Exercise versus control follow‐up, Outcome 12 Lymphoedema incidence.
Figures and Tables -
Analysis 2.12

Comparison 2 Exercise versus control follow‐up, Outcome 12 Lymphoedema incidence.

Summary of findings for the main comparison. Exercise compared with control for women receiving adjuvant therapy for breast cancer

Exercise compared with control for women receiving adjuvant therapy for breast cancer

Population: women receiving adjuvant therapy (chemo‐ or radiotherapy or both) for breast cancer

Settings: supervised or home based

Intervention: aerobic or resistance exercise or a combination of both

Comparison: control intervention (usual care or intervention that was not exercise, such as stretching)

Outcomes

Relative effects* (95% CI)

No of Participants
(studies)

Quality of the evidence
(GRADE)

Comments

Exercise vs control

Physical fitness

assessed with: 6‐ or 12‐minute walk test, peak oxygen uptake, and other scales

(follow‐up: 18 weeks to 6 months)

The mean physical fitness in the intervention group was 0.42 standard deviations higher (0.25 to 0.59 higher)

1310

(15 RCTs)

⊕⊕⊕⊝
moderate1

SMD 0.42 (95% CI 0.25 to 0.59)

Fatigue

assessed with: FACIT‐F scale, (revised) Piper Fatigue Scale, Multidimensional Fatigue Inventory and other scales

(follow‐up: 18 weeks to 6 months)

The mean fatigue in the intervention group was 0.28 standard deviations lower (0.41 lower to 0.16 lower)

1698

(19 RCTs)

⊕⊕⊕⊝
moderate2

SMD ‐0.28 (95% CI ‐0.41 to ‐0.16)

Cancer‐specific quality of life

assessed with: FACT‐G, EORTC QLQ‐C30 and other scales

(follow‐up: 12 weeks to 6 months)

The mean cancer‐specific quality of life in the intervention group was 0.12 standard deviations higher (0.00 to 0.25 higher)

1012

(12 RCTs)

⊕⊕⊕⊝
moderate3

SMD 0.12 (95% CI 0.00 to 0.25)

Health‐related quality of life

assessed with EQ‐5D visual analogue scale (higher scores indicate higher quality of life, score range from 0 to 100)

MID: 7 points

(follow‐up: end of intervention)

The mean health‐related quality of life in the intervention group was 1.10 points higher (5.28 lower to 7.48 higher)

68

(1 RCT)

⊕⊕⊝⊝
low4,5

MD 1.10 (95% CI ‐5.28 to 7.48)

Cancer site‐specific quality of life

assessed with: FACT‐B (higher scores indicate better quality of life, score range from 0 to 144)

MID: 7 to 8 points

(follow‐up: end of intervention)

The mean cancer site‐specific quality of life in the intervention group was 4.24 points higher (1.81 lower to 10.29 points higher)

262

(4 RCTs)

⊕⊕⊝⊝
low6,7

MD 4.24 (95% CI ‐1.81 to 10.29)

Depression

assessed with: BDI, CES‐D

(follow‐up: 6 months)

The mean depression in the intervention group was 0.15 standard deviations lower (0.30 lower to 0.01 higher)

674

(5 RCTs)

⊕⊕⊕⊝
moderate8

SMD ‐0.15 (95% CI ‐0.30 to 0.01)

Cognitive function

assessed with: Trail Making Test

(less time in seconds needed for completing the test means less cognitive dysfunction)

(follow‐up: end of intervention)

The mean time needed for completing the test in the intervention group was 11.55 seconds less (22.06 seconds less to 1.05 seconds less)

213

(2 RCTs)

⊕⊕⊝⊝
low9,10

MD ‐11.55 (95% CI ‐22.06 to ‐1.05)

Lymphoedema

assessed with: volumetric arm measurements and bioimpedance spectroscopy

(follow‐up: 8 weeks)

Assumed risk11:
85 per 1000

Corresponding risk:

60 per 1000 (30 to 123)

436

(2 RCTs)

⊕⊕⊝⊝
low12,13

RR 0.71 (95% CI 0.35 to 1.45)

*The basis for the assumed risk (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).
BDI: Beck Depression Inventory; CES‐D: Center for Epidemiological Studies‐Depression Scale; CI: confidence interval; FACIT‐F: Functional Assessment of Chronic Illness Therapy‐Fatigue Scale; FACT‐B: Functional Assessment of Cancer Therapy‐Breast; FACT‐G: Functional Assessment of Cancer Therapy‐General; MD: mean difference; MID: minimally important difference; RCT: randomised controlled trial; RR: risk ratio; SMD: standardised mean difference

GRADE Working Group grades of evidence
High quality: Further research is very unlikely to change our confidence in the estimate of effect.
Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low quality: We are very uncertain about the estimate.

1Lack of blinding, low adherence and high or unclear contamination, several randomisation and many allocation concealment procedures were unclear, therefore we downgraded by one level.
2Lack of blinding, low adherence and high or unclear amount of contamination, many allocation concealment procedures were unclear, therefore we downgraded by one level.
3Lack of blinding, low adherence and high or unclear amount of contamination, and a high rate of incomplete outcome data, therefore we downgraded by one level.
4Lack of blinding, low adherence and high amount of contamination, high rate of incomplete outcome data, and group similarity at baseline was at high risk, therefore we downgraded by one level.
5Small number of participants and null effect and appreciable benefit included in the confidence interval for the mean difference: imprecision, therefore we further downgraded by one level.
6Lack of blinding, low adherence, a high or unclear amount of contamination in three of four trials in the meta‐analysis, two of four allocation concealment procedures were unclear, therefore we downgraded by one level.
7Small number of participants, wide confidence intervals for two of the four trials, and null effect and appreciable benefit included in the confidence interval for the mean difference: imprecision, therefore we further downgraded by one level.
8Lack of blinding, low adherence and unclear or high contamination, two published studies could not contribute to the meta‐analysis, and in one of those there were no changes in the depression scores in any of the groups, therefore we downgraded by one level.
9Lack of blinding, low and unclear adherence and unclear contamination, group similarity at baseline for one study was at high risk of bias, therefore we downgraded by one level.
10Small number of participants: imprecision, therefore we further downgraded by one level.
11Assumed risk based on the mean control group risk in the included studies.
12Lack of blinding, low adherence and unclear or high contamination, one of two allocation procedures was unclear, group similarity at baseline was at high risk of bias for one study, therefore we downgraded by one level.
13Small number of participants and null effect and appreciable harm and benefit included in the confidence interval for the risk ratio: imprecision, therefore we further downgraded by one level.

Figures and Tables -
Summary of findings for the main comparison. Exercise compared with control for women receiving adjuvant therapy for breast cancer
Comparison 1. Exercise versus control

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1 Physical fitness Show forest plot

20

1310

Std. Mean Difference (IV, Random, 95% CI)

0.42 [0.25, 0.59]

2 Fatigue Show forest plot

22

1698

Std. Mean Difference (IV, Random, 95% CI)

‐0.28 [‐0.41, ‐0.16]

3 Cancer‐specific quality of life Show forest plot

13

1012

Std. Mean Difference (IV, Random, 95% CI)

0.12 [‐0.00, 0.25]

4 Health‐related quality of life Show forest plot

1

Mean Difference (IV, Random, 95% CI)

Totals not selected

5 Cancer site‐specific quality of life Show forest plot

4

262

Mean Difference (IV, Random, 95% CI)

4.24 [‐1.81, 10.29]

6 Depression Show forest plot

6

674

Std. Mean Difference (IV, Random, 95% CI)

‐0.15 [‐0.30, 0.01]

7 Cognitive function Show forest plot

2

213

Mean Difference (IV, Random, 95% CI)

‐11.55 [‐22.06, ‐1.05]

8 Strength Show forest plot

13

912

Std. Mean Difference (IV, Random, 95% CI)

0.27 [0.04, 0.50]

9 Subjective upper body function Show forest plot

3

231

Mean Difference (IV, Random, 95% CI)

‐0.52 [‐4.45, 3.41]

10 Shoulder mobility Show forest plot

1

Mean Difference (IV, Random, 95% CI)

Totals not selected

11 Arm morbidity Show forest plot

3

240

Mean Difference (IV, Random, 95% CI)

1.11 [‐4.07, 6.29]

12 Anxiety Show forest plot

3

331

Mean Difference (IV, Random, 95% CI)

‐1.45 [‐4.36, 1.46]

13 Mood disturbances Show forest plot

3

111

Std. Mean Difference (IV, Random, 95% CI)

‐1.00 [‐1.40, ‐0.60]

14 Hospital Anxiety and Depression Scale Show forest plot

1

Mean Difference (IV, Random, 95% CI)

Totals not selected

15 Self esteem Show forest plot

4

323

Mean Difference (IV, Random, 95% CI)

1.69 [‐0.01, 3.39]

16 Physical activity Show forest plot

8

549

Std. Mean Difference (IV, Random, 95% CI)

0.29 [0.12, 0.47]

17 Neuropathic pain Show forest plot

2

130

Mean Difference (IV, Random, 95% CI)

3.64 [‐1.32, 8.60]

18 Neuropathy symptoms Show forest plot

1

Mean Difference (IV, Random, 95% CI)

Totals not selected

19 Endocrine symptoms Show forest plot

1

Mean Difference (IV, Random, 95% CI)

Totals not selected

20 Gait and balance Show forest plot

3

122

Std. Mean Difference (IV, Random, 95% CI)

0.10 [‐0.25, 0.46]

21 Lymphoedema incidence Show forest plot

4

436

Risk Ratio (M‐H, Random, 95% CI)

0.71 [0.35, 1.45]

Figures and Tables -
Comparison 1. Exercise versus control
Comparison 2. Exercise versus control follow‐up

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1 Physical fitness Show forest plot

6

612

Std. Mean Difference (IV, Random, 95% CI)

0.26 [‐0.06, 0.57]

2 Fatigue Show forest plot

8

814

Std. Mean Difference (IV, Random, 95% CI)

‐0.21 [‐0.35, ‐0.07]

3 Cancer‐specific quality of life Show forest plot

6

583

Std. Mean Difference (IV, Random, 95% CI)

0.18 [0.01, 0.35]

4 Depression Show forest plot

3

378

Std. Mean Difference (IV, Random, 95% CI)

‐0.27 [‐0.48, ‐0.06]

5 Strength Show forest plot

4

386

Std. Mean Difference (IV, Random, 95% CI)

‐0.00 [‐0.30, 0.30]

6 Physical activity Show forest plot

3

261

Std. Mean Difference (IV, Random, 95% CI)

0.28 [‐0.05, 0.61]

7 Anxiety Show forest plot

2

201

Mean Difference (IV, Random, 95% CI)

‐3.61 [‐7.24, 0.03]

8 Self esteem Show forest plot

2

201

Mean Difference (IV, Random, 95% CI)

1.20 [‐0.41, 2.81]

9 Endocrine symptoms Show forest plot

1

Mean Difference (IV, Random, 95% CI)

Totals not selected

10 Neuropathy symptoms Show forest plot

1

Mean Difference (IV, Random, 95% CI)

Totals not selected

11 Gait and balance Show forest plot

1

Mean Difference (IV, Random, 95% CI)

Totals not selected

12 Lymphoedema incidence Show forest plot

2

194

Risk Ratio (M‐H, Random, 95% CI)

0.79 [0.37, 1.69]

Figures and Tables -
Comparison 2. Exercise versus control follow‐up