Scolaris Content Display Scolaris Content Display

Physical rehabilitation for older people in long‐term care

Collapse all Expand all

Abstract

Background

The worldwide population is progressively ageing, with an expected increase in morbidity and demand for long‐term care. Physical rehabilitation is beneficial in older people, but relatively little is known about effects on long‐term care residents. This is an update of a Cochrane review first published in 2009.

Objectives

To evaluate the benefits and harms of rehabilitation interventions directed at maintaining, or improving, physical function for older people in long‐term care through the review of randomised and cluster randomised controlled trials.

Search methods

We searched the trials registers of the following Cochrane entities: the Stroke Group (May 2012), the Effective Practice and Organisation of Care Group (April 2012), and the Rehabilitation and Related Therapies Field (April 2012). In addition, we searched 20 relevant electronic databases, including the Cochrane Central Register of Controlled Trials (The Cochrane Library, 2009, Issue 4), MEDLINE (1966 to December 2009), EMBASE (1980 to December 2009), CINAHL (1982 to December 2009), AMED (1985 to December 2009), and PsycINFO (1967 to December 2009). We also searched trials and research registers and conference proceedings; checked reference lists; and contacted authors, researchers, and other relevant Cochrane entities. We updated our searches of electronic databases in 2011 and listed relevant studies as awaiting assessment.

Selection criteria

Randomised studies comparing a rehabilitation intervention designed to maintain or improve physical function with either no intervention or an alternative intervention in older people (over 60 years) who have permanent long‐term care residency.

Data collection and analysis

Two review authors independently assessed risk of bias and extracted data. We contacted study authors for additional information. The primary outcome was function in activities of daily living. Secondary outcomes included exercise tolerance, strength, flexibility, balance, perceived health status, mood, cognitive status, fear of falling, and economic analyses. We investigated adverse effects, including death, morbidity, and other events. We synthesised estimates of the primary outcome with the mean difference; mortality data, with the risk ratio; and secondary outcomes, using vote‐counting.

Main results

We included 67 trials, involving 6300 participants. Fifty‐one trials reported the primary outcome, a measure of activities of daily living. The estimated effects of physical rehabilitation at the end of the intervention were an improvement in Barthel Index (0 to 100) scores of six points (95% confidence interval (CI) 2 to 11, P = 0.008, seven studies), Functional Independence Measure (0 to 126) scores of five points (95% CI ‐2 to 12, P = 0.1, four studies), Rivermead Mobility Index (0 to 15) scores of 0.7 points (95% CI 0.04 to 1.3, P = 0.04, three studies), Timed Up and Go test of five seconds (95% CI ‐9 to 0, P = 0.05, seven studies), and walking speed of 0.03 m/s (95% CI ‐0.01 to 0.07, P = 0.1, nine studies). Synthesis of secondary outcomes suggested there is a beneficial effect on strength, flexibility, and balance, and possibly on mood, although the size of any such effect is unknown. There was insufficient evidence of the effect on other secondary outcomes. Based on 25 studies (3721 participants), rehabilitation does not increase risk of mortality in this population (risk ratio 0.95, 95% CI 0.80 to 1.13). However, it is possible bias has resulted in overestimation of the positive effects of physical rehabilitation.

Authors' conclusions

Physical rehabilitation for long‐term care residents may be effective, reducing disability with few adverse events, but effects appear quite small and may not be applicable to all residents. There is insufficient evidence to reach conclusions about improvement sustainability, cost‐effectiveness, or which interventions are most appropriate. Future large‐scale trials are justified.

PICOs

Population
Intervention
Comparison
Outcome

The PICO model is widely used and taught in evidence-based health care as a strategy for formulating questions and search strategies and for characterizing clinical studies or meta-analyses. PICO stands for four different potential components of a clinical question: Patient, Population or Problem; Intervention; Comparison; Outcome.

See more on using PICO in the Cochrane Handbook.

Physical rehabilitation for older people in long‐term care

Rehabilitation treatments may be effective in improving the physical health of older people in long‐term care. In 2010, 7.6% of the world's population were over 65 years old, and this is predicted to increase to 13% by 2035. It is expected that this will lead to a rise in demand for long‐term residential care. This has increased interest in ways to prevent deterioration in health and activities of daily living, for example, walking and dressing, among care home residents. Physical rehabilitation (interventions based on exercising the body) may have a role, and this review examines the evidence available. This review included 67 trials, 36 of which were conducted in North America, 20 in Europe, and seven in Asia. In total, 6300 participants with an average age of 83 years were involved. Most interventions in some way addressed difficulties in activities of daily living. This review investigates the effects of physical rehabilitation on activities of daily living, strength, flexibility, balance, mood, cognition (memory and thinking), exercise tolerance, fear of falling, death, illness, and unwanted effects associated with the intervention, such as injuries. While variations between trials meant that we could not make specific recommendations, individual studies were often successful in demonstrating benefits to physical health from participating in different types of physical rehabilitation.

Authors' conclusions

Implications for practice

The included studies provide evidence that physical rehabilitation interventions for elderly people residing in long‐term care may be both safe and effective, improving physical and possibly mental state. However, the size and duration of the effects of physical rehabilitation interventions are unclear. Although physical rehabilitation may be beneficial for care‐home residents, the specific type(s) with most benefit, and how these relate to resident characteristics, is unclear.

Implications for research

Current research suggests rehabilitation improves short‐term function in ADL and is safe among elderly residents of long‐term care, but the evidence for this is limited by plausible risk of bias, inconsistency, and incompleteness in the outcomes reported. Further research is needed to establish the sustainability of any improvements, to demonstrate the effect of interventions on quality of life and caregiver satisfaction, to optimise interventions, to establish how individual differences (for example, age, gender, frailty, mental state) may affect treatment outcomes, and whether different interventions should be applied to disability‐based subgroups. The provision of rehabilitation services to this client group requires robust health economic evaluation. Of the ongoing studies and those awaiting assessment, a variety of measures of well‐being, life satisfaction, and perceived health status are in use and one is conducting a cost‐effectiveness evaluation (Gerritsen 2011). We described the characteristics of the participants and the interventions. The interventions may be applicable to this frail elderly client group regardless of location of care, but this hypothesis requires testing in future research. Future research should utilise mechanisms such as cluster randomisation and placebo interventions as part of an explicit strategy to blind participants and personnel to the experimental intervention. Publication of pre‐study protocols for analysis and reporting of all outcome measures is particularly important given the wide variety of outcome measures used in these studies. Outcome measures should be chosen with care, for their relevance, sensitivity, feasibility, validity, and reliability and to allow comparison between studies. Future research should report outcomes per group for mortality, fall incidence, number of participants who fell at least once, hospitalisation incidence, number of participants hospitalised at least once, and incidence of minor injuries.

Background

Physical function in older people in long‐term care

Elder residents of long‐term care are amongst the frailest in our population, with significant healthcare and social care needs (Bowman 2004; Continuing Care Conference 2006). Increasing age is associated with increasing disability. In developed countries, long‐term care for older people is often provided in institutional settings for those with physical or mental conditions that preclude independent living (Continuing Care Conference 2006). It is reported that care‐home residents spend the majority of their time inactive, with low levels of interaction with staff (Holthe 2007; Sackley 2006a).

Decreasing mobility and increasing dependency have many adverse effects. For residents in care homes, it may lead to increased incidence of pressure sores, contractures, cardiovascular deconditioning, urinary infections, and loss of independence (Butler 1998). Sedentary behaviour is adversely associated with chronic disease risk factors and all‐cause mortality (Balboa‐Castillo 2011; DH 2011). Mobility problems and reduced physical activity compound health difficulties by directly affecting physical and psychological health and reducing opportunities to participate in social activities; social isolation negatively impacts on mood and self‐esteem, which can then further adversely affect physical health (Marmot 2003; NICE 2008). Residents identify mobility as of central importance to quality of life and well‐being (Bourret 2002), and residents with dementia wish for more day‐time activities (Hancock 2006). Physical ill‐health and disability are the most consistent risk factors for depression in later life, with reports suggesting that, rather than illness per se, it is the resulting functional limitations, including social participation and meaningful relationships, that increase the risk of depression (Braam 2005; Zeiss 1996).

Physical rehabilitation

Physical rehabilitation is defined as those interventions that aim to maintain or improve physical function of an individual. In a care‐home setting, this typically involves increasing the physical exertions of an individual (active), although passive rehabilitation involving external stimulation (e.g. whole body vibration) is also in use. The focus of this review is active rehabilitation, which may be in the form of specific exercises or physical activity as a part of some other purposeful or leisure activity. It may be provided in a group format or individually; generic or tailored; and delivered by rehabilitation professionals (e.g. physiotherapist), care staff, or self‐directed.

How the intervention might work

Physical activity provides positive benefits for people over 65 years old for a range of outcomes: mood (Blake 2009; Windle 2010), decreased disease risk, and overall health (DH 2011). For frail institutionalised older people, systematic reviews indicate that physical training can positively affect fitness for some participants (Chin A Paw 2008; Rydwik 2004a; Weening‐Dijksterhuis 2011); the level of effect may be related to level of frailty (Chin A Paw 2008). A recent review of the effects of physical activity for older people with dementia (not all of whom were in institutions) reports some benefits to walking, getting out of chairs, lower limb strength, and flexibility (Potter 2011). Included studies in the reviews were generally small and of variable quality.

Why it is important to do this review

Dramatic increases in life expectancy over the last century are likely to result in a significant increase in the demand for long‐term care. Between 1985 and 2010 the proportion of the world's population over 65 years old grew by a quarter, from 6.0% (291 million) to 7.6% (524 million), and is expected to increase to 13% by 2035, exceeding a billion people globally (United Nations 2011). However, this prospect of longevity may be associated with a concomitant increase in morbidity and requirement for long‐term care in a residential setting. Annual healthcare costs among those living in long‐term care (USD 45,400 per annum) are over four times greater than the average for the elderly population in the USA in 1998 (Lubitz 2003). This means that despite much shorter life expectancy, total costs of care for those institutionalised at 70 are much greater than for the rest of the population (Lubitz 2003). Of those aged 65 or over, in the USA in 2004, 1.3 million (3.6%) were living in nursing homes (Jones 2009), while in England and Wales in 2001, 310,000 (3.7%) were living in care homes (ONS 2003). Projections of the use of long‐term care are unreliable (US Department of Health and Human Services 2003) as they rely on a variety of factors other than population projections, including finances; changes in the prevalence of disability; and social, technical, and organisational changes to the provision of assistance with independent living, including informal care. However, even if usage rates reduced by a third, approximately 2 million people would require nursing‐home care in the USA by 2030, a significant increase on current amounts (Sahyoun 2001).

An encouraging evidence base is being developed about rehabilitation programmes appropriate to the circumstances and needs of older people. In addition, governing bodies world wide are responding to the pressures exerted by current demographic patterns by placing increased emphasis on promoting health and independence in old age, which may result in greater investment in rehabilitation services. This review examines the evidence for the effectiveness of physical rehabilitation for older people in long‐term care. This is an update of a Cochrane review first published in 2009; it includes an additional 18 studies and now formally quantifies some of the pooled results using meta‐analytical methods.

Objectives

To evaluate the benefits and harms of rehabilitation interventions directed at maintaining, or improving, physical function for older people in long‐term care through review of randomised and cluster randomised controlled trials.

Methods

Criteria for considering studies for this review

Types of studies

We included all studies that were randomised controlled trials (RCTs) or cluster RCTs that evaluated physical rehabilitation programmes for older people in long‐term care.

Types of participants

Older people who reside in a care home or hospital as their place of permanent abode. We defined older people as those aged 60 years or over, and we included all participants in studies where the mean age is 60 or over. The term 'care home' was as defined in a previous review (Ward 2003):

  • provides communal living facilities for long‐term care;

  • provides overnight accommodation;

  • provides nursing or personal care; and

  • provides for people with illness, disability, or dependence.

We included studies that addressed a defined subgroup of care‐home residents, such as stroke survivors or residents with dementia. We excluded trials in which only a proportion of participants met the inclusion criteria, unless outcome data pertaining to these participants were reported separately.

Types of interventions

Physical rehabilitation was defined as those interventions that aim to maintain or improve physical function. We included studies that compared a rehabilitation intervention designed to maintain or improve physical function with either no intervention or an alternative intervention. We excluded interventions that primarily addressed cognitive deficits, mood disorders, or both, unless they also aimed to improve the physical state. We evaluated interventions by content, not by the personnel implementing them (e.g. physiotherapist, occupational therapist). We excluded studies where the intervention and control groups received the same physical rehabilitation intervention with the only differential being a non‐rehabilitative component. We reported comparisons of physical rehabilitation versus control (no physical rehabilitation, but including other interventions such as social visits) and comparisons of physical rehabilitation (experimental) versus physical rehabilitation (control), where the experimental intervention is hypothesised by the study authors to be more rehabilitative than the control. During the review process, the review team reached consensus to exclude those trials in which physical exercise was a component of a multifaceted intervention primarily aimed at falls prevention as this topic is addressed in other Cochrane reviews (Cameron 2005; Gillespie 2003).

Types of outcome measures

Outcome measures did not form part of the eligibility criteria for studies in this review. Outcomes of interest are listed below.

Primary outcomes

  • Function in activities of daily living (ADL) measured either with an independence scale (e.g. the Barthel Index (BI), the Functional Independence Measure (FIM)) or tests of ability in ADL, such as mobility or transfers (e.g. Timed Up and Go (TUG) test, 6‐metre walk test). Activities of daily living typically include eating, bathing, dressing, continence, personal care, mobility, and transfers.

Secondary outcomes

  • Exercise tolerance (e.g. number of repetitions)

  • Muscle power (e.g. isokinetic and isometric dynamometry)

  • Flexibility (e.g. joint range of movement)

  • Balance (e.g. Berg Balance Scale, Functional Reach test)

  • Perceived health status (e.g. Sickness Impact Profile, Nottingham Health Profile)

  • Mood (e.g. Geriatric Depression Scale)

  • Cognitive status (e.g. Mini‐Mental State Examination (MMSE))

  • Fear of falling (e.g. Falls Efficacy Scale)

  • Economic analyses

Adverse outcomes

  • Deaths from all causes

  • Morbidity

  • Falls and other serious adverse events

Timing of outcome assessment

Our original intention was to focus on those studies that comprised a minimum of one month of follow up. However, only a minority of studies reported any follow up. Therefore, for consistency, the outcomes were assessed at the end of the intervention. We also reported follow up in the narrative synthesis. We anticipated disparity between studies, and this was given due consideration in the review.

Search methods for identification of studies

See the 'Specialized register' section in the Cochrane Stroke Group module.

The extensive nature of this topic was reflected in the search of a wide range of resources, both electronic and non‐electronic. We searched for trials in all languages and arranged translation of papers published in languages other than English. The search dates given below are those up to which the trials found have been fully incorporated into the review.

Electronic searches

We searched the trials registers of the following Cochrane Groups: the Stroke Group (last searched 17 May 2012), the Effective Practice and Organisation of Care Group (last searched 2 April 2012), and the Rehabilitation and Related Therapies Field (last searched 4 April 2012). In addition, we searched the following databases:

  • the Cochrane Central Register of Controlled Trials (The Cochrane Library, 2009, Issue 4) (Appendix 1);

  • the Cochrane Database of Systematic Reviews (searched 21 December 2009);

  • Cochrane Other Reviews (DARE) and Methods Studies resources (The Cochrane Library, 2009, Issue 4);

  • MEDLINE (1966 to 18 December 2009) (Appendix 2);

  • EMBASE (1980 to 18 December 2009) (Appendix 3);

  • Cumulative Index to Nursing and Allied Health Literature (CINAHL) (1982 to 21 December 2009) (Appendix 4);

  • Allied and Complementary Medicine Database (AMED) (1985 to 21 December 2009) (Appendix 5);

  • PsycINFO (1967 to 21 December 2009) (Appendix 6);

  • Physiotherapy Evidence Database (PEDro) (searched 4 April 2012);

  • British Nursing Index (1994 to 1 October 2007);

  • Applied Social Sciences Index and Abstracts (ASSIA) (1987 to 21 December 2009);

  • International Bibliography of the Social Sciences (IBSS) (1951 to 21 December 2009);

  • Database of Abstracts of Reviews of Effects (DARE) (searched 21 December 2009);

  • Health Management Information Consortium (HMIC) database (searched 21 December 2009);

  • NHS Economic Evaluation Database (NHS EED) (searched 21 December 2009);

  • Health Technology Assessment (HTA) database (searched 21 December 2009);

  • ISI Web of Knowledge (searched 21 December 2009);

  • Google Scholar (searched 2006 to 14 January 2010);

  • Index to Theses (http://www.theses.com/) (searched 7 January 2010); and

  • ProQuest Dissertations & Theses (PQDT) database (searched 22 December 2009).

For this update, we stopped searching the British Nursing Index, because its collection is similar to CINAHL, and our institution no longer subscribes to it.

We developed the MEDLINE search strategy with the help of the Cochrane Stroke Group Trials Search Co‐ordinator and adapted it for the other databases.

On 19 August 2011, we again searched the Cochrane Central Register of Controlled Trials, the Cochrane Database of Systematic Reviews, Cochrane Other Reviews and Methods Database, MEDLINE, EMBASE, Cumulative Index to Nursing and Allied Health Literature (CINAHL), Allied and Complementary Medicine Database (AMED), Applied Social Science Index and Abstracts (ASSIA), International Bibliography of Social Sciences (IBSS), PsycINFO, Database of Abstracts of Reviews of Effects (DARE), Health Management Information Consortium Database (HMIC), NHS Economic Evaluation Database (NHS EED), Health Technology Assessment (HTA) Database, ISI Web of Knowledge, Google Scholar, Index to Theses, and Proquest Dissertations and Theses. We did not fully assess the records retrieved from these searches, but we screened the titles, sought the full text of potentially eligible studies, and assessed them further for eligibility. We added potentially relevant trials to the 'Characteristics of studies awaiting classification' tables.

In addition, we searched the National Research Register (www.nrr.nhs.uk/) in December 2007 (now defunct), and in January 2010 we searched Current Controlled Trials (www.controlled‐trials.com) and HSRProj (Health Services Research Projects in Progress, www.nlm.nih.gov/hsrproj/);

Searching other resources

In an effort to identify further published, unpublished, and ongoing trials, we:

  1. scanned reference lists of relevant studies;

  2. contacted investigators and subject area experts and requested additional information from authors of relevant trials;

  3. searched the following available proceedings of the Chartered Society of Physiotherapy Annual Congress (1990, 1995, 1997, 2000, 2003, and 2005); and

  4. searched the following available proceedings of the World Congress of Physical Therapy (1953, 1963, 1967, and 1982).

In view of the comprehensive nature of the electronic search we did not handsearch journals. We also contacted the Cochrane Dementia and Cognitive Improvement Group (August 2006) and the Cochrane Health Promotion and Public Health Field, now the Cochrane Public Health Group, (August 2006) who indicated that their own field registers would not contain studies of relevance to this topic.

Data collection and analysis

Selection of studies

Two review authors independently assessed titles and abstracts (where necessary) of the records identified from the electronic searches and excluded obviously irrelevant studies. We obtained the full texts of all remaining studies, and at least two members of the review team assessed these for eligibility based on the predetermined inclusion criteria. We resolved disagreements at a consensus meeting.

Data extraction and management

Two review authors independently extracted and recorded data using a standardised electronic data collection form. A third author combined these data sets; we combined numerical data automatically where there was consensus. We resolved discrepancies by discussion and, where possible, we contacted study authors to provide clarification or additional data if necessary.

For continuous outcome data and ordinal outcome data, we converted the results from all studies into estimated difference in means, and the standard error for this difference.

Assessment of risk of bias in included studies

Two review authors independently assessed risk of bias in included studies using The Cochrane Collaboration's tool for assessing risk of bias (Higgins 2011). We assessed risk in the categories of sequence generation (was assignment truly random?), allocation concealment (could group assignment be foreseen and therefore subverted?), blinding of participants and personnel (could participants and care staff identify treatment allocation?), blinding of outcome assessment (could outcome assessors identify treatment allocation?), incomplete outcome data (could attrition or exclusions have resulted in bias?), selective reporting (did authors report all prespecified outcomes) and any other risks of bias, using the criteria provided (Higgins 2011). We assessed the blinding of outcome assessment separately for observed measures of function in ADL (such as the TUG test) and reported measures of function in ADL (such as the BI) as these were entered into meta‐analyses and were likely to have involved different assessors and involved different difficulties with blinding. We assessed each category as having low, high, or unclear risk of bias. We resolved any disagreements by discussion and contacted study authors for clarification if appropriate. We did not actively seek pre‐study protocols unless they were referenced within a report or had been identified through our literature searches.

Measures of treatment effect

We treated ordinal data as if they were continuous. For continuous data, we combined the estimates for each study using the mean difference (MD). For dichotomous data, we combined the estimates for each study using the risk ratio (RR).

Unit of analysis issues

In cross‐over trials, we only included data from the first period of the trial in meta‐analyses to guard against carry‐over effects. Where a trial comprised of more than one exercise group (e.g. Christofoletti 2008; MacRitchie 2001), we used the group with the greatest rehabilitative component to compare with the group with the least intervention.

Where cluster randomised studies presented an estimate of effect that properly accounted for the cluster design, this was used. Where this was not the case, we assumed that the intra‐cluster correlation coefficient (ICC) was the same as for other studies included in the review for that outcome. We calculated an average ICC for the outcome and corrected the values for each unadjusted study by the design effect (see Higgins 2011). Where the ICC for an outcome was not available from the other included studies we attempted to find an appropriate estimate from external databases (e.g. Elley 2005; Health Services Research Unit 2004; Ukoumunne 1999). Where no appropriate estimate was available, we presented unadjusted estimates. In all cases, we presented sensitivity analyses excluding cluster studies.

Dealing with missing data

Because of the long‐term nature of the interventions and the frailty of the population, we anticipated a high rate of loss to follow up because of death, deviating from the intention‐to‐treat (ITT) principle. Where multiple analyses were reported, we used the data that most closely resembled an available case analysis (i.e. all available data are analysed in the intervention groups to which participants were assigned, without imputation of missing data), but we did not exclude studies that had only performed other analyses. However, as described above, we assessed incomplete outcome data as a risk of bias and, as described below, we stratified studies by risk of bias; therefore, we accounted for large deviations from the ITT principle in the analysis. We used the generic inverse‐variance approach to facilitate inclusion of studies presenting results in different ways, so we converted standard deviations, confidence intervals, or both, for each group separately to standard errors for the difference in means. Where data were missing, we made every effort to derive the appropriate measure from the available data. For example, we derived data from graphs and converted a variety of measures of time taken to cover set distances and walking speeds to metres per second.

Assessment of heterogeneity

We explored heterogeneity through stratified forest plots, quantified in terms of the proportion of the total variation in study estimates that is due to heterogeneity (I² statistic) (Higgins 2002) and tested using the Q statistic, with I² > 50% or P < 0.2 used to identify significant heterogeneity.

Assessment of reporting biases

We assessed small study effects, e.g. publication bias, using contour‐enhanced funnel plots centred around the null hypothesis and informed by the test of the intercept from a regression of estimates on their standard errors (Egger’s test), with P < 0.1 being used to indicate significant asymmetry.

Data synthesis

The included studies were heterogeneous. They examined different types of intervention and evaluated them with a wide battery of outcome measures. Such variety limited the feasibility of conducting meta‐analyses. We chose to perform meta‐analyses of measures of ADL, our primary outcome, and mortality.

Where we performed meta‐analyses, for all outcomes, we presented random‐effects meta‐analyses because of the anticipated large heterogeneity caused by different populations and interventions involved in the trials. When results were presented at several time points, we used the time closest to the end of intervention unless a better analysis was available at another time point. For continuous or ordinal data, where results were presented in terms of change from baseline or adjusted for baseline, this was used in preference. We used a generic inverse‐variance approach for continuous and ordinal data. We used the Mantel‐Haenszel approach for dichotomous outcome data.

We originally intended to combine results in a fixed‐effect meta‐analysis where sufficient homogeneity existed. However, because of the extensive heterogeneity in the interventions, we used a random‐effects meta‐analysis as our primary approach, but still report the results of fixed‐effect models as sensitivity analyses.

There were many different ways of measuring various ADL, so to reduce heterogeneity in the meta‐analysis we focused on studies reporting the BI, FIM, Rivermead Mobility Index (RMI), TUG test, and certain measures of walking speed. For walking speeds and timed walks over a fixed distance, we converted the time to walk a fixed distance into speed (m/s) over that distance, to include as many similar studies as possible. However, we decided a priori to only include distances of less than 10 metres, to reduce heterogeneity introduced by very different designs. Of the remaining studies, there were an insufficient number assessing the same outcome to include in further meta‐analyses. Those that appeared to assess similar outcomes were often measured in entirely different ways, assessing very different activities requiring varying functional ability. We therefore chose not to attempt to combine these quantitatively, even using standardised mean difference, because they were not actually assessing the same outcome.

For outcomes where a narrative synthesis is provided, we summarised those studies that reported a statistically significant difference in a direction that favoured the intervention or the control (P < 0.05) and those that do not. We described limitations of such comparisons where statistical significance was reached (for example, a within‐group comparison only). We provided a narrative exploration of the extent to which included studies demonstrated that their rehabilitative interventions were of benefit to the participants, and we discussed the nature and sustainability of any benefits. Some trials selected extremely frail individuals, and we considered this when assessing these interventions, as preventing or slowing decline may be the treatment goal in this situation.

Subgroup analysis and investigation of heterogeneity

For all outcome measures, potential sources of heterogeneity decided a priori were risk of bias (see Risk of bias in included studies); duration of intervention: for the BI, FIM, death and walking speed less than three months compared with three or more months, and for the TUG test and RMI less than six months compared with six or more months; mode of delivery (group, individual or group and individual); mean age of participants (less than 85 years compared with 85 years or more); and the percentage of participants in the study who are female (less than 80% compared with 80% or more). For ADL outcome measures, we also specified the level of function at baseline as measured by the relevant outcome measure (above or below the median function). For walking speed, we also included the fixed distance walked (less than six metres compared with six metres or more), in case this was a source of heterogeneity. We investigated these through subgroup analysis.

Our original intention, if sufficient data existed, was to conduct analyses on the basis of methodological quality and the effect of dropouts, but this was replaced by risk of bias. We also specified age, pathology‐specific interventions, mode of delivery, and residential category. However, we neither conducted analyses based on pathology‐specific interventions, because insufficient data exists, nor conducted analyses based on residential category, because we replaced this with measured function at baseline (see Differences between protocol and review).

We wanted to consider type of intervention as a potential source of heterogeneity, e.g. physiotherapy, strength training, mobility training, balance training, occupational therapy, but the interventions were often complex, containing many combinations of the above, and with great variation within each broad type. Given the small number of studies available for each meta‐analysis, there were insufficient studies of each type to explore this interesting aspect further.

Sensitivity analysis

For all outcomes included in a meta‐analysis, we presented a fixed‐effect sensitivity analysis. For dichotomous outcomes, we also calculated odds ratios and risk differences. Where a meta‐analysis included studies that were cluster‐randomised, we presented a sensitivity analysis excluding such studies.

Results

Description of studies

Results of the search

Several searches contributed to this review. The results of the searches are outlined in a PRISMA diagram in Figure 1. Searches from the original review in 2007 and searches from December 2009 produced approximately 30,000 references, from which 67 studies fulfilled the eligibility criteria and were included in this review. An additional search (August 2011) produced 7969 references, from which there are 27 potentially eligible studies awaiting classification.


Review update flow diagram

Review update flow diagram

The original review included 49 studies from a search that produced over 20,000 references. The search from December 2009 produced 10,621 references, from which 26 new articles fulfilled the eligibility criteria and were included in this update. This represented 18 new studies (22 articles) and an additional four articles that report on two existing studies. Four studies remain awaiting classification from this search because the articles were unavailable. The characteristics of Included studies and Excluded studies are discussed below. We conducted an additional search in August 2011. Because of the scale of this review and updates to the methods (introduction of an electronic database and meta‐analyses), we did not fully assess the results of these searches (i.e. we did not include any new studies). Of the 7969 references, an additional 25 new studies (28 references) and five new references across two existing studies (Resnick 2009; Rosendahl 2006) are awaiting classification (see the 'Characteristics of studies awaiting classification' tables). These studies awaiting classification are likely to be classified as included or ongoing in future updates of the review.

Included studies

Across 67 studies, the included studies randomised a total of 6300 participants, prior to any attrition. We give a general overview below; further details can be found in the 'Characteristics of included studies' tables.

Design

Forty‐eight studies randomised individuals into experimental groups; the remaining 19 used cluster designs, when they randomised facilities, not individuals (Brittle 2009; Brown 2004; Choi 2005; Faber 2006; Gillies 1999; Kerse 2008; Lee 2009; McMurdo 1993; McMurdo 1994; Mihalko 1996; Morris 1999; Peri 2008; Resnick 2009; Rosendahl 2006; Sackley 2006; Sackley 2008; Sackley 2009; Sung 2009; Taboonpong 2008). One study followed cluster randomisation of exercise type with randomisation of individual participants to exercise or control conditions (Faber 2006). Nine studies stratified participants before randomisation to ensure even distribution of certain participant characteristics between groups, for example, older, more sick, or less mobile individuals (Baum 2003; Bautmans 2005; Lazowski 1999; MacRitchie 2001; Makita 2006; Mulrow 1994; Przybylski 1996; Santana‐Sosa 2008; Sihvonen 2004). Five studies used a 'matched pairs' design, where participants were systematically matched on characteristics of interest and then randomly allocated into intervention groups (Au‐Yeung 2002; de Bruin 2007; Dorner 2007; Schoenfelder 2000; Schoenfelder 2004). Of the cluster randomised trials, two studies stratified facilities (Rosendahl 2006; Sackley 2006), and two matched facilities (Morris 1999; Peri 2008) prior to randomisation.

Five trials used a counterbalanced cross‐over design, where all participants received all conditions, but the order in which they were received was randomised. In three of these, the outcome measures were measures of performance during single‐session interventions, such as number of repetitions (DeKuiper 1993; Lang 1992; Riccio 1990), while in two they followed long‐term interventions that risked carry‐over of treatment effects between periods (Ouslander 2005; Pomeroy 1993). Four trials also used a semi‐cross‐over design (Baum 2003; Brown 2004; Kinion 1993; Sauvage 1992) where participants allocated to the control group also received the intervention. However, in Sauvage 1992 this was a post‐hoc design following attrition from the intervention group.

Of the cluster trials, six (Brittle 2009; Kerse 2008; Peri 2008; Resnick 2009; Sackley 2006; Sackley 2009) explicitly reported statistical analyses that were adjusted for the effect of clustering.

Eligibility criteria

All of the studies except Przybylski 1996 stated some eligibility criteria, which on average limited eligibility to half of all residents. This often related to the safety and feasibility of including such participants in the planned intervention or the likelihood of it showing an effect, and in 27 studies, it limited the focus to populations with specific functional limitations.

General eligibility criteria

Thirty studies had a minimum age limit (typically 65 years). Thirteen studies excluded participants that were engaged in physical therapy or activity. Six studies required participants to have been a resident for a minimum time that varied between one and four months; seven studies specified an expected duration of stay for at least as long as the intervention. Six studies excluded those with challenging behaviours, including abusive and aggressive behaviour.

Physical functioning or disorders

Overall, 45 studies excluded residents with insufficient physical function or physical disorders. The ability to walk or be mobile was a requirement of 22 studies, of which two disallowed the use of walking aids; one allowed one carer to assist; five specified at least six metres and two at least five metres; one, 250 feet; and one, five minutes. Alternative requirements included the ability to independently stand (three studies), stand or transfer with assistance (five studies), or to be independent in all but one basic activity of daily living (ADL) (one study). Thirteen studies excluded participants on the basis of musculoskeletal disorders or other physical impairments, including paralysis and amputation.

Cognitive functioning and communication

In total, 39 studies only included participants with a minimum level of cognitive function, often citing the ability to follow simple instructions; an additional four studies excluded participants because of communication‐specific difficulties. Exclusion criteria were often stated as severe dementia or severe cognitive impairment, but where specific measures were given, these varied widely. Nine studies excluded participants on the basis of their Mini‐Mental State Examination (MMSE) score: Five required a minimum score between 20 and 23, indicating participants were cognitively intact or had mild dementia; one excluded those scoring less than 50% (typically 15); and three excluded those scoring less than 10 or 11, indicating severe dementia. Four studies excluded those with very low communication and physical skills using the Parachek Geriatric Rating Scale.

Other health conditions

A variety of other health‐related criteria were reasons for exclusion. Sixteen studies ruled out participants on the broad grounds of medical contraindications or at the discretion of a physician. Twenty‐two studies excluded individuals with acute or unstable conditions, while 19 studies excluded those with a terminal condition or short life expectancy. Eight studies excluded individuals on the basis of recent medical events, for example, a fracture within the past six months. Twenty‐seven studies identified a variety of specific diseases as reasons for exclusion, often including cardiac disorders (14 studies). Medical implants, including pacemakers and hip replacements, or specific medications were exclusion criteria in six studies. Seven studies excluded those with significant visual impairments. Four studies excluded individuals with psychological or psychiatric disorders.

Focus on specific conditions

While most studies required participants to have some minimum level of physical or mental functioning, 27 studies only examined participants with some form of impairment or limitation. These included a degree of dependence in ADL (Brittle 2009; Karl 1982; Meuleman 2000; Mulrow 1994; Rosendahl 2006; Sackley 2009), stroke‐related dependence in ADL (Sackley 2006), dementia and dependence in ADL (Christofoletti 2008; Pomeroy 1993), dementia (Buettner 1997; Stevens 2006; Tappen 1994), Alzheimer’s disease (Cott 2002; Rolland 2007; Santana‐Sosa 2008; Tappen 2000), mental illness (Stamford 1972), those who were physically restrained (Schnelle 1996), incontinence (Alessi 1999; Ouslander 2005; Schnelle 1995; Schnelle 2002), visual impairment (Cheung 2008), those at a risk of falling (Choi 2005; Donat 2007), and those with poor balance and weak muscles (Sauvage 1992). Finally, in the feasibility study of Sackley 2008, staff purposively selected residents with a range of functional, cognitive, and continence impairments prior to randomisation.

Representativeness of participants

Approximately half of the population of participating facilities were eligible for entry into the trials, but only one quarter participated. Twenty‐two studies reported the total population of the participating facilities, and the number of those who were eligible for participation. Across these, the total population included 14,384 (median = 423) individuals, 6853 (47.6%, median = 204) were eligible, but only 3426 (23.8%, median = 104) of whom were allocated to groups in the trials; 1618 (11.2%, median = 63) did not consent to participate, and in 14 trials, residents were excluded for other reasons, including insufficient capacity within the trial or individuals becoming unavailable (e.g. illness) before the trial began (total = 1849 (12.9%), median = 7).

Sample size

Included studies randomised a median of 56 participants into their trial prior to any attrition. This ranged from just 12 participants (Sauvage 1992 ) to 682 (Kerse 2008) (lower quartile = 28, upper quartile = 107). Only 18 studies included 100 or more participants (Chin A Paw 2004; Faber 2006; Fiatarone 1994; Kerse 2008; Lee 2009; Makita 2006; Morris 1999; Mulrow 1994; Ouslander 2005; Peri 2008; Przybylski 1996; Resnick 2009; Rolland 2007; Rosendahl 2006; Sackley 2006; Sackley 2009; Schnelle 2002; Stevens 2006). Twenty‐four studies randomised fewer than 35 participants; of these, eleven studies were particularly small with 20 or fewer participants (Baum 2003; Brill 1998; Gillies 1999; Karl 1982; Lang 1992; Naso 1990; Santana‐Sosa 2008; Sauvage 1992; Schoenfelder 2000; Stamford 1972; Urbscheit 2001). One study (Sauvage 1992) was especially problematic, reporting data from just 10 individuals. Starting with 12 participants, they allocated 6 each to the intervention and control groups. On losing two intervention participants, they allowed four control participants to complete the intervention. Therefore, they reported data for eight intervention participants and six control participants. Sample size calculations were performed for 17 studies (25%), although recruitment did not always achieve the target.

Setting

Studies were undertaken in various countries and long‐term care settings.

Location

Most studies were conducted in North America: 31 took place in the USA and five in Canada. Within Europe, eight were conducted in the UK, two each in Belgium and The Netherlands, and one each in Austria, Denmark, Finland, France, Spain, Sweden, Switzerland, and Turkey. Throughout the rest of the world, there were three studies from Hong Kong, and two studies each from New Zealand and South Korea, with single studies from Australia, Brazil, Japan, and Thailand.

Care setting

Most often, studies were undertaken in nursing and residential care homes, with 45 studies and 25 studies including facilities from these categories, respectively. In addition, four studies were undertaken exclusively in hospitals where participants were long‐term residents (Clark 1975; Dorner 2007; Pomeroy 1993; Stamford 1972).

Participants

We present a brief synopsis of the characteristics of participants here. We give further in the 'Characteristics of included studies' tables. See also Eligibility criteria.

Sex

Overall, 76% of participants were women. Seven studies only had female participants (Cheung 2008; Crilly 1989; Makita 2006; Riccio 1990; Sihvonen 2004; Sung 2009; Yoder 1989), while two studies had exclusively male participants (Sauvage 1992; Stamford 1972).

Age

Data indicated that in each study the mean age was greater than 65 years. The grand mean (composite standard deviation (SD)) participant age was 83 (8) years across studies reporting such data. Reported means ranged from 69 years (Clark 1975; Stamford 1972) to 90 years (Bruunsgaard 2004). Only six studies reported a mean age of under 75 years, five of which were small (less than 25 participants) (Clark 1975; Karl 1982; Santana‐Sosa 2008; Sauvage 1992; Stamford 1972), and one was of average size (54 participants) (Christofoletti 2008). Three studies did not report mean age, two of which reported age range (Naso 1990; Pomeroy 1993). In total, 36 studies reported age range, and among these, the total range was from one participant aged 44 (Sackley 2006) to a participant aged 105 (Tappen 2000). Only five of these studies included any participants aged less than 60, and all but one included participants aged over 90 (Clark 1975), with 13 of the 36 studies including centenarians.

Physical status

The physical status of participants varied widely within and between studies that reported this. Eight studies reported the Barthel Index (BI) mean (SD) at baseline as 49.1 (27.5) (Sackley 2006), 51.5 (24) (Sackley 2008), 55.5 (21) (Brittle 2009), 58.8 (13) (Dorner 2007), 58.8 (21.1) (Sackley 2009), 58.9 (29.5) (Resnick 2009), 65.6 (21) (Rosendahl 2006), 71 (10) (Santana‐Sosa 2008), and 88 (12.5) (Peri 2008) out of 100, where 100 indicates independence in 10 basic ADLs. Four studies reported the Katz ADL index, with mean (SD) values of 1.9 (1.3) (Fiatarone 1994), 3.1 (1.3) (Rolland 2007), 4.7 (0.5) (Christofoletti 2008), and 5.8 (0.4) (Bautmans 2005) out of 6, where 6 indicates independence in six basic ADLs. Five studies reported the proportion of participants who used mobility assistance devices (e.g. cane, wheelchair) as 10% (Chin A Paw 2004), 19% (Donat 2007), 45% (Mihalko 1996), 60% (Sihvonen 2004), and 83% (Fiatarone 1994); as reported above, three studies had excluded such participants, and one study only included participants requiring assistance to stand.

Cognitive status

The cognitive status of participants varied widely within and between studies that reported this. Twenty‐one studies provided mean MMSE scores at baseline, four of which had a mean score less than 10, indicative of severe dementia (Buettner 1997; Cott 2002; Schoenfelder 2000; Tappen 1994); nine studies' participants had a mean score between 10 and 20, indicative of moderate dementia (Alessi 1999; Christofoletti 2008; Ouslander 2005; Rosendahl 2006; Santana‐Sosa 2008; Schnelle 1995; Schnelle 1996; Schnelle 2002; Tappen 2000); five studies' participants had a mean score between 20 and 25, indicative of mild dementia (Baum 2003; Dorner 2007; Fiatarone 1994; Mulrow 1994; Resnick 2009); while three studies' participants' mean score was in the cognitively intact range (25 to 30) (de Bruin 2007; Faber 2006; Schoenfelder 2000). Overall, mean MMSE scores ranged from 6 (Cott 2002) to 26.9 (de Bruin 2007), while for individual participants they ranged from 0 to 30.

Chronic comorbidities

The majority of participants had at least one significant comorbidity, with many having multiple comorbidities based on the 29 studies that reported on this. Commonly reported comorbidities included arthritis, osteoporosis, Alzheimer's disease, stroke, cardiovascular disease, respiratory disease, incontinence, and depression. Three studies reported the mean (SD) number of comorbidities that participants had as 2.9 (3.1) (Lee 2009), 4.9 (2.2) (Kerse 2008), and 5.6 (3.6) (Tappen 2000), while the similar Charlson Comorbidity Index was reported to average 3.8 (2.2) in Ouslander 2005.

Interventions

To provide a convenient overview, we categorised interventions according to key components. We describe individual programmes in the 'Characteristics of included studies' tables. Details of the groups that experimental interventions were compared with in all studies are provided in the below 'Comparison conditions' section.

While most studies featured only one experimental intervention, two studies featured two different experimental physical interventions. Faber 2006 compared 'functional walking' and 'in‐balance' exercise interventions, while Morris 1999 compared the 'fit for your life' exercise regime and the 'self‐care for seniors' nursing rehabilitation programme. Therefore, 69 interventions are described across the 67 studies.

Physical components

The most common physical components were strength training and walking. Forty‐nine interventions included exercises targeted at basic components of physical fitness, such as strength or flexibility (rote exercise), while 40 interventions included practice of basic ADLs, such as walking or transfers, and 21 interventions featured other recreation or leisure activities, such as ball games or dancing.

Rote exercise

Strength training, for example, using elastic resistance bands or weights, featured in 42 interventions. Balance (motor skill) exercises, such as tandem stands, were features of 21 interventions; flexibility (range of motion) exercises featured in 17 interventions; and endurance training featured in seven. Other less common features include relaxation and breathing exercises (three interventions) and posture training (two interventions).

Basic ADL practice

Mobility training (walking or wheeling) featured in 37 interventions; transfer practice featured in 21 interventions; and 10 interventions included practice of other basic ADLs, such as washing, dressing, eating, or grooming.

Recreation and leisure‐like activities

Other recreation or leisure‐like physical activities included kicking or throwing and catching balls, balloons or bean bags (10 interventions), rhythmic movement to music or dancing (5 interventions), Tai Chi (4 interventions), arts and crafts activities (1 intervention), meal preparation activities (2 interventions), and indoor gardening (1 intervention).

Combinations of physical components

Seventeen interventions only featured rote exercises; thirteen, basic ADL practice; and five, recreational activities. Eighteen combined basic ADL practice with rote exercises, seven combined recreational activities with rote exercises; and two combined basic ADL practice with recreational activities. In total, seven interventions included examples of all three of these types of component.

Components supplementary to physical activity

In addition to physical activity, 23 interventions contained other components. Among these were a social or communication element, for example, ‘walking and talking’ (Brittle 2009; Buettner 1997; Cott 2002; MacRitchie 2001; Tappen 2000). Twelve studies included music alongside the exercise (Chin A Paw 2004; Choi 2005; MacRitchie 2001; McMurdo 1993; McMurdo 1994; Pomeroy 1993; Rolland 2007; Sackley 2008; Santana‐Sosa 2008; Stevens 2006; Sung 2009; Taboonpong 2008). Interventions to improve continence, for example, prompted voiding (Alessi 1999; Ouslander 2005; Sackley 2008; Schnelle 1995; Schnelle 2002), nutritional supplementation (Fiatarone 1994; Rosendahl 2006), and environmental adaptations designed to improve sleep (Alessi 1999). Sung 2009 included a health education programme, while Brown 2004 included a video on gardening.

Distinctive interventions

Four trials explored the potential of imagery or purposefulness for enhancing exercise participation (DeKuiper 1993; Lang 1992; Riccio 1990; Yoder 1989). Imagery (e.g. pretending to pick apples) or 'added purpose' exercise (e.g. rotary arm exercise in the form of making biscuits) were compared with rote exercise. Two studies explored 'Whole body vibration', where exercises are performed on an oscillating platform (Bautmans 2005; Bruyere 2005). One study (Sihvonen 2004) compared dynamic balance exercise visual feedback sessions on a 'Good Balance' force platform with an unspecified control activity. Przybylski 1996 did not specify particular physical components, but examined the effect of a four‐fold increase in occupational therapy and physiotherapy staffing, comparing a 1:200 (standard) and 1:50 (enhanced) staff to participant ratio.

Format of intervention

Interventions were most often delivered as supervised 45‐minute group sessions three times weekly. Forty‐one interventions included a group component, two of which were provided in pairs and three of which also had an individually delivered component. Another 18 individual interventions were described, with 10 not specifying whether they were provided on a group or individual basis. Despite the predominance of group‐based interventions, some degree of tailoring to the ability or needs of the participant was a feature of 43 interventions. In 11 trials, participants carried out the intervention seated (e.g. McMurdo 1993), and in five further studies, this was optional (e.g. Karl 1982). Sessions were time‐limited in 47 interventions, ranging from nine minutes to two and a half hours, with a median and mode of 45 minutes (10 studies). In most cases, sessions occurred on a routine basis, varying from weekly to four times daily, but most often three times weekly (median and mode, N = 30). In other cases, the intervention was continuous in nature or only administered once where the exercise rate or duration, rather than the effect of exercise on health were being evaluated. In the 32 interventions for which a total time per week could be calculated, this varied widely from 20 to 750 minutes per week, with a median of 120 minutes per week.

Fifty‐six interventions involved specific sessions primarily designed to deliver physical rehabilitation (Au‐Yeung 2002; Baum 2003; Bautmans 2005; Brill 1998; Brittle 2009; Brown 2004; Bruunsgaard 2004; Bruyere 2005; Cheung 2008; Chin A Paw 2004; Choi 2005; Christofoletti 2008; Clark 1975; Cott 2002; Crilly 1989; de Bruin 2007; DeKuiper 1993; Donat 2007; Dorner 2007; Faber 2006 (both interventions); Fiatarone 1994; Gillies 1999; Hruda 2003; Karl 1982; Kinion 1993; Lang 1992; Lazowski 1999; Lee 2009; MacRitchie 2001; Makita 2006; McMurdo 1993; McMurdo 1994; Meuleman 2000; Mihalko 1996; Morris 1999 (fit for your life); Mulrow 1994; Naso 1990; Pomeroy 1993; Przybylski 1996; Riccio 1990; Rolland 2007; Sackley 2006; Sackley 2008; Santana‐Sosa 2008; Sauvage 1992; Schnelle 1996; Schoenfelder 2000; Schoenfelder 2004; Sihvonen 2004; Stamford 1972; Stevens 2006; Sung 2009; Taboonpong 2008; Urbscheit 2001; Yoder 1989). Ten interventions involved rehabilitation that was embedded within, or incidental to, resident care (Alessi 1999; Buettner 1997; Kerse 2008; Morris 1999 (self care for seniors); Ouslander 2005; Peri 2008; Resnick 2009; Schnelle 1995; Schnelle 2002; Tappen 2000). Three interventions combined specific sessions and incidental rehabilitation (Rosendahl 2006; Sackley 2009; Tappen 1994). Examples of specific sessions include an interactive group exercise class with warm‐up and cool‐down periods, flexibility, balance, strengthening and endurance exercises (Brittle 2009) or client‐centred occupational therapy (Sackley 2006). Examples of incidental rehabilitation include the Functional Incidental Training (FIT) and 'Promoting Independence' interventions described below.

Three studies evaluated FIT (Alessi 1999; Ouslander 2005; Schnelle 1995). Here, exercises targeting specific individual needs, such as standing up, were provided throughout the day, incidental to daily nursing care routines, such as toileting. The therapeutic recreation nursing team intervention (Buettner 1997) is comparable to these. Here, the nursing‐home environment was enhanced, with every aspect of daily life regarded as part of the intervention. A range of activities were provided, including cardiovascular exercise, cooking, gardening, cognitive therapy, and sensory stimulation activities. Nursing staff were involved in provision, and ADLs such as dressing were targeted. Kerse 2008 and Peri 2008 evaluated variations of a 'Promoting Independence' plan, where a functional physical goal was set with the resident, an activity plan based on ADLs was devised, and a healthcare assistant encouraged the resident to perform these.

Delivery of intervention

It appeared that all interventions involved supervised delivery, as opposed to wholly self‐directed interventions with a worksheet or video, for example. The majority were delivered by staff external to the home (54 interventions), using rehabilitation professionals (e.g. physiotherapists, occupational therapists, sports scientists, activities staff; 30 interventions), researchers (22 interventions), or a combination of these (2 interventions). Care facility staff delivered five interventions (Kinion 1993; Lazowski 1999; MacRitchie 2001; Morris 1999 (both interventions)). All of these included the healthcare staff, while two included activities staff, and two included other staff (e.g. domestic staff). In two of these studies, volunteers (e.g. family members) participated in the delivery. Ten interventions involved both internal and external staff (Baum 2003; Buettner 1997; Kerse 2008; Lee 2009; Makita 2006; Peri 2008; Przybylski 1996; Resnick 2009; Rosendahl 2006; Sackley 2009): In six, staff were external rehabilitation professionals and internal healthcare staff; in three, internal and external healthcare staff; and in one, internal and external rehabilitation professionals.

Among the 10 interventions that were incidental to the resident's care (see the above 'Format of intervention' section), research staff provided the care and rehabilitation in five interventions (Alessi 1999; Ouslander 2005; Schnelle 1995; Schnelle 2002; Tappen 2000); in four delivery was provided by a combination of internal and external staff (Buettner 1997; Kerse 2008; Peri 2008; Resnick 2009), and in one delivery was provided wholly by internal staff (Morris 1999 (self care for seniors)).

Duration of intervention

The interventions lasted between four weeks (Karl 1982; Sackley 2008; Sihvonen 2004) and a year (Naso 1990; Resnick 2009; Rolland 2007), with the exception of the four interventions that examined imagery or purposefulness and were only administered once (DeKuiper 1993; Lang 1992; Riccio 1990; Yoder 1989). Most typically, interventions were twelve weeks in duration (median and mode, N = 12), with 10 interventions lasting eight to nine weeks and 7 lasting six months. Total exposure to the intervention (total time per week multiplied by the duration of the intervention) ranged very widely from 240 minutes (four hours) (Karl 1982) to 15,653 minutes (approximately one and a half weeks delivered in two‐hour sessions, five times per week for six months) (Christofoletti 2008), with a median of 1440 minutes (24 hours) in the 32 interventions where this could be calculated.

Comparison conditions

Most studies compared two groups: the intervention of interest and some sort of control. However, 10 studies compared three groups (Christofoletti 2008; Clark 1975; Cott 2002; Faber 2006; Gillies 1999; Lang 1992; Morris 1999; Schnelle 1995; Stevens 2006; Tappen 1994), and 4 studies compared four groups (Chin A Paw 2004; Faber 2006; Fiatarone 1994; Rosendahl 2006).

Thirty‐five studies compared their intervention(s) to a 'usual care' control group, allowing examination of whether an intervention was better or worse than their usual situation. The remaining studies supplemented 'usual care' in some way, for example, with a social meeting or different exercise. A social or recreational activity control session, for example, talking, playing cards, or reminiscing, featured in 18 studies (e.g. Baum 2003; Brown 2004). Nineteen studies compared different exercise programmes, usually a novel approach with a traditional type (Au‐Yeung 2002; Bautmans 2005; Brill 1998; Bruyere 2005; Cheung 2008; de Bruin 2007; Donat 2007; Dorner 2007; Gillies 1999; Lazowski 1999; Mihalko 1996; Riccio 1990; Urbscheit 2001; Yoder 1989). Two studies compared three exercise types (DeKuiper 1993; Lang 1992). Four studies compared four groups. Two studies crossed an exercise and a social activity control with a nutritional supplement and a placebo control to examine whether exercise alone was better than the social activity control, and whether benefit from exercise was enhanced by nutritional supplementation (Fiatarone 1994; Rosendahl 2006). For the purposes of this review, we ignored the impact of supplementation, and where possible, we combined nutrition and placebo variants of exercise and control groups for meta‐analyses. One study compared two different exercise programmes, each with their own control group (Faber 2006: controls were located in the same facilities as the relevant exercise programme). Finally, one study compared the effects of strength training and functional skills training, with the effect of both interventions combined and with an educational control group (Chin A Paw 2004).

Outcome measures

As a consequence of the considerable variation in the purpose and content of the interventions outlined above, the studies used many outcome measures (327 in total). Frequently, these were study‐specific, with 59 studies including a unique measure and 258 of the 327 measures used being unique. The studies reported only 13 measures five or more times (Timed Up and Go (TUG) test, six‐metre walk time, BI, Berg Balance Scale, Tinetti Mobility Scale, 'sit‐and‐reach' test, average number of sit‐to‐stands in 30 seconds, hand grip strength, Geriatric Depression Scale, MMSE, falls (number of falls and any per participant), and attendance). In total, 51 trials reported an outcome measure related to ADL, our primary outcome. Other common outcomes addressed by the studies included balance (29 studies), muscle power (25 studies), flexibility (16 studies), exercise tolerance (7 studies), physical activity (7 studies), mood (15 studies), cognitive performance (11 studies), quality of life (7 studies), fear of falling (6 studies), and perceived health status (6 studies). The studies also recorded morbidity, mortality, adverse events, and attendance. We report details of the methods used by individual studies to assess these outcomes in the 'Characteristics of included studies' tables.

Follow up

All studies except Brittle 2009 assessed participants immediately after intervention completion; follow up of participants after this was rare, undertaken by just 14 studies. In these, follow‐up was most frequently at three months after the end of the intervention (Au‐Yeung 2002; Rosendahl 2006; Sackley 2006; Sackley 2009; Schoenfelder 2000; Schoenfelder 2004). The other follow‐up periods were two weeks (Sackley 2008), one month (Clark 1975; Sihvonen 2004), two and five months (Brittle 2009), six months (Kerse 2008), and one year (Faber 2006; Meuleman 2000; Urbscheit 2001).

Excluded studies

We excluded 52 studies that may, on the surface, appear to meet the inclusion criteria, but do not: individual reasons are provided in the 'Characteristics of excluded studies' tables. We excluded these studies because the purpose was not to improve residents' physical condition (N = 14); assignment to groups was not random (N = 12); participants included those who were not residents of long‐term care, and they did not report the results separately (N = 10); they evaluated a multi‐faceted falls prevention intervention (N = 7); the aspect of the intervention that varied between groups was not physical rehabilitation (N = 4); they targeted contractures (N = 3); or there was insufficient information to include them (N = 2).

New studies found at this update

We included an additional 18 studies in this update. Half of the new studies have used a cluster‐randomised design, previously only used by 20% of the included studies. Similarly, eight new studies had over 100 participants compared to 10 of the 49 studies in the previous version of the review. In total, the number of participants has almost doubled from 3611 to 6300. It was notable that only one new study came from North America, which had previously supplied 35 studies (71%) and that nine additional countries are represented in this review, including the first South American country (Brazil).

Risk of bias in included studies

We present our 'Risk of bias' judgements, made according to The Cochrane Collaboration's tool, in the 'Characteristics of included studies' tables and summarise them here in the text, in Figure 2, and in Appendix 7. We did not judge any studies to have low risk of bias across all categories, with no studies judged to have a low risk of performance bias or reporting bias. To enable an analysis of the best available evidence, we selected the seven studies judged to have low risk of bias in all other categories (selection, detection, attrition, and other sources of bias) as a subgroup named 'lower risk of bias' for meta‐analysis (Brittle 2009; Chin A Paw 2004; Kerse 2008; McMurdo 1994; Sackley 2006; Sackley 2008; Sackley 2009) to be contrasted with all other studies (higher risk of bias).


'Risk of bias' graph: review authors' judgements about each 'Risk of bias' item presented as percentages across all included studies

'Risk of bias' graph: review authors' judgements about each 'Risk of bias' item presented as percentages across all included studies

Several studies caused particular concern. Karl 1982 did not report baseline or follow‐up data or randomisation procedure. Brill 1998 had only one room and time slot to conduct their weight‐training intervention, which meant both groups received their intervention at the same time. It is unclear how far this deviates from the intended design. In Sauvage 1992, the study began with 12 individuals, and following the loss of two of the six intervention participants, crossed over four participants from the control group, whose results were reported in each group. They did not account for this in their statistical analysis (samples were treated as independent), nor did they discuss temporal differences or report results separately. The design used in one study (Przybylski 1996) also raised potential problems. Their intervention was implemented over two years, with 29 new participants recruited throughout to replace those who died or were discharged. The researchers had no control over who entered and left the groups and made the assumption that this was a random process.

Allocation

We judged the risk of selection bias to be unclear in the majority of studies because they reported insufficient information. We judged the risk in both categories to be low for 13 studies and high for 2 studies, where after the initial randomisation, these studies allocated further participants without stating that this was performed randomly. We judged risk of bias due to random sequence generation to be low for 32 studies, unclear for 33 studies, and high for 2 studies. We judged concealment of the allocation sequence to pose low risk of bias for 16 studies and high risk of bias for 4 studies; it was unclear for 47 studies.

Blinding

We did not judge blinding to pose low risk of bias in any of the studies, because none of them were able to achieve low risk with respect to blinding of participants and personnel (performance bias). We judged 47 studies to be at high risk of performance bias, usually because the control would have been obvious, while for 20 studies the risk of performance bias was judged unclear, typically where such blinding was feasible, using strategies including cluster randomisation and alternative interventions for the control groups, but not specifically reported. By contrast, blinding of outcomes assessors was often sufficient to judge a low risk of detection bias for the outcome measures entered into meta‐analyses (for observed outcomes, 20 studies were at low risk, 16 studies were at unclear risk, and four studies were at high risk; for reported outcomes, 16 studies were at low risk, 9 studies were at unclear risk, and 3 studies were at high risk). Thirty‐five of the 67 studies attempted blinding of some of their outcome assessments.

Incomplete outcome data

We judged incomplete outcome data to pose low risk of bias in 26 studies, high risk of bias in 21 studies, and it was unclear in 20 studies. Typically, high risk of bias related to differential attrition rates between study groups, but also high overall attrition, inability to get measurements for a significant proportion of participants, or post‐randomisation exclusions. Overall attrition rates were reported by 59 of the 67 studies, among which the grand mean rate was 21.4% (N = 1300 of 6083). Five studies had no attrition, three of which were studies of single‐session interventions (DeKuiper 1993; Lang 1992; Yoder 1989), the other two (Cheung 2008; Kinion 1993) lasting for 12 and 8 weeks, respectively. Attrition in 29 other studies was less than 20%, between 20% and 30% in 18 studies (Buettner 1997 (21%); Chin A Paw 2004 (28%); Christofoletti 2008; de Bruin 2007 (22%); Donat 2007 (24%); Dorner 2007 (29%); Gillies 1999 (25%); Lazowski 1999 (29%); Lee 2009 (21%); Meuleman 2000 (26%); Naso 1990 (27%); Ouslander 2005 (27%); Sackley 2006 (25%); Sackley 2009 (25%); Schnelle 1996 (26%); Schnelle 2002 (22%); Schoenfelder 2004 (28%); Taboonpong 2008 (29%)), between 30% and 40% in four studies (Kerse 2008 (31%); Pomeroy 1993 (33%); Resnick 2009 (33%); Stevens 2006 (38%)), and over 40% in three studies (Au‐Yeung 2002 (42%); Bruunsgaard 2004 (46%); Przybylski 1996 (45%)). The eight studies that did not provide data on overall attrition were Brill 1998; Brown 2004; Karl 1982; Mihalko 1996; Santana‐Sosa 2008; Sauvage 1992; Stamford 1972, and Urbscheit 2001, only two of which had more than 20 participants.

Selective reporting

We did not judge selective reporting to pose low risk of bias in any studies, often because a pre‐study protocol was not available, and because of the wide range of outcomes measured across studies, a complete range could not be considered to have been assessed. We judged 53 studies to have an unclear risk of reporting bias, while we judged 14 studies to have a high risk of reporting bias, usually because they did not report (or did so insufficiently) outcomes specified in the methods section. It should be noted that many of the studies judged to have unclear risk of reporting bias reported a number of outcomes that did not reach (or even come close to) statistical significance, suggesting that these studies may have reported all outcomes.

Other potential sources of bias

In three studies, we identified a potential risk of bias due to contamination (control participants receiving the intervention). We judged this to pose an unclear risk of bias in Buettner 1997, where the review authors suspected contamination, and a high risk of bias in Peri 2008 and Baum 2003, where the study authors reported contamination.

Effects of interventions

Primary outcomes: function in activities of daily living

In total, 51 studies conducted a measure of our primary outcome, function in activities of daily living (ADL). However, only 33 studies measured an outcome that was included in one of our meta‐analyses, nine of which were excluded from the analysis, either because they provided insufficient information to be included (N = 8) or had a substantial baseline imbalance in the specific measure (N = 1, sensitivity analysis presented). Therefore, we included the results of 24 studies in the meta‐analyses (Au‐Yeung 2002; Baum 2003; Bautmans 2005; Brill 1998; Brittle 2009; Bruyere 2005; Cheung 2008; Chin A Paw 2004; Dorner 2007; Hruda 2003; Kerse 2008; Lazowski 1999; MacRitchie 2001; Makita 2006; McMurdo 1993; Peri 2008; Przybylski 1996; Resnick 2009; Rolland 2007; Rosendahl 2006; Sackley 2006; Sackley 2009; Santana‐Sosa 2008; Schoenfelder 2004). These studies initially randomised a total of 3139 participants into them. The other studies used ADL measures that they reported too infrequently for inclusion in meta‐analyses. We provide details in the 'Characteristics of included studies' tables, but they are not synthesised here.

Independence in activities of daily living
Barthel Index

The Barthel Index (BI) assesses independence in physical ADL across 10 items, rated in increments of 5, e.g. scores of 0, 5, 10, with a maximum total score of 100 (best function). Some studies scaled this to increments of 1, e.g. scores of 0, 1, 2, with a maximum total score of 20. In this case, scores were multiplied by 5 to allow comparison with the original scaling.

Seven studies used the BI and contributed information to the meta‐analysis (Dorner 2007; McMurdo 1993; Resnick 2009; Rosendahl 2006; Sackley 2006; Sackley 2009; Santana‐Sosa 2008). Where the rules of the residential home restricted the total score, e.g. participants not being allowed to go to the toilet alone, reducing the maximum score to 95/100, we ignored this in pooling studies. In McMurdo 1993, it was unclear which scale had been used, so we assumed use of the 0 to 20 scale, because this is most common in the UK, and the standard errors would have been unfeasibly tight for such a small study if the alternative had been used. In Santana‐Sosa 2008, the BI score was derived from the graphs presented in the publication. Five of these studies were cluster trials (McMurdo 1993; Resnick 2009; Rosendahl 2006; Sackley 2006; Sackley 2009), although two only reported unadjusted results (McMurdo 1993; Rosendahl 2006). We were able to adjust these results using an estimated intra‐cluster correlation coefficient (ICC) of 0.38 based on Sackley 2006 and Sackley 2009.

The rehabilitation group had a BI on average six points higher than controls (95% CI 2 to 11, P = 0.008) when analysed with the random‐effects method (Analysis 1.1). We found similar results for the fixed‐effect pooled estimate, with a BI five points higher (95% CI 2 to 7, P = 0.003) at follow‐up than controls (Analysis 1.43). There was substantial between‐study heterogeneity (I² statistic = 48%, Q = 12 on 6 degrees of freedom (df), P = 0.07). Excluding cluster studies resulted in a much larger effect estimate of 18 points difference, with wide confidence intervals (95% CI 7 to 28, P = 0.001) (Analysis 1.44), although this was based on two small studies.

The small number of studies limited the exploration of the potential sources of heterogeneity. There was no evidence that studies with a higher risk of bias had different measures of effect than those with a lower risk of bias (Analysis 1.7) (P = 0.3). There was some evidence that studies with shorter interventions had larger effects than those with longer interventions (Analysis 1.8) (P = 0.01). There was no evidence of differential effects on BI based on mode of delivery (Analysis 1.9) (P = 0.3), baseline function (Analysis 1.10) (P = 0.5), age (Analysis 1.11) (P = 0.4), or gender (Analysis 1.12) (P = 0.5).

There was some evidence of asymmetry in the contour‐enhanced funnel plot (Figure 3) (Egger’s test P = 0.05), with larger studies indicating less benefit of rehabilitation. However, six of the seven studies were not statistically significant, suggesting that this asymmetry may not be due to publication bias. However, with only seven studies contributing, this should be interpreted with caution.


Funnel plot of comparison: 1 Rehabilitation versus control, outcome: 1.1 Barthel Index.

Funnel plot of comparison: 1 Rehabilitation versus control, outcome: 1.1 Barthel Index.

Functional Independence Measure

The Functional Independence Measure (FIM) assesses a participant’s degree of independence in self care, toileting, mobility, communication, and social cognition functions. It consists of 18 items rated on a 7‐point scale, with higher scores indicating greater independence.

Four studies used the FIM and contributed information to the meta‐analysis (Dorner 2007; Lazowski 1999; Makita 2006; Przybylski 1996). Przybylski 1996 did not present the numbers in each intervention group at follow‐up, but did present total numbers, balanced numbers in each group at baseline, and report that attrition was similar. We therefore assumed an equal dropout rate in each group and similar numbers in each group at follow‐up. All of these studies were randomised at the level of the individual.

The rehabilitation group had a FIM on average 5.0 points higher than controls (95% CI ‐1.6 to 11.5, P = 0.1) when analysed with the random‐effects method (Analysis 1.2). The fixed‐effect pooled estimate was lower, but with narrower confidence intervals, with a FIM on average 1.5 points higher (95% CI ‐0.4 to 3.3, P = 0.1) at follow‐up than controls (Analysis 1.45). There was substantial between‐study heterogeneity (I² statistic = 71%, Q = 10 on 3df, P = 0.02).

The small number of studies limited the exploration of the potential sources of heterogeneity. All studies were categorised as higher risk of bias, so it was not possible to assess this as a source of heterogeneity (Analysis 1.13). There was no evidence of differential effects on FIM based on duration of intervention (Analysis 1.14) (P = 0.6) or mode of delivery (Analysis 1.15) (P = 0.3). Comparing studies with differing mean functional independence at baseline (Analysis 1.16) suggested that participants with greater functional independence benefited more from intervention than those with less function at baseline (P = 0.03). There was evidence that younger participants (less than 85 years) benefited more from rehabilitation in terms of functional independence than older participants (85 years and older) (Analysis 1.17) (P = 0.001). This also reduced the excess heterogeneity in both groups (from I² statistic = 71% to I² statistic = 0% in each group separately). There was no evidence of differential effects on FIM due to gender (Analysis 1.18) (P = 0.8).

There were too few studies to explore asymmetry in the contour‐enhanced funnel plot (Egger’s test P = 0.3).

Rivermead Mobility Index

The Rivermead Mobility Index (RMI) assesses mobility independence and performance across 15 items, with a score ranging from 0 to 15, with 15 being the best outcome.

Three studies contributed information to the meta‐analysis (Brittle 2009; Sackley 2006; Sackley 2009); four studies used the RMI, but Sackley 2008 did not present results as it was a feasibility study. All of these studies were cluster trials, and all presented appropriately adjusted analyses.

Rehabilitation groups had a RMI on average 0.7 points higher at follow‐up than controls (95% CI 0.04 to 1.3, P = 0.04) when analysed with the random‐effects method (Analysis 1.3). There was almost no excess between‐study heterogeneity (I² statistic = 0%, Q = 0.02 on 2df, P = 0.99). Therefore, the fixed‐effect pooled estimate (Analysis 1.46) was identical to the random‐effects model.

The small number of studies limited the exploration of the potential sources of heterogeneity. We had categorised all of these studies as lower risk of bias, so we were not able to assess risk of bias as a source of heterogeneity (Analysis 1.19). There was no evidence of differential effects on RMI based on duration of intervention (Analysis 1.20) (P = 0.9), mode of delivery (Analysis 1.21) (P = 0.9), baseline function (Analysis 1.22) (P = 0.9), age (Analysis 1.23) (P = 0.9), or gender (Analysis 1.24) (P = 0.9).

There were too few studies to explore asymmetry in the contour‐enhanced funnel plot (Egger’s test P = 0.09).

Tests of ability in specific activities of daily living
Timed Up and Go test

The Timed Up and Go (TUG) test assesses participant mobility, measuring the time in seconds for a participant to rise from sitting in a standard armchair, then walk three metres, turn around, walk back to the chair, and sit down again. Therefore, a lower score indicates better performance. Two studies modified the distance for the TUG test (Hruda 2003; Santana‐Sosa 2008), and one counted the number of steps taken in addition to the time taken (Christofoletti 2008). To reduce heterogeneity, the modified outcomes were not included in the meta‐analyses.

Seven studies contributed to the rehabilitation versus control meta‐analysis, and two studies contributed to the meta‐analysis of rehabilitation (experimental) versus rehabilitation (control). Twelve studies used the standard TUG test (Au‐Yeung 2002; Baum 2003; Bautmans 2005; Bruyere 2005; Cheung 2008; Christofoletti 2008; Donat 2007; Kerse 2008; Lazowski 1999; MacRitchie 2001; Peri 2008; Sackley 2009). However, we could not include Sackley 2009 in the meta‐analyses because the authors did not present TUG test results on the grounds of extensive missing data and substantial variation in individual results. We could not include Donat 2007 in the meta‐analyses because the study did not present a measure of variation in the outcome (e.g. standard error, standard deviation, or confidence interval). We excluded Christofoletti 2008 because of substantial baseline imbalance that persisted throughout the duration of the trial, with the control group taking more than twice as long to complete the TUG test before any intervention. We present below an analysis that re‐includes these data. We analysed two studies in a separate meta‐analysis because they compared exercise plus whole body vibration with exercise alone, so both groups contained a rehabilitative intervention (Bautmans 2005; Bruyere 2005). Kerse 2008 and Peri 2008 were cluster randomised trials and presented appropriately adjusted analyses.

The rehabilitation group was five seconds quicker on average at follow‐up than controls (95% CI ‐9 to 0, P = 0.05) when analysed with the random‐effects method (Analysis 1.4). We observed substantial excess heterogeneity (I² statistic = 65%, Q = 17 on 6df, P = 0.009). The fixed‐effect pooled estimate was similar: Rehabilitation groups had TUG test results four seconds quicker than controls (95% CI ‐6 to ‐1), and this was statistically significant (P = 0.001) (Analysis 1.47). The sensitivity analysis excluding cluster trials (Analysis 1.48) was significant (P = 0.02) and estimated a larger effect, with rehabilitation groups an average eight seconds faster, but with wide confidence intervals (95% CI ‐14 to ‐2). The sensitivity analysis including the study with substantial baseline imbalance (Christofoletti 2008) (Analysis 1.49) was significant (P = 0.02) and estimated a larger effect, with rehabilitation groups an average eight seconds faster, but wide confidence intervals (95% CI ‐16 to ‐1) and further increased heterogeneity (I² statistic = 89%, Q = 65 on 7df, P < 0.00001).

Exploring the heterogeneity, we categorised only one study as lower risk of bias, and there was no evidence that this study had different measures of effect on TUG test scores than those with a higher risk of bias (Analysis 1.25) (P = 0.1). There was some evidence that studies with shorter interventions had larger effects than those with longer interventions (Analysis 1.26) (P = 0.06), though numbers of studies were small, and there was still substantial heterogeneity between studies with less than six months' intervention. There was no evidence that group interventions differed in effect from individual interventions (Analysis 1.27) (P = 0.9). There was some evidence that participants with greater mobility benefited more from rehabilitation than those with less mobility at baseline (Analysis 1.28) (P = 0.06). However, the numbers of studies in each subgroup were small, and substantial heterogeneity remained between studies with lower TUG test scores. A post‐hoc analysis, moving the median study from the more mobile group to the less mobile group, found no evidence of this difference (P = 0.8). There was no evidence of difference in pooled estimates due to age (Analysis 1.29) (P = 1.0). There was some evidence that participants in studies with a higher proportion of women (more than 80% compared with 80% or less) had lower (better) TUG test scores than those with a lower proportion of women (Analysis 1.30) (P = 0.05).

There was no evidence of asymmetry in the contour‐enhanced funnel plot (Figure 4) (Egger’s test P = 0.4).


Funnel plot of comparison: 1 Rehabilitation versus control, outcome: 1.4 TUG test

Funnel plot of comparison: 1 Rehabilitation versus control, outcome: 1.4 TUG test

The whole body vibration plus exercise (experimental rehabilitation) group was eight seconds quicker on average at follow‐up than exercise alone (control rehabilitation) (95% CI ‐19 to 3, P = 0.2) when analysed with the random‐effects method (Analysis 2.1). The fixed‐effect pooled estimate was similar, with the experimental group seven seconds quicker (Analysis 2.3) (95% CI ‐11 to ‐3, P = 0.0002). We observed substantial excess heterogeneity (I² statistic = 89%, Q = 9 on 1df, P = 0.003). However, because there were only two studies, we did not conduct subgroup analyses.

Walking time and speed over fixed distance

To investigate walking as a functional ability, we combined measures of time to walk a fixed distance with measures of speed over a fixed distance, converting these into speed in metres per second (m/s). We anticipated that varied distances may impact on speed to walk that distance. Therefore, to reduce heterogeneity, we decided a priori to only combine studies over a fixed distance (i.e. excluding studies of maximum distance walked in a fixed time) and for that fixed distance to be less than 10 metres. Where measures of 'fast' walking and 'normal' walking were available, we selected normal walking speed, again to reduce heterogeneity.

Fifteen studies met these criteria, but only nine studies contributed information to the meta‐analysis (Au‐Yeung 2002; Brill 1998; Chin A Paw 2004; Hruda 2003; Lazowski 1999; MacRitchie 2001; Rolland 2007; Rosendahl 2006; Schoenfelder 2004). One study did not report numeric results for the walking outcome (Schnelle 1996); two studies did not present any measure of variation in the outcome (Schnelle 1995; Schoenfelder 2000); and three studies only presented results as change in time, which we were unable to convert into change in speed (Choi 2005; Fiatarone 1994; Meuleman 2000). Rosendahl 2006 was a cluster randomised trial, but did not present correctly adjusted results, although they claimed results were similar. Because other trials were not cluster trials, we could not estimate an ICC from them. We were also unable to identify a suitable ICC estimate from external sources. Therefore, we presented the unadjusted results.

The rehabilitation group were on average 0.03 m/s (95% CI ‐0.01 to 0.07, P = 0.1) faster at walking a fixed distance less than 10 metres than controls when analysed with the random‐effects method (Analysis 1.5). There was very little between‐study heterogeneity (I² statistic = 9%, Q = 9 on 8df, P = 0.4). Therefore, the fixed‐effect pooled estimate was similar, also estimating that rehabilitation groups had a walking speed of on average 0.03 m/s faster over a fixed distance (95% CI 0.00 to 0.06, P = 0.02) at follow‐up than controls (Analysis 1.50). While statistically significant, this is a small effect and was not significant in the random‐effects analysis. The sensitivity analysis excluding the one cluster trial (Analysis 1.51) further reduced the estimated effect to an increase of 0.01 m/s (95% CI ‐0.05 to 0.08, P = 0.7) and slightly increased between‐study heterogeneity (I² statistic = 16%, Q = 8 on 7df, P = 0.3).

We categorised only Chin A Paw 2004 as lower risk of bias, which appeared to be significantly different from the other studies, which were higher risk of bias (Analysis 1.31) (P = 0.01), implying that studies with lower risk of bias recorded less impact of the rehabilitation. However, this was based on only one lower risk study, which differed in other ways, e.g. type of intervention, duration of intervention, and distance walked to measure speed. There was no evidence of differential effects due to duration of interventions (Analysis 1.32) (P = 0.7), mode of delivery (Analysis 1.33) (P = 0.6), or baseline walking speeds (Analysis 1.34) (P = 0.6). All these studies had mean participant ages less than our predetermined threshold (less than 85 years), so we could not assess age as a potential source of heterogeneity in this outcome (Analysis 1.35). There was no evidence of differential effects due to gender (Analysis 1.36) (P = 0.2). There was no evidence that studies testing walking speeds over shorter distances measured different responses to rehabilitation than those testing over longer distances (Analysis 1.37) (P = 0.5).

There was no evidence of any asymmetry in the contour‐enhanced funnel plot (Figure 5) (Egger’s test P = 1.0).


Funnel plot of comparison: 1 Rehabilitation versus control, outcome: 1.5 Walking speed

Funnel plot of comparison: 1 Rehabilitation versus control, outcome: 1.5 Walking speed

Secondary outcomes

Strength

Twenty‐five studies reported strength as an outcome, seven of which reported no significant effect at the end of the intervention. Five studies assessed upper body strength (excluding grip), three of which found significant differences between groups (Mihalko 1996; Ouslander 2005; Schnelle 2002), while two did not (Chin A Paw 2004; Lazowski 1999). However, in the case of Mihalko 1996, this was based on an unadjusted analysis of a cluster study. Seven studies assessed hand grip strength, four of which found significant differences (Brill 1998; Buettner 1997; McMurdo 1993; Schnelle 1996), although in two of these strength was assessed separately in each hand, and differences were only significant in one hand (Brill 1998; Schnelle 1996), while McMurdo 1993 presented an unadjusted analysis of a cluster study. Three studies found no significant difference in grip strength (Bautmans 2005; Lazowski 1999; Resnick 2009). Sixteen studies assessed lower body strength, with 11 finding significant differences favouring rehabilitation at the end of the intervention (Brill 1998; Bruunsgaard 2004; Choi 2005; de Bruin 2007; Donat 2007; Fiatarone 1994; Hruda 2003; Lazowski 1999; McMurdo 1994; Ouslander 2005; Sauvage 1992) and five finding no significant difference (Bautmans 2005; Chin A Paw 2004; Rosendahl 2006; Schoenfelder 2000; Schoenfelder 2004). However, among those finding in favour of rehabilitation, one study had a significant baseline imbalance (Choi 2005); this study and McMurdo 1994 were cluster trials that did not adjust their analysis for the design; one did not find significant differences in all types of strength measure (Hruda 2003); and three were limited to within‐group improvements only (de Bruin 2007; Donat 2007; Sauvage 1992). Three studies assessed a global measure, combining measures of upper and lower body strength, with one finding significant difference in some measures (isometric and isokinetic concentric, not isokinetic eccentric) after training (Meuleman 2000), one finding significant difference in changes in strength, but with a large baseline imbalance likely to have produced a regression to the mean (Dorner 2007), and one finding no significant difference (Mulrow 1994).

Five studies addressed improvement sustainability (Buettner 1997; Meuleman 2000; Rosendahl 2006; Schoenfelder 2000; Schoenfelder 2004), although only two had found significant differences at the end of the intervention (Buettner 1997; Meuleman 2000). Buettner 1997 observed significant strength gains in very frail participants during the first 20 weeks of the intervention, while strength deteriorated among controls. However, during the final 10 weeks of the intervention, strength deteriorated among all participants, although the intervention group remained significantly stronger than at baseline and than the control group. The participants in Meuleman 2000 did not sustain the significant differences seen after four to eight weeks training at 6 or 12 months. Of the other studies, Schoenfelder 2000 and Schoenfelder 2004 still found no significant difference, while in Rosendahl 2006, improvement in the intervention group and deterioration in the control group led to a significant difference at six months not seen at the end of the intervention. This was in an unadjusted analysis of a cluster study.

Flexibility

Components targeting flexibility featured in 17 interventions, and 16 studies assessed it as an outcome measure (Bautmans 2005; Buettner 1997; Chin A Paw 2004; Choi 2005; Donat 2007; Kinion 1993; Lazowski 1999; Lee 2009; Makita 2006; McMurdo 1993; Mulrow 1994; Resnick 2009; Santana‐Sosa 2008; Schnelle 1996; Sung 2009; Taboonpong 2008). Ten reported significant benefits to their participants at the end of the intervention (at P < 0.05) (Buettner 1997; Choi 2005; Donat 2007; Kinion 1993; Lazowski 1999; Makita 2006; McMurdo 1993; Santana‐Sosa 2008; Schnelle 1996; Sung 2009), although three studies were cluster trials that did not adjust their analysis for the design (Choi 2005; McMurdo 1993; Sung 2009); in two studies, this was limited to within‐group assessments (Donat 2007; Lazowski 1999); and in three studies, only some joints showed significant benefit (spine but not knees McMurdo 1993; shoulders and knees but not ankles Makita 2006; shoulders, hips, and elbows but not knees Kinion 1993). The within‐group assessments of Donat 2007 found significant increases in flexibility in both the supervised and unsupervised exercise groups, but there was no usual care control group for comparison. Five other studies found no evidence of significant benefit to flexibility from their interventions (Bautmans 2005; Chin A Paw 2004; Mulrow 1994; Resnick 2009; Taboonpong 2008). Successful interventions included rowing by participants with advanced dementia and frailty (Schnelle 1996); a combination of walking, joint mobility, resistance and co‐ordination exercises (Santana‐Sosa 2008); Tai Chi (Choi 2005); a programme to increase the practice of sensorimotor activities (Buettner 1997); strengthening exercises with dancing to music and health education (Sung 2009); and exercise to music related to improvement in spinal flexion, which deteriorated in the control group (McMurdo 1993). Only Lazowski 1999 compared the effect of two types of physical rehabilitation on flexibility. They found their 'functional fitness' intervention significantly (P < 0.05) outperformed 'range of motion' exercises on several indices of flexibility. Studies rarely systematically assessed flexibility, and it was not clearly linked with overall activity restriction. Lee 2009 did not report results. None of the studies examined long‐term effects.

Balance

Twenty‐nine trials assessed balance as an outcome measure (Au‐Yeung 2002; Baum 2003; Bautmans 2005; Brill 1998; Bruyere 2005; Cheung 2008; Choi 2005; Christofoletti 2008; Clark 1975; Crilly 1989; de Bruin 2007; Dorner 2007; Donat 2007; Kerse 2008; Lazowski 1999; Lee 2009; MacRitchie 2001; McMurdo 1993; Morris 1999; Mulrow 1994; Resnick 2009; Rolland 2007; Rosendahl 2006; Sauvage 1992; Schoenfelder 2000; Schoenfelder 2004; Sihvonen 2004; Sung 2009; Urbscheit 2001). Thirteen trials reported significantly benefiting their participants' balance at the end of the intervention (at P < 0.05) (Bautmans 2005; Bruyere 2005; Cheung 2008; Choi 2005; Christofoletti 2008; de Bruin 2007; Donat 2007; Lazowski 1999; MacRitchie 2001; Resnick 2009; Schoenfelder 2004; Sihvonen 2004; Sung 2009). However, Choi 2005 and Sung 2009 based this on an unadjusted analysis of a cluster study; Donat 2007 only reported within‐group comparisons; in three studies, benefit was only significant for some of their measures of balance (Choi 2005; Schoenfelder 2004; Sihvonen 2004); and in one study (Resnick 2009), there was a significant baseline imbalance with possible regression to the mean. The within‐group assessments of Donat 2007 found significant increases in balance in both the supervised and unsupervised exercise groups, but there was no usual care control group for comparison. Successful interventions included a combination of strength and balance exercises (Cheung 2008; Christofoletti 2008; de Bruin 2007; Lazowski 1999), strengthening exercises with dancing to music and health education (Sung 2009), and standing and walking activities performed to music (MacRitchie 2001). However, 14 studies were unable to demonstrate any effect of their programme on balance at the end of the intervention (Au‐Yeung 2002; Baum 2003; Clark 1975; Crilly 1989; Dorner 2007; Kerse 2008; McMurdo 1993; Morris 1999; Mulrow 1994; Rolland 2007; Rosendahl 2006; Sauvage 1992; Schoenfelder 2000; Urbscheit 2001). Urbscheit 2001 suggested this was due to initial balance ability, with participants in poorer health unable to improve. Morris 1999 suggested some rehabilitation interventions may cause harm to the balance of elderly residents of long‐term care: They found their nursing rehabilitation intervention group's balance deteriorated significantly compared to their control and 'fit for your life' groups. Two studies did not report the results of their balance assessments (Brill 1998; Lee 2009).

Eight studies conducted long‐term follow up of balance (Au‐Yeung 2002; Clark 1975; Kerse 2008; Rosendahl 2006; Schoenfelder 2000; Schoenfelder 2004; Sihvonen 2004; Urbscheit 2001). Results at follow‐up were typically similar to those at the end of the intervention, with significant differences for some measures of balance found by Schoenfelder 2004 and Sihvonen 2004, and no evidence of effect in five studies (Au‐Yeung 2002; Clark 1975; Kerse 2008; Schoenfelder 2000; Urbscheit 2001). Only Rosendahl 2006 found a different result at follow‐up: While differences in balance were not significant at the end of the intervention, there was significant improvement in their experimental group's balance at follow‐up. This was in an unadjusted analysis of a cluster study.

Mood

Fifteen studies assessed mood (Brill 1998; Brittle 2009; Brown 2004; Buettner 1997; Chin A Paw 2004; Dorner 2007; Kerse 2008; MacRitchie 2001; McMurdo 1993; Meuleman 2000; Mihalko 1996; Morris 1999; Mulrow 1994; Rolland 2007; Sung 2009). Five studies reported significant differences in mood at the end of the intervention, favouring the experimental group (P < 0.05), for depression (Brill 1998; Buettner 1997; McMurdo 1993), anxiety (Brill 1998), self‐esteem (Sung 2009), and loneliness (Brown 2004), although two of these studies limited these conclusions to within‐group comparisons (Brill 1998; Brown 2004), while two studies were cluster trials that did not adjust their analysis for the design (McMurdo 1993; Sung 2009). By contrast, Kerse 2008 found participants became significantly more depressed during the course of the intervention, while the control group did not: this increase was concentrated among cognitively‐impaired participants. Ten studies found no significant difference in depression (Brittle 2009; Chin A Paw 2004; Dorner 2007; MacRitchie 2001; Meuleman 2000; Morris 1999; Mulrow 1994; Rolland 2007; Sung 2009) or positive and negative affect (Mihalko 1996).

Three studies conducted long‐term follow up of mood (Brittle 2009; Kerse 2008; Meuleman 2000). Results were the same as at the end of the intervention, with no significant improvement in mood, while Kerse 2008 found intervention participants became significantly more depressed.

Cognitive status

Eleven studies assessed cognitive performance (Baum 2003; Buettner 1997; Christofoletti 2008; Dorner 2007; McMurdo 1993; McMurdo 1994; Mulrow 1994; Pomeroy 1993; Schoenfelder 2000; Schoenfelder 2004; Stevens 2006), nine of which used the Mini‐Mental State Examination (MMSE). Three studies, at the end of the intervention, identified significant differences in cognitive performance (at P < 0.05) (Buettner 1997; Christofoletti 2008; Stevens 2006), although these results should be interpreted with caution. In Buettner 1997, the control group's cognition declined consistently, in contrast to the experimental group, but there was significant baseline imbalance, and at no point did the experimental group score higher than the control, suggesting regression to the mean. In Christofoletti 2008, the significant difference was only for two of the eight subscales of the Brief Cognitive Screening Battery, not its overall measure or for the MMSE, and it was described by the authors as probably fortuitous. In Stevens 2006, a comparison of their experimental group with their social‐visit control group was significant, but comparison of the experimental group with the no‐intervention control group was not. Within‐group comparisons revealed statistically significant changes in the social‐visit group only (significant decline). Five studies found no significant difference in cognition at the end of the intervention (Dorner 2007; McMurdo 1993; McMurdo 1994; Mulrow 1994; Schoenfelder 2004). Schoenfelder 2000 did not report results. Baum 2003 assessed cognition, but only tested significance in combination with three other outcomes to avoid multiple hypothesis testing. They reported an effect size of 0.54 (3.1 points better on the MMSE; 90% CI 0.15 to 0.92). Pomeroy 1993 did not analyse possible effect on cognition.

Two studies conducted long‐term follow up of cognitive status (Schoenfelder 2000; Schoenfelder 2004), although only Schoenfelder 2004 reported results, finding no significant difference, as at the end of the intervention.

Exercise tolerance

Three studies examined the effect of interventions on exercise tolerance (Naso 1990; Sauvage 1992; Schnelle 1996); four other studies examined the effect of interventions on the quantity of exercise conducted (see the section below, 'Approaches to increase intervention compliance or quantity') (DeKuiper 1993; Lang 1992; Riccio 1990; Yoder 1989). The intervention condition had significantly greater exercise tolerance than the control group in one study (Schnelle 1996). In two studies, there was no significant difference between groups (Naso 1990; Sauvage 1992). None of the studies examined long‐term effects.

Perceived health status

Six studies examined perceived health status (Bruyere 2005; Chin A Paw 2004; Kerse 2008; Lee 2009; Mulrow 1994; Peri 2008). The rehabilitation group had significantly greater perceived health (P < 0.05) than the control group at the end of the intervention in three studies (Bruyere 2005; Lee 2009; Peri 2008). However, in Bruyere 2005, there was significant difference for eight subscales of the Short Form‐36 (SF‐36), but not health change; while in Peri 2008, it was limited to the physical, but not mental, component; and in Lee 2009, it was only after adjustment for resident satisfaction, in a cluster trial that did not adjust the analysis to account for the design. Two studies showed no significant difference between groups (Kerse 2008; Mulrow 1994). In Chin A Paw 2004, there was a significant decline in perceived health among the intervention group, although this was not significant among regular attenders to the exercise sessions. Kerse 2008 and Peri 2008 examined long‐term effects following the withdrawal of external nursing support; neither found a significant difference.

Fear of falling

Six studies measured fear of falling (Brill 1998; Choi 2005; Donat 2007; Kerse 2008; Schoenfelder 2000; Schoenfelder 2004). In Choi 2005, there was a significant difference between the groups in favour of the experimental group at the end of the intervention, but this was a cluster trial that did not adjust the analysis to account for the design. Three studies reported no significant difference (Donat 2007; Schoenfelder 2000; Schoenfelder 2004). Two studies did not report a statistical comparison (Brill 1998; Kerse 2008), which was explained by Kerse 2008 as being due to significant missing data because participants found it difficult to assign a number to their fear of falling. Three studies conducted long‐term follow up (Kerse 2008; Schoenfelder 2000; Schoenfelder 2004), but as at the end of the intervention, it was not significant for two (Schoenfelder 2000; Schoenfelder 2004) and not analysed in Kerse 2008, as described above.

Economics

No study performed a full cost‐benefit analysis, but three studies assessed costs (Mulrow 1994; Przybylski 1996; Schnelle 2002). Mulrow 1994 compared average costs of their one‐to‐one physical therapy intervention (USD 1220, 95% CI USD 412 to USD 1832) and their control, friendly visits, (USD 189, 95% CI USD 80 to USD 298) over four months. They also found that other healthcare charges did not differ significantly between the groups (average USD 11,398), the majority of which (81%) were nursing‐home charges. Przybylski 1996 calculated the cost of providing their enhanced level physiotherapy and occupational therapy service as well as direct‐care nursing costs from case‐mix measures and found that reductions in nursing costs outweighed the cost of their service by USD 283 per bed per year. However, they did not test significance or perform sensitivity analyses. Schnelle 2002 compared the costs of evaluating and treating acute events between groups and found no significant difference as a result of their intervention. They also calculated that there would be insufficient staff resources to implement their FIT intervention at a ratio of 10 residents to one nursing aide.

Intervention compliance and feasibility

Many studies failed to report either intervention or control session attendance. Twenty‐four studies reported experimental intervention session attendance, with a mean of 83% and only Cheung 2008 reporting 100%. Twelve studies reported control session attendance, with a mean of 82%; only Fiatarone 1994 reported 100% attendance. Varying attendance levels may enhance the apparent treatment effect in favour of the experimental intervention. Session attendance was irrelevant where interventions were not provided in discreet sessions, for example, the FIT studies, repeated measures designs, or for control groups that used 'usual care'. Resnick 2009 suggested additional measures of treatment fidelity for future studies and at each stage in the process, for example, training of providers and delivery as well as receipt.

Taboonpong 2008 reported that 4 of the 35 participants in the exercise group could not maintain the Tai Chi schedule. Similarly, Chin A Paw 2004 reported that 8 of 173 participants found the intervention "too intensive" and discontinued it. Brittle 2009 reported that cognitive impairment in 9 of 28 participants either rendered them unable to follow the instructions or disruptive. Peri 2008 reported that varying adherence across sites, in a programme implemented by care‐home staff, appeared to be related to resource.

Approaches to increase intervention compliance or quantity

Four trials investigated different ways of maximising compliance, the amount of exercise a participant took, or both. Two studies (Riccio 1990; Yoder 1989) found verbally elicited imagery of purposeful activity resulted in more exercise than rote repetitions (P < 0.05). Two studies (DeKuiper 1993; Lang 1992) found that participants exercised more when engaged in activity with a real object compared to an imaginary one. This suggests that adding purpose and asking participants to work with an actual object is an effective way of increasing exercise quantity. Similarly, including conversation during walking exercises improved compliance (Tappen 1994; Tappen 2000), preventing the physical decline observed in the conversation‐only and walk‐only groups. Donat 2007 compared supervised and unsupervised exercise, with four unsupervised and two supervised participants giving up (21 in each group). Karl 1982 argued that perceived irrelevance of the intervention to participants' lives was the main cause of lack of success, and proposed that individualised interventions might have been more effective.

Adverse events

Few studies reported adverse events that were directly attributable to their intervention. Many reported morbidity and mortality for their participants during the trial period. However, morbidity and mortality should be expected among this population because of their age and often poor physical condition, making causality difficult to establish. The studies assessing whole body vibration reported some adverse events. Bautmans 2005 reports one participant developing a phobia of the treatment room. Other adverse events included the following: one case of groin pain (Bautmans 2005) and two cases of lower limb tingling (Bruyere 2005). Among other intervention types, few reported any problems. One of the only other studies to report adverse events was in the study by Rosendahl, et al (191 participants) of high intensity functional exercise and nutritional supplementation (Rosendahl 2006). They reported that adverse events occurred in 9% of 1906 sessions. Of these, they classified only two as major: one case of chest pain and another of loss of balance, neither of which led to manifest injury or disease. Mulrow 1994 found the intervention group suffered more, and more serious, falls, although this was not statistically significant. Rolland 2007 reported five falls occurring during the exercise sessions, one causing a head injury, although there was no significant difference in the number of falls between the groups over the 12‐month programme. Six other studies found no significant difference in the number of falls between groups (Cheung 2008; Choi 2005; Faber 2006; Kerse 2008; MacRitchie 2001; Peri 2008).

Morbidity and mortality

Twenty‐nine studies reported mortality within each group at the end of the intervention period, or we inferred it from reports of attrition. Fourteen of these studies were cluster trials, and we did not identify a suitable ICC by which to adjust the results; therefore, we presented unadjusted counts and events. Brittle 2009 reported mortality per group, but we did not include this in the meta‐analysis because reports were at three and six months post‐baseline, rather than at the end of the five‐week intervention period. The meta‐analysis for the 25 rehabilitation studies versus the control studies showed no evidence of an effect from a physical rehabilitation intervention (Analysis 1.6) (P = 0.5), with the risk ratio slightly favouring the rehabilitation group (0.95, 95% CI 0.8 to 1.1). There was almost no excess between‐study heterogeneity (I² = 0%, Q = 11 on 17df, P = 0.9). There is little evidence of asymmetry in the funnel plot (Figure 6). Prespecified sensitivity analyses also yielded no evidence of an effect with alternative methods (odds ratio (OR), Analysis 1.52; risk difference, Analysis 1.53; fixed‐effect, Analysis 1.54; and Peto odds ratio, Analysis 1.55). Excluding cluster trials resulted in a similar risk ratio, but with wider confidence intervals (0.93, 95% CI 0.60 to 1.44, P = 0.8) (Analysis 1.56). Results of a post‐hoc sensitivity analysis including Brittle 2009 were very similar to the primary analysis (Analysis 1.57) (P = 0.5). None of the prespecified subgroup analyses suggested differential mortality between studies based on risk of bias (Analysis 1.38) (P = 0.4), duration of intervention (Analysis 1.39) (P = 0.5), mode of delivery (Analysis 1.40) (P = 0.6), age (Analysis 1.41) (P = 0.4), or gender (Analysis 1.42) (P = 0.9). Four studies contributed to the meta‐analysis of rehabilitation (experimental) versus rehabilitation (control), but only one death was reported across these studies, leading to no evidence of an effect (Analysis 2.2) (risk ratio = 2.7, 95% CI 0.1 to 61, P = 0.5).


Funnel plot of comparison: 1 Rehabilitation versus control, outcome: 1.6 Death

Funnel plot of comparison: 1 Rehabilitation versus control, outcome: 1.6 Death

Several studies reported hospitalisations. Rolland 2007 reported a significantly increased number of hospitalisations per participant within the exercise group (at 12 months, 0.6 (1.3) versus 0.2 (0.6), P = 0.04). Meuleman 2000 found significantly fewer hospitalisations and significantly fewer days admitted to hospital among the intervention group compared to the control group (at 12 months, 0.2 versus 0.7, P = 0.005; and 2.3 versus 7.6, P = 0.005, respectively). Kerse 2008 and Schnelle 2002 found no significant difference.

Discussion

Summary of main results

The present studies provide preliminary evidence that physical rehabilitation interventions may be associated with significant improvements across various measures of physical and mental functioning, without increasing the mortality risk in elderly care‐home residents. This is traditionally regarded as a group that is hard to research, but this review has found a substantial body of evidence. Many studies concluded that their intervention was both successful and safe, achieving their study goals. However, these are mostly explanatory trials that require replication in routine care and direct comparison between different interventions. At present, there is no clear indication of the optimum type of intervention.

Activities of daily living

There is some evidence that activities of daily living (ADL) independence and performance in this population are enhanced, or decline less, through physical rehabilitation interventions when compared with usual care. All of the point estimates for measures of ADL for which we performed meta‐analyses favour rehabilitation, and two have statistically significant random‐effects estimates (Barthel Index (BI), Rivermead Mobility Index (RMI)). The fixed‐effect models are significant for all measures except the Functional Independence Measure (FIM), although the heterogeneity expected and observed in many of the analyses suggest we should consider fixed‐effect estimates with caution. The RMI estimate, which is significant in both models, is of note for only pooling studies of lower risk of bias. In each of the analyses of independence scales, the point estimates of the effect were approximately 5% of the scale total. For the Timed Up and Go (TUG) test, the point estimate was approximately 15% of the mean baseline time, while for walking speed, the point estimate was approximately 5% of the mean baseline speed. While these are not large effects, they at least imply a stabilisation of function. It should be remembered that these are estimates of the average intervention effect, which may vary widely with some interventions resulting in smaller or larger effects. Of interest is the large difference between TUG test estimate and walking speed estimate, which could conceivably suggest a greater effect of rehabilitation on standing up, sitting down, or turning around than walking speed. Alternatively, it may relate to differences in the participants, interventions, or other study features.

The subgroup analyses did not provide clear evidence across measures of sources of heterogeneity in effects, although the small number of studies in each subgroup hampered this. For all but one of the measures, there were greater estimates of effect in studies with participants with better baseline function, but this was only significant in one analysis (FIM). There was some evidence that shorter interventions had larger effects than longer interventions based on the BI and TUG test, but not other outcome measures. We did not perform subgroup analyses of interventions, which varied widely. It is plausible that some of the heterogeneity observed is related to differences in effect between interventions.

Secondary outcomes

Many of the studies measuring strength, flexibility, and balance found significant differences favouring the intervention. There was little evidence about the effect on exercise tolerance and perceived health status. There was some evidence of effect on mood and little evidence of effect on cognition and fear of falling. However, it should be noted that we excluded interventions primarily targeting improvements in cognitive, psychological, or psychosocial outcomes and multi‐faceted falls interventions from this review. Therefore, it is possible that physical interventions other than those in the included studies would show greater evidence of effect on these outcomes or outcomes such as quality of life. There was very limited economic evidence and no cost‐benefit analyses among the trials. Evidence from several trials suggests that ensuring an intervention is perceived as relevant and important by participants may be crucial to its success.

Adverse outcomes

The meta‐analysis of mortality provides good evidence that rehabilitation does not increase mortality risk. Subgroup analyses also suggested there were not different effects among different types of participant (age or gender). There was relatively little evidence about other adverse outcomes. Most trials included very frail elderly individuals, among whom relatively high rates of morbidity and mortality would be expected, and high morbidity was often reported at baseline.

Overall completeness and applicability of evidence

Dominance of North American research

Of the 67 included studies, 36 took place in North America. This may be problematic if there are large differences in the nature of long‐term care in North America or in the characteristics, such as age and physical condition, of the people who receive the intervention, when compared with Europe or the rest of the world. As a consequence, the present findings may be difficult to apply to long‐term care settings elsewhere. However, the increase in nationalities represented in this update is welcome. We have described the characteristics of the participants and the interventions. The interventions may be effective in this frail elderly client group regardless of location of care, but this hypothesis remains to be tested.

Participant representativeness

The extent to which participants in the included studies are representative of the wider population residing in long‐term care is unclear. This may present more of a problem where sample sizes were small, participant attrition was high, or both. It is notable that where studies did report the number of eligible individuals within the facility, on average they excluded more than half of its residents and less than one quarter of residents ultimately participated. This might suggest that the participant sample is not representative of the wider long‐term care population. However, some studies included participants with multiple comorbidities and severe physical and cognitive disabilities.

Participant variation

There is substantial variation in the physical condition and mental health of people aged over 65 years in long‐term care. It is improbable that the same intervention will be appropriate for all people. However, the subgroup analyses failed to identify clear differences in effect between different studies based on participant characteristics.

Economics

A convincing economic case for rehabilitation has yet to be made. Conceptually, it seems reasonable: improving physical condition should reduce ill health, reducing the burden of the individual on health care, the need for hospital treatment, and intensive personal care. Evidence for this would have to demonstrate that the absolute cost of the intervention is less than the amount the individual would cost if they remained in the same condition or deteriorated. A further effect to consider in an economic analysis is the additional cost of increased length of stay in long‐term care that may result from a rehabilitative intervention increasing life expectancy. Consideration would also have to be given to the variety of funding models. Because of the variation between individuals in resource use, we will require large trials to evaluate economic arguments. Widespread provision of interventions, however effective they are in practical terms, are only likely to occur once a viable financial case has been demonstrated. However, benefits may go beyond reductions in healthcare costs to improvements in quality of life; these should be quantified and accounted for in future economic analysis.

Research conducted among the long‐term care population may also be informative and applicable to similarly frail elderly people residing in the community. While none of the present trials investigated this adequately, it is reasonable to include it in future research.

Quality of the evidence

Overall, we included 67 studies, featuring 6300 participants, in this review. Within the analyses of specific outcomes, these numbers were reduced as each study only contributed data to some comparisons. Between three and nine studies contributed to each meta‐analysis of ADL outcome measures. The direction of the effect estimates in these meta‐analyses was consistently in favour of rehabilitation, though not always statistically significant. Twenty‐five studies contributed to the meta‐analysis of mortality where rehabilitation was compared with control.

Risk of bias

It is possible that biases have resulted in overestimation of the effects. Most of the included studies had unclear or high risk of bias across most categories. Blinding of participants and personnel was particularly problematic, a common limitation of trials of rehabilitative interventions. The risk of selective reporting was also often unclear, in part due to the range of different outcomes measured. A large number of studies also had substantially incomplete outcome data, often due to high and differential rates of attrition. However, there was evidence of an effect on the RMI among studies with the lowest risk of bias in this review. Yet, for the three measures where lower and higher risk of bias studies could be compared (BI, TUG test, and walking speed), lower estimates of effect were found in studies with lower risk of bias. However, this was only significant for one analysis (walking speed), and each comparison included only one or two lower risk studies. Based on funnel plots and Egger's test, there was little evidence of small studies effects.

Trial diversity

It was disappointing that the huge variety of outcome measures used precluded a comprehensive meta‐analysis. While creative variation in interventions is desirable for promoting innovation, the extent of the diversity among these trials, in both interventions and in the extensive number of outcome measures used, is highly problematic. A particular obstacle was the small number of trials replicating previous work. Where replications occurred, most often the same research group within the same location undertook them.

Intervention fidelity

High levels of participant attrition and poor compliance with the intervention's demands were a fairly frequent problem among these trials. This is understandable; many participants would have been unused to activity and physically frail, making them vulnerable to illness and limiting their life expectancy. Many researchers reported reluctance to comply with intervention demands and felt this apathy adversely affected the trial. While it is impossible to prevent attrition through illness and death, it should be possible to improve motivation and compliance with interventions; enjoyment of, and satisfaction with, the intervention among participants should be a priority, especially if long‐term and widespread provision is ultimately intended. Ways of achieving this might include ensuring that participants perceive the intervention to be both relevant and beneficial to their lives. Many trials included social elements in both the intervention and the control group; the relationship between use of such methods and compliance requires further exploration. Incorporating the therapy into daily activities as opposed to discreet sessions also warrants closer attention.

Long‐term follow up

The lack of postintervention follow up is problematic. Among the trials that did follow participants after the intervention (for a maximum of one year, most often three months), there was frequently no finding of intervention benefits. However, this was also often the case in these studies at the end of the intervention. It is hard to justify provision of any short‐term rehabilitation intervention if any benefits the individual gains dissipate as soon as it ends. However, if benefits are sustained while the intervention remains in place, the economic and practical viability of long‐term or indefinite provision need to be assessed. Moreover, some studies addressed interventions that were designed to become self‐sustaining, delivered by care‐home staff after an initial training and support period. Future research should follow participants for a reasonable period postintervention to clarify the durability of improvements and whether some participants require some type of long‐term maintenance. If this is the case, interventions should be designed with long‐term provision as a clear consideration and sustainability of the programmes evaluated.

Cluster trials

While 19 of the studies used cluster randomisation, only six of the studies adjusted for this in their analysis. Where we have been unable to adjust these estimates (walking speed and mortality outcomes), the cluster studies are likely to have overly narrow confidence intervals and receive excess weight in the meta‐analyses. Excluding cluster trials from the meta‐analyses typically resulted in an increase in effect size, although for walking speed the estimated difference decreased. The use of cluster randomisation for this type of intervention and setting will often be appropriate, as the approach can help researchers to guard against contamination and identification of the experimental intervention by staff and residents. It is also possible that some interventions may have an effect at the group level, perhaps acting through culture or opportunities to socialise, although there was no evidence of this in the sensitivity analyses conducted.

Potential biases in the review process

We identified a considerable amount of literature for this systematic review, providing confidence in our search strategy and indicating the wealth of innovative research. The 67 included studies included 6300 participants. We have not included possible evidence from two studies in this review because they are awaiting translation (de Greef 2006; Sung 2007). A further 29 studies are awaiting assessment, and the additional information contained could have an important impact on the conclusions of the review. Identification of this volume of literature created its own problems. The included studies present an almost overwhelming number of different interventions, ranging from traditional exercise programmes to those requiring access to machinery, and the huge variety of outcome measures used hampered our ability to synthesise the evidence and compare the effectiveness of different interventions in different types of participant and in different circumstances. Two authors extracted all data, and this was combined automatically for numerical data where there was consensus and manually for qualitative data and conflicting results, giving confidence in its quality. We performed a variety of sensitivity analyses to evaluate the robustness of the outcomes of meta‐analyses. A relatively low number of studies contributed to our analysis of ADL outcomes because many studies reported a measure that could not be quantitatively combined with others. However, we do not believe this has biased results. We included five of the seven lower risk of bias studies. The 24 studies represented almost half of the participants from the 67 included studies (3139/6300). The new analyses provided an estimate of the effect size, reducing our optimism about the effectiveness of physical rehabilitation in this population expressed in the original version of this review.

Agreements and disagreements with other studies or reviews

Two other systematic reviews (Rydwik 2004a; Weening‐Dijksterhuis 2011) evaluated the effects of physical rehabilitation on elderly residents in long‐term care. Both suggest there is moderate to good evidence of effects on strength, mobility, and flexibility. Weening‐Dijksterhuis 2011 also concluded there were significant positive effects on balance and ADL, while Rydwik 2004a found contradictory evidence for these outcomes. The current review, including more studies overall and excluding multi‐faceted falls interventions, finds significant positive effects on all of these outcomes, although the effect size appears small. The current review also synthesises data on adverse outcomes and reports the results of meta‐analyses, which were not included in those reviews.

Review update flow diagram
Figures and Tables -
Figure 1

Review update flow diagram

'Risk of bias' graph: review authors' judgements about each 'Risk of bias' item presented as percentages across all included studies
Figures and Tables -
Figure 2

'Risk of bias' graph: review authors' judgements about each 'Risk of bias' item presented as percentages across all included studies

Funnel plot of comparison: 1 Rehabilitation versus control, outcome: 1.1 Barthel Index.
Figures and Tables -
Figure 3

Funnel plot of comparison: 1 Rehabilitation versus control, outcome: 1.1 Barthel Index.

Funnel plot of comparison: 1 Rehabilitation versus control, outcome: 1.4 TUG test
Figures and Tables -
Figure 4

Funnel plot of comparison: 1 Rehabilitation versus control, outcome: 1.4 TUG test

Funnel plot of comparison: 1 Rehabilitation versus control, outcome: 1.5 Walking speed
Figures and Tables -
Figure 5

Funnel plot of comparison: 1 Rehabilitation versus control, outcome: 1.5 Walking speed

Funnel plot of comparison: 1 Rehabilitation versus control, outcome: 1.6 Death
Figures and Tables -
Figure 6

Funnel plot of comparison: 1 Rehabilitation versus control, outcome: 1.6 Death

'Risk of bias' summary: review authors' judgements about each 'Risk of bias' item for each included study
Figures and Tables -
Figure 7

'Risk of bias' summary: review authors' judgements about each 'Risk of bias' item for each included study

Comparison 1 Rehabilitation versus control, Outcome 1 Barthel Index.
Figures and Tables -
Analysis 1.1

Comparison 1 Rehabilitation versus control, Outcome 1 Barthel Index.

Comparison 1 Rehabilitation versus control, Outcome 2 Functional Independence Measure (FIM).
Figures and Tables -
Analysis 1.2

Comparison 1 Rehabilitation versus control, Outcome 2 Functional Independence Measure (FIM).

Comparison 1 Rehabilitation versus control, Outcome 3 Rivermead Mobility Index (RMI).
Figures and Tables -
Analysis 1.3

Comparison 1 Rehabilitation versus control, Outcome 3 Rivermead Mobility Index (RMI).

Comparison 1 Rehabilitation versus control, Outcome 4 Timed Up and Go (TUG) Test.
Figures and Tables -
Analysis 1.4

Comparison 1 Rehabilitation versus control, Outcome 4 Timed Up and Go (TUG) Test.

Comparison 1 Rehabilitation versus control, Outcome 5 Walking speed.
Figures and Tables -
Analysis 1.5

Comparison 1 Rehabilitation versus control, Outcome 5 Walking speed.

Comparison 1 Rehabilitation versus control, Outcome 6 Death.
Figures and Tables -
Analysis 1.6

Comparison 1 Rehabilitation versus control, Outcome 6 Death.

Comparison 1 Rehabilitation versus control, Outcome 7 Barthel Index (by risk of bias).
Figures and Tables -
Analysis 1.7

Comparison 1 Rehabilitation versus control, Outcome 7 Barthel Index (by risk of bias).

Comparison 1 Rehabilitation versus control, Outcome 8 Barthel Index (by duration of intervention).
Figures and Tables -
Analysis 1.8

Comparison 1 Rehabilitation versus control, Outcome 8 Barthel Index (by duration of intervention).

Comparison 1 Rehabilitation versus control, Outcome 9 Barthel Index (by mode of delivery).
Figures and Tables -
Analysis 1.9

Comparison 1 Rehabilitation versus control, Outcome 9 Barthel Index (by mode of delivery).

Comparison 1 Rehabilitation versus control, Outcome 10 Barthel Index (by baseline Barthel Index score).
Figures and Tables -
Analysis 1.10

Comparison 1 Rehabilitation versus control, Outcome 10 Barthel Index (by baseline Barthel Index score).

Comparison 1 Rehabilitation versus control, Outcome 11 Barthel Index (by age).
Figures and Tables -
Analysis 1.11

Comparison 1 Rehabilitation versus control, Outcome 11 Barthel Index (by age).

Comparison 1 Rehabilitation versus control, Outcome 12 Barthel Index (by gender).
Figures and Tables -
Analysis 1.12

Comparison 1 Rehabilitation versus control, Outcome 12 Barthel Index (by gender).

Comparison 1 Rehabilitation versus control, Outcome 13 Functional Independence Measure (by risk of bias).
Figures and Tables -
Analysis 1.13

Comparison 1 Rehabilitation versus control, Outcome 13 Functional Independence Measure (by risk of bias).

Comparison 1 Rehabilitation versus control, Outcome 14 Functional Independence Measure (by duration of intervention).
Figures and Tables -
Analysis 1.14

Comparison 1 Rehabilitation versus control, Outcome 14 Functional Independence Measure (by duration of intervention).

Comparison 1 Rehabilitation versus control, Outcome 15 Functional Independence Measure (by mode of delivery).
Figures and Tables -
Analysis 1.15

Comparison 1 Rehabilitation versus control, Outcome 15 Functional Independence Measure (by mode of delivery).

Comparison 1 Rehabilitation versus control, Outcome 16 Functional Independence Measure (by baseline FIM score).
Figures and Tables -
Analysis 1.16

Comparison 1 Rehabilitation versus control, Outcome 16 Functional Independence Measure (by baseline FIM score).

Comparison 1 Rehabilitation versus control, Outcome 17 Functional Independence Measure (by age).
Figures and Tables -
Analysis 1.17

Comparison 1 Rehabilitation versus control, Outcome 17 Functional Independence Measure (by age).

Comparison 1 Rehabilitation versus control, Outcome 18 Functional Independence Measure (by gender).
Figures and Tables -
Analysis 1.18

Comparison 1 Rehabilitation versus control, Outcome 18 Functional Independence Measure (by gender).

Comparison 1 Rehabilitation versus control, Outcome 19 Rivermead Mobility Index (by risk of bias).
Figures and Tables -
Analysis 1.19

Comparison 1 Rehabilitation versus control, Outcome 19 Rivermead Mobility Index (by risk of bias).

Comparison 1 Rehabilitation versus control, Outcome 20 Rivermead Mobility Index (by duration of intervention).
Figures and Tables -
Analysis 1.20

Comparison 1 Rehabilitation versus control, Outcome 20 Rivermead Mobility Index (by duration of intervention).

Comparison 1 Rehabilitation versus control, Outcome 21 Rivermead Mobility Index (by mode of delivery).
Figures and Tables -
Analysis 1.21

Comparison 1 Rehabilitation versus control, Outcome 21 Rivermead Mobility Index (by mode of delivery).

Comparison 1 Rehabilitation versus control, Outcome 22 Rivermead Mobility Index (by baseline RMI score).
Figures and Tables -
Analysis 1.22

Comparison 1 Rehabilitation versus control, Outcome 22 Rivermead Mobility Index (by baseline RMI score).

Comparison 1 Rehabilitation versus control, Outcome 23 Rivermead Mobility Index (by age).
Figures and Tables -
Analysis 1.23

Comparison 1 Rehabilitation versus control, Outcome 23 Rivermead Mobility Index (by age).

Comparison 1 Rehabilitation versus control, Outcome 24 Rivermead Mobility Index (by gender).
Figures and Tables -
Analysis 1.24

Comparison 1 Rehabilitation versus control, Outcome 24 Rivermead Mobility Index (by gender).

Comparison 1 Rehabilitation versus control, Outcome 25 TUG Test (by risk of bias).
Figures and Tables -
Analysis 1.25

Comparison 1 Rehabilitation versus control, Outcome 25 TUG Test (by risk of bias).

Comparison 1 Rehabilitation versus control, Outcome 26 TUG Test (by duration of intervention).
Figures and Tables -
Analysis 1.26

Comparison 1 Rehabilitation versus control, Outcome 26 TUG Test (by duration of intervention).

Comparison 1 Rehabilitation versus control, Outcome 27 TUG Test (by mode of delivery).
Figures and Tables -
Analysis 1.27

Comparison 1 Rehabilitation versus control, Outcome 27 TUG Test (by mode of delivery).

Comparison 1 Rehabilitation versus control, Outcome 28 TUG Test (by baseline TUG score).
Figures and Tables -
Analysis 1.28

Comparison 1 Rehabilitation versus control, Outcome 28 TUG Test (by baseline TUG score).

Comparison 1 Rehabilitation versus control, Outcome 29 TUG Test (by age).
Figures and Tables -
Analysis 1.29

Comparison 1 Rehabilitation versus control, Outcome 29 TUG Test (by age).

Comparison 1 Rehabilitation versus control, Outcome 30 TUG Test (by gender).
Figures and Tables -
Analysis 1.30

Comparison 1 Rehabilitation versus control, Outcome 30 TUG Test (by gender).

Comparison 1 Rehabilitation versus control, Outcome 31 Walking speed (by risk of bias).
Figures and Tables -
Analysis 1.31

Comparison 1 Rehabilitation versus control, Outcome 31 Walking speed (by risk of bias).

Comparison 1 Rehabilitation versus control, Outcome 32 Walking speed (by duration of intervention).
Figures and Tables -
Analysis 1.32

Comparison 1 Rehabilitation versus control, Outcome 32 Walking speed (by duration of intervention).

Comparison 1 Rehabilitation versus control, Outcome 33 Walking speed (by mode of delivery).
Figures and Tables -
Analysis 1.33

Comparison 1 Rehabilitation versus control, Outcome 33 Walking speed (by mode of delivery).

Comparison 1 Rehabilitation versus control, Outcome 34 Walking speed (by baseline walking speed).
Figures and Tables -
Analysis 1.34

Comparison 1 Rehabilitation versus control, Outcome 34 Walking speed (by baseline walking speed).

Comparison 1 Rehabilitation versus control, Outcome 35 Walking speed (by age).
Figures and Tables -
Analysis 1.35

Comparison 1 Rehabilitation versus control, Outcome 35 Walking speed (by age).

Comparison 1 Rehabilitation versus control, Outcome 36 Walking speed (by gender).
Figures and Tables -
Analysis 1.36

Comparison 1 Rehabilitation versus control, Outcome 36 Walking speed (by gender).

Comparison 1 Rehabilitation versus control, Outcome 37 Walking speed (by distance walked).
Figures and Tables -
Analysis 1.37

Comparison 1 Rehabilitation versus control, Outcome 37 Walking speed (by distance walked).

Comparison 1 Rehabilitation versus control, Outcome 38 Death (by risk of bias).
Figures and Tables -
Analysis 1.38

Comparison 1 Rehabilitation versus control, Outcome 38 Death (by risk of bias).

Comparison 1 Rehabilitation versus control, Outcome 39 Death (by duration of intervention).
Figures and Tables -
Analysis 1.39

Comparison 1 Rehabilitation versus control, Outcome 39 Death (by duration of intervention).

Comparison 1 Rehabilitation versus control, Outcome 40 Death (by mode of delivery).
Figures and Tables -
Analysis 1.40

Comparison 1 Rehabilitation versus control, Outcome 40 Death (by mode of delivery).

Comparison 1 Rehabilitation versus control, Outcome 41 Death (by age).
Figures and Tables -
Analysis 1.41

Comparison 1 Rehabilitation versus control, Outcome 41 Death (by age).

Comparison 1 Rehabilitation versus control, Outcome 42 Death (by gender).
Figures and Tables -
Analysis 1.42

Comparison 1 Rehabilitation versus control, Outcome 42 Death (by gender).

Comparison 1 Rehabilitation versus control, Outcome 43 Sensitivity analysis: Barthel Index (fixed‐effect).
Figures and Tables -
Analysis 1.43

Comparison 1 Rehabilitation versus control, Outcome 43 Sensitivity analysis: Barthel Index (fixed‐effect).

Comparison 1 Rehabilitation versus control, Outcome 44 Sensitivity analysis: Barthel Index (cluster trials).
Figures and Tables -
Analysis 1.44

Comparison 1 Rehabilitation versus control, Outcome 44 Sensitivity analysis: Barthel Index (cluster trials).

Comparison 1 Rehabilitation versus control, Outcome 45 Sensitivity analysis: Functional Independence Measure (fixed‐effect).
Figures and Tables -
Analysis 1.45

Comparison 1 Rehabilitation versus control, Outcome 45 Sensitivity analysis: Functional Independence Measure (fixed‐effect).

Comparison 1 Rehabilitation versus control, Outcome 46 Sensitivity analysis: Rivermead Mobility Index (fixed‐effect).
Figures and Tables -
Analysis 1.46

Comparison 1 Rehabilitation versus control, Outcome 46 Sensitivity analysis: Rivermead Mobility Index (fixed‐effect).

Comparison 1 Rehabilitation versus control, Outcome 47 Sensitivity analysis: TUG Test (fixed‐effect).
Figures and Tables -
Analysis 1.47

Comparison 1 Rehabilitation versus control, Outcome 47 Sensitivity analysis: TUG Test (fixed‐effect).

Comparison 1 Rehabilitation versus control, Outcome 48 Sensitivity anlaysis: TUG Test (cluster trials).
Figures and Tables -
Analysis 1.48

Comparison 1 Rehabilitation versus control, Outcome 48 Sensitivity anlaysis: TUG Test (cluster trials).

Comparison 1 Rehabilitation versus control, Outcome 49 Sensitivity analysis: TUG Test (re‐including Christofoletti 2008).
Figures and Tables -
Analysis 1.49

Comparison 1 Rehabilitation versus control, Outcome 49 Sensitivity analysis: TUG Test (re‐including Christofoletti 2008).

Comparison 1 Rehabilitation versus control, Outcome 50 Sensitivity analysis: Walking speed (fixed‐effect).
Figures and Tables -
Analysis 1.50

Comparison 1 Rehabilitation versus control, Outcome 50 Sensitivity analysis: Walking speed (fixed‐effect).

Comparison 1 Rehabilitation versus control, Outcome 51 Sensitivity analysis: Walking speed (cluster trials).
Figures and Tables -
Analysis 1.51

Comparison 1 Rehabilitation versus control, Outcome 51 Sensitivity analysis: Walking speed (cluster trials).

Comparison 1 Rehabilitation versus control, Outcome 52 Sensitivity analysis: Death (random‐effects: odds ratio).
Figures and Tables -
Analysis 1.52

Comparison 1 Rehabilitation versus control, Outcome 52 Sensitivity analysis: Death (random‐effects: odds ratio).

Comparison 1 Rehabilitation versus control, Outcome 53 Sensitivity analysis: Death (random‐effects: risk difference).
Figures and Tables -
Analysis 1.53

Comparison 1 Rehabilitation versus control, Outcome 53 Sensitivity analysis: Death (random‐effects: risk difference).

Comparison 1 Rehabilitation versus control, Outcome 54 Sensitivity analysis: Death (fixed‐effect).
Figures and Tables -
Analysis 1.54

Comparison 1 Rehabilitation versus control, Outcome 54 Sensitivity analysis: Death (fixed‐effect).

Comparison 1 Rehabilitation versus control, Outcome 55 Sensitivity analysis: Death (fixed‐effect: Peto odds ratio).
Figures and Tables -
Analysis 1.55

Comparison 1 Rehabilitation versus control, Outcome 55 Sensitivity analysis: Death (fixed‐effect: Peto odds ratio).

Comparison 1 Rehabilitation versus control, Outcome 56 Sensitivity analysis: Death (cluster trials).
Figures and Tables -
Analysis 1.56

Comparison 1 Rehabilitation versus control, Outcome 56 Sensitivity analysis: Death (cluster trials).

Comparison 1 Rehabilitation versus control, Outcome 57 Sensitivity analysis: Death (including Brittle 2009).
Figures and Tables -
Analysis 1.57

Comparison 1 Rehabilitation versus control, Outcome 57 Sensitivity analysis: Death (including Brittle 2009).

Comparison 2 Rehabilitation (experimental) versus rehabilitation (control), Outcome 1 TUG Test.
Figures and Tables -
Analysis 2.1

Comparison 2 Rehabilitation (experimental) versus rehabilitation (control), Outcome 1 TUG Test.

Comparison 2 Rehabilitation (experimental) versus rehabilitation (control), Outcome 2 Death.
Figures and Tables -
Analysis 2.2

Comparison 2 Rehabilitation (experimental) versus rehabilitation (control), Outcome 2 Death.

Comparison 2 Rehabilitation (experimental) versus rehabilitation (control), Outcome 3 Sensitivity analysis: TUG Test (fixed‐effect).
Figures and Tables -
Analysis 2.3

Comparison 2 Rehabilitation (experimental) versus rehabilitation (control), Outcome 3 Sensitivity analysis: TUG Test (fixed‐effect).

Comparison 1. Rehabilitation versus control

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1 Barthel Index Show forest plot

7

857

Mean Difference (Random, 95% CI)

6.38 [1.63, 11.12]

2 Functional Independence Measure (FIM) Show forest plot

4

303

Mean Difference (Random, 95% CI)

4.98 [‐1.55, 11.51]

3 Rivermead Mobility Index (RMI) Show forest plot

3

323

Mean Difference (Random, 95% CI)

0.69 [0.04, 1.33]

4 Timed Up and Go (TUG) Test Show forest plot

7

885

Mean Difference (Random, 95% CI)

‐4.59 [‐9.19, 0.01]

5 Walking speed Show forest plot

9

590

Mean Difference (Random, 95% CI)

0.03 [‐0.01, 0.07]

6 Death Show forest plot

25

3721

Risk Ratio (M‐H, Random, 95% CI)

0.95 [0.80, 1.13]

7 Barthel Index (by risk of bias) Show forest plot

7

Mean Difference (Random, 95% CI)

Subtotals only

7.1 lower risk of bias

2

275

Mean Difference (Random, 95% CI)

3.38 [‐2.10, 8.86]

7.2 higher risk of bias

5

582

Mean Difference (Random, 95% CI)

8.25 [1.15, 15.34]

8 Barthel Index (by duration of intervention) Show forest plot

7

Mean Difference (Random, 95% CI)

Subtotals only

8.1 shorter (< 3 months intervention)

2

46

Mean Difference (Random, 95% CI)

17.55 [6.97, 28.13]

8.2 longer (3+ months intervention)

5

811

Mean Difference (Random, 95% CI)

3.08 [‐0.03, 6.19]

9 Barthel Index (by mode of delivery) Show forest plot

7

Mean Difference (Random, 95% CI)

Subtotals only

9.1 group

4

256

Mean Difference (Random, 95% CI)

10.99 [1.51, 20.48]

9.2 individual

2

275

Mean Difference (Random, 95% CI)

3.38 [‐2.10, 8.86]

9.3 not reported

1

326

Mean Difference (Random, 95% CI)

2.19 [‐4.35, 8.73]

10 Barthel Index (by baseline Barthel Index score) Show forest plot

6

Mean Difference (Random, 95% CI)

Subtotals only

10.1 better (baseline Barthel Index score > median)

3

511

Mean Difference (Random, 95% CI)

7.94 [‐1.77, 17.64]

10.2 worse (baseline Barthel Index score < median)

3

305

Mean Difference (Random, 95% CI)

3.97 [‐0.83, 8.78]

11 Barthel Index (by age) Show forest plot

7

Mean Difference (Random, 95% CI)

Subtotals only

11.1 younger (mean age < 85 years)

4

552

Mean Difference (Random, 95% CI)

8.02 [‐0.25, 16.30]

11.2 older (mean age 85+ years)

3

305

Mean Difference (Random, 95% CI)

3.97 [‐0.83, 8.78]

12 Barthel Index (by gender) Show forest plot

7

Mean Difference (Random, 95% CI)

Subtotals only

12.1 < 80% female

4

402

Mean Difference (Random, 95% CI)

7.93 [0.18, 15.69]

12.2 80%+ female

3

455

Mean Difference (Random, 95% CI)

4.29 [‐1.25, 9.83]

13 Functional Independence Measure (by risk of bias) Show forest plot

4

Mean Difference (Random, 95% CI)

Subtotals only

13.1 lower risk of bias

0

0

Mean Difference (Random, 95% CI)

0.0 [0.0, 0.0]

13.2 higher risk of bias

4

303

Mean Difference (Random, 95% CI)

4.98 [‐1.55, 11.51]

14 Functional Independence Measure (by duration of intervention) Show forest plot

4

Mean Difference (Random, 95% CI)

Subtotals only

14.1 shorter (< 3 months intervention)

1

30

Mean Difference (Random, 95% CI)

2.0 [‐10.26, 14.26]

14.2 longer (3+ months intervention)

3

273

Mean Difference (Random, 95% CI)

5.85 [‐2.22, 13.93]

15 Functional Independence Measure (by mode of delivery) Show forest plot

4

Mean Difference (Random, 95% CI)

Subtotals only

15.1 group

3

240

Mean Difference (Random, 95% CI)

3.90 [‐3.08, 10.88]

15.2 individual

1

63

Mean Difference (Random, 95% CI)

11.76 [‐2.66, 26.18]

16 Functional Independence Measure (by baseline FIM score) Show forest plot

3

Mean Difference (Random, 95% CI)

Subtotals only

16.1 better (baseline FIM score > median)

2

95

Mean Difference (Random, 95% CI)

7.77 [1.39, 14.14]

16.2 worse (baseline FIM score < median)

1

145

Mean Difference (Random, 95% CI)

0.3 [‐1.73, 2.33]

17 Functional Independence Measure (by age) Show forest plot

4

Mean Difference (Random, 95% CI)

Subtotals only

17.1 younger (mean age < 85 years)

2

128

Mean Difference (Random, 95% CI)

9.91 [4.41, 15.42]

17.2 older (mean age 85+ years)

2

175

Mean Difference (Random, 95% CI)

0.35 [‐1.65, 2.34]

18 Functional Independence Measure (by gender) Show forest plot

4

Mean Difference (Random, 95% CI)

Subtotals only

18.1 < 80% female

2

93

Mean Difference (Random, 95% CI)

6.11 [‐3.33, 15.55]

18.2 80%+ female

2

210

Mean Difference (Random, 95% CI)

4.51 [‐4.56, 13.58]

19 Rivermead Mobility Index (by risk of bias) Show forest plot

3

Mean Difference (Random, 95% CI)

Subtotals only

19.1 lower risk of bias

3

323

Mean Difference (Random, 95% CI)

0.69 [0.04, 1.33]

19.2 higher risk of bias

0

0

Mean Difference (Random, 95% CI)

0.0 [0.0, 0.0]

20 Rivermead Mobility Index (by duration of intervention) Show forest plot

3

Mean Difference (Random, 95% CI)

Subtotals only

20.1 shorter (< 3 months intervention)

1

49

Mean Difference (Random, 95% CI)

0.6 [‐1.48, 2.68]

20.2 longer (3+ months intervention)

2

274

Mean Difference (Random, 95% CI)

0.69 [0.02, 1.37]

21 Rivermead Mobility Index (by mode of delivery) Show forest plot

3

Mean Difference (Random, 95% CI)

Subtotals only

21.1 group

1

49

Mean Difference (Random, 95% CI)

0.6 [‐1.48, 2.68]

21.2 individual

2

274

Mean Difference (Random, 95% CI)

0.69 [0.02, 1.37]

22 Rivermead Mobility Index (by baseline RMI score) Show forest plot

3

Mean Difference (Random, 95% CI)

Subtotals only

22.1 better (baseline RMI score > median)

2

235

Mean Difference (Random, 95% CI)

0.70 [0.01, 1.39]

22.2 worse (baseline RMI score < median)

1

88

Mean Difference (Random, 95% CI)

0.6 [‐1.17, 2.37]

23 Rivermead Mobility Index (by age) Show forest plot

3

Mean Difference (Random, 95% CI)

Subtotals only

23.1 younger (mean age < 85 years)

1

49

Mean Difference (Random, 95% CI)

0.6 [‐1.48, 2.68]

23.2 older (mean age 85+ years)

2

274

Mean Difference (Random, 95% CI)

0.69 [0.02, 1.37]

24 Rivermead Mobility Index (by gender) Show forest plot

3

Mean Difference (Random, 95% CI)

Subtotals only

24.1 < 80% female

2

235

Mean Difference (Random, 95% CI)

0.70 [0.01, 1.39]

24.2 80%+ female

1

88

Mean Difference (Random, 95% CI)

0.6 [‐1.17, 2.37]

25 TUG Test (by risk of bias) Show forest plot

7

Mean Difference (Random, 95% CI)

Subtotals only

25.1 lower risk of bias

1

556

Mean Difference (Random, 95% CI)

0.6 [‐5.36, 6.56]

25.2 higher risk of bias

6

329

Mean Difference (Random, 95% CI)

‐5.92 [‐11.29, ‐0.54]

26 TUG Test (by duration of intervention) Show forest plot

7

Mean Difference (Random, 95% CI)

Subtotals only

26.1 shorter (< 6 months intervention)

4

185

Mean Difference (Random, 95% CI)

‐7.34 [‐13.93, ‐0.75]

26.2 longer (6+ months intervention)

3

700

Mean Difference (Random, 95% CI)

0.13 [‐4.28, 4.53]

27 TUG Test (by mode of delivery) Show forest plot

7

Mean Difference (Random, 95% CI)

Subtotals only

27.1 group

4

154

Mean Difference (Random, 95% CI)

‐4.98 [‐10.74, 0.77]

27.2 individual

3

731

Mean Difference (Random, 95% CI)

‐4.56 [‐14.02, 4.90]

28 TUG Test (by baseline TUG score) Show forest plot

7

Mean Difference (Random, 95% CI)

Subtotals only

28.1 better (baseline TUG score < median)

4

185

Mean Difference (Random, 95% CI)

‐7.34 [‐13.93, ‐0.75]

28.2 worse (baseline TUG score > median)

3

700

Mean Difference (Random, 95% CI)

0.13 [‐4.28, 4.53]

29 TUG Test (by age) Show forest plot

7

Mean Difference (Random, 95% CI)

Subtotals only

29.1 younger (mean age < 85 years)

5

741

Mean Difference (Random, 95% CI)

‐5.39 [‐10.77, ‐0.00]

29.2 older (mean age 85+ years)

2

144

Mean Difference (Random, 95% CI)

‐5.40 [‐25.75, 14.96]

30 TUG Test (by gender) Show forest plot

7

Mean Difference (Random, 95% CI)

Subtotals only

30.1 < 80% female

3

594

Mean Difference (Random, 95% CI)

0.17 [‐3.90, 4.24]

30.2 80%+ female

4

291

Mean Difference (Random, 95% CI)

‐7.55 [‐14.28, ‐0.82]

31 Walking speed (by risk of bias) Show forest plot

9

Mean Difference (Random, 95% CI)

Subtotals only

31.1 lower risk of bias

1

75

Mean Difference (Random, 95% CI)

‐0.10 [‐0.21, 0.01]

31.2 higher risk of bias

8

515

Mean Difference (Random, 95% CI)

0.04 [0.01, 0.07]

32 Walking speed (by duration of intervention) Show forest plot

9

Mean Difference (Random, 95% CI)

Subtotals only

32.1 shorter (< 3 months intervention)

3

59

Mean Difference (Random, 95% CI)

0.24 [‐0.74, 1.22]

32.2 longer (3+ months intervention)

6

531

Mean Difference (Random, 95% CI)

0.02 [‐0.03, 0.08]

33 Walking speed (by mode of delivery) Show forest plot

9

Mean Difference (Random, 95% CI)

Subtotals only

33.1 group

7

475

Mean Difference (Random, 95% CI)

0.03 [‐0.02, 0.07]

33.2 individual

1

48

Mean Difference (Random, 95% CI)

0.26 [‐0.32, 0.83]

33.3 not reported

1

67

Mean Difference (Random, 95% CI)

‐0.03 [‐0.19, 0.13]

34 Walking speed (by baseline walking speed) Show forest plot

9

Mean Difference (Random, 95% CI)

Subtotals only

34.1 better (baseline walking speed > median)

5

198

Mean Difference (Random, 95% CI)

‐0.00 [‐0.15, 0.14]

34.2 worse (baseline walking speed < median)

4

392

Mean Difference (Random, 95% CI)

0.04 [0.01, 0.07]

35 Walking speed (by age) Show forest plot

9

Mean Difference (Random, 95% CI)

Subtotals only

35.1 younger (mean age < 85 years)

9

590

Mean Difference (Random, 95% CI)

0.03 [‐0.01, 0.07]

35.2 older (mean age 85+ years)

0

0

Mean Difference (Random, 95% CI)

0.0 [0.0, 0.0]

36 Walking speed (by gender) Show forest plot

9

Mean Difference (Random, 95% CI)

Subtotals only

36.1 < 80% female

5

437

Mean Difference (Random, 95% CI)

0.01 [‐0.04, 0.07]

36.2 80%+ female

4

153

Mean Difference (Random, 95% CI)

0.13 [‐0.02, 0.28]

37 Walking speed (by distance walked) Show forest plot

9

Mean Difference (Random, 95% CI)

Subtotals only

37.1 less far (< 6 metres)

2

185

Mean Difference (Random, 95% CI)

0.04 [0.01, 0.07]

37.2 further (6+ metres)

7

405

Mean Difference (Random, 95% CI)

0.01 [‐0.06, 0.09]

38 Death (by risk of bias) Show forest plot

25

Risk Ratio (M‐H, Random, 95% CI)

Subtotals only

38.1 lower risk of bias

6

1366

Risk Ratio (M‐H, Random, 95% CI)

1.05 [0.76, 1.46]

38.2 higher risk of bias

19

2355

Risk Ratio (M‐H, Random, 95% CI)

0.88 [0.71, 1.10]

39 Death (by duration of intervention) Show forest plot

25

Risk Ratio (M‐H, Random, 95% CI)

Subtotals only

39.1 shorter intervention (< 3 months)

10

663

Risk Ratio (M‐H, Random, 95% CI)

0.64 [0.18, 2.29]

39.2 longer intervention (3+ months)

15

3058

Risk Ratio (M‐H, Random, 95% CI)

0.95 [0.80, 1.14]

40 Death (by mode of delivery) Show forest plot

25

Risk Ratio (M‐H, Random, 95% CI)

Subtotals only

40.1 group

12

1007

Risk Ratio (M‐H, Random, 95% CI)

0.82 [0.46, 1.49]

40.2 individual

9

2172

Risk Ratio (M‐H, Random, 95% CI)

0.91 [0.70, 1.19]

40.3 group and individual

1

24

Risk Ratio (M‐H, Random, 95% CI)

5.0 [0.27, 94.34]

40.4 not reported

3

518

Risk Ratio (M‐H, Random, 95% CI)

1.00 [0.73, 1.36]

41 Death (by age) Show forest plot

25

Risk Ratio (M‐H, Random, 95% CI)

Subtotals only

41.1 younger (mean age < 85 years)

16

3001

Risk Ratio (M‐H, Random, 95% CI)

0.97 [0.81, 1.17]

41.2 older (mean age 85+ years)

9

720

Risk Ratio (M‐H, Random, 95% CI)

0.79 [0.49, 1.27]

42 Death (by gender) Show forest plot

25

Risk Ratio (M‐H, Random, 95% CI)

Subtotals only

42.1 < 80% female

12

2366

Risk Ratio (M‐H, Random, 95% CI)

0.98 [0.77, 1.25]

42.2 80%+ female

12

1340

Risk Ratio (M‐H, Random, 95% CI)

0.91 [0.71, 1.18]

42.3 not reported

1

15

Risk Ratio (M‐H, Random, 95% CI)

0.88 [0.16, 4.68]

43 Sensitivity analysis: Barthel Index (fixed‐effect) Show forest plot

7

857

Mean Difference (Fixed, 95% CI)

4.54 [1.59, 7.49]

44 Sensitivity analysis: Barthel Index (cluster trials) Show forest plot

7

Mean Difference (Random, 95% CI)

Subtotals only

44.1 cluster (adjusted)

5

811

Mean Difference (Random, 95% CI)

3.08 [‐0.03, 6.19]

44.2 individual

2

46

Mean Difference (Random, 95% CI)

17.55 [6.97, 28.13]

45 Sensitivity analysis: Functional Independence Measure (fixed‐effect) Show forest plot

4

303

Mean Difference (Fixed, 95% CI)

1.46 [‐0.42, 3.34]

46 Sensitivity analysis: Rivermead Mobility Index (fixed‐effect) Show forest plot

3

323

Mean Difference (Fixed, 95% CI)

0.69 [0.04, 1.33]

47 Sensitivity analysis: TUG Test (fixed‐effect) Show forest plot

7

885

Mean Difference (Fixed, 95% CI)

‐3.66 [‐5.86, ‐1.45]

48 Sensitivity anlaysis: TUG Test (cluster trials) Show forest plot

7

Mean Difference (Random, 95% CI)

Subtotals only

48.1 cluster (adjusted)

2

Mean Difference (Random, 95% CI)

0.51 [‐3.93, 4.95]

48.2 individual

5

Mean Difference (Random, 95% CI)

‐7.85 [‐14.34, ‐1.37]

49 Sensitivity analysis: TUG Test (re‐including Christofoletti 2008) Show forest plot

8

914

Mean Difference (Random, 95% CI)

‐8.41 [‐15.53, ‐1.29]

50 Sensitivity analysis: Walking speed (fixed‐effect) Show forest plot

9

590

Mean Difference (Fixed, 95% CI)

0.03 [0.00, 0.06]

51 Sensitivity analysis: Walking speed (cluster trials) Show forest plot

9

Mean Difference (Random, 95% CI)

Subtotals only

51.1 cluster (unadjusted)

1

Mean Difference (Random, 95% CI)

0.04 [0.01, 0.07]

51.2 individual

8

Mean Difference (Random, 95% CI)

0.01 [‐0.05, 0.08]

52 Sensitivity analysis: Death (random‐effects: odds ratio) Show forest plot

25

3721

Odds Ratio (M‐H, Random, 95% CI)

0.93 [0.75, 1.15]

53 Sensitivity analysis: Death (random‐effects: risk difference) Show forest plot

25

3721

Risk Difference (M‐H, Random, 95% CI)

‐0.01 [‐0.02, 0.01]

54 Sensitivity analysis: Death (fixed‐effect) Show forest plot

25

3721

Risk Ratio (M‐H, Random, 95% CI)

0.95 [0.80, 1.13]

55 Sensitivity analysis: Death (fixed‐effect: Peto odds ratio) Show forest plot

25

3721

Peto Odds Ratio (Peto, Fixed, 95% CI)

0.93 [0.75, 1.14]

56 Sensitivity analysis: Death (cluster trials) Show forest plot

25

Risk Ratio (M‐H, Random, 95% CI)

Subtotals only

56.1 cluster (unadjusted)

13

2644

Risk Ratio (M‐H, Random, 95% CI)

0.95 [0.79, 1.15]

56.2 individual

12

1077

Risk Ratio (M‐H, Random, 95% CI)

0.93 [0.60, 1.44]

57 Sensitivity analysis: Death (including Brittle 2009) Show forest plot

26

3777

Risk Ratio (M‐H, Random, 95% CI)

0.94 [0.79, 1.11]

Figures and Tables -
Comparison 1. Rehabilitation versus control
Comparison 2. Rehabilitation (experimental) versus rehabilitation (control)

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1 TUG Test Show forest plot

2

57

Mean Difference (Random, 95% CI)

‐7.95 [‐19.22, 3.31]

2 Death Show forest plot

4

118

Risk Ratio (M‐H, Random, 95% CI)

2.67 [0.12, 60.93]

3 Sensitivity analysis: TUG Test (fixed‐effect) Show forest plot

2

57

Mean Difference (Fixed, 95% CI)

‐7.19 [‐10.92, ‐3.46]

Figures and Tables -
Comparison 2. Rehabilitation (experimental) versus rehabilitation (control)