Introduction

Health care decisions are frequently directed by the best available evidence from research studies. As both diagnostic and treatment options multiply rapidly in health care, increases in available research bring with it challenges in identifying which studies are of the highest quality. This challenge is made more difficult when different studies provide results that support different conclusions or lead a field erroneously through bias in reporting.

To answer these challenges, researchers have developed methods to synthesize and evaluate research from multiple studies. Systematic reviews represent one method to rigorously compile scientific evidence to answer questions regarding the state of science in an area. They can help clinicians make decisions when similar studies present apparently confusing or conflicting results [1].

Different research designs shed different amounts of light on how treatments work under controlled conditions. Some designs provide information on the association between treatments and their outcomes, but do not control for the myriad of intervening or confounding issues surrounding treatment applications, e.g., age, gender, disease severity, co-interventions, enthusiasm, etc. One design that provides definitive evidence of intervention efficacy is the randomized controlled trial (RCT) [2].

The RCT is one of the simplest, but most powerful research designs available to a researcher to evaluate the efficacy of an intervention. The key feature of this design is that, after an assessment of eligibility, subjects in the study are randomly and independently assigned to receive one or other treatments under investigation. Once randomized the groups are followed in exactly the same way and only differences in response to the treatment they receive are compared. The power of the design lies in its ability to minimize bias in treatment allocation and balance prognostic factors between the groups. Well-conducted RCTs minimize bias by controlling known and unknown factors (confounders) that may distort treatment effects. Unfortunately, it is estimated that fewer than 10 % of all literature published are well-conducted RCTs, with <5 % reported from the area of rehabilitation [3, 4]. Despite the strength and benefits of RCTs, poorly conducted or poorly reported trials can yield misleading data and misdirect knowledge and development of a field.

Like other areas of health care, dysphagia intervention methods are diverse and rapidly advancing within the literature. Common categories of dysphagia rehabilitation methods include; behavioral maneuvers, behavioral compensations, exercise interventions, dietary interventions (including modified diets and enteral feeding methods), pharmaceutical applications, electro-physical applications, and sensory-motor applications to mention but a few. To date, evaluations of the strength of data supporting swallowing interventions have been equivocal. Some reviews identifying an increase in the quality of supportive data whilst others have admonished the field for a lack of supportive evidence and suggested the need for more rigorous study designs [5, 6] (Fig. 1).

Fig. 1
figure 1

Flow chart of study selection process

Recently, a published review of abstracts presented to an international dysphagia meeting (Dysphagia Research Society) over the last decade (2001–2011) suggested that there has been a surge in the use of more rigorous research designs to evaluate swallowing. In fact, they reported that the use of RCT methodologies in that group had tripled since 2001, and that at present RCTs constituted 3.3 % of all abstracts presented at this meeting [7]. However, data for this review were generated from study abstracts only and the authors did not evaluate the individual study rigor of each trial using a validated qualitative analysis system.

To further evaluate the use of rigor in recent RCT studies in the field of dysphagia rehabilitation, we proposed to undertake a systematic review of all RCTs of behavioral intervention for oro-pharyngeal dysphagia during the period from January 2010 to June 2013.

Methods

Study Identification

Studies were identified using searches on MEDLINE (PubMed), PsychInfo, Google Scholar, EBSCO, PROQUEST Web of Science, and grey literature. In addition, meeting abstracts and Cochrane Reviews and reviews of reference lists of related book chapters and journal articles from January 2010 to June 2013 were screened. A comprehensive list of search terms included; “swallowing intervention,” “swallowing treatment trial,” “randomized clinical trial”, “RCT”, “randomized controlled clinical trial”, “and swallowing rehabilitation trial”, “dysphagia rehabilitation”. MESH terms (PubMed) used included, “humans”, “dysphagia”, “swallowing trial”, “treatment”. For databases allowing advanced search techniques, articles were restricted to RCTs. Potentially eligible articles were first screened for relevance by the first author, using article titles and then abstracts. Articles that appeared to meet eligibility criteria after this initial review were then reviewed independently by the first and second authors, using the measures described below.

Inclusion Criteria

The current review included RCTs that assessed the impact of behavioral interventions to reduce or ameliorate oro-pharyngeal dysphagia. Due to the differences in intervention format and treatment goals for programs involving children and adolescents, the current review was restricted to studies investigating adults (age ≥18) only. To be eligible for inclusion, the following inclusion/exclusion criteria were used: (1) must include a randomized control trial design; (2) published in English or capable of being translated; (3) accessible in full text; (4) evaluated an oro-pharyngeal swallowing intervention; (5) assessed a measurable swallowing outcome; (6) did not evaluate a surgical interventions; (7) did not evaluate a pharmaceutical intervention; and (8) was not a device trial.

RCT

For the purposes of this systematic review, an RCT was defined a priori as a study in which subjects are randomly allocated (by chance alone) to receive one of several clinical swallowing interventions. One of these interventions must be a standard of comparison or control. This control group should be receiving either no treatment or receive the current standard treatment. The use of a placebo comparison only without a standard control comparison will be considered a quasi-experimental trial, as the placebo effect alone cannot be adequately identified. Furthermore, for the purposes of qualitative grading, reviewers were advised that an acceptable randomization procedure should include either use of computer generated random numbers or random numbers tables. In addition, concealed allocation should be described as either use of a centralized or independently controlled randomization procedure, serially-numbered identical containers, or randomization sequences not readable until allocation.

Data Extraction

The first author (G.C.) systematically abstracted data from all articles reviewed in full text including authors, year in which the study was published, country in which the study was conducted, characteristics of the sample (e.g., sample size, gender, age range, diagnostic group, and treatment type), the intervention and control group’s methodology, length of intervention, length of follow-up, and all relevant dysphagia outcome measures. For studies employing more than one treatment arm (or those using factorial designs), only data related to the current review were used. To calculate intervention effect sizes, mean differences between an intervention and a control group and associated standard deviations (if available) or next through the use of either pre- and post-test means and standard deviations, group n, and F ratio were abstracted. If more than one intervention arm met criteria for inclusion, data were combined to create pooled estimates. Following this, effect data were standardized using accepted methods to single comparable effect metric, Cohen’s d.

Quality Assessment

Two independent reviewers, blinded to the study’s’ authors, author’s affiliations, and journal, independently used the van Tulder scale to determine the quality of the trials. Where a selected study included an author of this paper, it was re-directed to a third independent reviewer. The van Tulder scale is a qualitative assessment tool designed to make assessments on 11 components of RCT study design including randomization method, allocation concealment, baseline characteristics, patient blinding, therapist blinding, observer blinding, co-intervention control, compliance, drop-out rate, end-point assessment time point, and intention-to-treat analysis (see Table 3 in Appendix). The reviewer is required to select ‘yes’, ‘no’, or ‘don’t know’ for each item. A rating of ‘1’ is allocated for any affirmative response, or ‘0’ for ‘no’, or ‘don’t know. When ≥5 items are satisfied (≥5 points), the quality of the report is deemed high [8]. The van Tulder scale has previously been evaluated for interrater reliability, face, content, and concurrent validity [8]. In the present study, only ‘high quality’ rated studies were included in the analysis of primary effect within the systematic review (see Table 4 in Appendix).

Reliability

Two independent raters utilized a standardized rating sheet to record study details for the van Tulder qualitative grading (Table 1). The inter-class correlation coefficient for van Tulder scores between the reviewers was α = 0.976 (95 % CI 0.938–0.991).To further explore areas of concordance in ratings across studies a fixed marginal kappa analysis [9] was conducted and revealed 94 % agreement in coding across studies. Kappa values of 0.75 or higher reflect excellent agreement between raters [10]. Overall rater agreement was strong across categories of the scale, ranging from 86 to 100 % agreement. The item demonstrating the greatest discrepancy between raters was item H; “were co-interventions avoided or controlled”. This item was often not explicitly stated by authors and overall agreement for this item was 86 %. Inconsistencies identified between raters were later discussed and resolved by consensus.

Table 1 Van Tulder [8] quality assessment ratings

Results

Initial queries using the search terms listed yielded a total of 164 articles of which the titles were screened. Following this, 128 studies did not meet the inclusion criteria and were excluded (Table 1). Thirty-three studies were evaluated by abstract review, of which ten were deemed ineligible for the reasons; failure to use of RCT design (n = 2) lack of control arm (n = 2), lack of randomization (n = 3), review article only (n = 3).After reviewing the full text studies a further three studies were eliminated due to failure to provide randomization details (n = 2) and providing a protocol only (n = 1). In total, 20 studies met all eligibility criteria and were included in the systematic review. While all the studies included self-identified as RCTs, only 15 met the RCT criteria by including a true comparison (control) arm [1126]. A further 5 studies utilized sham/placebo comparators only (without a standard control comparison) or non-randomized controls and were deemed quasi-randomized trials [2630].

Quality Ratings

The 15 studies identified as RCTs averaged a quality rating score of a 4.46 (SD 2.6), with scores ranging from 1 [21] to a score of 9 [24]. The modal score for the group was 4 (range 1–9). Of the 15 included RCTs, five studies were conducted in Asia with an average quality rating of 2.6 (SD 0.89), four trials were conducted in the USA and scored an average quality rating of 7 (SD 2.1), two in Europe with an average quality rating of 3 (SD 1.4), and two from Australia rating 6 (SD 2.8) and a single trial from the UK scoring 7.

Only 5 RCTs met the van Tulder criteria (>5) indicating high quality trials [11, 14, 19, 22, 24]. Only 46 % (7) of studies provided adequate detail regarding the randomization and concealment procedures, while 86 % (13) studies identified and confirmed similarity of baseline group comparability (item C) and 93 % (14) defined similar timing of evaluation points between the groups (item J). 26 % (4) of studies reported post-intervention data with <20 % attrition [13, 14, 16, 23] and only 20 % reported the compliance of subjects to the treatment provided [11, 22, 24]. Limited blinding was demonstrated across studies with only 13 % (2) of studies blinding subjects and 6 % (1) blinding therapists. Further, only 53 % (8) studies [11, 13, 14, 17, 19, 22, 24, 25] reported that assessors who measured the main outcome were blinded to treatment allocation. Lastly, 46 % (7) of studies were conducted with intent-to-treat analysis [11, 14, 17, 19, 22, 24] (Table 1).

Study Characteristics

Ten studies were two-arm RCTs (intervention and control), while 5 were three-arm trials were either intervention, placebo, and control [19, 24] or intervention, intervention, and control [12, 22, 23]. Only one study was a cluster trial of 19 centers [14]. The average size of study recruitment was 64.5 (SD 37.4), with the range from 16 [21] to 130 [22] subjects. The mean ages of patients enrolled across the RCTs ranged from 25.4 [18] to 80 [13]. The proportion of subjects recruited were 1.5 times more likely to be male (1,190) versus female (793).

The most frequent intervention group diagnosis was head and neck cancer (40 %), followed by stroke (27 %). Two studies were conducted using acute care patients [11, 14], four with sub-acute rehabilitation patients [12, 13, 21, 24] and eight with either chronic conditions [23] or patients presenting for outpatient-based medical interventions [1517, 19, 20, 22, 25]. An additional study utilized healthy normal volunteers only [18].

In four studies, confirmation of dysphagia status was made on the basis of a non-validated clinical examination [13, 14, 23] or patient history [20]. Seven studies utilized a validated clinical exam only [11], videofluoroscopic examination only [21], or both methods [12, 18, 19, 25]. Additionally, two studies utilized patient report or a quality of life survey [15, 17] and only one study did not specifically report the process for dysphagia status confirmation at study onset [16].

Nature of the Interventions

The 15 RCTs included a diverse range of swallowing intervention methodologies (Table 2; Table 5 in Appendix) including: dietary [11, 13, 21], electrotherapeutic [12, 2325], preventative behavioral exercise [16, 17, 19, 22], behavioral maneuvers and compensations [15, 18], program effectiveness [14], and behavioral exercise alone [20]. Within the quasi-experimental trials, three studies evaluated alternative medicine approaches [2830], one evaluated a postural adjustment [27], and another evaluated respiratory strength training [26].

Table 2 Study demographics

Recruitment into the studies was variable across the groups. Three studies reported consecutive recruitment from admission to medical care [11, 14, 24], while a further six studies reported consecutive recruitment from outpatient admission [16, 17, 19, 22, 25], five studies did not report time to recruitment [12, 13, 20, 21, 23], one study reported recruitment on average 4.7 years post-medical treatment [15], and one utilized volunteers [18] (Table 2).

Intervention Timing

The duration of intervention was highly variable, ranging from 3 days to 4 months. The average number of days treatment was provided across all studies was >5 weeks (mean days: 40.47, SD 36.0).The total amount of intervention prescribed within a treatment period was also variable ranging from 1.5 to 189 h, with the average for all studies equal to 55.7 (SD 61) h. The frequency of treatment provided (sessions/day) was on average 1.7 (SD 0.9) with seven studies providing daily treatment [1114, 21, 23, 24], three studies providing treatment twice daily [19, 20, 22], and four studies providing intervention three times a day [1517, 25]. One study also provided intervention on alternating days [18]. In general, studies evaluating dietary interventions provided care on a daily basis. Studies evaluating electrotherapeutic interventions provided care for 30–60 min daily, while prophylactic exercise interventions were more often provided 2–3 times daily for 15–45 min. Behavioral interventions only [15, 18, 20] were highly variable, ranging from 1 to 3 times a day.

Outcome Assessment

The assessments of outcomes in 15 RCTs were evaluated at mixed time points. Nine studies evaluated pre- and post-intervention only [1116, 18, 21, 25], three studies completed pre- and post-intervention measures and followed patients out to 3 months [2224], and three followed patients for ≥6 months [17, 19, 20]. The outcome evaluated by the majority of studies (7) was swallowing function measured by a clinical swallowing evaluation [12, 15, 17, 20, 23, 25]. Two studies evaluated swallowing physiology as the primary target using videofluoroscopic analysis [16, 18]. Two studies evaluated swallowing muscle composition using T2-weighted MRI [19, 22]. Two studies evaluated the larger functional outcomes of lung infection from clinical report, and death/dependency using the modified Rankin Scale [13, 14]. Lastly, two studies evaluated fluid and enteral intake [11, 13]. In total, six trials utilized non-validated primary outcome tools [12, 13, 16, 18, 20, 23], and of these four were non-validated clinical assessment measures and two were non-validated instrumental approaches. Nine studies utilized validated clinical and instrumental outcome assessment tools [11, 14, 15, 17, 19, 22, 24, 25].

Secondary outcomes were evaluated in 93 % of trials. Only one study utilized a single primary outcome [18]. The range of secondary outcomes reviewed included: need for additional medical intervention [11, 17, 21], weight change [11, 16, 19], occurrence of dysphagia-related adverse events [11, 14, 1921, 24], change in core temperature [13, 14], dietary/fluid intake [13, 16, 19, 2224], quality of life [12, 13, 2123], swallow physiology/biomechanics [12, 24, 25], chemosensory function [19], clinical swallowing ability [14, 19, 22], mouth opening [15, 16, 19], mouth/neck pain [16, 19, 20, 22], psycho-social measures (fatigue, depression, mood, anxiety, fear) [19, 22], and length of stay [14] .

Statistical Approach

Of the total 15 RCTs, four studies provided descriptive and uni-variate analyses only [12, 13, 15, 25], four provided descriptive and non-parametric analyses [1618, 23] and four trials [11, 14, 19, 22] provided descriptive, uni-variate, and multivariate results. Eight studies conducted appropriate statistical analyses for their reported sample size and statistical plan [11, 12, 1416, 19, 22, 24]. Of this group, five studies, however, did not meet statistical reporting standards by failing to provide specific parametric and non-parametric details beyond providing a p value for results [31]. A further seven studies did not conduct appropriate statistical analyses including: inappropriate analysis for stratification reported [13], inadequate sample size for RCT to ensure randomization [17, 18], incorrect application of parametric analyses [21], and failure to adjust for multiple testwise error [17, 20, 25]. Lastly, a single study’s results could not be confidently evaluated as the numbers reported in the text and tables conflicted [23].

Reported Outcomes

Eleven (73 %) of the RCTs reported a positive outcome from the intervention used to remediate dysphagia [1115, 17, 1922, 25]. Positive outcomes included improved nutritional intake [11, 24], increased fluid intake [13, 21] improved swallowing ability [12, 15, 19, 20, 22, 24], improved quality of life [22], improved swallow physiology [25], reduced death or disability [14], increased mouth opening [19, 22], maintenance of chemo-sensory function [19], and maintenance of swallowing muscle composition [19, 22].

Two studies reported negative outcomes for their primary variable [13, 24]. Three studies reported no change in outcome from intervention [16, 18, 23]. In reviewing the design quality rating and statistical conduct of each study, five studies reporting positive outcomes could not be justified due to methodological and statistical issues [13, 17, 20, 21, 25]. An additional two studies with low methodological rigor and identified statistical issues did not report improved outcomes for their sample and remained inconclusive [18, 23].

Discussion

Systematic reviews are conducted to appraise the volume and strength of a body of research surrounding a topic. As the name implies, they are systematic, organized, comprehensive, and structured investigations. A thorough systematic review can assist researchers and clinicians in outlining the benefits of available treatments, and provide direction for future work.

We conducted a systematic review of dysphagia rehabilitation and evaluated the quality of RCTs meeting recommended methodological rigor. Our review identified 15 studies meeting the a priori inclusion criteria for a randomized trial of oro-pharyngeal dysphagia in adults over the years 2010–2013. Of those studies, only five met the criteria for high quality using a validated evaluation scale [8]. Specific weaknesses of the lower rated studies included: incomplete details regarding randomization procedures, lack of information on allocation concealment, limited blinding of subjects and therapists to the provision of the intervention, control of co-intervention contamination, and management/reporting of sample attrition. Importantly, an additional five studies were designated as quasi-experimental trials as they did not include a standard control comparator in the trial design. If a study provides only comparison between an active arm and a placebo, critical information about the true and placebo effect of an intervention cannot be fully obtained. For example, where the natural progression of a disease is variable or unknown, we would not be able to identify the true responders, as some patients will garner benefit by any intervention alone (enthusiasm bias). Without a true “control”, the impact of the placebo strategy cannot be determined and may lead to exaggerated treatment effect estimation.

Another potential problem identified from the current review is the continuing publication of trials with limited sample sizes. Small clinical trials (n < 80) using conventional simple randomization methods may result in imbalanced covariate distributions between treatment and control groups. Similarly, smaller trials are often hampered by power limitations in analysis resulting from their size. The use of alternate methods of randomization that support balance in smaller trials, including blocked, stratified, and covariate adaptive designs, however, remain limited within the dysphagia rehabilitation literature.

No matter how well designed an RCT, it is only as good as its outcome assessment. In this review, we identified only 60 % of studies utilizing validated outcome assessment tools. The majority of studies used clinical dysphagia assessment methods, with only 26 % utilizing both validated clinical and instrumental methods. Accuracy of assessment using proven methods reduces investigator bias and improves the ability for findings to be translated across studies. Clearly, dysphagia researchers must continue to strive to use outcome metrics that improve the accuracy and clarity of measurement within trials to advance science in this area.

This review has identified a diverse range of dysphagia interventions and applications. Without accepted and utilized standards, comparisons of outcomes or treatment effects is not possible. The current RCTs reviewed demonstrate little concordance in either timing of interventions, duration of interventions, or frequency of application. The only intervention area demonstrating any concordance was the application of neuromuscular electrical stimulation (NMES). This treatment method, although controversial in outcome, was applied daily in all four studies reviewed. Unfortunately, the exact timing, duration, and configuration of this application were haphazard, underscoring the lack of accepted standards in treatment approaches. Consequently, RCTs in dysphagia rehabilitation would benefit greatly from procedures to reduce variability of investigation and promote comparisons across studies.

The majority of RCTs in this review presented data from pre- to post-evaluation comparisons. Only 20 % of trials provided any outcome data beyond the post-intervention time point. Moreover, only one study evaluated outcome out to 12 months. Lack of follow-up and the inclusion of methods to manage attrition of study samples is another area that needs to be addressed in future research. Likewise, the use and application of statistical methods that match the trial design and recruited sample sizes needs further consideration. Many researchers did not conform to statistical reporting standards for publication, providing only p values without specific details of direction or variability of effect. The continued use of inadequate statistical reporting within trials limits the ability to synthesize data across swallowing studies (meta-analysis) or to inform effect size calculation for future research protocols.

In this review, we sought to identify both published and unpublished trials of dysphagia rehabilitation to evaluate the scope and quality of recently completed research. Our final study sample included 13 published RCTs and two grey (unpublished) papers. The literature search included here was comprehensive, including academic search premier, and conference proceedings and abstracts; as such, we believe it reflects the current knowledge base in this area. Despite this, it is possible that not all studies were captured. Our review differs from previously published systematic reviews [57] in that we included grey literature within our sample. The addition of grey data is important as they may be the only source of important up-to-date information. Further, it may offer insight into unique directions for a burgeoning field.

Although our study is distinct from previous systematic reviews that have sought to evaluate dysphagia interventions, our results are consistent with their findings that the rigor of swallowing intervention trials remains lacking. It is important to note, however, that the number of RCTs on swallowing rehabilitation published over the last 2.5 years has increased steadily. Previous reviews have identified a publication rate of 0.36 trials/year, which has now risen to an average of 6 per year [5, 6]. Moreover, the use methodological control strategies within current trials is continuing to advance.

Conclusion

Emerging evidence demonstrates that the breadth of dysphagia rehabilitation intervention methods is rapidly advancing. The use of more advanced study designs has also increased and the publication rate for RCTs appears to be growing. Despite this, newly published RCTs continue to demonstrate significant weaknesses in design and heterogeneity in treatment methods, limiting current comparisons and data to support the efficacy of dysphagia rehabilitation approaches.