Explanatory evidence generation
Much of the discussion in the literature and amongst focus group participants concerning evaluative clinical research in rare diseases centered on problems associated with the inherent small numbers of patients available for study and adequate recruitment for conventional RCTs, long considered the gold standard explanatory design with a low risk of bias [
5].
“I do find it quite difficult when the clinical trials are very short, very small numbers, and the endpoints are something like the six minute walk test in regards to really being confident that that is going to be an effective treatment for the patients that I’m seeing.” – Physician 4
“For common diseases, there’s no reason for not doing a randomized controlled trial. I mean, that’s one of the big points in the paper that we published a few years ago was that in order to be called rare, you should not have enough patients to confidently determine whether a treatment is efficacious or not.” – Policy Advisor 1
“Well the ex-[profession] in me looks at things like, you know, the size of the study, well MPS is [laughter around the table], okay that’s not going to happen. You know, so you have to, it’s hard when you’re looking at MPS because the things that you would normally look for in a good study aren’t going to be there because of the size of the sample…” – Patient/caregiver 3
This paradigm was discussed by more than half (35/60, 58%) of the studies we reviewed, and was also the first that emerged in the literature in 1992 (Fig.
2). Most of the reports that discussed this paradigm were methodological review articles that focused on rare diseases in general or a group of rare diseases (24/35, 68%), rather than a single specific rare disease. The first author to highlight the challenges associated with fewer participants for clinical studies was Haffner, who took the perspective of a regulatory agency responsible for reviewing the safety and efficacy of orphan medicines [
30,
31]. Haffner argued that orphan medicines should be as well-scrutinized as medicines for more common diseases but recognized that conventional RCTs are not always feasible due to small numbers [
30,
31]. Some alternative research methods or design features for demonstrating safety and efficacy that may be acceptable to a regulatory agency were suggested, including the use of multicenter studies, crossover trials, randomized withdrawal trials, open label studies, open protocol studies, and incorporating historical controls or composite or surrogate endpoints [
30,
31]. The discussion concerning explanatory evidence generation for clinical interventions for rare diseases continued from these early publications to the present day (Fig.
2). Others elaborated on the issues brought forth by Haffner and offered more suggestions to overcome the challenges related to small numbers and limited feasibility of conventional RCTs, while preserving internal validity and protecting against bias and confounding [
2,
4,
18,
32‐
61].
While participants in our focus groups highlighted the limited feasibility of conventional RCTs because of small sample sizes, there was little emphasis in the focus group discussions on specific strategies that might be used to overcome this challenge. Thus, most of the results presented under the paradigm of explanatory evidence generation are derived from our meta-narrative literature review.
In general, the research methods or study design features that have been proposed in the literature to address small numbers while retaining internal validity and thus an explanatory focus have concentrated on three overarching strategies: (i) enhancing statistical efficiency at the design phase, so that fewer participants are required to conduct a robust evaluation; (ii) using Bayesian rather than frequentist analysis methods, also to reduce the number of participants required; and (iii) making participation more appealing to patients and families by maximizing time spent on the active treatment. Several methodological reviews were published on this topic in the last decade [
36,
39,
40,
42,
45,
46,
61], some of which provided more detail about the methods described below; here we focus on the most commonly suggested research designs that focus on minimizing bias to maximize internal validity and explanatory power.
Strategies that have been proposed for enhancing statistical efficiency at the design phase for clinical evaluative studies of rare disease treatments include factorial trials and adaptive designs. Factorial trials are designed to test multiple treatments simultaneously using the same study population, thus reducing the overall number of participants needed [
2,
33,
39,
40,
46,
49,
53,
57]. For example, in a 2 × 2 factorial design participants are randomized to either treatment A or control group A, and then randomized again to treatment B or control group B, which effectively reduces the sample size needed to test these two treatments by 50% because the same participants are being randomized [
40]. However, authors have pointed out that this reduction in sample size only holds assuming there is no interaction between the treatments being administered concurrently; otherwise, statistical efficiency is lost [
40]. Adaptive designs allow flexibility in trial procedures such that changes (“adaptations”) based on interim analyses can be made after trial initiation without undermining the validity of the trial [
59]. Two commonly discussed adaptive trial strategies are response-adaptive randomization and group sequential design [
36,
40,
46,
53,
59,
61]. Response-adaptive randomization involves modifying treatment assignment probabilities with the accrual of data so that the number of participants randomized to the best-performing treatment arm (“play-the-winner”) is increased and overall sample size is decreased [
59]. Group sequential designs do not have a predetermined sample size, rather, small groups of participants are recruited over several phases and data are analyzed at the end of each phase to assess safety, futility, efficacy, or a combination of these until enough data have been accrued to justify study termination [
59,
61]. Simulation studies have shown that sequential design approaches may, but do not always, reduce the eventual sample size compared to fixed sample size designs [
35,
53,
62]. While adaptive trial strategies are often reported as a means to enhance statistical efficiency, some authors have questioned their usefulness based on the paucity of published practical application in the context of rare diseases [
40,
59].
For conventional RCTs with small sample sizes, achieving sufficient statistical power to detect differences in treatment effects, especially when the treatment effect is expected to be modest, is challenging [
52]. Several authors have argued (as early as 1995) that Bayesian techniques would be better suited in this context relative to standard frequentist approaches to analysis, because a Bayesian analysis is not as compromised by small numbers and offers more direct conclusions [
32,
34,
41,
44,
45,
48,
50‐
52]. In such approaches, previously collected data or expert opinion is used to generate a prior probability (posterior) distribution for the unknown treatment effect, and Bayes theorem is applied as new data are accumulated to update the posterior distribution for the new treatment and inform clinical practice [
48,
52]. As an example, Johnson and colleagues reanalyzed data from an RCT of methotrexate versus placebo in 73 patients with scleroderma, and demonstrated that methotrexate had more favorable odds of being beneficial for patients when a Bayesian approach was applied compared to the non-statistically significant findings obtained through a frequentist approach [
32]. While several authors argued that Bayesian statistics offer an alternative approach to the analysis of small numbers of participants, some criticized the subjectivity in establishing prior distributions and were skeptical of the acceptance of results obtained using Bayesian statistics at the regulatory level [
34,
36,
45,
48].
It was reported in the literature and in our focus group discussions that there can be a lack of patient/family/clinician acceptance of the possibility of being randomized to a control group, particularly for placebo-controlled studies of treatments for rare diseases where few treatment alternatives exist. Therefore, study designs that make participation more appealing by maximizing time spent on- or guaranteeing provision of- the active treatment have been suggested [
4,
33,
36,
38‐
42,
44‐
47,
49,
51,
56,
57,
60].
“I agree with [name]‘s comments that it’s hard to have a placebo-controlled trial. I mean, certainly there has been trials to try to do that. …however, they're very short and really with these, like almost, like even to encounter, to have families agreeable to participate being a placebo for long-term, I think would be very difficult. I think for the short-term, for a few months or a year, families are agreeable, but after that I don’t think they would be agreeable.” – Physician 2
The randomized placebo-phase design has the same design features of a conventional RCT, except that the time from enrollment in the study to the start of the experimental treatment is randomized for all participants [
56]. All participants eventually receive the experimental treatment, and effectiveness is determined based on whether a response is observed sooner among those that received the treatment earlier [
56]. Similarly, randomized withdrawal, early escape, and stepped wedge trials reduce time spent in a control arm or ensure that all participants eventually receive the intervention being studied, and have been proposed as alternative approaches to evaluate clinical interventions for rare diseases [
40]. Crossover trials and n-of-1 trials also guarantee that participants receive the active treatment, but are different than conventional RCTs in that the treatment sequence is randomized with a washout period in between treatment regimens, such that each participant acts as his or her own control [
2,
36,
41,
53,
57]. As some authors reported, n-of-1 trials are often embedded in clinical practice to help healthcare providers determine the best treatments for their patients [
2,
36,
57]. While several authors have examined the advantages of crossover and n-of-1 trials, others have discussed the risk of carryover and period effects between phases, and have argued that these designs are generally not suitable for diseases that have an unstable disease course or for interventions that are not fast-acting with reversible effects [
2,
18,
33,
36,
39,
44,
46,
53].
The three overarching strategies and associated research methods discussed above are not mutually exclusive, rather there is significant overlap among them in the literature. For example, in addition to being an attractive option for participants, crossover trials are also considered statistically efficient and reduce the number of participants needed because each participant acts as his or her own control [
2,
18,
33,
36,
39,
40,
44,
46]. Huang and colleagues have suggested that statistical efficiency could be further enhanced in crossover trials by allowing participants to “escape early” [
41]. Similarly, authors have stated that trials using adaptive randomization can be attractive to participants because the likelihood of being randomized to the less effective treatment arm is reduced over time [
36,
40,
46,
53,
59,
61]. Bayesian methods are also reported as a common design feature of adaptive trials as a means of improving statistical efficiency [
34,
42,
59]. They have also been proposed as a means to combine results from multiple n-of-1 trials and enhance the usability of n-of-1 trial data in answering population-level questions about treatment efficacy and effectiveness [
51].
A criticism of explanatory evidence generation reported both in the literature and in focus group discussions was that studies designed to evaluate the efficacy of an intervention typically limit enrolment to a very homogenous group of participants, which strengthens the robustness of the causal interpretation of the findings, but at the expense of a reduction in the external validity or generalizability of study results [
4,
18,
44,
60]. Because rare diseases typically exhibit substantial clinical heterogeneity (discussed in the following section), some authors have questioned the suitability of the above-mentioned approaches for evaluating clinical interventions for rare diseases [
4,
18,
44,
60]. Additionally, authors have argued that many conventional RCTs and other explanatory studies are short in duration, often due to resource constraints, and do not allow for adequate assessment of long term treatment effects, further compromising external validity [
4,
18,
57]. Finally, some authors were concerned that unfamiliar approaches to research design, such as adaptive randomization or n-of-1 trials would not be accepted by regulatory agencies and other policy decision-making bodies [
36]. Partly in response to some of these concerns, other research paradigms for evaluating clinical interventions for rare diseases have evolved.
Comparative effectiveness/pragmatic evidence generation
It is well established that there is a high degree of clinical heterogeneity among rare disease patients, such that patients with the same specific disease might have drastically different clinical manifestations based on patient characteristics such as age, disease characteristics such as residual enzyme activity levels, or for unknown reasons, and may respond differently to a given intervention [
18,
42]. As several authors have discussed, this clinical heterogeneity is often not accounted for in conventional RCTs, and has raised concern among stakeholders about the applicability of study results to patients with clinical manifestations different from those included in RCTs [
4,
18,
44,
60].
“And I find it frustrating in terms of research what I’ve found, and you guys know this, that each case is so unique and different, and so when you read a study or evidence-based research, I find that it, it’s not a guarantee that it’s going to directly correlate to your particular unique situation. So, you have to take that at face value and not think that ‘oh because I read that study and that it is evidence based that this is exactly what’s going to pertain to my situation.” – Patient/caregiver 4
“…there’s a huge heterogeneity of this population. There’s people with very severe diseases, people with very mild disease, and this is the nature of enzyme deficiencies. There’s some people that have zero and some people will have, a lot, near normal enzyme activity, so we’re going to get this heterogeneity. And this is one of the big problems, like [name] mentioned, how do we apply this clinically to a larger population of these patients? Are the results, for instance, with infantile-Pompe, how do we relate that to an adult Pompe patient?” – Physician 5
“…the way that the trials are designed, very sub, select populations with the actual disease of concern, which is already a narrow disease as it is. It makes it very difficult for us to know where and when these therapies are going to work. And so, when we’re talking about rare diseases, it really has to be linked to not just research, but effectiveness research about natural history and epidemiology. And given the wide degree of heterogeneity with the diseases that we’re dealing with, we’re going into this with a huge degree of uncertainty about whether or not there really is any evidence to support that these therapies are going to work.” – Policy advisor 2
In response to concerns about the external validity of study results, several authors and focus group participants have advocated for study designs that may compromise internal validity to some extent, by shifting away from the explanatory RCT, in order to address real-world effectiveness [
2,
4,
7,
18,
42,
44‐
47,
55,
57,
58,
63‐
80].
“…I think the effort like the Canadian group CIMDRN to look at long-term outcomes, where there’s natural selection of various treatment groups, I think will be very helpful over the long term because of the challenges we have in doing strict study designs, and lack of financial supports for long-term studies. This effect to observational studies and looking at outcome differences in naturally, sort of, selected difference maybe as helpful in rare diseases I think as the designed studies.” – Physician 1
Almost 10 years after discussions about explanatory evidence generation for clinical interventions for rare diseases emerged in the literature, the research paradigm of comparative effectiveness/pragmatic evidence generation started to develop (first discussion published in 2001). This paradigm was discussed by half (30/60, 50%) of studies included in this review, and was first mentioned by Wilcken in 2001 [
7]. Like the previous research paradigm, most of the reports that discussed this paradigm were methodological review articles that focused on rare diseases in general or a group of rare diseases (21/30, 70%), rather than a single rare disease. Wilcken suggested that for some rare diseases, conventional RCTs remained possible, but for others, observational studies with historical controls could be used to evaluate treatment effectiveness [
7]. Since that initial publication, many authors have discussed research designs that take a more pragmatic approach to evaluating treatment effectiveness in rare diseases, and often explicitly attempt to include a broader patient population and longer-term observation in natural settings. These designs include: pragmatic clinical trials, observational studies (e.g., cohort studies and registries, case series, case reports), and hybrid designs that incorporate both randomization and systematic observation [
2,
4,
18,
42,
44‐
47,
55,
57,
58,
63‐
80].
While participants in our focus groups questioned the suitability of explanatory RCTs for establishing effectiveness of clinical interventions for rare diseases, little of the discussion focused on specific solutions to overcome this challenge. Like the previous research paradigm, most of the results presented under the paradigm of comparative effectiveness/pragmatic evidence generation are derived from our meta-narrative literature review.
Incorporating more pragmatic features into RCTs has been suggested as a means to improve external validity while maintaining the element of randomization to help control for unmeasured confounding and maintaining other standard methodological features of explanatory RCTs, such as blinded outcome assessments [
18,
45,
57]. These pragmatic RCTs feature design elements that better reflect actual clinical practice, including: enrolling participants with differing clinical presentations, taking into consideration the system of care in which the new treatment will be delivered (e.g., using standard-of-care as a comparator instead of placebo), following participants for a longer period of time, and incorporating outcomes that are meaningful from a patient/care provider standpoint (patient-oriented research will be discussed in the following section) [
18,
45,
57]. Authors have criticized pragmatic RCTs because they do still estimate average treatment effects and thus are not necessarily better suited to investigating potential heterogeneity of treatment effects relative to explanatory RCTs [
18].
Among the most common observational rare disease research designs discussed in the studies we reviewed are patient registries [
4,
18,
42,
47,
58,
64,
65,
67,
72‐
74,
77,
80] and cohort studies [
68,
78]. Because these observational studies do not typically have strict inclusion or exclusion criteria for participants, nor do investigators manipulate participants’ treatment(s), some authors have argued these studies better reflect real-world clinical practice and the clinical heterogeneity that typifies many rare diseases [
18,
42,
67,
72]. As reported in the literature, registries have multiple purposes including: evaluating clinical- and/or cost-effectiveness of therapies; monitoring safety of new or existing therapies; evaluating diagnostic tools; monitoring quality of care; and assessing natural history over time [
67]. We identified several examples of registries being used to evaluate treatment effectiveness of interventions for rare diseases, for example, enzyme replacement therapy for lysosomal storage disorders [
72]. The International Collaborative Gaucher Group Registry was established in 1991 and, at the time of the publication of a paper by Jones and colleagues (2011), had collected longitudinal clinical data for almost 6000 patients [
72]. Several authors stated that an additional advantage of registries is that they can be used to identify potential participants for recruitment into future research studies, including clinical trials [
18,
67,
73,
76,
77]. Some authors have also suggested that observational patient registries may play an important role in post-market evaluation of interventions for rare diseases by serving as a platform to collect longitudinal clinical and quality of life data [
47]. While observational patient registries are an attractive method for the evaluation of longer term outcomes in real-world settings, some authors reported that results remain prone to residual confounding in the absence of randomization, especially confounding by indication (when patient characteristics influencing the choice of treatment also influence the outcome) [
18,
44]. A few authors discussed variability in the quality of registry data, as observational patient registries tend to be heterogeneous in the depth of data collection and the definitions applied to included data elements, particularly in the context of the multi-center and sometimes multi-national nature of rare disease research [
42,
65]. In addition, some authors described the potentially important influence of complete case ascertainment and data collection on the accuracy of study results, particularly given that registry participation may be associated with receipt of particular treatments or lead to different investigations [
67,
73,
81].
In recent years (since 2009), some authors have suggested that elements of both explanatory and observational studies can be combined into “hybrid” study designs that attempt to mitigate challenges faced by both approaches [
18,
63,
75]. For example, Vickers and colleagues suggested that the “clinically-integrated randomized trial,” which seeks to integrate randomization into standard clinical care, would be suitable for rare disease research, addressing the threat of confounding while maintaining an element of pragmatism and enhancing generalizability [
63]. The key feature of the clinically-integrated randomized trial is that there is no difference between the care a patient routinely receives, follow-up, payment, or documentation (e.g., charting), other than the fact that treatment was assigned randomly with informed consent from participants [
63]. In the context of rare diseases, the authors argued that the clinically-integrated randomized trial is attractive because there is often considerable uncertainty about the most effective course of treatment for patients and that trials could easily be conducted worldwide to maximize the number of participants [
63]. Another design that incorporates elements of both explanatory and observational approaches and has been suggested in the context of rare diseases is the “cohort multiple randomized controlled trial (cmRCT)” [
75]. The cmRCT seeks to enroll an observational cohort of patients, with participants routinely reporting on a minimum set of core outcomes [
75,
82]. At the time of enrollment in the cohort, participants give their consent for 1) their longitudinal data to be used in aggregate; and 2) to be randomly selected to participate in potential RCTs of new or existing interventions with the understanding that only those who have been selected to be offered the intervention under study will be contacted [
75,
82]. Those who are eligible for the RCT, but who were not randomly selected to be offered the intervention serve as the control group and are not contacted about the study [
75,
82]. According to the literature, launching RCTs using this design increases the efficiency of research by accommodating multiple trials and comparison of multiple treatments, allows for longer follow-up of participants, provides pragmatic/real-world evidence, and accommodates clinical heterogeneity by enrolling participants across the clinical spectrum [
18,
75,
82]. Concerns that have been raised with these “hybrid” study designs include: potential for confounding and bias in the observational component of the study, and the feasibility of implementing such a study design [
18,
75,
82].
Finally, there is discussion in this literature about other observational designs such as case-control studies, small case series and case reports; however, these approaches are not commonly suggested as potential solutions for improving pragmatic evidence generation for establishing effectiveness of treatments for rare diseases. Some authors have suggested that case-control designs, where individuals who have experienced a certain outcome (cases) are matched to and compared with individuals who have not experienced the outcome of interest (controls), are well suited for studying rare diseases, particularly in instances where there could be a long lag time between the treatment and outcome of interest [
2,
80]. However, there are concerns about the potential for introducing selection bias in choosing controls [
2]. Other authors have argued the importance of case series and case reports in the context of establishing treatment effectiveness for rare diseases [
47,
66]. Case series and case reports typically include in-depth information related to clinical manifestations of disease, treatment, and follow-up for a single patient or small group of patients [
47,
66]. While authors have acknowledged there are clear limitations in terms of establishing treatment effectiveness, they have argued that this evidence can provide a better understanding of natural history for many rare diseases, and can identify unexpected harms or benefits of treatments, which could be of particular importance for diseases considered “ultra-rare” [
47,
66]. Similar to the concept of using case reports as pragmatic evidence, several focus group participants reported relying on some anecdotal evidence to help inform medical-decision making:
“I think all of the different information is important, and including anecdotal, right? Because we deal with very rare disorders sometimes, and you often go to clinicians who have seen these conditions and have treated them, and may take their point of view about a certain treatment. So, you may say that’s anecdotal, but it may be extremely valuable if there’s only a handful of patients who have received that treatment. So, I think all of the studies and designs, including anecdotal evidence, I personally use that in determining whether I think about a treatment for a patient.” – Physician 1
“…sometimes it all depends on the experience of what other people lived. Sometimes people tell you not to go there because they’ve has a bad experience. So, I like to have the bad and the good ones too, and then make my mind and take better decisions.” – Patient 2
The main criticism in the literature for comparative effectiveness/pragmatic evidence generation is the inherent risk for bias and confounding because of the lack of randomization; however, there have been efforts made to mitigate this risk. As previously discussed, some authors have suggested incorporating pragmatic elements into RCTs [
18,
45,
57], while others have proposed methods to overcome challenges in non-randomized studies. For example, Cole and colleagues demonstrated the use of case-control matching using the risk-set method for participants enrolled in the International Collaborative Gaucher Group Registry [
69]. The authors applied this method to balance “cases”, i.e., Gaucher patients with skeletal avascular necrosis, and controls according to demographic and clinical factors [
69]. Use of propensity scores to match participants has also been suggested as a means of reducing the risk of bias in observational studies of rare diseases [
44].
Patient-oriented evidence generation
One of the main criticisms, both in the literature and by focus group participants, of highly internally valid, explanatory study designs is their tendency to rely on short-term, and often surrogate, outcomes that are not necessarily clinically meaningful [
9].
“Most of the time study with rare diseases rely on surrogates and the surrogates are selected usually on the basis of biochemical indicators of some biological activity of the treatment. And so, for enzyme replacement therapy, the reduction in the concentration of a substrate in urine or blood is regarded as evidence of a biological pivot, a biological activity, but there’s far too many examples where a surrogate, such as the one I’ve just described, are really, there’s no relationship to what the clinical outcomes are.” – Policy Advisor 1
“…I have a concern that sometimes outcome measures are defined by what funding and drug approval [agencies] like FDA want to see, right? [chuckles]. Rather than what the clinician may feel for a particular rare disease is far more important. …it becomes challenging to design appropriate studies and pharma is at the end of the day interested in getting approval and funding approval, and may target outcome measures that are demanded by various bodies rather than perhaps going for the most clinically appropriate outcome measures.” – Physician 1
Only in the last decade (Fig.
2) has a discussion in the literature emerged regarding the importance of patient-oriented evidence generation in rare diseases (the first appearing in 2010). This discussion emphasizes the need for outcomes that are of direct importance to patients and caregivers. Fifteen of 60 reports (15/60, 25%) discussed issues related to the paradigm of patient-oriented evidence generation, making it the research paradigm with the smallest proportion of literature. The majority of reports that discussed this paradigm were again methodological review articles (13/15, 87%), and the remaining two articles described case examples specific to one rare disease.
Connected to the paradigm of explanatory evidence generation, some authors have suggested the use of surrogate outcomes as proxies for patient-oriented outcomes such as survival or quality of life because they can be measured relatively quickly and require fewer participants to reach statistical efficiency [
33,
83‐
85]. For example, in 2010, Kinder and colleagues reported that functional outcomes such as exercise tolerance, survival, and quality of life were the most salient outcomes to consider for rare lung disease studies because they have undeniable meaning for patients; however, the authors also described the limited feasibility of conducting explanatory RCTs that include these outcomes and argued that surrogate outcomes could therefore be developed and used as proxies for patient-oriented outcomes [
33]. Several authors and focus group participants expressed concern about the lack of validation of surrogate outcomes; a clear understanding of the natural history of disease and proposed causal mechanism of a treatment in relation to the disease is needed in order to establish, with reasonable certainty, the relationship between surrogate and patient-oriented outcomes [
33,
70,
73,
85,
86].
“…in order to identify reasonable outcomes measures for any clinical trial, one has to know the what the natural history of the disease is. So, those are major challenges, and what we’re faced with in the pharmaceutical industry, who are anxious to do as short a study as possible, for rare disease almost always use surrogate markers as evidence of effectiveness and the relationship between the surrogate marker and clinical outcome is often completely unknown.” – Policy Advisor 1
For example, the six-minute walk test (6MWT) is a common surrogate outcome measure used in clinical evaluative studies for many rare diseases [
83,
84,
87]. The 6MWT was originally developed for patients with moderate to severe lung disease as a means of assessing overall functional status and as a predictor of morbidity and mortality [
88] but has since been used in studies of many rare diseases, including late-onset Pompe disease and Duchenne muscular dystrophy, among others [
84,
87]. An important criticism of this extension of its use is the lack of adequate validation to determine if observed changes in the 6MWT reflect meaningful changes for patients [
83,
84,
87].
“For me I think one of the big issues is the outcome measures that we’re trying to document. For instance, with the lysosomal storage diseases, what is the relevance of a 6 minute walk test? What is the clinical relevance of this type of test?” – Physician 5
Partly in response to concerns about the relevance and validity of surrogate outcomes being used in clinical research for interventions for rare diseases, there has been a shift towards incorporating patient-oriented outcomes in clinical research [
4,
42,
45,
74,
89].
“…because yes, scientific research is important too, but it’s this push-pull dichotomy between the happiness, the living life, just the simple moments, you know, going outside, sitting in the sun, that type of, going down to the beach, those things need to be equally measured…” – Patient/caregiver 4
“We need to know more what’s going to happen in terms of lifespan, in terms of morbidity, in terms of the operations these patients are getting, in terms of growth as well. Is this something that we’re seeing improvement?” – Physician 5
“I think that [name] made reference to this earlier about the importance of evaluating quality of life. And unfortunately, this is not really done. I don’t know of a single study that has done this rigorously for the diseases that I happen to be involved with or have been. And so, for example, the fact that a child may require an intravenous infusion of some medication that takes six hours of infusion and needs it every week. They’re missing a day of school every week. That’s twenty percent of their schooling! This is never, in my experience, never evaluated. Now that’s not a direct measure of quality of life, but you could easily imagine that it would have a significant indirect impact on quality of life.” – Policy Advisor 1
In the literature and among our focus group participants, much of the discussion regarding patient-oriented outcomes has focused on developing outcomes that are meaningful based on the lived experiences of patients and their caregivers [
18,
42,
74,
89]. Tudur Smith and colleagues used the example of juvenile idiopathic arthritis to demonstrate that clinical research initially focused on outcomes related to clinical disease activity and disease damage, but more recently has shifted to identifying and validating outcomes that are most important to patients and parents, such as health related quality of life, functional assessments, and pain assessments [
45]. Basch and Bennett advocated for the use of patient-reported outcomes in clinical studies for interventions for rare disease as the best measurement tools for how a patient feels and functions [
89]. Participants in our focus groups also expressed a desire for researchers to incorporate outcomes beyond those directly related to the patient, including parent- and family-related outcomes.
“One quick comment about the whole family because I know, obviously, a lot of this is directed towards the patient, the person with [disease], but it’s, you know, so linked and so connected, that I find there’s a direct, you know, effect on the child through the parents, so I’d like to see more supports, research for the parents that are also kind of surviving through this…” – Patient/caregiver 4
A common criticism is that many outcome measures, including patient-oriented outcome measures, have not been validated or standardized for the population of interest, leading to questions about the applicability of study results [
4,
42,
70].
“… we know that some of these tests or some of the questionnaires have not been standardized for these particular populations, and we’re faced with always the question is it clinically relevant for these patients? I think overall, there’s agreement that they are, but we run into this problem all the time with, you know, Pompe or the different MPS’ because there hasn’t been long enough natural history studies, there has not been standardization of these tests, so we’re choosing these measuring tools for these particular studies without really knowing if they’re the best tools. And this is very relevant for the quality of life questionnaires, we sometimes use the SF36 or we use specific pain criteria, APPT or something like that, but we haven’t actually standardized this for these populations, so we don’t actually know if what we’re measuring is clinically relevant.” – Physician 5
In response to this criticism, some researchers have begun to identify/develop and validate standard sets of outcome measures that can be used in clinical research evaluating treatment effectiveness in their populations [
4,
45,
76]. Another concern that has been raised with respect to outcomes is that it may not be possible to use the same outcome measure within the same disease if there is substantial clinical heterogeneity among patients [
4,
42,
45,
84,
89]. Some authors and focus group participants also noted that clinical heterogeneity has implications for identifying the minimal clinically important difference [
42].
“…the main trial showed an improvement of 22.5 meters after 6 months in the six-minute walk test, which there’s quite a variability in outcomes depending on which patients you’re looking at, but that the average improvement. What does that really mean is a very difficult decision because for somebody that is walking perhaps 300 meters in six minutes and improves by 22.5 meters, that’s probably not clinically significant, if we’re just looking at a six-minute walk test. But, if someone is not very mobile at all and has that improvement, we might actually have a more clinically significant impact with that treatment.” – Physician 5
Finally, some focus group participants expressed concern about balancing subjective outcomes (e.g., patient-reported quality of life) with more objective outcomes (e.g., biomarkers of disease progression) because of possible placebo effects with patient-reported outcomes.
“I think there needs to be a combination of objective and subjective outcome measures and quality of life measures because, certainly, quality of life is extremely important, but my sense is that it’s a lot more vulnerable to placebo effect. As well, just in the sense that a lot of these families are extremely invested in being on their therapy because it is their only therapeutic option. And so by relying on quality of life measures very heavily, I think we can end up advocating for treatment for patients that aren’t really clinically benefitting.” – Physician 6