Background
The European Union (EU) define a disease as being rare if the prevalence is not more than 5 in 10,000 which affects approximately 254,500 people throughout the EU member countries whose total population is approximately 509 million [
1]. The United States (US) define a disease as being rare if it affects fewer than 200,000 person in the US [
2]. This is equivalent to 62 people in 100,000 in 2015 [
1]. In such circumstances, one may still be able to design a randomised controlled trial (RCT) based on the classical frequentist framework where, for example, the sample size for a two sample
t-test with a 0.05 two-sided type I error rate and 0.90 power to detect a standardized effect size of 0.20 is 1052.
As stated in the “Guideline on clinical trials in small populations” by the European Medicines Agency/Committee for Medicinal Products for Human Use (EMA/CHMP), most orphan indications submitted for regulatory approval are based on RCTs [
3]. Deviation from the perceived gold standard RCT is uncommon. This statement is supported by Buckley [
4], who presented a short summary of clinical trials of drugs for rare diseases approved by the European regulator between 2001 and 2007. Some of these studies had as few as 12 patients and some several hundreds. For example, the marketing authorisation of carglumic acid for hyperammonaemia due to N-acetyl glutamate synthase deficiency was supported by one pharmacokinetic study with 12 patients and one retrospective study with 20 patients. In contrast, the marketing authorisation of sorafenib tosilate for renal cell and hepatocellular carcinomas was supported by one phase III renal trial with 903 patients and one phase III hepatic trial with 602 patients.
Bell and Tudur Smith compared the characteristics of rare and non-rare disease clinical trials registered in ClinicalTrials.gov [
5]. In their review, 64% of rare disease trials had fewer than 50 patients compared to 38% of non-rare disease trials. Only 14% of rare disease trials had more than 100 patients compared to 36% of non-rare disease trials. These results suggest that large studies are possible when studying indications for rare diseases. However, many rare diseases affect 1 in 100,000 or fewer [
6] limiting the potential pool of patients that would be eligible and willing to be recruited to trials. Accordingly, the design and analysis of clinical trials for these diseases becomes more challenging. In addition, as stated in the EMA/CHMP guideline, the prevalence of the disease may constrain to varying degrees the design, conduct, analysis and interpretation of these trials.
In this paper we examine the association between the disease prevalence and sample size for clinical trials in rare diseases allowing for other factors, extending the work of Bell and Tudur Smith but without comparison between non-rare and rare disease trials. Our analysis is based on data from the Aggregate Analysis of ClinicalTrials.gov database (AACT) [
7], a registry of more than 180,000 clinical studies and Orphadata [
8], a portal for information of rare diseases and their prevalence.
Discussion
We found that a majority of trials were conducted in one country only regardless of the disease prevalence. This is slightly surprising given the opportunity in multi-nation trials to recruit more patients. Further investigation may be necessary to understand why multi-nation trials were not conducted more frequently.
We also found that the actual sample size for completed trials was generally smaller than the anticipated trial size for ongoing trials. This supports results shown by Bell and Tudur Smith where there were more rare disease trials (35%) with actual enrolment of 50 or less and 29% of rare disease trials with anticipated enrolment of 50 or less [
5]. This could be indicative of an ambition to complete large trials in rare disease populations that are difficult to achieve in practice.
Sample sizes for trials in rare diseases were statistically significantly related to gender, age, whether or not the trial had a DMC, whether or not the intervention was FDA regulated, intervention model, trial regions with at least one participating centre, number of countries participating in the trial, year that enrolment to the protocol began and number of treatment arms.
Trials enrolling males only were on average smaller than those that enrolled either females only or both sexes. Trials enrolling females only had slightly larger size than those that enrolled both sexes but this was not statistically significantly different. We expected that trials enrolling males only and females only to have smaller size because when the eligibility criteria is restrictive, the population is more homogeneous and less variable in effectiveness, thus smaller sample size may be sufficient. Further inspections revealed that of the 79 trials with females only, 78% (
m = 62) of them were in phase 2 and 89% (
m = 70) were for diseases with prevalence 1-5/10,000. There were only 25 trials with males only and 76% (
m = 19) were in phase 2 and only 36% (
m = 9) were for diseases with prevalence 1-5/10,000. The small number of less rare diseases for males might have influenced the average sample size in male-only trials as shown in Table 7 (Appendix
10), a list of diseases by phase for females and males only. Of note is that most of these trials were in diseases that affect one sex only; all of the male-only trials were X-linked disorders whereas almost all of the female-only trials affected females only. A few of these trials were in disorders for pregnant women only. Further research is necessary to investigate and identify other factors that could explain this difference.
Similarly, we expected trials enrolling various age groups to have larger sample sizes than those that recruited children only, adults only or elderly only because by expanding the sampling pool more patients could be recruited. However, on average trials recruiting multiple age groups were slightly smaller than adults-only and elderly-only trials.
Unsurprisingly, trials with factorial design had larger sample size than single group and crossover trials since in a factorial design a few combinations of interventions are tested at the same time. Diseases that employed the factorial design had prevalence greater than 1/100,000 (the less rare diseases) suggesting that sophisticated designs could be used when possible. However, the most frequently used intervention model for the rarer diseases (prevalence <1/100,000) was single group assignment and the average sample size was less than 35. The levels of evidence from these trials may not be as high quality as the gold standard RCT. The EMA has indicated that prevalence of disease could constrain the design, conduct and analysis of trials for small populations and the EMA/CHMP guideline suggested that novel approaches could be considered in situations when it is difficult to recruit large number of patients [
3]. This in turn presents a challenge of developing new methodology for trials in small populations. In response to this challenge, three collaborative research projects (Asterix, IDeAl and InSPiRe) are working on methods for clinical trials in the small population setting [
13].
The main analysis and sensitive analyses with parallel 2-arm trials only and single group (1-arm) trials only showed that generally, the mean sample size was affected by prevalence where mean sample size increases as prevalence increases. The increase was noticeably larger in phase 3 trials compare to phase 2. However, due to small number of trials in some classes, it is difficult to make comparisons.
The generalisability of the results obtained in this study rely on the extent to which trials included in the database are representative. Although institutions such as the International Committee of Medical Journal Editors (ICMJE) require certain studies to be registered either in ClinicalTrials.gov or other equivalent registries [
14,
15], it seems likely that certain types of trials are more likely to be registered, especially, efficacy trials in serious or life-threatening diseases with investigational new drugs regulated by the FDA and EMA. This is a strength of this research as we concentrated on interventional phase 2 and/or 3 trials where there would be better coverage. However, phase 2 and/or 3 trials taking place in EU site(s) initiated after 2011 may not be registered in ClinicalTrials.gov but in the EU Clinical Trials Register which was launched on 22 March 2011 [
16].
A limitation with our study is the potential selection bias because we included only trials conducted in the US and/or the EU. This is a necessary measure to exclude trials studying diseases with low prevalence in the US/EU but high prevalence elsewhere. For example, there was a multi-centre interventional trial on tuberculosis with locations in the US, United Kingdom and Peru. The annual incidence in these countries are 1-9/100,000, 1-5/10,000 and >1/1,000, respectively [
8,
17].
Another possible limitation with our study is that we considered a condition to be rare if information on prevalence was listed in Orphadata. This database is updated on a regular basis and some conditions may have been missed out or with no prevalence information. Table 8 in Appendix
11 provides a list of trials in the AACT database where the conditions studied were listed in Orphadata but for which no value of prevalence is given. Prevalence of some diseases changes over time and because the prevalence information in Orphadata is updated regularly, old prevalence data are not retained. This presents a weakness to the study as trials studying rare diseases prior to 2016 were assumed to have updated prevalence.
As explained in the methods section, we have used point prevalence to classify diseases into prevalence classes where this is available. In some cases, some other measure of prevalence has been used. In this project diseases are classified into groups according to their prevalence value and because of categorising continuous variable we have lost some information. However, this is a necessary pragmatic approach so that ultra rare diseases where only number of cases/families were known could be included in the analysis. In these diseases it is unknown which denominator should be used to calculate the prevalence value but they could be classified as having prevalence <1/1,000,000, as is the practice in Orphadata. Our results depend to some extent on the choice of types of prevalence used but as the results presented are based on means from a number of studies, it is likely that conclusions are relatively robust.
In our analysis we have grouped trials described as phase 2/3 by investigators with trials described as phase 3. This is a reasonable assumption because the eventual objective of both phase 2/3 and phase 3 trials is to test the study hypothesis whether or not the treatment is more effective with a plan to subsequently submit for regulatory approval. However, there may have been inconsistency in data entry by investigators with the definition given by US FDA. This is likely to introduce systematic bias. Theses inconsistencies are difficult to rectify as the registry does not require investigators to give details on the design and sample size calculation where detailed examinations could be performed to check if the objective of the design correspond to the US FDA definition.
The number of patients eligible for trials may also depend on whether the rare condition is acute or life threatening, so that only new cases can be recruited, or chronic, when it may be possible to sample from a larger population depending on the prevalence rather than the incidence rate. Further work should investigate the association of acuteness/chronicity of the condition on the trial sample size.
We have focussed attention on the sample size of trials in rare diseases. The AACT database also contains additional data, for example on trial design features such as the intervention model (crossover, factorial, parallel or single group assignment), masking (double blind, single blind or open label), allocation (randomized, non-randomized), primary endpoint (e.g., efficacy, safety, pharmacodynamics, pharmacokinetics) and number of interventions in a trial. These might also vary with disease prevalence among rare disease trials. Investigation of such effects could be the subject of further work.