Background
Random-effects meta-analysis is most commonly performed based on an underlying hierarchical model including two unknowns as parameters: the
effect
μ, which is the figure of primary interest, and the
between-study variance (heterogeneity)
τ
2, which is a nuisance parameter. Inference then is usually done sequentially, by first deriving an estimate of the heterogeneity variance,
\(\hat {\tau }^{2}\), and then determining the effect estimate
\(\hat {\mu }\) by conditioning on the estimate
\(\hat {\tau }^{2}\) [
1,
2]. A large number of different estimators for the heterogeneity variance is available (see e.g. [
3‐
6]), and effect estimation may be done based on a simple normal approximation, or by utilizing a Student-
t distribution [
7] with an additionally refined estimator of the variance of
\(\hat {\mu }\) [
8‐
12]. While the normal model may be motivated by asymptotic arguments, in actual applications the number of estimates to be combined is commonly small [
13,
14] and hence the estimation uncertainty in the between-study variance
τ
2 is substantial, so that an adjustment is appropriate and in fact improves operating characteristics [
7‐
12,
15].
The problem of deriving estimates from only a small number of data sources is a common problem especially in fields of application where empirical information is sparse due to the rarity of the condition in question. The rarity of a disease is often accompanied with a low (commercial) interest or incentive, which is why such diseases are also known as
orphan diseases. According to the European Commission, a disease is designated orphan status when the prevalence is ≤5 in 10 000 [
16]. While by definition any individual rare disease has a low prevalence, there is a large number of these, eventually affecting a substantial fraction of an estimated 6–8 % of the total population [
17], and with that posing a challenge to health care systems worldwide.
The European Medicines Agency acknowledges the particular obstacles in rare diseases research but points out that there is no fundamental difference between rare and more common diseases and hence no “paradigm change” when it comes to regulatory issues. Because of the common small-sample settings, the importance of sophisticated methods is emphasized, and meta-analyses of good quality randomised controlled clinical trials are still considered the highest level of evidence [
18]. The problems encountered in rare diseases research often call for special statistical methods, especially with respect to study designs [
17,
19,
20]. Meta-analyses are particularly important in this field due to the lack of large trials, while these will commonly still be faced with the problem of small numbers of available studies. Between-study heterogeneity then is anticipated, since the gathered pieces of evidence are likely to differ with respect to study designs, types of control groups or treatment allocation [
17,
19,
20], Unkel S, Röver C, Stallard N, Benda N, Posch M, Zohar S, et al., Systematic reviews in paediatric multiple sclerosis and Creutzfeldt-Jakob disease exemplify shortcomings in methods used to evaluate therapies in rare conditions, Submitted. Small studies have in fact epirically been found to exhibit more heterogeneity than large trials [
21]. Consequently, the use of methods suitable for few studies and marginally significant findings is of crucial importance here.
With an estimated incidence of 2–20 cases per 100 000 population, juvenile idiopathic arthritis (JIA) is an example of a rare disease [
22]. In the following, we will use a meta-analysis in JIA [
23] as a case study to illustrate the different methods discussed below.
In the following sections, we will first describe the methods used, then show the results of a simulation study, and demonstrate the different types of analyses in an example data set, before closing with conclusions and recommendations.
Conclusions
The HKSJ procedure ensures the coverage probability only when the included studies’ standard errors s
i
are similar; for unbalanced settings, the actual error probability tends to exceed the targeted one. With the standard definition of the correction factor q the results may sometimes be counterintuitive, since the corresponding CIs may turn out shorter than using the simple normal approximation; in fact they may get arbitrarily short. In case of no heterogeneity (τ=0) the HKSJ method also works well, however practically this is of limited relevance, as one can rarely tell (or convincingly argue) whether this condition holds.
The ad hoc modification of the mKH method aims at fixing these shortcomings and results in type-I error probabilities that are not grossly in excess of the pre-specified ones. Especially when the standard errors s
i
are of dissimilar magnitude, the mKH method can therefore be recommended. For few studies (small k), the modified procedure however tends to be very conservative, with very small error probabilities especially in the extreme case of meta-analysis of only k = 2 studies. In this extreme case the choice of methods may therefore be considered a matter of a power vs. type-I error probability tradeoff.
While meta-analyses of few studies are a particular problem in indications where there is only little evidence available (such as rare diseases), such circumstances are not as uncommon as one might expect. Turner et al. [
13] and Kontopantelis et al. [
14] investigated the analyses archived in the Cochrane Database and actually found a
majority of them to be based on as few as
k = 2 or
k = 3 studies; so these constitute highly relevant cases for which the proper control of error rates is crucial.
The properties of either unmodified or modified method for the extreme case of
k = 2 may be considered unsatisfactory, as it seems one has the choice of either falling short of or exceeding the targeted error probability; the problem has in fact been regarded as effectively unsolved [
41]. The poor behaviour may be explained by the fact that performing a random-effects meta-analysis effectively means the estimation of first- and second-order statistics, and it is not overly surprising to find that this is a hard task when the data consist of as few as two samples that are only measured with uncertainty. Bearing this in mind, the use of Bayesian methods [
42] and the consideration of external evidence on the likely magnitude of the heterogeneity [
43] may be the way forward.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
TF conceived the concept of this study. CR carried out the simulations and drafted the manuscricpt. GK critically reviewed and made substantial contributions to the manuscript. All authors commented on and approved the final manuscript.