Background
The Delphi method is used to reach a consensus between experts on a specific topic via a series of questionnaires interspersed with feedback of group answers [
1]. Originally developed by the Rand Corporation for applications in forecasting the impact of technology on warfare [
2], this method has since been applied to many fields, including medical research [
3‐
6], in order to compensate for the lack of empirical data concerning specific topics in medical literature.
The aim of this method is not to collect knowledge about a subject, but rather to gather opinions [
7]. This implies that erroneous estimates might occur in evaluation of some quantities by a panel of experts even when a convergence of opinion is observed. In particular, quantities such as probability of disease progression according to health status involve reasoning mechanisms that may produce cognitive bias. Here we demonstrate this, and we explain why the Delphi method may not be accurate when estimating probabilities of disease evolution. We use the example of a Delphi study we conducted in Egypt to estimate evolution of HCV-infected patients with cirrhosis.
The primary objective of this study was to use expert opinion to determine transition probabilities in a decision model of natural history of HCV-related cirrhosis in Egypt, namely, probability of death and probability of transition between the different stages of HCV (compensated cirrhosis, decompensated cirrhosis, hepatocellular carcinoma (HCC), etc.). Studies of the natural history of HCV-related cirrhosis have previously been published [
8,
9], but they focused on populations in northern countries with different genotypes (in Egypt, HCV infections are mainly genotype 4 [
10], while in northern countries, this genotype is uncommon [
11]); moreover, the Egyptian health care system differs, as does the population. In Egypt, the absence of alcohol is favorable to patients, while co-infection with bilharzias, hepatitis B or overweight, increase the risk of complications. For these reasons, we felt that literature estimations were inappropriate for an Egyptian population.
Discussion
There exist no data on the natural history of HCV disease in developing countries, and Egypt is no exception. We therefore conducted this Delphi analysis to specifically estimate progression of patients with cirrhosis to HCC and death in that country.
The natural history of HCV disease has been estimated for developed countries. We assume that, in Egypt, management of HCV-infected patients with cirrhosis differs from that of developed countries. Moreover, patient characteristics and co-morbidities are not necessarily similar in these settings, and it was for these reasons that we conducted our study. However, we were surprised at differences in estimations in our Delphi study compared to those reported in developing countries [
8,
9,
13,
14]. Transition probabilities for some stages and, in particular, early stages of cirrhosis, seemed unexpectedly high in our study (i.e. annual death probabilities for patients with compensated cirrhosis were 10- to 24-fold higher in our study than that found in the medical literature in developed countries [
8]). In contrast, transition probabilities for other stages such as late stages of cirrhosis seemed to be lower in our study (i.e. annual death probability for patients with first cirrhosis decompensation were 1- to 2-fold lower in our study [
8]).
Moreover, we found very strong variability between the responses of the different participants. Certain transition probability estimates ranged from 5% to 90%. In addition, the intraclass correlation coefficient estimating overall degree of agreement between experts was low. This lack of consensus calls into question the estimates obtained by the Delphi process, especially as other studies were conducted in the past to estimate transition probabilities by eliciting expert opinions, with similar results. Soares
et al. attempted to estimate transition probabilities for a cost effectiveness transition model of negative pressure wound therapy for severe pressure ulceration [
15]. The authors observed variability in experts’ answers, and felt that this type of result is desirable, since it ensures that all views are represented. However, the design of that exercise was not a Delphi study, which specifically aims to obtain a consensus between experts. Schultz
et al. applied the Delphi technique to lung cancer progression [
6], with estimation of 5-year survival probabilities by 14 experts. They did not make any comparison with expected values for this disease because of the absence of empirical data, but high variability in responses between experts was observed. The authors concluded that the different beliefs may explain variations between practitioners in management of patients with solitary pulmonary nodules. They mentioned small sample size as one of the study limitations and stated that such a result – that is, variability in opinions – may not be generalized. Lubell
et al. conducted a Delphi survey to estimate transition probabilities in the natural history and medical management of malaria and acute febrile illness [
5]. Twenty-one panellists participated in that study. They too noted wide dispersion on several questions, ranging from 5% to 100%, and overall lack of agreement between experts. Our survey suggests that this type of variability can be found for other diseases. This is worrisome, since erroneous perception of risk of disease progression by practitioners may influence medical choices concerning these diseases as well. In a broader context, Cahan
et al. were interested in the estimation of probabilities by physicians in the context of a “threshold approach” for decision-making in medicine [
16,
17]. They asked physicians to assess the probability of various diagnoses from a case description in a single anonymous questionnaire. Their results indicated that practitioners generally overestimate the probabilities of each diagnosis, resulting in a total of more than 100% (“subadditivity”) for a non-exhaustive list of mutually exclusive diagnoses [
17]. Second, they also found a wide variability in responses among experts [
16,
17]. They did not use a Delphi process, which specifically aims to increase agreement between the responders. However, the same type of problem is observed. The authors highlighted the “support theory” of Tversky and Koehler [
18,
19]. This theory claims that the probability assigned to a description of an event depends on the description: two different descriptions of the same event can lead to different estimates. Results of these studies show that it is indeed a lack of probabilistic thinking rather than the type of probability, or a lack of observation of specific cases, that may be involved.
Other explanations may be proposed for the wide variability in responses and errors in transition probability estimations. First, the infrequency of events and the rarity of profiles may lead to imaginary rather than statistical estimates (i.e. cognitive-based estimates) [
20]. For example, for patients with compensated cirrhosis, orders of magnitude for probabilities are around 1 percent and sometimes 1 per mile, and overestimations are clearly accentuated for such a state. Under such conditions, we presume that events are not frequent enough to be familiar to practitioners. Profiles may be too specific to allow practitioners a good representation of the situation described in the questionnaire. Other potential explanations concerning cognitive bias and perception of risk are present in the literature [
21,
22]. Kahneman
et al. and Tversky
et al., in particular, describe several cognitive biases and thought mechanisms involved in human judgment. The “representativeness heuristic” implies that probability estimations by humans are not based on statistical or probabilistic reasoning, but rather on judgment by representativeness, i.e. on stereotypes based on the description of patient characteristics. This mode of reasoning leads to erroneous evaluations, which are sometimes in contradiction with elementary probabilistic properties [
23]. Other cognitive biases may occur because of the “availability heuristic”, which states that reasoning tends to be based on immediately available information. Information availability may be influenced by various factors. Tversky
et al. suggest – among others – salience: “it is a common experience that the subjective probability of traffic accidents rises temporarily when one sees a car overturned by the side of the road” [
22]. It is reasonable to assume that a similar cognitive bias occurs when we ask physicians to estimate event probabilities regarding their patients’ deaths. Abstraction also has an impact on availability, because abstract quantities – such as probabilities – are not immediately available and require reasoning. Evaluation of such quantities can cause what Tversky
et al. call “biases of imaginability” [
22]: instances are generated according to a given rule. In our study, questions were ordered by gender, current stage of cirrhosis, transition stage and age (Table
1). Linearity of the different probability functions of age in Figure
1 suggests that experts responded according to a predetermined model (linear increase of risk with age) and not according to their practical experience.
In the Delphi survey, the iteration process can theoretically be repeated in order to increase the opinion’s convergence between experts. However, this point of view does not create a consensus [
24], because of the risk of an artificial consensus: Delphi increases judgment convergence, but not accuracy [
1,
20]. In our study, the relative convergence observed between the two rounds does not seem to lead responses toward more realistic values. In contrast, overestimated values become higher (Figure
1). If experts have no idea of the answer at the first round, there is no reason why iteration would lead to better estimations. In addition, anonymity might encourage experts to answer even if they are uncertain [
25]. In our study, the mean number of missing values decreased between the two rounds, from 2.1 (9%) to 0.4 (2%). It is reasonable to assume that the difference might be explained by some form of group pressure.
Results mentioned above suggest that recommendations be made for solicitation of expert opinions. Delphi might be a useful tool for compensating for the lack of empirical data in medical research [
26]. However, for Dalkey, knowledge is more reliable than opinion [
7] and this is why, in the presence of available data, Delphi is a useless technique: at best, results are consistent with empirical results; if results differ, then empirical estimations are a priori better. Moreover, the nature of the information requested must be taken into account. While it is expected that information concerning medical practices is easily evaluated by a panel of experts, probabilistic measurements or abstract information can cause cognitive bias in estimations [
21,
22]. Thus, it is recommended that Delphi not be used to obtain information on the natural history of a disease or survival probabilities. Finally, the relevance of the questions is subject to caution. Questions that are too precise could paradoxically be counterproductive, and the familiarity of experts with the different aspects of the questionnaire should be subjected to preliminary discussion.
Our study suffers from several limitations. First, data on transition probabilities used to evaluate quality of results were taken from studies in northern countries. Second, alternative formulations of questions may have been more appropriate for estimating transition probabilities by experts without probabilistic background [
15,
27]. Finally, opinion (unlike knowledge) is imperfect and this uncertainty must be measured in the questionnaire [
28]. We left two options to the experts when responding to questions: either respond by a single value or, if unable to provide a single value, respond by giving a range that takes uncertainty into account. The fact that only 33% of the questions were answered in the form of an interval illustrates the fact that we may have failed to capture this individual uncertainty [
15,
27].
Acknowledgments
This study was funded by the French Agence Nationale de Recherche sur le Sida et les Hépatites virales (ANRS,
http://www.anrs.fr), grant number 12215.
We would like to express our gratitude to experts in Egypt for giving their time and sharing their expertise by participing in the Delphi: Samy Zaky El Sayed, Maissa El Raziky, Mohamed A. Mohey, M. Magdi Atta, Basem El Sayed Eysa, Mohamed Kamal Shaban, Mohamed Atallah, Mai Esmaiel Mehrez, Wafaa Ahmed El Akel, Mohamed Adel, Tamer Mahmoud Elbaz, Ayman Salem Amer, Abdelkader Farrage, Mohamed El Ateek, , Mohamed Said Abdel Aziz, Hassan Hamdy, Abobakr Mohamed, Emad Abdel Sattar, Ahmed Mohamed El Deeb and Mahmoud Ahmed Shedid.
Competing interests
SDB received grants from Roche, Janssen-Cilag and Schering-Plough and consultancy honoraria from Merck and GlaxoSmithKline. GE received funds from BioGenesic Phasma and BMS. YY received travel grants, honoraria for presentations at workshops and consultancy honoraria from Abbott, Bristol-Myers Squibb, Gilead, Merck, Roche, Tibotec and ViiV Healthcare. None of the other authors report any association that might pose a conflict of interest.
Authors’ contributions
YY had the idea for the study. DO, SBD, VC and YY contributed to the conception and design of the analysis. GE and MKM took part in the selection of experts. GE, MEK, MES, and MKM were part of the panel of experts. AM, DO, and AC performed first-round statistical analysis during the Delphi process. AC performed the second round and additional statistical analysis. All authors contributed to interpretation of data. AC drafted the article and all authors critically revised it for important intellectual content. All authors approved the final version of the manuscript to be published.