Findings in context of previous research
A recent Cochrane review by Ndounga Diakou et al. [
11], which investigated how the blinding status of on-site assessors affected the benefit of independent, blinded outcome assessment, found similar conclusions. The review found that blinded outcome assessment had little to no impact on the treatment-effect estimates. However, as mentioned previously Ndounga Diakou et al. suggest that open-label trials could benefit most from additional blinded assessment. Therefore, it is important to note that as our findings do not appear to agree with this notion, they may not be generalisable. Although, this disagreement may be due to the fact that Ndounga Diakou et al. looked at a range of studies with subjective outcomes, whereas we investigated one trial with an objective primary outcome.
When comparing the use of trial and unblinded measurements when determining the STOP GAP trial’s primary outcome we found the estimated treatment effects to be identical, contrary to other studies [
17‐
19] which have found unblinded trials often overestimate treatment effects. However, this may be partially attributed to the fact that our primary analysis did not directly compare blinded versus unblinded outcome assessment; only 80% of the trial measurements were blinded. The exploratory analysis, using the reduced population of participants assessed using blinded measurements, found the treatment-effect estimate using the unblinded measurements to be at least ten times larger in magnitude than when only the blinded measurements were used. However, this analysis still resulted in the same conclusion as our primary analysis in that digital image assessment did not change the trial findings.
The potential for bias tends to be higher when the primary outcome is a subjective measure, such as quality of life, rather than a clearly defined objective one [
1,
5,
20,
21] such as lesion size in STOP GAP. Moreover, the overestimation in the unblinded physical measurements was consistent over both treatment groups rather than biased, and was likely due to their crude nature. Additionally, there is no recommended initial treatment nor were there any preconceived ideas about the superiority of either drug due to PG being very rare. This could explain why we observed no difference between assessment methods, as nondifferential error can be expected, which would dilute the treatment estimates rather than introduce bias [
22].
A recent study [
1], which reviewed the use of digital photographs for blinded outcome assessment in a clinical trial looking at treatments for verrucae [
23], found that blinded digital image assessment did not have an impact on the trial conclusions. Similarly, the conclusions of STOP GAP would not have been altered even if the observed difference between the unblinded and trial measurements was increased differentially in the ciclosporin arm by a factor of 5. However, this may be due to the fact that the speed of healing was already very similar between the two treatment groups in the trial, so it would be hard for detection bias to introduce enough variation to change the result. If the trial initially provided stronger evidence of a treatment effect, then our study might have reached a different conclusion.
It is important to consider the cost of digital image assessment, which we estimated to be £20,000, approximately 2% of the total budget for this trial. This includes the cost of equipment and software, training of both image assessors, travel for the specialist trainers, and payment for all staff involved to carry out the image processing and assessment. Alongside costs, it is also important to recognise the time involved in these processes. For instance, whilst the unblinded assessors found the measuring process relatively straightforward, the blinded image assessors had difficulties initially using the software and found the actual measurement process to be time-consuming.
Additionally, in STOP GAP, there was a difficulty in measuring photographs of lesions that were particularly large or circumferential such as when stretched around the curvature of a limb. This finding agrees with a study [
24] that compared wound measurement using two techniques; a manual tracing process and computer software which calculated the measurement after the wound was photographed. The study found that as digital photographs are a 2D image attempting to capture a 3D structure, ‘discrepancy may also occur when tracing circumferential wounds’. It can also be hard to measure digital photographs of lesions when a participant exhibits subtle symptoms, such as redness or swelling, which can affect outcome measures. In fact, the outlying participants referred to earlier as A and B are such cases of this. An inspection of their digital images revealed that participant A had a large circumferential lesion which covered most of their forearm, whilst participant B’s lesion was healing in patches with a large amount of surrounding redness. It is probable that these properties were the cause of the disparity between their blinded and unblinded measurements.
Furthermore, blinded digital measurements were only obtained for 80% of patients; 20% of the sample would have had no primary outcome data if unblinded physical measurements had not also been taken as a back-up and so would have been excluded from the primary analysis. An alternative method of incorporating blinding in STOP GAP would have been to use a ‘double dummy’ design [
25], with each participant receiving one placebo and one active treatment. However, whilst participants and assessors may be blinded at baseline, this approach would not mask the difference in side effects that would be evident a short time after receiving either treatment. This would have the potential to lead to unblinded participants which, in turn, could lead to unblinded assessors. Additionally, this design is potentially more expensive to implement than digital image assessment. Another possible approach to facilitate blinding could include the use of an additional dermatologist in participant follow-up visits. This dermatologist would be employed purely to conduct measurements, without any further participant interaction or exposure to participant data. However, whilst this would have avoided the complications of digital photography, this approach may not have been feasible in a trial involving a rare disease such as the STOP GAP trial. Moreover, the use of digital image assessment may have increased the accuracy of the measurements as crude physical measurements have been seen to overestimate wound area by 10% [
26]. This is desirable regardless of the fact that the results remained unaffected. It also enabled global assessment of the lesion severity and made it possible for experts to check the diagnosis, which was important for a rare condition that recruiting physicians rarely see.
We found that the agreement on measurements between the two assessors was high at both baseline and 6 weeks. As blinded measurements were obtained using computerised assessment and a clear set of instructions, this was not unexpected. The implication from this finding is that the cost of assessment could have been reduced by having only one assessor. However, we observed that if this was the case and only one assessor had been used, on average 49% of the participants would have been assessed using blinded measurements, rather than the 80% observed in STOP GAP. This is due to the low agreement between assessors on image usability and provides some justification for the use of multiple assessors; if digital image assessment is seen to be an important element of the trial design to facilitate blinding, it is vital to ensure that the majority, if not all, of the outcome data received are of a blinded nature. Additionally, the use of multiple assessors adds to the validity of the measurements; if only one assessor is used and is consistently measuring the images incorrectly, there would be no verification and their measurements could cause misleading results and conclusions to be drawn.
An additional benefit that independent, blinded digital image assessment had on STOP GAP was that it improved the credibility of the trial findings. Furthermore, it ensured that the trial would be scored as being of high quality in any subsequent systematic reviews [
27]. Due to blinded outcome assessment being seen as the ‘gold standard’, researchers often strive to ensure that at least the primary outcome is blinded. In fact, a study by Olson et al. [
28] has shown that manuscripts which report on trials with some form of blinding are three times more likely to be published than those that could have been blinded but were not. Therefore, whilst we have shown in this single case that blinded outcome assessment did not impact on the trial results, it would be of interest to know whether journal editors or the wider scientific community would have accepted the findings had blinded digital image assessment of the primary outcome not been implemented.
Strengths and limitations
A limitation of this study was that comparing blinded and unblinded assessment was confounded by the measurement method. In order to make a direct comparison, we would have required blinded assessors in clinics taking physical measurements and/or unblinded assessors calculating lesion area using digital images and the specialist software. Furthermore, we were restricted by the small dataset that we had available. As PG is a very rare disease and our analysis was limited to 108 participants, exploring digital image assessment in this setting should be treated as a hypothesis-generating process rather than a hypothesis-confirming one. Therefore, caution should be taken before our results are generalised to other situations or disease areas. However, we have shown that in some circumstances, blinded outcome assessment may not be a necessity to preserve trial quality. We understand that another constraint of our research is that we have only investigated a single trial. Regardless, our findings can be used together with other studies to add to the pool of current knowledge.
One strength of this study is that our results remained robust to a variety of assumptions. With observed difference between unblinded and trial measurements increased both nondifferentially and differentially by up to five times, our primary analysis still suggested that blinded outcome assessment was not necessary in STOP GAP. In fact, we found that it would require the observed difference to be increased by more than 20 times to meaningfully shift the primary outcome, which we feel is implausible. Furthermore, two sensitivity analyses were performed: excluding extreme observations, and an exploratory analysis. Both analyses concluded that digital image assessment did not have an impact on the primary outcome. This helps to reinforce the robustness of our results.