Introduction

With the introduction of new biometry techniques such as the partial coherence interferometry-based IOLMaster (version 5, Carl Zeiss, Jena, Germany) and the optical low coherence reflectometry-based LENSTAR LS900 (Haag-Streit, Bern, Switzerland), accurate, fast and easy measurement of ocular variables is now routine at many eye centres in industrialised nations. The optical biometry device IOLMaster, for example, can achieve axial length (AL) measurement precision within 20 μm,1 compared to 100 μm in ultrasound biometry.2 More importantly, the built-in computer algorithms in these devices allow intraocular lens (IOL) power calculation using a range of formulae, some of which require sophisticated optimisation procedures.3 This greatly extends the usefulness of these devices in both research and consultation work.

In contrast, ophthalmologists in developing countries continue to rely on applanation and immersion ultrasound biometry in clinical consultation. Skilled doctors or technicians can usually obtain accurate AL and IOL power measurements using these methods. Nevertheless, variations in probe position, degree of corneal indentation, and patient movements during a biometry session can affect the final outcome. Furthermore, these methods require several steps before an IOL power measurement can be obtained, unlike the ‘one-stop’ new optical biometry devices. This provides opportunities for mistakes to creep in at each step (eg, when keying in the K readings into the A-scan machines). Patients may feel discomforted by the eye-probe contact in applanation biometry, which can also expose them to corneal abrasion or eye infection.4 Despite these shortcomings, ultrasound biometry may be the only means of obtaining suitable biometry in problematic cases: eyes with dense ocular media,5 low visual acuity,6 and fixation instability caused by macular degeneration.7 IOLMaster has been reported to be ineffective in 20% of cataract patients8 in the UK public hospitals, many of whom have posterior subcapsular cataracts.

When different ways of measuring the same variable are available, it is of interest to find out how well two different methods agree, as strong agreement implies that they can be used interchangeably.9 Assessing method agreement can also be useful for evaluating the relative importance of instrument variation as a source of error for IOL power. Several studies10, 11, 12, 13, 14 that look at LENSTAR-IOLMaster agreement in AL, anterior chamber depth (ACD), IOL power, K1, and K2 measurements of phakic eyes have been done. All suggest that measurements obtained from LENSTAR and IOLMaster strongly agree. These studies, however, do not tell us how good the agreement between LENSTAR and IOLMaster is relative to those of LENSTAR-applanation and LENSTAR-immersion. In addition, the statistical arguments used to justify the conclusions in some cases are specious. Misconceptions about what constitutes method agreement remain prevalent: high correlation15, 16 and R2 values from a linear regression analysis of the x-y variables in the Bland–Altman plot17, 18, 19 are synonymous with strong agreement. The present study aims to address these gaps by comparing all three method agreement analyses for AL, average K, and IOL power using a coherent statistical approach.

Materials and methods

The present study was done at the Ophthalmology Clinic in the University of Malaya Medical Centre (UMMC), Kuala Lumpur, Malaysia. The first author, who is an experienced trainee ophthalmologist with 3 years of experience in biometry, performed all biometry work. To collect data, he approached patients in the clinic who had varying degrees of cataract severity in both eyes, and informed them about the purpose of the present study. Patients who gave informed consent and did not suffer from physically inconvenienced illnesses (ie, wheel-chair bound and limb injuries) were included in the study. The patients were first examined using IOLMaster (version 5), followed by LENSTAR, immersion and finally applanation. For an eye, if it was not possible to measure any of the ocular variables using IOLMaster because of very dense media along the visual axis, we excluded it from consideration. Measurements of AL, ACD (not used), K1, and K2 were taken from both eyes; to calculate IOL power, we set emmetropia as the refractory target and used the SRK/T formula.20 For each eye (under miosis), five AL and three keratometry measurements were taken then averaged. IOL power calculation from A-scan applanation and immersion ultrasound biometry readings requires a separate assessment of keratometry values. For this, we used the TOPCON KR-8100 autorefractor-keratometer (Topcon Europe Medical Bv, Capelle a/d ljssel, the Netherlands) to obtain K1 and K2 readings. The average time taken to obtain all readings for an eye in a patient was 2 min for IOLMaster, three-and-a-half minutes for LENSTAR, and 4 min for applanation and immersion ultrasound biometry.

Data collection ceased when 76 consecutive patients who came for cataract evaluation had been sampled. We queried 104 patients, with a 73% response rate. Reasons for declining to participate in the study include obesity, lack of time, and uneasiness with undergoing four biometry sessions in a sitting. The sampling period was from 12 May to 1 June 2010. The average age of the patients (27 females; 49 males) was 54 years, with SD 15 (range 18–86; see Supplementary Information). We certify that all applicable institutional and governmental regulations concerning the ethical use of human volunteers were followed during this research.

Statistical analysis

We compared how well measurements of AL and IOL power agreed in three pairwise comparisons: LENSTAR-IOLMaster, LENSTAR-applanation and LENSTAR-immersion. For the average K variable ((K1+K2)/2), we made two comparisons: LENSTAR with IOLMaster and LENSTAR with the TOPCON autorefractor-keratometer. In determining the sample size for the present study (76), we used information from a preliminary assessment of measurement variation in the method comparison groups. First, we estimated the mean difference and SD of AL for all three comparison groups using 22 eyes from 13 patients in a similar study. We then computed the sample size required to achieve at least 80% power of detecting the mean difference with SD in each case (two-sided paired t-test, 5% significance level), and used the maximum of the three estimated sample sizes (74 pairs of eyes) as a guide. Pooling of left and right eyes for subsequent analysis seems reasonable, as most of the variation in the ocular variables considered is found within rather than between the left and right eyes (analysis of variance P-values: 0.8 (AL), 0.8 (IOL power), 0.9 (average K)). The total number of eyes in each comparison group varies from 142 to 147 because of three reasons: mistakes in data recording (misplaced printouts), single-eyed patients and dense cataract cases.

We used Bland–Altman plots to assess the level of agreement (width of the 95% limits of agreement, LoA). In order to relate observations from the Bland–Altman plots to a clinical context, we have to decide how narrow the LoA should be to conclude that two methods agree well. This is a clinical decision that is context-dependent and needs to be determined in advance of an analysis.9 Our approach is to evaluate the proportion of differences falling into particular ranges that show varying levels of clinical impact. Thus, we defined clinically acceptable ranges for IOL power (within 1D, between 1 D and 2 D, greater than 2 D), AL (within 0.33 mm, within 0.10 mm), and average K (within 1 D, within 0.5 D) differences, and then estimated the proportion of differences falling within those ranges from the empirical distribution of differences. We chose these ranges based on the following reasoning. A difference of within 1 D in IOL power corresponds to the case where patients can generally see well without corrective glasses; between 1 D to 2 D, the patient would need corrective glasses of low power, but it is not hard to explain to the patient about the small error; above 2 D represents the situation of the patient facing unacceptable postoperative high refractive error. Our choice of the 0.33 mm and 0.10 mm cut-off for AL is based on the fact that these differences translate to 1 D and 0.3 D difference in IOL power, respectively, under the SRK/T formula. Similarly, differences of 1 D and 0.5 D in average K translate to 1 D and 0.5 D difference in IOL power.

To judge whether the proportions of interest in two comparison groups are similar, we estimated their discrepancy (the larger value minus the smaller one ), and reported the SE, which is given by the formula9

where n1 and n2 are the sample sizes of the first and second comparison group, respectively. We used Fisher's exact test to detect association between comparison groups and several categories of clinically acceptable difference ranges for IOL power. All statistical work was done using R21 (version 2.10.1).

Results

The LENSTAR-IOLMaster comparison has the smallest magnitude of mean and SD of difference in AL (0. 01 mm±0.03 mm) and IOL power (0.07 D±0.26 D), followed by the LENSTAR-immersion comparison (0.04 mm±0.17 mm; −0.08 D±0.75 D) and the LENSTAR-applanation comparison (0.18 mm±0.23 mm; −0.52 D±0.93 D). IOL power values obtained using LENSTAR therefore tend to be smaller than those of ultrasound biometry, particularly applanation. For average K, the LENSTAR-IOLmaster comparison gives slightly larger magnitude of mean difference but much smaller SD (−0.11 D±0.18 D) compared with the LENSTAR-TOPCON comparison (0.04 D±0.41 D). The distributions of the differences in all these comparison groups are approximately symmetric (see Supplementary Material).

The Bland–Altman plots (Figure 1) show that the width of the 95% LoA for AL difference is narrowest (about 0.1 mm) in the LENSTAR-IOLMaster comparison, and 0.7 mm and 0.9 mm in the LENSTAR-immersion and LENSTAR-applanation comparisons, respectively. Figure 2 shows that the width of the 95% LoA for IOL power difference is narrowest (1 D) for the LENSTAR-IOLMaster comparison, followed by LENSTAR-immersion (3 D) and LENSTAR-applanation (3.6 D) comparisons. We note an exceptionally large difference (close to −5 D) in the LENSTAR-applanation comparison, which belongs to a subject with long AL (about 28 mm). The same subject also produced the positive difference outlier in the LENSTAR-applanation (+1.11 mm difference) and LENSTAR-immersion comparison (+0.78 mm) in Figure 1. Figure 3 shows the Bland–Altman plot for average K in the LENSTAR-IOLMaster and LENSTAR-TOPCON comparisons. The width of the 95% LoA is narrowest in the LENSTAR-IOLMaster comparison (about 0.7 D; 1.6 D for LENSTAR-TOPCON).

Figure 1
figure 1

Bland–Altman plots for assessing agreement of LENSTAR-IOLMaster, LENSTAR-applanation, and LENSTAR-immersion comparisons in AL. (95% LoA for AL difference: LENSTAR-IOLMaster (−0.04, 0.07); LENSTAR-immersion (−0.29, 0.37); LENSTAR-applanation (−0.27, 0.63))

Figure 2
figure 2

Bland–Altman plots for assessing agreement of LENSTAR-IOLMaster, LENSTAR-applanation, and LENSTAR-immersion comparisons in IOL power. When a coordinate has more than one data point, the number of whiskers emanating from a square indicates the number of points stacked together. (95% LoA for IOL power difference: LENSTAR-IOLMaster (−0.45, 0.59); LENSTAR-immersion (−1.56, 1.39); LENSTAR-applanation (−2.35, 1.31))

Figure 3
figure 3

Bland–Altman plots for assessing agreement of LENSTAR-IOLMaster and LENSTAR-TOPCON in average K. (95% LoA: LENSTAR-IOLMaster (−0.44, 0.24); LENSTAR-TOPCON (−0.75, 0.84))

Table 1 shows the proportions of differences falling within three ranges of IOL power in the three comparisons. We note that the LENSTAR-IOLMaster comparison gives the most desirable result, followed by the LENSTAR-immersion and LENSTAR-applanation comparisons. The result of Fisher's exact test for the 3 × 3 table is statistically significant (P-value ≈10−14); there is association between the three paired comparison groups with the proportion of difference measures that indicate clinically important levels of disagreement. The negligible P-value is expected because there are two adjacent cells with zero counts in the LENSTAR-IOLMaster comparison. To see if the association remains after removal of the latter, we performed Fisher's exact test on the 2 × 3 table without the LENSTAR-IOLMaster comparison; the result is still statistically significant (P-value=0.02), thus supporting the ranking implied by casual inspection of the estimated proportions in Table 1.

Table 1 Agreement between methods for IOL power as judged by the estimated proportion of differences falling within clinically acceptable difference ranges from zero (n=142 in all comparisons)

Table 2 gives the estimated proportion of differences falling within clinically acceptable ranges for the three comparisons. The magnitude of discrepancy in the proportion of differences between two comparison groups () informs us about its practical significance. The proportion of AL differences falling within 0.33 mm from zero in the LENSTAR-IOLMaster comparison is 0.24±0.04 more than the LENSTAR-applanation comparison, but only 0.05±0.02 more than the LENSTAR-immersion comparison. However, for the more stringent cut-off of within 0.1 mm, that of LENSTAR-IOLMaster is 0.48±0.04 more than LENSTAR-immersion; and 0.74±0.04 more than LENSTAR-applanation. For the LENSTAR-immersion comparison, the proportion is 0.19±0.04 more than LENSTAR-applanation comparison for the within 0.3 mm category; and 0.26±0.05 more for the within 0.10 mm category. For average K, the proportion in the LENSTAR-IOLMaster comparison is only 0.02±0.02 more than the LENSTAR-TOPCON comparison for the within 1 D category—a practically insignificant result. However, it is 0.10±0.03 more when we consider the within 0.5 D category. For IOL power difference, the proportion of differences falling within 1 D from zero for the LENSTAR-IOLMaster comparison (Table 1) is 0.32±0.04 more than the LENSTAR-applanation comparison, but just 0.18±0.03 more than the LENSTAR-immersion comparison; the same proportion is 0.14±0.05 more in the latter compared with the LENSTAR-applanation comparison.

Table 2 The empirical proportion of differences that fall within clinically acceptable difference ranges for AL (n=146) and average K (n=146 in LENSTAR-IOLMaster; n=147 in LENSTAR-TOPCON)

Discussion

In this study, we have presented estimates of a clinically meaningful quantity: the proportion of differences falling within some clinically acceptable range for AL, IOL power, and average K. We suggest that this approach complements the Bland–Altman plots in assessing method agreement under a clinical context, and leads to a more coherent interpretation of the results of method agreement between LENSTAR and other techniques of biometry. The mean difference parameter is informative in method agreement studies if two comparisons differ substantially in it; otherwise, it is inadequate. For example, two comparisons can both have zero mean difference, but with substantially different SD. Clearly, agreement between methods in the comparison group with the smaller SD is stronger. By considering the proportion of differences falling within some clinically acceptable range (Tables 1 and 2), we have a simple means of judging method agreement that takes into account both the mean difference and SD, although remaining easy to interpret from the subject matter point of view. As far as we know, no authors have stressed on this point when making agreement analysis between LENSTAR and other biometry techniques.

In judging the merit of our present findings, some aspects of the design of the present study need to be considered. Owing to tight clinic schedules, we did not restrict comparisons using only patients with the same degree of cataract severity; neither did we control for potential effects of age, gender, or ethnicity. As media density in lens with cataract affects the speed of ultrasound, we therefore cannot completely rule out bias in the agreement results involving ultrasound biometry. We also did not randomise the order of biometry. In the context of this problem, however, it seems that applanation must be applied last to avoid the possibility of corneal indentation affecting the outcome of other biometry techniques. Immersion was third in the order because we wanted to limit the potential effect of the immersion liquid on measurement variation between techniques. Hence, randomisation of the order of applying LENSTAR and IOLMaster should have been done. However, we think interpretation of our results is not seriously affected by this shortcoming, as the results of LENSTAR-IOLMaster agreement are similar to those of Holzer et al,10 where randomisation of the order of these two devices was done. The lack of repeatability tests is a weakness too, but does not seem critical as repeatability of LENSTAR has been shown to equal to or better than those of IOLMaster and ultrasound biometry.22

Although we did not perform pre and postoperation comparison of IOL power, the prospect of LENSTAR achieving high accuracy in targeted refraction seems high because of its strong agreement with IOLMaster. In a comparative study of 50 cataractous eyes, pre and postoperatively, Packer et al23 found that 92% of them were within 0.50 D and 100% were within 1 D of the targeted refraction; IOL power was computed fromthe IOLMaster using the Holladay II formula. Two other studies7, 24 showed that a high percentage of patients (87 and 91%) assigned to IOLMaster biometry achieved targeted refraction of within 1 D, compared with about 80% in the applanation group. In another study,25 the two figures were even reported to be as high as 96 and 93%, respectively. For immersion biometry, Haigis et al.26 reported that postoperative refraction was correctly predicted within 1 D in 86% of patients and in 99% of patients within 2 D.

Our results reinforce earlier studies10, 11, 12, 13, 14reporting the strong agreement found between LENSTAR and IOLMaster in AL, average K, and IOL power. An exception is a recent study,27 which reported less satisfactory average K agreement between IOLMaster and LENSTAR; the mean difference being 0.67 D, with 95% LoA given by (0.07, 1.20). The same study also reported similar 95% LoA in IOL power and AL for their LENSTAR-IOLMaster and LENSTAR-applanation comparison, which differs from the present findings. We believe this discrepancy is caused by the authors' use of two different values of corneal refractive index—same ones in IOLMaster and their Javal-type keratometer, and different ones between LENSTAR and IOLMaster (see Table 6 in their paper). Consequently, the interpretation of their findings is not straightforward. Indeed, Buckhurst et al.22 cautioned that using the default setting of corneal refractive index used in LENSTAR (1.332) and IOLMaster (1.3375) can lead to a mean difference of magnitude 0.76 D in average K in their study population.

Our study suggests that IOL power disagreement between LENSTAR and applanation and immersion ultrasound biometry is a potential source of error contributing to incidences of postoperative refractive surprise. Aside from AL error, keratometry reading error is one of the most common causes of preoperative IOL power error resulting in postoperative lens exchange.28 As the built-in keratometer in LENSTAR and IOLMaster estimate K1 and K2 differently from those of a manual keratometer,29 some discrepancy is expected. In our study, we found that about nine-tenths of average K differences in the LENSTAR-TOPCON comparison are within 0.5 D from zero (Table 2). Despite the good agreement in AL (95% within 0.33 mm) between LENSTAR and immersion, reduced agreement in average K can reduce the final agreement in IOL power calculated using the SRK/T formula. On the other hand, proper optimisation of lens constants (Holladay 2, Hoffer Q, SRK/T, and Haigis formulae) has been reported30 to improve the accuracy of IOL power calculation when ocular variables are measured from highly myopic and hyperopic eyes using IOLMaster.

One of the clearest benefits of new generation optical biometry techniques like LENSTAR and IOLMaster is the potential to minimise measurement variation due to operator experience. Kielhorn et al.31 showed that both experienced and inexperienced operators of the IOLMaster returned essentially similar measurements for IOL power. In constrast, Goel et al.32 reported that AL measurement variation between expert and non-expert operators for ultrasound biometry was 10 times higher compared with IOLMaster. In overcrowded ophthalmology clinics worldwide, we see clear advantages in adopting optical low-coherence reflectometry (OLCR) or partial coherence interferometry (PCI) devices in consultation sessions. Cataract patients would experience shorter waiting time; there is no risk of contamination and no need for anaesthetic eye drops. These improvements will be well received by patients, as hospital visits are often time consuming, and can be costly and inconvenient if unanticipated side effects occur. The strong agreement between LENSTAR and IOLMaster for the ocular variables considered suggests biometrical equivalence of measurements from both. On the other hand, it seems that the speed difference between them has the potential to translate into considerable time savings over large numbers of patients seen (60 min for every 40 patients seen), if IOLMaster is used. The difference arises because LENSTAR computes AL, IOL power, and K readings simultaneously, hence if outlying measurements are detected in any of those variables, one has to repeat the whole procedure. IOLMaster does not have this problem, as the measurements are scored in a sequential manner. Nevertheless, the ability of LENSTAR to measure additional variables such as central corneal and lens thickness, which is absent in IOLMaster, may be useful to ophthalmologists who need such information.

Taking previous and the present findings into account, we believe the case for supplanting ultrasound biometry with either LENSTAR or IOLMaster for routine use in cataract patient treatment is strong. Indeed, mean absolute error in IOL power prediction has been shown to be small and comparable between LENSTAR and IOLMaster.13 Unfortunately, hospitals in developing countries continue to face fiscal hurdles in procuring these advanced instruments. Consequently, there is a danger that eye care standards remain stagnant in these places whereas eye specialists in advanced countries continue to push standards to a higher level by taking full advantage of accuracy and additional features offered by LENSTAR and IOLMaster. It is time that health policy makers consider investing in these powerful instruments to narrow the gulf in eye care standards between developing and industrialised countries.

Conclusions

AL, average K measurements and IOL power calculations (SRK/T formula) taken from the OLCR device LENSTAR are biometrically equivalent to those of the PCI device IOLMaster, in the sense that interchanging measurements of the same variables has clinically negligible effect. However, this cannot be done between LENSTAR and the applanation and immersion ultrasound biometry without incurring substantial disagreements in the proportion of IOL power differences falling within 1 D from zero.