Publicly Available Datasets for DR
The initial dataset for the development of AI for DR screening should be sufficiently large so that the algorithms can achieve robust performance in the in silico evaluation phase. Then, the algorithms can further be tuned to be more precise and generalizable with training data that is diverse in terms of patients’ demographics and ethnicity, image acquisition methods, and image qualities. Most open datasets were developed under the concept that the lack of large publicly available datasets to train the DL models with high-quality images is a reason for the present barriers in the development and application of automated DR detection programs in clinical practices. Public datasets provide researchers with invaluable information for use; several of such databases are shown in Table
1.
Table 1
Comparison of publically available datasets of diabetic retinopathy
Messidor | 1200 (660:540) | France | N/A | 66.7% | Topcon TRC NW6 | 45 | 1440 × 960, 2240 × 1488, 2304 × 1536 |
Messidor-2 | 1,748 (528:1217) | France | N/A | N/A | Topcon TRC NW6 | 45 | Various |
DIARETDB0 | 130 (110:20) | Finland | N/A | N/A | N/A | 50 | 1500 × 1152 |
DIARETDB1 | 89 (84:5) | Finland | N/A | N/A | N/A | 50 | 1500 × 1152 |
EyePACS | | USA | ICDR | N/A | Various | N/A | Various |
IDRiD | 516 (168:348) | India | ICDR | 100% | Kowa VX 0 10a | 50 | 4288 × 2848 |
APTOS 2019 | 5590 | India | ICDR | N/A | Various | N/A | N/A |
DDR | 13,673 (6256:6266) | China | ICDR | N/A | Various | 45 | N/A |
The Messidor database was created within the context of the Messidor project to facilitate studies on computer-assisted diagnoses of DR and has been available for public use since 2008. It includes not only the images but also the diagnoses of DR severity and risk of macular edema provided by medical experts; however, no annotations are provided [
15].
DIARETDB0 and DIARETDB1 are public databases provided by Kauppi et al. [
16,
17] with the purpose of creating a unified framework for evaluating and comparing automatic DR detection methods. Images were captured with unknown camera settings, which the creators claim to correspond with practical situations. Together with images, the datasets include the “ground truth,” which are annotations provided by four medical professionals who are experts in medical education and ophthalmology. These four experts marked retinal areas containing microaneurysms, hemorrhages, and exudates in the images [
16,
17]. The annotations ensure that the extracted DR findings are at the same location as those marked by the experts. The DIARETDB datasets were used for the development of an AI algorithm for automated segmentation and detection of retinal hemorrhages in retinal photographs [
18].
EyePACS is a telescreening program that collects retinal photographs from many primary care clinics within the context of the EyePACS telescreening program for DR. This dataset comprises retinal photographs from both mydriatic and non-mydriatic conditions. DR severity grading for this dataset is provided based on the reading by trained graders.
The Asia–Pacific Teleophthalmology Society (APTOS) 2019 Challenge is another database. It is available at the APTOS 2019 Blindness Detection website, where retinal photographs were collected by the Aravind Eye Hospital in India. The images were gathered under different conditions and clinical environments and later labeled by trained ophthalmologists. Another public dataset from India is the IDRiD dataset from an ophthalmology clinic in Nanded. In addition to including data on DR severity and presence of DME, this dataset also provides annotations on DR lesions and the optic disc [
19].
China’s public dataset, the DDR, provides three types of annotations, including image-level DR grading annotations, pixel-level annotations, and bounding-box annotations of lesions associated with DR, all labeled by ophthalmologists [
20].
Retrospective Validation Studies from ML to DL
Table
2 summarizes selected studies on the retrospective validation of available AI models for DR screening. It is evident that the early studies using ML in the earlier versions (Retinalyze, Retmaker, EyeArt v1.2, and IDP in the period 2003–2015) could achieve high sensitivity, from 91.0% to 96.7%, but their specificities were relatively lower, from 51.7 to 71.6%. It was when DL was applied to IDX-DR X2.1 (new version of IDP; see Table
2) in 2016 that the retrospective validation for screening of rDR by IDX-DR could achieve robust sensitivity and specificity [
21].
Table 2
Retrospective validation studies of artificial intelligence in diabetic retinopathy
| Retinalyze | ML | WCDRS | 400 | DR/no DR | 96.7% | 71.4% | N/A | N/A | 0.903 |
| Retinalyze | ML | Steno Diabetes Center, Denmark | 365 | DR/no DR | 93.1% | 71.6% | N/A | N/A | 0.936 |
| Retmarker | DL | ARS-Centro | 21,544 | DR/no DR | 96.1% (94.4–97.9) | 51.7% (50.3–53.1) | N/A | N/A | 0.849 |
| IDP | ML | Messidor-2 | 1748 | rDR | 96.8% (94.4–99.3) | 59.4% (55.7–63.0) | 39.8% (35.2–44.3) | 98.5% (97.4–99.7) | 0.937 (0.916–0.959) |
| IDP | ML | Nakuru Eye Study | 6788 | DR/no DR | 91.0% (88.0–93.4) | 69.9% (68.3–71.6) | 32.1% (29.6–34.7) | 98.0% (97.4–98.6) | 0.878 (0.850–0.905) |
| IDx-DR X2.1 | DL | Messidor-2 | 1748 | rDR | 96.8% (93.3–98.8) | 87.0% (84.2–89.4) | 67.4% (61.5–72.9) | 99.0% (97.8–99.6) | 0.980 (0.968–0.992) |
VTDR | 100% (96.1–100) | 90.8% (88.5–92.7) | 56.4% (48.4–64.1) | 100% (99.5–100) | 0.989 (0.984–0.994) |
| EyeArt v1.2 | ML | EyePACS | 40,542 | rDR | 90.0% (88.0–92.0) | 63.2% (61.7–64.6) | N/A | N/A | 0.879 (0.865–0.893) |
| ARDA | DL | Messidor-2 | 1748 | rDR | 87.0% (81.1–91.0) | 98.5% (97.7–99.1) | N/A | N/A | 0.990 (0.986–0.995) |
EyePACS-1 | 9963 | rDR | 90.3% (87.5–92.7) | 98.1% (97.8–98.5) | N/A | N/A | 0.991 (0.988–0.993) |
| SELENA+ | DL | SIDRP | 71,896 | rDR | 90.5% (87.3–93.0) | 91.2% (88.0–93.6) | N/A | N/A | 0.936 (0.925–0.943) |
| | | VTDR | 100% (94.1–100) | 91.1% (90.7–91.4) | N/A | N/A | 0.958 (0.956–0.961) |
| DLA | DL | LabelMe, China | 35,201 | rDR | 92.5% | 98.5% | N/A | N/A | 0.955 |
| EyeArt v2.0 | DL | EyePACS | 850,908 | rDR | 91.3% (90.9–91.7) | 91.1% (90.9–91.3) | 72.5% (71.9–73.0) | 97.6% (97.5–97.7) | 0.965 (0.963–0.966) |
Ruamviboonsuk et al. [ 31] | ARDA | DL | Thailand’s national screening program | 25,326 | rDR | 96.8% (89.3–99.3) | 95.6% (98.3–98.7) | N/A | N/A | 0.987 (0.977–0.995) |
Grzybowski and Brona [ 36] | Retinalyze strategy 1 | ML | Poznan, Poland | 680 | rDR | 89.7% (78.8–96.1) | 71.8% (62.4–80.0) | 62.7% | 92.9% | 0.807 |
Retinalyze strategy 2 | ML | Poznan, Poland | 680 | rDR | 74.1% (61.0–84.7) | 93.6% (87.3–97.4) | 86.0% | 87.3% | 0.839 |
IDx-DR | DL | Poznan, Poland | 680 | rDR | 93.3% (83.8–98.2) | 95.5% (89.7–98.5) | 91.8% | 96.3% | 0.944 |
In the era of ML, prior to DL, three ML algorithms for DR screening, iGradingM, EyeArt v1.2, and RetMarker, were validated in one study in > 20,000 patients with DM in the National Health System (NHS) Diabetic Eye Screening Programme (DESP) in the UK. The investigators found that iGradingM classified the photographs as either having retinopathy or ungradable, which limited further analysis. The authors reported the comparable sensitivity of EyeArt and Retmarker to be 93.8% and 85.0% for detecting rDR, and 99.6% and 97.9% for detecting PDR. The sensitivity and false-positive rates for EyeArt were not affected by ethnicity, gender, nor camera type, but the values did decline with increasing patient age, whereas the performance of Retmarker was affected by patient age, ethnicity, and camera type [
22].
Retmarker is an automated DR screening algorithm using ML that was developed in Portugal. This ML system extracts features, such as the presence of microaneurysms, and provides an output as disease or no disease. It also provides a co-registration component which combines images from two visits and compares them at the same location of the retina using the retinal vascular tree as landmarks; this allows the ML algorithm to be able to provide microaneurysm turnover rates [
23]. The algorithm can be applied in the treatment of clinically significant macular edema (CSME) with intravitreal injections of ranibizumab by demonstrating a decrease in the absolute number of microaneurysms after the treatment [
24]. Retmarker was found to be able to lower the grading burden by 48.8% in the DR screening program in Portugal [
25].
Retinalyze is another ML-based algorithm using feature extraction to detect DR. The sensitivity and specificity of this algorithm for detecting DR or no DR was found to be approximately 95% and 71%, respectively, for validation under mydriasis [
26] in populations of patients with DM from the Welsh Community DR study in the UK [
27] and Steno Diabetes Center in Denmark [
28].
IDx-DR (Digital Diagnostics, Coralville, IA, USA) was initially developed at the University of Iowa as an ML algorithm and given the name of the Iowa Detection Program (IDP). The later version of IDP, IDx-DR, became a combination of convolutional neural networks (CNN) and DL enhancement. The model detects different types of DR lesions and assesses image quality and imageability ratings. Validation of the IDP on the Messidor-2 dataset, which includes Caucasian populations, achieved a sensitivity of 96.8% and specificity of 59.4% in detecting rDR [
29]. The IDP was later validated on a population from the Nakuru Study in Kenya and achieved a comparable sensitivity of 91.0% and specificity of 69.9%, implying that race might not affect its performance [
30]. IDx-DR X2.1, the newer version, which is enhanced by DL components, was validated on the same Messidor-2 dataset but achieved the higher sensitivity of 96.8% and higher specificity of 87.0% for rDR, and a sensitivity of 100.0% and a specificity of 90.8% for VTDR [
21].
ARDA (Automated Retinal Disease Assessment) is a DL algorithm developed by Verily Life Sciences LLC (South San Francisco, CA, USA). This algorithm was developed using datasets of approximately 130,000 retinal photographs of patients with DM from the USA and India. The initial validation of this algorithm was on both the Messidor-2 and EyePACS-1 datasets, with approximately 10,000 photographs, following which the algorithm was retrospectively validated on retinal photographs of patients from the national registry of diabetic patients in Thailand, which is a distinct dataset. In this validation, which was a comparison with human graders, the algorithm demonstrated a sensitivity of 96.8%, which was higher than that of the human graders (approx. 74%), while the specificity was comparable (96–97%) [
31].
The Singapore Eye Research Institute and the National University Singapore developed a DL system called SELENA+ to detect rDR. These algorithms were developed to have capabilities to detect a number of other vision-threatening eye diseases, such as glaucoma and age-related macular degeneration (AMD) from retinal photographs. The system was validated from datasets of > 70,000 photographs from ten countries and various races. The investigators of SELENA+ proposed two different scenarios of how AI could be integrated into clinical screening programs. The first scenario was the fully automated model, in which there were no human assistants; a sensitivity of 93.0% and a specificity of 77.5% was found in detecting referable cases, including glaucoma and AMD. The second scenario was a semi-automated model, in which there were human assistants working with AI. In this scenario, the specificity increased to 99.5%, with the sensitivity remaining relatively similar at 91.3% [
32].
EyeArt v2.1 (Eyenuk Inc., Los Angeles, CA, USA) is another DL model developed for detecting rDR as a fully automated DR screening system that combines novel morphological image analysis with DL techniques. The earlier version of EyeArt, which was ML-based, was validated on 5084 diabetic patients from EyePACS and on another set of 40,542 images from an independent EyePACS dataset, achieving a sensitivity of 90.0% and specificity of 63.2% [
33]. The REVERE 100k study demonstrated that EyeArt v2.1, which was DL-based, could achieve a higher sensitivity (91.3%) and higher specificity (91.1%) that were neither affected by patient ethnicity, gender, nor camera type in the validation of photographs of > 800,000 patients from the routine DR screening protocol of EyePACS [
34].
A DL algorithm in China was developed from 71,043 de-identified retinal photographs from a web-based platform, LabelMe (Guangzhou, China). The photographs were provided by 36 ophthalmology departments, optometry clinics, and screening units in China. External validation of this algorithm was performed using over 35,000 images from population-based cohorts of Malaysians, Caucasian Australians, and Indigenous Australians. The sensitivity and specificity were found to be 92.5% and 98.5%, respectively [
35].
A recent study compared the performance of Retinalyze 1 and 2 and IDx-DR v2.1 for detecting rDR in retinal images captured without mydriasis in the same group of patients with DM. Both automated algorithms were able to analyze most of the images. However, the sensitivities and specificities of Retinalyze (89.7% and 71.8%, respectively, for version 1; 74.1% and 93.6%, respectively, for version 2) were lower than those of IDX-DR (93.3% and 95.5%, respectively). The investigators noted that Retinalyze’s ability to annotate images is helpful for human verification, but concluded that the algorithm could not be used for diagnosing patients without direct clinician oversight [
36].
Another retrospective validation study compared seven different DL algorithms for detecting rDR. The investigators found that most of the algorithms performed no better than human graders. The sensitivities varied widely (51.0–85.9%) although high negative predictive values (82.7–93.7%) were observed. Interestingly, one algorithm was significantly worse than human graders and would miss up to one fourth of advanced retinopathy cases (72.4% sensitivity for PDR), a limitation which could potentially lead to vision loss [
37].
Prospective Validation Studies of DL
One of the first prospective validation studies on DL for DR screening was the trial for IDX-DR. This pivotal trial was then submitted for U.S. Food and Drug Administration (FDA) approval on the AI model. In this study, in which photographs of the Early Treatment Diabetic Retinopathy Study (ETDRS) were used as standards, IDx-DR was prospectively validated in primary care units in the USA and found to have a sensitivity of 87.2% and specificity of 90.7%, both of which were higher than the pre-specified superiority endpoints of a sensitivity of 85% and specificity of 82.5% [
38]. However, these diagnostic parameters were lower than those found in the retrospective validation on IDX-DR (approx. 97% sensitivity and approx. 87% specificity) [
21]. IDX-DR was then prospectively validated in the Hoorn Diabetes Care System in the Netherlands, for detecting rDR and VTDR based on two DR classification systems, the International Clinical Diabetic Retinopathy Severity Scale (ICDR) and EURODIAB. The sensitivity of IDX-DR to detect VTDR was approximately 60% for both classifications, but its sensitivity to detect rDR was 91% when based on the EURODIAB and 68% when based on the ICDR. This discrepancy may have arisen from the different scores in different classification systems [
39].
There have been a few prospective validation studies on EyeArt, another autonomous AI model approved by the U.S. FDA for DR screening. A large prospective validation study was conducted in > 30,000 patients in the NHS DESP in the UK using EyeArt v2.1 with DL incorporation. In the combined manual grading with EyeArt, this system achieved a sensitivity of 95.7% and a specificity of 54.0% for triaging rDR [
40]. Another prospective study validated EyeArt in multiple primary care centers in the USA for the detection of more-than-mild DR (mtmDR) and VTDR; the referral rate in this study was found to be 31.3%. In the comparison on 2-field non-dilated retinal photographs with the standard dilated 4-wide-field stereoscopic photographs as standards, the investigators in this study found that 12.5% of images were classified as ungradable; however, this prevalence dropped to 2.7% under the dilate-if-needed protocol. Although imageability increased with pupillary dilation, EyeArt achieved similar sensitivities and specificities for detecting both mtmDR (95.5% sensitivity and 85.3% specificity) and VTDR (95.2% sensitivity and 89.5% specificity) in both the non-dilated and dilate-if-needed protocol [
41].
In the prospective validation of SELENA+ in diabetic patients who attended the mobile screening in Zambia, upon grading 4504 images, the DL achieved a sensitivity of 92.3% and 99.4% for rDR and VTDR, respectively. This algorithm, which was developed in Singapore, showed excellent generalizability for patients from different races in both the retrospective and prospective validations [
42].
Another prospective validation study was conducted for ARDA in the existing workflow of nine primary care centers in Thailand’s national DR screening program. ARDA was still able to achieve 91.4% sensitivity and 95.4% specificity for detecting VTDR. This prospective study applied a semi-automated approach using local retinal specialists to overread the results of DL. Various fundus cameras were used according to routine practice in the primary care sites; DL performance across the different camera models was not affected [
43].
Prospective studies have been carried out on ML-based Retmarker to analyze microaneurysm turnover rates on retinal photographs as a biomarker of DR progression. The investigators found that higher microaneurysm turnover at the macula correlated with earlier development of CSME [
44,
45]. Changes in microaneurysms in patients with mild NPDR in the first year were observed to be associated with the development of VTDR over 5 years [
46].
The DL algorithm developed by Li et al. in China, which was previously trained with images collected from the Chinese dataset and later validated on populations of Indigenous Australians and Caucasians in Australia, was prospectively evaluated in Aboriginal Medical Services of Australian Healthcare settings. A sensitivity of 96.9% and specificity of 87.7% was found for detecting rDR in this population. In addition to the performance of the DL algorithm, this study also investigated the experience and acceptance of automated screening from patients, clinicians, and organizational stakeholders involved and found high acceptance for AI [
47].
Another DL for automated DR screening, Voxel Cloud, was validated in > 150 diabetes centers in a screening program in China after training with private retinal image database comprising > 140,000 images and tested on both public and private datasets (one of them being the APTOS 2019 Blindness Detection dataset). This algorithm was able to achieve a sensitivity of 83.3% and a specificity of 92.5% for detecting rDR from 31,498 images [
48].
AI Analysis on Smartphone Photographs
A few studies have been performed on retinal photographs taken using smartphone cameras for analysis by AI. These platforms were not handheld but were smartphones attached to desktop cameras for use in low-resource settings. All published studies validating AI with smartphone photography were prospective since the retinal photographs were captured in real-time and analyses were made at the point of care. A pilot study conducted in India used the Remidio Fundus on Phone (FOP) application to capture retinal images after the pupils were dilated and graded by the offline EyeArt algorithm. This study achieved a sensitivity and specificity of 95.8% and 80.2%, respectively, for any DR, and a sensitivity and specificity of 99.1% and 80.4%, respectively, for VTDR [
49]. Another smartphone-based AI for DR screening also conducted in India also used the Remidio FOP for image capture but used an offline automated analysis by Medios AI. The sensitivity and specificity for detecting rDR of this system was 100.0% and 88.4%, respectively [
50]. Another study conducted in the USA used EyeArt for automated analysis on retinal photographs retrieved from a smartphone-based RetinaScope camera; the sensitivity and specificity for detecting rDR was 77.8% and 71.5%, respectively [
51].
In summary, for these retrospective and prospective studies, including the studies on retinal photographs from smartphones presented in Tables
2 and
3, numerous algorithms have been developed and validated, both retrospectively and prospectively, with differences in approach, dataset, camera, images, or the level of DR in detection. The overall performances of AI was high, although performances were generally lower in some prospective validation studies.
Table 3
Prospective validation studies of AI in DR
Conventional retinal photography cameras |
iGrading | | Primary care. Scotland | 6722 | As needed | Scottish DR Grading Scheme | Any DR | 90.5% (89.3–91.6) | 67.4% (66.0–68.8) | N/A | N/A |
| Primary care, Spain | 5253 | Yes | MA detection | Any DR | 94.52% (92.6–96.5) | 68.77% (67.2–70.4) | 34.1% (31.7–36.5) | 98.66% (89.2–99.2) |
Bosch | | India | 560 | No | AAO PPP 2019 | Any DR | 91.2% (86.4–94.7) | 96.9% (94.5–98.5) | 94.4% (90.4–96.8) | 95.0% (92.5–96.8) |
IDx-DR X2.1 | | Ten primary care sites, USA | 819 | As needed | ETDRS | mtmDR | 87.2% (81.8–91.2) | 90.7% (88.3–92.7) | N/A | N/A |
rDR | 99.3% (96.1–99.9) | 68.8% (61.5–76.2) | 74.6% (68.4–80.8) | 99.1% (97.2–100) |
VTDR | 99.1% (95.1–99.9) | 80.4% (73.9–85.9) | 75.3% (68.4–82.3) | 99.3% (96.3–100) |
IDx-DR 2.0 | van der Heijden et al. [ 39] | Hoorn DCS center, The Netherlands | 898 | As needed | ICDR | rDR | 68% (56–79) | 86% (84–88) | 30% (24–38) | 97% (95–98) |
ICDR | VTDR | 62% (32–85) | 95% (93–96) | 14% (7–27) | 99% (99–100) |
EURODIAB | rDR | 91% (69–98) | 84% (81–96) | 12% (8–18) | 100% (99–100) |
EURODIAB | VTDR | 64% (36–86) | 95% (93–96) | 16% (8–29) | 99% (99–100) |
SELENA + | | Zambia, Africa | 1574 | N/A | ICDR | rDR | 92.3% (90.1–94.1) | 89.0% (87.9–90.3) | N/A | N/A |
VTDR | 99.4% (99.2–99.7) | N/A | N/A | N/A |
DME | 97.2% (96.6–97.8) | N/A | N/A | N/A |
VoxelCloud Retina | | Nationwide DR screening, China | 15,805 | No | ICDR | rDR | 83.3% (81.9–84.6) | 92.5% (92.1–92.9) | 61.8% (60.3–63.3) | 97.4% (97.2–97.7) |
DLA | | Australia | 203 | No | NHS | rDR | 96.9% (83.8–99.9) | 87.7% (81.8–92.2) | 59.6% (45.1–73.0) | 99.3% (96.4–100) |
ARDA | Ruamviboonsuk et al. [ 43] | Nationwide DR screening, Thailand | 7651 | As needed | ICDR | VTDR | 91.3% (85.1–97.4) | 96.3% (95.1–97.4) | 79.2% (73.8–84.3) | 95.5% (92.8–97.9) |
EyeArt v2.1 | | NHS DESP, UK | 30,405 | N/A | ETDRS | rDR | 95.7% (94.8–96.5) | 54.0% (53.4–54.5) | N/A | N/A |
EyeArt v2.1 | | Multicenter, USA | 893 | No | ETDRS | mtmDR | 95.5% (92.4–98.5) | 85.0% (82.6–87.4) | 59.5% (53.9–63.9) | 98.8% (98.2–99.4) |
Yes | mtmDR | 95.5% (92.6–98.4) | 85.3% (83.0–87.5) | 59.1% (53.8–64.4) | 98.8% (98.2–99.5) |
No | VTDR | 95.1% (90.1–100) | 89.0% (87.0–91.1) | 26.7% (19.5–33.0) | 99.8% (99.5–100) |
Yes | VTDR | 95.2% (90.4–100) | 89.5% (87.6–91.4) | 26.1% (19.6–32.6) | 99.8% (99.5–100) |
Smartphone-based retinal photography |
EyeArt v2.1 (FOP) | | Tertiary care hospital, India | 301 | Yes | ICDR | Any DR | 95.8% (92.9–98.7) | 80.2% (72.6–87.8) | 89.7% (85.5–93.8) | 91.4% (85.7–97.1) |
EyeArt v2.0 (FOP) | | Two university hospitals, USA | 72 | Yes | ICDR | rDR | 77.8% (67.3–85.7) | 71.5% (48.7–86.9) | N/A | N/A |
Remidio | | Dispensaries in Mumbai, India | 213 | Yes | ICDR | Any DR | 85.2% (66.3–95.8) | 92.0% (97.1–95.4) | N/A | N/A |
rDR | 100% (78.2–100) | 88.4% (83.2–92.5) | N/A | N/A |
Different Types of Retinal Photographs
Of all the retinal imaging modalities, color retinal photographs acquired from conventional retinal cameras using simple flash camera technology are ubiquitous and practical for screening DR in primary care settings, particularly in remote or underserved areas. Another type of retinal camera which is gaining in popularity recently uses white light-emitting diodes (LED) combined with confocal scanning laser technology and enhanced color fidelity [
52]. These white LED cameras provide color retinal photographs which differ in appearance from those obtained with conventional cameras. The differences between the photographs obtained from these two types of cameras appear to be the image viewing angle (45° for the conventional cameras and 60° for the LED cameras), the image resolutions, and color discriminations [
52].
In one study, EyeArt V.2.1.0 was used to analyze retinal photographs obtained from both conventional cameras and the white LED cameras in 1257 patients with DM from the UK NHS DESP. The authors found that the diagnostic accuracy of detecting any retinopathy and PDR from the photographs obtained from both cameras was similar at 92.3% sensitivity and 100.0% specificity using human grading as the standard [
53]. In the published article, it was not clear whether EyeART applied the same algorithm for the analysis of photographs from both cameras, and the authors stated that EyeART was not optimized for photographs from the LED cameras and also that reference patterns might not be properly recognized. Wongchaisuwat et al. performed a study in which they had to apply separate DL algorithms, one for the conventional photographs and another for the photographs from white LED cameras, to prospectively validate DL on photographs from both cameras for DR screening. These investigators achieved a sensitivity of 82% and 89%, and specificity of 92% and 84%, for conventional and LED photographs, respectively, to detect rDR using retinal examination from retinal specialists as the standard [
54].
It would be an ideal situation if an AI model would be able to analyze the different domains of retinal photographs from conventional cameras and white LED cameras and achieve a robust performance. The effect of different domains of images on AI performance may be more pronounced for segmentation of optical coherence tomography (OCT) images. There has been an attempt to develop AI models that can perform well on OCT images from different domains of different manufacturers [
55].