Early diagnosis of glaucoma is vital for timely treatment of patients. Medical practitioners have proposed a number of criteria for early diagnosis and these criteria mostly focus on or around OD region. If the position, centre, and size of OD is calculated accurately it can greatly help in further automated analysis of the image modality. Rest of this subsection discusses various image processing and machine learning approaches making use of these diagnosis criteria for disc localization and glaucoma identification.
Localization of optic disc
Although optic disc can be spotted manually as a round bright spot in a retinal image, yet performing large scale manual screening can prove to be really tiresome, time consuming, and prone to human fatigue and predisposition. CAD can provide efficient and reliable alternative solution with near human accuracy (as shown in Table
4). Usually the disc is the brightest region in the image. However, if ambient light finds its way into the image while capturing the photo it can look brighter than optic disc. Furthermore, occasionally some shiny reflective areas appear in the fundus image during image capturing. These shiny reflections can also look very bright and mislead a heuristic algorithm in considering them as candidate regions of interest. There are many approaches laid out by researchers for OD localization exploiting different image characteristics. Some of these approaches are briefly covered below.
Intensity variations in the image can help locate optic disc in fundus images. To make use of this variation the image contrast is first improved using some locally adaptive transforms. The appearance of OD is then identified by noticing rapid variation in intensity as the disc has dark blood vessels alongside bright nerve fibres. The image is normalized and average intensity variance is calculated within a window of size roughly equal to expected disc size. The disc centre is marked at the point where the highest intensity is found. Eswaran et al. [
12] used such intensity variation based approach. They applied a 25 × 35 averaging filter with equal weights of 1 on the image to smooth it and get rid of low intensity variations and preserve ROI. Chràstek et al. [
13] used 31 × 31 averaging filter and the ROI is assumed to be 130 × 130 pixels. They used Canny Edge Detector [
14] to plot the edges in the image. To localize the optic disc region they used only green channel of RGB image. Abràmoff et al. [
15] proposed that the optic disc can be selected by taking only top 5% brightest pixels and hue values in the yellow range. The surrounding pixels are then clustered to constitute a candidate region. The clusters which are below a certain threshold are discarded. Liu et al. [
16] used a similar approach. They first divided the image into 8 × 8 pixels grid and selected the block with maximum number of top 5% brightest pixels as the centre of the disc. Nyúl [
17] employed an adaptive thresholding with a window whose size is determined to approximately match the size of the vessel thickness. A mean filter with the large kernel is then used with threshold probing for rough localization.
Another extensively used approach is threshold based localization. A quick look at the retinal image tells that the optic disc is mostly the brightest region in the image. This observation is made and exploited by many including Siddalingaswamy and Prabhu [
18]. It is also noticed that the green channel of RGB has the greatest contrast compared to red and blue channels [
19‐
21], however, red channel has also been used [
22] due to the fact that it has fewer blood vessels that can confuse the rule-based localization algorithm. Optimal threshold is chosen based upon approximation of image histogram. The histogram of the image is gradually scanned from a high intensity value
I1, slowly decreasing the intensity until it reaches a lower value
I2 that produces at least 1000 pixels with the same intensity. It results in a subset of histogram. The optimal threshold is taken as the mean of the two intensities
I1 and
I2. Applying this threshold produces a number of connected candidate regions. The region with the highest number of pixels is taken as the optic disc. Dashtbozorg et al. [
23] used Sliding Band Filter (SBF) [
24] on downsampled versions of high resolution images since SBF is computationally very expensive. They apply this SBF first to a larger region of interest on downsampled image to get a rough localization. The position of this roughly estimated ROI is then used to establish a smaller ROI on original sized image for a second application of SBF. The maximum filter response results in
k-candidates pointing to potential OD regions. They then use a regression algorithm to smooth the disc boundary. Zhang et al. [
25] proposed a fast method to detect optic disc. Three vessel distribution features are used to calculate possible horizontal coordinates of the disc. These features are local vessel density, compactness of the vessels and their uniformity. The vertical coordinates of the disc are calculated using Hough Transform according to the global vessel direction characteristics.
Hough Transform (HT) has been widely utilized to detect OD [
25‐
27] due to disc’s inherent circular shape and bright intensity. The technique is applied to binary images after they have undergone morphological operations to remove noise or reflection of light from ocular fundus that may interfere with the calculation of Hough Circles. The HT maps any point (
x,
y) in the image to a circle in a parameter space that is characterized by centre (
a,
b) and radius
r, and passes through the point (
x,
y) by following the equation of circle. Consequently, the set of all feature points in the binary image are associated with circles that may almost be concentric around a circular shape in the image for some given value of radius
r. This value of
r should be known a priori by experience or experiments. Akyol et al. [
28] presented an automatic method to localize OD from retinal images. They employ keypoint detectors to extract discriminative information about the image and Structural Similarity (SSIM) index for textual analysis. They then used visual dictionary and random forest classifier [
29] to detect the disc location.
Glaucoma classification
Automatic detection and classification of glaucoma has also been widely studied by researcher since long. A brief overview of some of the current works is presented below. For a thorough coverage of glaucoma detection techniques [
30‐
32] may be consulted.
Fuente-Arriaga et al. [
33] proposed measuring blood vessels displacement within the disc for glaucoma detection. They first segment vascular bundle in OD to set a reference point in the temporal side of the cup. Centroid positions of inferior, superior, and nasal vascular bundles are then determined which are used to calculate
L1 distance between centroid and normal position of vascular bundles. They applied their method on a set of 67 images carefully selected for clarity and quality of retina from a private dataset and report 91.34% overall accuracy. Ahmad et al. [
34] and Khan et al. [
35] have used almost similar techniques to detect glaucoma. They calculate CDR and ISNT quadrants and classify an image as glaucomatous if the CDR is greater than 0.5 and it violates ISNT rule. Ahmad et al. applied the method on 80 images taken from DMED dataset, FAU data library, and Messidor dataset and reported 97.5% accuracy whereas Khan et al. used 50 images taken from the above-mentioned datasets and reported 94% accuracy. Though the accuracies reported by the aforementioned researchers are well above 90%, their test images are handpicked and so fewer in number that the results are not statistically significant and cannot be reliably generalized to large scale public datasets.
ORIGA [
36] is a publicly available dataset of 650 retinal fundus images for benchmarking computer aided segmentation and classification. Xu et al. [
37] formulated a reconstruction based method for localizing and classifying optic discs. They generate a codebook by random sampling from manually labelled images. This codebook is then used to calculate OD parameters based on their similarity to the input and their contribution towards reconstruction of input image. They report AUC for glaucoma diagnosis at 0.823. Noting that classification based approaches perform better than segmentation based approaches for glaucoma detection, Li et al. [
38] proposed to integrate local features with holistic feature to improve glaucoma classification. They ran various CNNs like AlexNet, VGG-16 and VGG-19 [
39] and found that combining holistic and local features with AlexNet as classifier gives highest AUC at 0.8384 using 10-fold cross validation, while the manual classification gives AUC equal to 0.8390 on ORIGA dataset. Chen et al. [
6] also used deep convolutional networks based approach for glaucoma classification on ORIGA dataset. Their method inserts micro neural networks within more complex models so that the receptive field has more abstract representation of data. They also make use of a contextualization network to get hierarchal and discriminative representation of images. Their achieved AUC is 0.838 with 99 randomly selected train images and rest for testing. In another of their publications Chen et al. [
5] used a six layer CNN to detect glaucoma from ORIGA images. They used the same strategy of taking 99 random images for training and rest for testing and obtained AUC at 0.831.
Recently, Al-Bander et al. [
40] used deep learning approach to segment optic cup and OD from fundus images. Their segmentation model has a U-Shape architecture inspired from U-Net [
41] with Densely connected convolutional blocks, inspired from DenseNet [
42]. They outperformed state-of-the-art segmentation results on various fundus datasets including ORIGA. For glaucoma diagnosis, however, in spite of combining commonly used vertical CDR with horizontal CDR, they were able to achieve AUC at 0.778 only. Similarly Fu et al. [
43] also proposed a U-Net like architecture for joint segmentation of optic cup and OD and named it M-Net. They added a multi-scale input layer that gets the input image at various scales and gives receptive fields of respective sizes. The main U-shaped convolutional network learns hierarchical representation. The so-called side-output layers generate prediction maps for early layers. These side-output layers not only relieve vanishing gradient problem by back propagating side-output loss directly to the early layers but also help achieve better output by supervising the output maps of each scale. For glaucoma screening on ORIGA data set, they trained their model on 325 images and tested on rest of 325 images. Using vertical CDR of their segmented discs and cups they achieved AUC at 0.851.