Skip to main content
  • Deep Learning Algorithm for Glaucoma Diagnosis

    By Arthur Stone
    Selected By: Richard K. Parrish II, MD

    Journal Highlights

    American Journal of Ophthalmology, March 2020

    Download PDF

    Jammal et al. compared diagnostic performance of human graders to predictions generated by a machine-to-machine deep learning algorithm trained to quantify retinal nerve fiber layer (RNFL) damage.

    The algorithm was trained with RNFL thickness parameters from spec­tral-domain optical coherence tomog­raphy. It was then applied to a subset of 490 fundus photos of 490 eyes (370 subjects) that had been graded by two glaucoma specialists for the probabil­ity of glaucomatous optic neuropathy (GON) and estimates of cup-to-disc (C/D) ratios. Spearman correlations with standard automated perimetry (SAP) global indices were compared between the human gradings and the RNFL thickness values predicted by the algorithm. The area under the receiver operating characteristic curves (AUC) and partial AUC for the region of clinically meaningful specificity (85%-100%) were used to compare the ability of each output to discriminate eyes with repeatable glaucomatous SAP defects versus eyes with normal fields.

    The algorithm-predicted RNFL thickness had a significantly stronger absolute correlation with SAP mean deviation (rho = 0.54) than the proba­bility of GON given by human graders (rho = 0.48; p < .001). The partial AUC for the algorithm was significantly higher than that for the probability of GON by human graders (partial AUC = 0.529 vs. 0.411, respectively; p = .016).

    The researchers concluded that the algorithm outperformed human grad­ers in detecting signs of glaucomatous visual field loss on fundus photographs. They pointed out that the algorithm provides objective and quantitative assessment of neural damage that potentially could be used for glaucoma diagnosis and screening, thus avoiding the biases and labor of human subjec­tive gradings. However, they advised that further refinement is desirable before the algorithm can be applied to either clinical or screening settings.

    The original article can be found here.