Automated classification of chest X-rays as normal/abnormal using a high sensitivity deep learning algorithm

In: <a href="https://caring-research.com/category/publications/artificial-intelligence/">Artificial Intelligence , Conference Oral , Publications
Comments Of: <a href="https://caring-research.com/automated-classification-of-chest-x-rays-as-normal-abnormal-using-a-high-sensitivity-deep-learning-algorithm/">Jan 21, 2019
By: Caring-Research

Oral Presentation at the European Congress of Radiology, Vienna, 2019

Purpose

Majority of Chest X-rays (CXRs) performed globally are normal and radiologists spend significant time ruling out these scans. We present a Deep Learning (DL) model trained for the specific use of classifying CXRs into normal and abnormal, potentially reducing time and cost associated with reporting normal studies.

Methods and Materials

A DL algorithm trained on 1,150,084 CXRs and their corresponding reports was developed. A retrospectively acquired independent test set of 430 CXRs (285 abnormal, 145 normal) was analysed by the algorithm, classifying each X-ray as normal or abnormal. Ground truth for the independent test set was established by a sub-specialist chest radiologist with 8 years' experience by reviewing every Chest X-ray image with reference to the existing report. Algorithm output was compared against ground truth and summary statistics were calculated.

Results

The algorithm correctly classified 376 (87.44%) CXRs with a sensitivity of 97.19% (95% CI -94.54% to 98.78%) and specificity of 68.28% (95% CI -60.04% to 75.75%). There were 46 (10.70%) false positives and 8 (1.86%) false negatives (FNs). Out of the 8 FNs, 3 were designated as clinically insignificant (mild, inactive fibrosis) and 5 as significant (rib fractures, pneumothorax).

Conclusion

High-sensitivity DL algorithms can potentially be deployed for primary read of CXRs enabling radiologists to spend appropriate time on abnormal cases, saving time and thereby cost of reporting CXRs, especially in non-emergency situations. More in-depth prospective trials are required to ascertain the overall impact of such algorithms.