An AI deep-learning tool that estimates the malignancy risk of lung nodules showed high cancer detection rates while also reducing false-positive results by almost 40%, according to a study published September 16 in Radiology.
The study findings "offer promising solutions, but robust validation is essential," said lead author and doctoral candidate Noa Antonissen, MD, of Radboud University Medical Center in Nijmegen, the Netherlands, in an RSNA statement.
"AI accounts for factors that we might not even see on the CT scan to further assess a nodule as likely to be malignant," she said.
Lung cancer causes the most cancer-related deaths around the world, the group explained, and screening those at high risk of the disease with low-dose chest CT (LDCT) has been shown to reduce lung cancer mortality. The problem is that some screening trials have reported high false-positive rates -- which can lead to unnecessary follow-up procedures, patient anxiety, and higher healthcare costs, it noted. To make things trickier, pulmonary nodules found on LDCT are common, but determining which are malignant can be a challenge.
Current lung cancer screening protocols rely on nodule size, type, and growth to estimate malignancy risk, and there are models such as the Pan-Canadian Early Detection of Lung Cancer (PanCan) that then help distinguish risk via combinations of patient and nodule characteristics. But deep learning could further refine the task of identifying malignant nodules by using data-driven predictions, according to the investigators.
Antonissen and colleagues trained an in-house deep learning AI algorithm to estimate risk for malignancy for lung nodules using data from the National Lung Screening Trial (NLST), which included 16,077 nodules (1,249 malignant). They conducted external testing using baseline CT scans from the Danish Lung Cancer Screening Trial, the Multicentric Italian Lung Detection trial and the Dutch–Belgian NELSON trial, ending with a pooled cohort of 4,146 participants (median age 58 years, 78% male, median smoking history 38 pack-years) with 7,614 benign and 180 malignant nodules.
They then assessed the algorithm's performance for the pooled cohort and two subsets of participants with indeterminate nodules (5 mm to 15 mm) and malignant nodules size-matched to benign ones, choosing the 5 mm to 15 mm size set "due to their diagnostic challenges and frequent need for short-term follow-up," Antonissen said. "Accurate risk classification of these nodules could reduce unnecessary procedures." The team also compared the algorithm's performance against the PanCan model at nodule and participant levels using the area under the receiver operating characteristic curve (AUC).
The investigators reported the following:
AUCs of deep-learning algorithm compared with PanCan model for predicting lung nodule malignancy | ||
Time frame | PanCan | Deep-learning AI algorithm |
Pooled cohort | ||
1 year | 0.98 | 0.98 |
2 years | 0.94 | 0.96 |
Throughout screening | 0.93 | 0.94 |
Indeterminate nodules | ||
1 year | 0.91 | 0.95 |
2 years | 0.88 | 0.94 |
Throughout screening | 0.86 | 0.9 |
The group also found that for cancers size-matched to benign nodules, the deep-learning model's AUC was 0.79 versus PanCan's at 0.6, and that at 100% sensitivity for cancers diagnosed within one year, the deep-learning model classified 68.1% of benign cases as low risk compared to 47.4% using the PanCan model -- representing a 39.4% relative reduction in false positives.
Low-dose CT images show examples of screen-detected pulmonary nodules (arrows) where the deep-learning algorithm provides a more accurate malignancy risk estimation than the Pan-Canadian Early Detection of Lung Cancer (PanCan) model on axial (top), coronal (middle), and sagittal (bottom) planes. (A) Image shows a 9.7-mm malignant nodule (arrows) with a high deep-learning risk score (32.3%) and low PanCan risk score (3.2%) in a 74-year-old male participant diagnosed with squamous cell carcinoma. (B) Image shows a 6.8-mm malignant nodule (arrows) with a high deep-learning risk score (15.9%) and low PanCan risk score (1.2%) in a 71-year-old male participant diagnosed with adenocarcinoma. (C) Image shows a 19 mm benign nodule (arrows) with a low deep-learning risk score (4.7%) and high PanCan risk score (32.7%) in a 50-year-old female participant. Additional PanCan input features used in the model were retrieved from original trial records, as follows: (A) negative for family history of lung cancer, negative for emphysema, negative for spiculation, negative for upper lobe location, nodule count: four; (B) negative for family history of lung cancer, negative for emphysema, negative for spiculation, negative for upper lobe location, nodule count: two; (C) negative for family history of lung cancer, positive for emphysema, negative for spiculation, positive for upper lobe location, nodule count: one. Images and caption courtesy of the RSNA.
"Deep-learning algorithms can assist radiologists in deciding whether follow-up imaging is needed, but prospective validation is required to determine the clinical applicability of these tools and to guide their implementation in practice," Antonissen said. "Reducing false positive results will make lung cancer screening more feasible."
The complete study can be found here.