SIIM: Deep-learning model aids lumbar spine MRI analysis

PORTLAND, OR – An AI algorithm can assist radiologists in comprehensive analysis of lumbar spine MRI exams, according to research presented at the Society for Imaging Informatics in Medicine (SIIM) annual meeting. 

Presenter Kay Wu, MD, a radiology resident at the University of Toronto, and colleagues from Massachusetts General Hospital, Harvard Medical School, and the Athinoula A. Martinos Center for Biomedical Imaging developed DeepSpine, a deep-learning model that provides level-by-level classification of degenerative disc disease.

“DeepSpine, overall, represents a step toward AI-assisted, standardized spine MRI interpretation and holds potential to enhance diagnostic accuracy, provide standardized interpretation for referring doctors and streamlining workflow for radiologists in order to improve patient care,” Wu said. 

Kay Wu, MD, presents the research group's results with DeepSpine at SIIM 2025.Kay Wu, MD, presents the research group's results with DeepSpine at SIIM 2025.

Low back pain is a major global issue and a significant healthcare and productivity burden. It places a heavy burden on healthcare systems and society, Wu said.

Although spine MRI is the preferred imaging modality for investigating causes of low back pain, this exam is time-consuming for radiologists and is subject to high interreader variability.  

“... manifestations of back pain are multifactorial and commonly occur at multiple levels,” Wu said. “So radiologists have to provide level-by-level finding by pathology and by severity.” 

These studies are subject to high interreader variability, limiting the potential value for disease prognostication and informing patient management, according to Wu. 

“So this is where AI can step in,” she said. 

A few AI models are currently available for spine MRI that reduce interpretation time and improve interobserver agreement. However, they typically focus on a single or narrow set of tasks while radiologists have to assess multiple degenerative changes comprehensively, Wu said. What’s more, these existing models also tend to have been trained on smaller datasets, limiting their generalizability.  

As a result, the team trained DeepSpine to provide a more comprehensive and more generalizable characterization of multiple degenerative spine conditions in order to assist radiologists in MRI spine interpretation and streamlining the reporting process, Wu said. 

DeepSpine was developed using a dataset of 54,739 lumbar spine MRIs from the greater Boston area. Patients had an average age of 58.3 years.

“Overall, DeepSpine was able to automate segmentation and classification across multiple degenerative pathologies in a way that closely mirrors how radiologists assess MRI studies,” she said.

The algorithm demonstrated strong performance across multiple classification tasks, especially in stenosis grading. In testing, DeepSpine produced within-one severity class accuracy and Cohen’s kappa of 97.3% and 0.781, respectively, for left foraminal stenosis, 97.3% and 0.784, respectively for right foraminal stenosis, and 97.6% and 0.797, respectively, for spinal canal stenosis.  

It was also highly accurate for classifying disc bulging and disc osteophyte complex, and performed moderately well for the other pathologies.

DeepSpine classification performance of spine pathology

 

Accuracy 

AUC score 

Disc bulging 

88.9% 

0.866 

Disc osteophyte complex 

97.4% 

0.87

Epidural lipomatosis 

93.7% 

0.71

Left facet arthropathy 

76.5% 

0.638 

Right facet arthropathy 

76.5% 

0.638 

Ligamentum flavum thickening 

78% 

0.667 

In future directions, the researchers plan to optimize model performance by exploring other model architectures. They also want to incorporate additional pathologies and validate its effectiveness in real-world clinical workflows. 

Page 1 of 378
Next Page