SIIM: Are foundation models the future for AI in radiology?

Will Morton, Associate Editor, AuntMinnie.com. Headshot

PORTLAND, OR – AI foundation models are booming in health care and hold promise in radiology, yet it will take specific strategies to adapt them to the field, according to a May 20 presentation at the SIIM-ACR Data Science Summit.

Bernardo Bizzo, MD, PhDBernardo Bizzo, MD, PhD

“You’ve all heard a lot about this over the past couple of years, especially as foundational models ramp up, like the GPTs of the world,” said Bernardo Bizzo, MD, PhD, of Mass General Brigham in Boston.

Foundational models are large, general purpose models most commonly based on transformer architectures and trained via self-supervised learning. They often exhibit probabilistic behavior and “emergent capabilities,” he explained.

There are three subtypes, distinguished by the data they have been trained on, Bizzo added:

  • Large language models (text only)
  • Vision language models (text and image)
  • Multimodal language models (text, image, audio, video)

When it comes to adapting the models to domain-specific areas like radiology, there are further two task subtypes: generative AI, in which researchers use a foundational model to create new content from learned patterns (such as radiology report generation) or discriminative AI, in which the models are used as a foundation to perform tasks such as segmenting tumors.

“Basically, that means that these models can be adapted to generate new data. That's the simplest way to put it,” he said.

Next, in terms of strategies for improving the performance of the models, researchers may employ what’s called “zero-shot learning,” which is where developers simply perform a new task without providing additional input, much like a Q&A session. Pros for this approach are that no training of the model is required, while cons are that accuracy may suffer on unfamiliar input formats, Bizzo noted.

A “few-shot learning” approach takes it a step further and “guides the model,” according to Bizzo, with pros to this approach being that adaptation is fast and that no retraining is required. Cons include that the model's performance is sensitive to the quality of the examples provided, he said.

A third technique, “fine-tuning,” is using task-specific labeled data to retrain and adjust the weights of a pretrained foundational model, with pros being that results are usually highly accurate and cons that it requires compute, labeled data, and comes with a risk of “overfitting,” Bizzo explained.

Finally, Bizzo touched on considerations for using open-label foundation models or proprietary models, and he provided examples of both types, specifically those that have been trained on medical data, research articles, and textbooks. Mainly, he noted that if you are thinking about building a product on top of a proprietary model, bear in mind that these models are “closed” – in other words, the training data used to develop the models are not made publicly available.

Proprietary foundation models include those such as Med-Gemini (Google DeepMind), Rad1 (Harrison AI), MARIA 2 (Microsoft Research), Watson.ai (IBM), and Amazon Health AI Models (Amazon Web Services). Open-source models include LlaVA-Rad (UCLA and Stanford), Med-Flamingo (DeepMind), OpenEvidence (Stanford and OpenEvidence.org), and PubMedGPT (Microsoft Research).

“And this list keeps growing. The point here is there there are an increasing number of models that take text and images, and from a radiology perspective, that's where we're going to see a lot more value in the interpretation tasks,” he concluded.

Page 1 of 378
Next Page