How should performance validation be evaluated for rigor and transparency when it comes to U.S. Food and Drug Administration (FDA) 510(k) clearances for software as a medical device (SaMD)?
SaMD performance validation rigor (study quality) and transparency (data accessibility) are distinct, although they are often combined into one concept, according to Abdul Rahman Diab, MD, and William Lotter, PhD, both of the Dana-Farber Cancer Institute in Boston. Based on FDA guidance documents and their own experience developing FDA-cleared devices, Diab and Lotter clarify the current state of SaMD validation standards and map them to radiology AI device types in a paper published September 24 in Radiology: Artificial Intelligence.
Common radiology SaMDs include quantification (CADq), triage (CADt), detection (CADe), diagnosis (CADx), detection/diagnosis (CADe/x), and denoising/reconstruction in the context of acquisition/optimization (CADa/o). To obtain FDA clearance, overarching validation study designs have emerged, but they can vary depending on the use case, Diab and Lotter found.
For these use cases, validation study designs -- from nonclinical validation to prospective validation -- land along a "rigor spectrum."
William Lotter, PhD, of the Dana-Farber Cancer Institute and Harvard Medical School in Boston explains how to apply the rigor spectrum to the current state of radiology AI validation studies.
Answering the question of what evidence of validation is presented to the FDA in support of a device’s marketing authorization can be surprisingly difficult for two key reasons, the researchers explained.
"First, the FDA’s regulatory framework is intentionally flexible," they said. "This flexibility can foster innovation in medical device development, but it has also led to application-specific validation standards that are not explicitly codified, complicating efforts to assess the consistency and quality of the regulatory process from an external perspective."
The challenge is compounded by the data-driven nature of AI models and their potential for frequent updates, they added. Current radiology AI use cases that are designed to assist radiologists largely rely on retrospective validation, though prospective validation has become standard in other domains for devices that autonomously interpret medical images.
Recommendations to improve regulation often treat AI as a homogenous technology, according to Diab and Lotter.
A key challenge arises when devices generate entirely new images for radiologist interpretation, they noted. Some metrics gathered using standalone testing do not fully capture the diagnostic quality of device outputs, especially with denoising/reconstruction devices in computer-assisted acquisition/optimization, they noted.
"Their generative outputs (i.e., new images) are inherently less straightforward to evaluate than the discriminative outputs of other CAD types," the two wrote.
Another issue involves quantitative imaging devices (CADq). FDA guidance implies that validation exclusively using synthetic “phantom” data may be sufficient for regulatory authorization in some circumstances, Lotter said.
"My sense is that the FDA is enforcing, real clinical testing for all these AI-enabled devices," he told AuntMinnie.com.
In the paper, the two wrote that "in practice, we suspect that nearly all AI-enabled CADq devices have used clinical data for validation regardless of subtype, even if this testing is not reported in the associated Summary," and that, in their experience, “the FDA expects validation data to be collected from multiple clinical sites that were not involved with AI model training, and to be representative of the intended clinical population in terms of demographics."
Furthermore, data transparency diminishes with device modifications and subsequent FDA 510(k)s, according to Lotter.
Ultimately, the researchers suggested that five actionable steps will address "key gaps" in both validation rigor and transparency for both SaMD and software in a medical device, or SiMD -- that is, validation of an embedded AI component of an imaging system rather than the validation of the SiMD device as a whole, primarily on the FDA 510(k) pathway.
For added rigor, they recommended the following:
- Remove nonclinical validation option for AI-enabled devices in FDA guidance document for quantitative imaging devices.
- Require reader studies for all devices performing denoising/reconstruction using AI.
- Require prospective clinical studies to validate devices that autonomously interpret radiologic images, while maintaining retrospective studies as an option for assistive devices.
For greater transparency, they recommended the following:
- Adopt mandatory checklists for performance validation reporting in 510(k) summaries.
- Create a third-party performance validation database.
Find the complete paper here.