ChatGPT shows potential in generating board-style practice questions for radiology resident education, according to research published July 7 in Academic Radiology.
The finding is from a study in which the large language model produced multiple-choice questions (MCQs) that matched the quality of questions written by residents, noted lead author Aaron Zheng, MD, of the University of Pittsburgh, and colleagues.
“We found no statistically significant difference in perceived MCQ quality as rated by radiology resident physicians between ChatGPT-generated MCQs and attending radiologist-written MCQs,” the group wrote.
At the University of Pittsburgh, after presenting a didactic resident lecture, attending radiologists are asked to submit two to three MCQs summarizing the key concepts of their presentation, the authors explained. At the end of every second month, a senior resident then presents these questions to the other radiology residents in a lecture format during noon conferences.
Historically, however, approximately 50% of faculty do not submit MCQs after giving a lecture, the authors noted. Thus, in this study, the researchers assessed whether ChatGPT could accurately and adequately fill this role in generating MCQs for resident education.
First, the group created a custom prompt for ChatGPT to generate non-image-based MCQs based on resident lecture transcripts. The chatbot generated 144 MCQs, of which 17 were selected by the chief resident for inclusion in the study and randomly combined with 11 attending radiologist-written MCQs.
Next, for each MCQ, 21 participating radiology residents answered the questions, rated them from 1 to 10 on effectiveness in reinforcing lecture material, as well as responded whether they thought an attending radiologist at their institution wrote the MCQ versus an alternative source.
According to the analysis, perceived question quality was not significantly different between ChatGPT-generated (mean = 6.93) and attending radiologist-written questions (mean = 7.08). MCQ correct answer percentages did not significantly differ between ChatGPT-generated (mean = 57%) and attending radiologist-written questions (mean = 59%).
In addition, the percentage of MCQs thought to be written by an attending radiologist was significantly different between ChatGPT-generated (mean = 57%) and attending radiologist-written MCQs (mean = 71%).
“These results suggest that radiology chief resident-selected ChatGPT-generated MCQs may provide utility in reinforcing lecture material and covering clinical teaching points from previously given didactic lectures,” the researchers wrote.
Ultimately, the researchers noted that this is the first study to date that has evaluated the use of ChatGPT in generating MCQs for radiology education, and they highlighted areas for future research. For instance, further research should compare image-based MCQs generated by ChatGPT to attending radiologist-written MCQs that include imaging.
Also, how residents were able to identify attending radiologist-written MCQs compared with ChatGPT-generated MCQs remains a mystery, and a qualitative analysis needs to be conducted to elucidate this, they added.
“[Large language models] such as ChatGPT demonstrate potential in generating and presenting educational material for radiology education, and their use should be explored further on a larger scale,” the group concluded.
The full study is available here.