2023
DOI: 10.1101/2023.10.26.23297629
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Performance of Multimodal GPT-4V on USMLE with Image: Potential for Imaging Diagnostic Support with Explanations

Zhichao Yang,
Zonghai Yao,
Mahbuba Tasmin
et al.

Abstract: Importance Using artificial intelligence (AI) to help clinical diagnoses has been an active research topic for more than six decades. Few research however has the scale and accuracy that can be turned into clinical practice. The tide may be turned today with the power of large language models (LLMs). In this application, we evaluated the accuracy of medical license exam using the newly released Generative Pre-trained Transformer 4 with vision (GPT-4V), a large multimodal model trained to analyze image inputs w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

1
3
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 52 publications
1
3
0
Order By: Relevance
“…The best performance under the open-book setting is achieved by human physicians (95%, CI: 91-99%), though not significantly different from GPT-4V. Our findings, therefore, align with the previous ones, which show the superior performance of GPT-4V in the closed-book setting 12,13 .…”
supporting
confidence: 89%
“…The best performance under the open-book setting is achieved by human physicians (95%, CI: 91-99%), though not significantly different from GPT-4V. Our findings, therefore, align with the previous ones, which show the superior performance of GPT-4V in the closed-book setting 12,13 .…”
supporting
confidence: 89%
“…Wu et al examine GPT-4V's potential in multimodal medical diagnosis, demonstrating substantial promise yet revealing limitations in high-stakes domains [6]. Echoing this view, Yang et al assessed the performance of Multimodal GPT-4V in medical licensing exams, particularly in imaging diagnostics, offering a glimpse into future support systems for medical professionals [7]. The concept of "Socratic models" by Zeng et al brings forth the idea of zero-shot multimodal reasoning, allowing models to compose answers from disparate sources without explicit training [8].…”
Section: Novel Roles For Multimodal Large Language and Vision Modelsmentioning
confidence: 99%
“…The creative maker spaces have become vibrant hubs of 21 st -century innovation, merging the traditional tactile experience with digital fabrication and design. However, integrating new artificial intelligence (AI) tools and, in particular, the current generation of multimodal large language models (LLMs) [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17] into these environments has the potential to enhance human creativity and innovation [18][19][20][21][22][23][24][25][26][27][28][29][30][31][32]. In recent years, the intersection of AI and multimodal (MM) learning has spawned a generation of models that integrate and interpret information across various forms of data, including text, images, and speech.…”
Section: Introductionmentioning
confidence: 99%
“…Recent studies have further explored the diagnostic application of multimodal LLMs (also called 'vision-language models') that are able to ingest not only text but also image data as input (12)(13)(14)(15)(16)(17)(18)(19)(20). However, several studies demonstrated low performance of Generative Pretrained Transformer 4 Vision (GPT-4V) by OpenAI in differential diagnosis based on various types of radiological images (12,16,18,20,21).…”
Section: Introductionmentioning
confidence: 99%