Evaluation and Comparison of Ophthalmic Scientific Abstracts and References by Current Artificial Intelligence Chatbots

Hua, Hong-Uyen; Kaakour, Abdul-Hadi; Rachitskaya, Aleksandra; Srivastava, Sunil K.; Sharma, Sumit; Mammo, Danny A.

doi:10.1001/jamaophthalmol.2023.3119

Cited by 11 publications

(7 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Chatbots can generate average quality scientific abstracts (41.7% correct) but remain plagued by fake data and references, when not provided with a data set. 85 GPT-4 scores slightly better than GPT-3.5 with lower fake score and hallucination rates (Table 5). Chatbots can assist people with relatively weak writing or language skills to prepare written assignments both faster and of higher quality.…”

Section: Performance In Other Potential Applicationsmentioning

confidence: 93%

“…But there is a growing concern that AI chatbots are being abused in writing essays, scientific abstracts and even manuscripts. 85 With the number of factual errors these chatbots generate and their apparently comprehensive response, it is important for authors to know their limitations and pitfalls and for publishers/editors to identify AI-generated text in manuscripts. 86 GPT-4 can categorise refractive surgery candidates to their ideal procedures (68%-88% correct) with low to moderate agreement (0.399-0.610) with clinicians.…”

Section: Performance In Other Potential Applicationsmentioning

confidence: 99%

See 1 more Smart Citation

Utility of artificial intelligence‐based large language models in ophthalmic care

Biswas,

Davies,

Sheppard

et al. 2024

Ophthalmic Physiologic Optic

View full text Add to dashboard Cite

PurposeWith the introduction of ChatGPT, artificial intelligence (AI)‐based large language models (LLMs) are rapidly becoming popular within the scientific community. They use natural language processing to generate human‐like responses to queries. However, the application of LLMs and comparison of the abilities among different LLMs with their human counterparts in ophthalmic care remain under‐reported.Recent FindingsHitherto, studies in eye care have demonstrated the utility of ChatGPT in generating patient information, clinical diagnosis and passing ophthalmology question‐based examinations, among others. LLMs' performance (median accuracy, %) is influenced by factors such as the iteration, prompts utilised and the domain. Human expert (86%) demonstrated the highest proficiency in disease diagnosis, while ChatGPT‐4 outperformed others in ophthalmology examinations (75.9%), symptom triaging (98%) and providing information and answering questions (84.6%). LLMs exhibited superior performance in general ophthalmology but reduced accuracy in ophthalmic subspecialties. Although AI‐based LLMs like ChatGPT are deemed more efficient than their human counterparts, these AIs are constrained by their nonspecific and outdated training, no access to current knowledge, generation of plausible‐sounding ‘fake’ responses or hallucinations, inability to process images, lack of critical literature analysis and ethical and copyright issues. A comprehensive evaluation of recently published studies is crucial to deepen understanding of LLMs and the potential of these AI‐based LLMs.SummaryOphthalmic care professionals should undertake a conservative approach when using AI, as human judgement remains essential for clinical decision‐making and monitoring the accuracy of information. This review identified the ophthalmic applications and potential usages which need further exploration. With the advancement of LLMs, setting standards for benchmarking and promoting best practices is crucial. Potential clinical deployment requires the evaluation of these LLMs to move away from artificial settings, delve into clinical trials and determine their usefulness in the real world.

show abstract

Section: Performance In Other Potential Applicationsmentioning

confidence: 93%

Section: Performance In Other Potential Applicationsmentioning

confidence: 99%

Utility of artificial intelligence‐based large language models in ophthalmic care

Biswas,

Davies,

Sheppard

et al. 2024

Ophthalmic Physiologic Optic

View full text Add to dashboard Cite

show abstract

“…Studies have shown that abstracts automatically generated by ChatGPT contain natural language and content that causes scientists to perceive them as having been written by humans rather than artificial intelligence [35 && ]. However, limitations of ChatGPT in generating biomedical abstracts of presenting 'hallucinated' or inaccurately cited information [36]. This suggests that LLMs could support researchers by generating initial outlines or introductory drafts.…”

Section: Scientific Writingmentioning

confidence: 99%

Large language models and the future of rheumatology: assessing impact and emerging opportunities

Mannstadt,

Mehta

2023

Current Opinion in Rheumatology

View full text Add to dashboard Cite

Purpose of review Large language models (LLMs) have grown rapidly in size and capabilities as more training data and compute power has become available. Since the release of ChatGPT in late 2022, there has been growing interest and exploration around potential applications of LLM technology. Numerous examples and pilot studies demonstrating the capabilities of these tools have emerged across several domains. For rheumatology professionals and patients, LLMs have the potential to transform current practices in medicine. Recent findings Recent studies have begun exploring capabilities of LLMs that can assist rheumatologists in clinical practice, research, and medical education, though applications are still emerging. In clinical settings, LLMs have shown promise in assist healthcare professionals enabling more personalized medicine or generating routine documentation like notes and letters. Challenges remain around integrating LLMs into clinical workflows, accuracy of the LLMs and ensuring patient data confidentiality. In research, early experiments demonstrate LLMs can offer analysis of datasets, with quality control as a critical piece. Lastly, LLMs could supplement medical education by providing personalized learning experiences and integration into established curriculums. Summary As these powerful tools continue evolving at a rapid pace, rheumatology professionals should stay informed on how they may impact the field.

show abstract

“…The influence of ChatGPT is attributed to its conversational prowess and its performance, which approaches or matches human-level competence in cognitive tasks, spanning various domains including medicine. 16 ChatGPT has achieved commendable results in the United States Medical Licensing Examinations, leading to discussions about the readiness of LLM applications for integration into clinical [17][18][19] , educational [20][21][22] , and research 23 environments.…”

Section: Introductionmentioning

confidence: 99%

Performance of Multimodal GPT-4V on USMLE with Image: Potential for Imaging Diagnostic Support with Explanations

Yang,

Yao,

Tasmin

et al. 2023

Preprint

View full text Add to dashboard Cite

Importance Using artificial intelligence (AI) to help clinical diagnoses has been an active research topic for more than six decades. Few research however has the scale and accuracy that can be turned into clinical practice. The tide may be turned today with the power of large language models (LLMs). In this application, we evaluated the accuracy of medical license exam using the newly released Generative Pre-trained Transformer 4 with vision (GPT-4V), a large multimodal model trained to analyze image inputs with the text instructions from the user. This study is the first to evaluate GPTs for interpreting medical images. Objective This study aimed to evaluate the performance of GPT-4V on medical licensing examination questions with images, as well as to analyze interpretability. Design, Setting, and Participants We used 3 sets of multiple-choice questions with images to evaluate GPT-4V performance. The first set was the United States Medical Licensing Examination (USMLE) from the National Board of Medical Examiners (NBME) sample questions in step1, step2CK, and step3. The second set was derived from AMBOSS, a commonly used question bank for medical students, which also provides statistics on question difficulty and the performance on an exam relative to the user base. The third set was the Diagnostic Radiology Qualifying Core Exam (DRQCE) from the American Board of Radiology. The study (including data analysis) was conducted from September to October 2023. Main Outcomes and Measures The choice accuracy of GPT-4V was compared to two other large language models, GPT-4 and ChatGPT. The GPT-4V explanation was evaluated across 4 qualitative metrics: image misunderstanding, text hallucination, reasoning error, and non-medical error. Results Of the 3 exams with images, NBME, AMBOSS, and DRQCE, GPT-4V achieved accuracies of 86.2%, 62.0%, and 73.1%, respectively. GPT-4V outperformed ChatGPT and GPT-4 by 131.8% and 64.5% on average across various data sets. The model demonstrated a decreasing trend in performance as question difficulty increased in the AMBOSS dataset. GPT-4V achieves an accuracy of 90.7% in the full USMLE exam, outperforming the passing threshold of about 60% accuracy. Among the incorrect answers, 75.9% of responses included misinterpretation of the image. However, 39.0% of them could be easily solved with a short hint. Conclusion In this cross-sectional study, GPT-4V achieved a high accuracy of USMLE that was in the 70th - 80th percentile with AMBOSS users preparing for the exam. The results suggest the potential of GPT-4V for clinical decision support. However, GPT-4V generated explanation revealed several issues. It needs to improve explanation quality for potential use in clinical decision support.

show abstract

Evaluation and Comparison of Ophthalmic Scientific Abstracts and References by Current Artificial Intelligence Chatbots

Cited by 11 publications

References 16 publications

Utility of artificial intelligence‐based large language models in ophthalmic care

Utility of artificial intelligence‐based large language models in ophthalmic care

Large language models and the future of rheumatology: assessing impact and emerging opportunities

Performance of Multimodal GPT-4V on USMLE with Image: Potential for Imaging Diagnostic Support with Explanations

Contact Info

Product

Resources

About