OpenAI’s recent advancement involving the ChatGPT-4.0 model has shown promise in clinical neurology, achieving an 85% accuracy rate in a proof-of-concept neurology exam. Researchers at the University Hospital Heidelberg and the German Cancer Research Center conducted this experiment, published on Dec. 7, utilizing questions from the American and European Boards for Neurology.
Comparing the older ChatGPT-3.5 model, which scored 66.8%, answering 1,306 out of 1,956 questions correctly, the latest iteration, ChatGPT-4.0, achieved an 85% accuracy with 1,662 correct answers. The human average score was 73.8%, positioning the ChatGPT-4.0 above the typical passing mark of 70% in educational settings. However, both AI models demonstrated less proficiency in tasks involving higher-order cognitive skills compared to lower-order cognitive ones. The researchers highlighted the potential for large language models (LLMs) in clinical neurology, but they emphasized the need for refinements and fine-tuning to bolster their applicability.
Dr. Varun Venkataramani, one of the study’s authors, clarified that while the study showcased the potential of LLMs, caution is necessary. Although these models exhibit promise in documentation and decision support systems, their reliability in handling complex cognitive tasks remains imperfect. Venkataramani sees the study as a stepping stone, indicating further development and tailored modifications for LLMs to become effective tools in clinical neurology.
AI’s involvement in healthcare is expanding rapidly, from aiding in cancer research for AstraZeneca to tackling antibiotic overprescription in Hong Kong. Yet, despite these advancements, the study underscores the need for cautious integration of AI in neurology, acknowledging the ongoing need for enhancement and specialization to ensure its reliability and effectiveness in clinical practice.
OpenAI’s ChatGPT-4.0’s success in a neurology exam highlights its potential in healthcare. While achieving an impressive 85% accuracy, limitations in higher-order cognitive tasks persist. Further refinements are essential before AI can be seamlessly integrated into clinical neurology, ensuring reliability and proficiency in practice.