Abstract Listings 2025 – European Society of Ophthalmic Plastic and Reconstructive Surgery

Thyroid Eye Disease and AI: A Comparative Study of ChatGPT-3.5, ChatGPT-4o, and Gemini in Patient Information Delivery

Author: Shirin Hamed Azzam
Base Hospital / Institution: Tzafon Medical Center

ePoster presentation

Abstract ID: 25-116

Purpose

This study aimed to compare the effectiveness of three artificial intelligence (AI) language models—GPT-3.5, GPT-4o, and Gemini, in delivering patient-centered information about Thyroid Eye Disease (TED). We evaluated their performance based on the accuracy and comprehensiveness of their responses to common patient inquiries regarding TED.

Methods

Five oculoplastic surgeons assessed the responses generated by the AI models to 12 key questions frequently asked by TED patients. These questions addressed TED pathophysiology, risk factors, clinical presentation, diagnostic testing, and treatment options. Each response was rated for correctness and reliability on a 7-point Likert scale. Correctness referred to factual accuracy, while reliability assessed trustworthiness for patient use.

Results

GPT-3.5 emerged as the top performer, achieving an average correctness score of 5.75 and a reliability score of 5.68, excelling in delivering detailed information on complex topics such as TED treatment and surgical interventions. GPT-4o followed with scores of 5.32 for correctness and 5.25 for reliability, generally providing accurate but less detailed information. Gemini trailed with scores of 5.10 for correctness and 4.70 for reliability, often providing sufficient responses for simpler questions but lacking detail in complex areas like second-line immunosuppressive treatments. Statistical analysis using the Friedman test showed significant differences between models (p < 0.05) for key topics, with GPT-3.5 consistently leading.

Conclusion

GPT-3.5 was the most effective model for delivering reliable and comprehensive patient information, particularly for complex treatment and surgical topics. GPT-4o provided reliable general information but lacked the necessary depth for specialized topics, while Gemini was suitable for addressing basic patient inquiries. This study highlights the role of AI in patient education, suggesting that models like GPT-3.5 can be valuable tools for clinicians in enhancing patient understanding of TED.

Additional Authors

First name	Last name	Base Hospital / Institution
Morris	Hartstein	Shamir Medical Center
Ofira	Zloto	Sheba Medical Center
Cat	Burkat	University of Wisconsin USA
Jimmy	Uddin	Moorfields Eye Hospital
Daniel	Bahir	Tzafon Medical Center

↑ Back to top

Thyroid Eye Disease and AI: A Comparative Study of ChatGPT-3.5, ChatGPT-4o, and Gemini in Patient Information Delivery

ePoster presentation

Purpose

Methods

Results

Conclusion

Additional Authors

Post Categories

Latest Posts

Extraordinary General Meeting (EGM)

2026 ESOPRS Educational Stipends

NUH OOTED Symposium 2026

44th ESOPRS Annual Meeting 2026 in Cascais, Portugal, Sept 10-12 2026