Bilingual evaluation of large language models for patient education in refractive surgery

doi:10.18240/ijo.2026.06.01

Home > Archive>Volume , Issue 6, 2026 >1019-1027. DOI:10.18240/ijo.2026.06.01

Bilingual evaluation of large language models for patient education in refractive surgery
DOI:
                        10.18240/ijo.2026.06.01
                    
Author:
                        
                        
                    
Corresponding Author:Hung-Chi Chen. Chang Gung Memorial Hospital, Linkou Main Branch, No.5, Fuxing Street, Guishan District, Taoyuan 333, Taiwan, China. mr3756@cgmh.org.tw
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

AIM: To evaluate the ability of six advanced large language models (LLMs)—in providing accurate, comprehensive, and readable patient education on corneal refractive surgeries [laser in-situ keratomileusis (LASIK), keratorefractive lenticule extraction (KLEx), and photorefractive keratectomy (PRK)] in both English and Chinese. METHODS: This is a cross-sectional, comparative study. Twenty-six questions, compiled from authoritative ophthalmologic sources and covering four domains (procedure basics and eligibility; safety, risks and long-term stability; recovery and postoperative experience; and practical concerns), were administered in both English and Chinese via fresh chat sessions with each LLM, respectively. Five performance metrics were evaluated: accuracy, comprehensiveness, word count, readability, and reproducibility, using appropriate statistical tests. RESULTS: OpenAI o1 and DeepSeek-R1 consistently achieved the highest accuracy and most comprehensive responses, significantly outperforming ChatGPT-4o, Gemini Advanced, Claude Sonnet, and Tongyi Qwen (Friedman P<0.001). Although overall accuracy and comprehensiveness were similar across languages, Chinese responses were significantly longer. Readability varied among the models, with Claude Sonnet generally producing the most readable English texts. Reproducibility analysis revealed moderate consistency, reflecting inherent variability in outputs to identical prompts. CONCLUSION: Reasoning-augmented LLMs, particularly OpenAI o1 and DeepSeek-R1, demonstrate superior performance in delivering bilingual patient education for corneal refractive surgery, with high accuracy and comprehensiveness. However, variations in response length, readability, and reproducibility indicate that further refinement is necessary before these tools can be reliably integrated into clinical practice.

Reference

Cited by

Get Citation

Tsung-Hsien Tsai, Chin-Ling Tsai, Jui-Hung Hsu, et al. Bilingual evaluation of large language models for patient education in refractive surgery. Int J Ophthalmol, 2026,(6):1019-1027

Copy

Article Metrics

Abstract:
PDF:

Publication History

Received:November 15,2025
Revised:January 28,2026
Adopted:
Online: May 18,2026
Published:

Home

Articles

Journal Info

For Authors

For Reviewers

Publication Policies

News and Events

RSS

Get Citation

Article Metrics

Publication History