Bilingual evaluation of large language models for patient education in refractive surgery
Author:
Corresponding Author:

Hung-Chi Chen. Chang Gung Memorial Hospital, Linkou Main Branch, No.5, Fuxing Street, Guishan District, Taoyuan 333, Taiwan, China. mr3756@cgmh.org.tw

Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    AIM: To evaluate the ability of six advanced large language models (LLMs)—in providing accurate, comprehensive, and readable patient education on corneal refractive surgeries [laser in-situ keratomileusis (LASIK), keratorefractive lenticule extraction (KLEx), and photorefractive keratectomy (PRK)] in both English and Chinese. METHODS: This is a cross-sectional, comparative study. Twenty-six questions, compiled from authoritative ophthalmologic sources and covering four domains (procedure basics and eligibility; safety, risks and long-term stability; recovery and postoperative experience; and practical concerns), were administered in both English and Chinese via fresh chat sessions with each LLM, respectively. Five performance metrics were evaluated: accuracy, comprehensiveness, word count, readability, and reproducibility, using appropriate statistical tests. RESULTS: OpenAI o1 and DeepSeek-R1 consistently achieved the highest accuracy and most comprehensive responses, significantly outperforming ChatGPT-4o, Gemini Advanced, Claude Sonnet, and Tongyi Qwen (Friedman P<0.001). Although overall accuracy and comprehensiveness were similar across languages, Chinese responses were significantly longer. Readability varied among the models, with Claude Sonnet generally producing the most readable English texts. Reproducibility analysis revealed moderate consistency, reflecting inherent variability in outputs to identical prompts. CONCLUSION: Reasoning-augmented LLMs, particularly OpenAI o1 and DeepSeek-R1, demonstrate superior performance in delivering bilingual patient education for corneal refractive surgery, with high accuracy and comprehensiveness. However, variations in response length, readability, and reproducibility indicate that further refinement is necessary before these tools can be reliably integrated into clinical practice.

    Reference
    Related
    Cited by
Get Citation

Tsung-Hsien Tsai, Chin-Ling Tsai, Jui-Hung Hsu, et al. Bilingual evaluation of large language models for patient education in refractive surgery. Int J Ophthalmol, 2026,(6):1019-1027

Copy
Article Metrics
  • Abstract:
  • PDF:
Publication History
  • Received:November 15,2025
  • Revised:January 28,2026
  • Adopted:
  • Online: May 18,2026
  • Published: