Skip to main content
  • News in Review

    ChatGPT Takes the OKAP

    Download PDF

    Interest in ChatGPT is growing rapidly. To assess its limitations, potential biases, and clinical usefulness in ophthalmology practice, a team at the Université de Montréal in Montréal, Quebec, sought and obtained permission from the Academy to use Academy material to evaluate the performance of ChatGPT on Ophthalmic Knowledge Assessment Program (OKAP)–simulated exams. They gathered benchmark data on two types of ChatGPT programs and their ability to accurately answer ophthalmology test questions. Findings from this type of research could eventually be applied to clinical practice and help, for example, interpret data to make diagnoses and disease management decisions.1

    An earlier study assessed the ability of ChatGPT and another large language model (LLM), called PaLM, to respond correctly to multiple-choice questions on the United States Medical Licensing Exam. The authors of the current study focused more narrowly on the knowledge that ophthalmology residents need for their yearly exams. The primary outcome was to better understand the performance of ChatGPT in comparison to human performance within this specialty context, said study coauthor Fares Antaki, MDCM, FRCSC.

    Test time. Simulated exams were created by randomly generating 260 questions each from the Basic and Clinical Science Course (BCSC) Self-Assessment2 and OphthoQuestions question banks. The questions were individually assigned a difficulty index and cognitive level. Twenty questions representing each of the 13 topic sections of the official OKAP exam were included, for a combined pool of 520 test questions. Two versions of ChatGPT were tested on these simulation exams. The original version of ChatGPT, referred to as the legacy model, was used in an initial experimental run. An advanced model, ChatGPT Plus, was used for three subsequent test runs.

    Exam scores. The legacy model scored 55.8% and 42.7%, respectively, for exams based on the BCSC and OphthoQuestions question banks. Pooled data from the three ChatGPT Plus runs scored 59.4% ± 0.6 and 49.2% ± 1.0, respectively.

    Exam section and question difficulty were most predictive of answer accuracy. On OKAP section–specific questions, both versions of ChatGPT scored higher in general medicine (75%) than in subspeciality domains like neuro-ophthalmology (25%). “Both versions also performed well in the fundamentals (60%) and cornea (60%) sections,” said Dr. Antaki.

    Although ChatGPT did not test as well overall on the exam questions as its human counterparts—who scored 74% on the BCSC- and 63% on OpthoQuestions-based exams—the study authors were encouraged by the technology’s consistent and repeatable performance. The performance of ChatGPT aligned to an extent with the general knowledge of ophthalmology trainees and tended to do better on the questions that humans were also stronger on, said Dr. Antaki.

    Future work. LLM technologies may one day be incorporated into practical clinical applications for use in patient triage, for example, said Dr. Antaki. But to better understand subspeciality domain training using emerging versions of ChatGPT and how artificial intelligence-driven imaging diagnostics can be best used, more research is needed.

    —Julie Monroe

    ___________________________

    1 Antaki F et al. Ophth Sci. Published online May 4, 2023.

    2 The questions from the BCSC Self-Assessment Program were generated through personal subscription Permission was obtained from the American Academy of Ophthalmology for use of the underlying materials.

    ___________________________

    Relevant financial disclosures: Dr. Antaki—None.

    For full disclosures and the disclosure key, see below.

    Full Financial Disclosures

    Dr. Yam—Research Grants Council, Hong Kong: S; Collaborative Research Fund: S; Innovation and Technology Fund: S; UBS Optimus Foundation: S; Centaline Myopia Fund: S; National Natural Science Foundation of China: S; CUHK: S; CUHK Jockey Club Children’s Eye Care Programme: S; CUHK Jockey Club Myopia Prevention Programme: S.

    Dr. Antaki—None.

    Dr. Kandarakis—None.

    Mr. Dicko—None.

    Dr. Williams—None.

    Disclosure Category

    Code

    Description

    Consultant/Advisor C Consultant fee, paid advisory boards, or fees for attending a meeting.
    Employee E Hired to work for compensation or received a W2 from a company.
    Employee, executive role EE Hired to work in an executive role for compensation or received a W2 from a company.
    Owner of company EO Ownership or controlling interest in a company, other than stock.
    Independent contractor I Contracted work, including contracted research.
    Lecture fees/Speakers bureau L Lecture fees or honoraria, travel fees or reimbursements when speaking at the invitation of a commercial company.
    Patents/Royalty P Beneficiary of patents and/or royalties for intellectual property.
    Equity/Stock/Stock options holder, private corporation PS Equity ownership, stock and/or stock options in privately owned firms, excluding mutual funds.
    Grant support S Grant support or other financial support from all sources, including research support from government agencies (e.g., NIH), foundations, device manufacturers, and\or pharmaceutical companies. Research funding should be disclosed by the principal or named investigator even if your institution receives the grant and manages the funds.
    Stock options, public or private corporation SO Stock options in a public or private company.
    Equity/Stock holder, public corporation US Equity ownership or stock in publicly traded firms, excluding mutual funds (listed on the stock exchange).

     

    More from this month’s News in Review