Roche asks:

Which organization's large language model (LLM) will be ranked first as of 29 May 2026, according to MedHELM's medical domain LLM leaderboard?

Started Feb 11, 2026 02:30PM UTC
Closing May 29, 2026 07:01AM UTC

MedHELM is "a comprehensive healthcare benchmark to evaluate language models on real-world clinical tasks" created by a collaboration between, among others, Stanford's Center for Research on Foundational Models (CRFM) (Stanford - MedHELM). The question will be suspended on 28 May 2026 and the outcome determined using the ranks as reported by MedHELM at approximately 5:00p.m. ET on 29 May 2026 (Stanford - MedHELM Leaderboard, see "Mean win rate" on the "Accuracy" sheet tabbed near the top of the page; this page will be used for resolution). As of 5 February 2026, OpenAI's "GPT-5 (2025-08-07)" was ranked first with a score of 0.703, followed by OpenAI's "o4-mini (2025-04-16)" with a score of 0.697. In the event of a tie for first place by LLMs of different organizations, the LLM with the higher "Mean win rate" on the "Efficiency" sheet tabbed near the top of the page will be considered first. If the named source changes the way it presents the data, further instructions will be provided.

Confused? Check our FAQ or ask us for help. To learn more about Good Judgment and Superforecasting, click here.

To learn more about how you can become a Superforecaster, see hereFor other posts from our Insights blog, click here.

Possible Answer Crowd Forecast Change in last 24 hours
Anthropic (e.g., "Claude") 5.00% -15.00%
DeepSeek 5.67% -14.33%
Google (e.g., "Gemini") 5.33% -14.67%
OpenAI (e.g., "03-mini," "GPT") 82.33% +62.33%
Another organization 1.67% -18.33%

Sign up or sign in to forecast!

Sign Up Sign In
Files
Tip: Mention someone by typing @username