Roche asks:

Which organization's large language model (LLM) will be ranked first as of 30 September 2025, according to LMArena's "Text Arena?"

Started Jun 04, 2025 09:00PM UTC
Closed Sep 30, 2025 07:01AM UTC
Challenges
Tags

LMArena is an open-source platform for crowdsourced AI benchmarking, created by researchers from UC Berkeley SkyLab (SkyLab). For more information on how the leaderboard is constructed, see LMArena - Blog. The question will be suspended on 29 September 2025 and the outcome determined using the ranks as reported by LM Arena at approximately 5:00p.m. ET on 30 September 2025 (LMArena - Text Arena Leaderboard, see "Rank (UB)"). As of 27 May 2025, Google was ranked first, with its "gemini-2.5-pro-preview-05-06" scoring 1446, followed by OpenAI's "o3-2025-04-16" scoring 1435. In the event of a tie for first place by LLMs of different organizations, the LLM with the higher "Score" will be considered first, followed by the "Votes" total. If the named source changes the way it presents the data, further instructions will be provided.

Confused? Check our FAQ or ask us for help. To learn more about Good Judgment and Superforecasting, click here.

To learn more about how you can become a Superforecaster, see hereFor other posts from our Insights blog, click here.


The question closed "Google" with a closing date of 30 September 2025.

See our FAQ to learn about how we resolve questions and how scores are calculated.

Possible Answer Correct? Final Crowd Forecast
Anthropic 4%
Google 84%
OpenAI 10%
xAI 1%
Another organization 1%

Crowd Forecast Profile

Participation Level
Number of Forecasters 128
Average for questions older than 6 months: 160
Number of Forecasts 431
Average for questions older than 6 months: 475
Accuracy
Participants in this question vs. all forecasters average

Most Accurate

Relative Brier Score

1.
-0.462538
2.
-0.455958
3.
-0.449338
4.
-0.442538
5.
-0.402087

Recent Consensus, Probability Over Time

Files
Tip: Mention someone by typing @username