Claude Beat GPT-5 (But There's a Catch)

369 views
October 6, 2025
Claude Sonnet 4.5 is ranking higher than GPT-5 on LMArena, but benchmarks are misleading because models are trained to game them. The scoring system penalizes models for saying "I don't know," so they hallucinate instead of admitting uncertainty....