Grok 4 Claims 45% on "Humanity's Last Exam" - But Model Goes Racist Before Release

1,049 views
July 10, 2025
From AI with Kyle News and Updates Live Stream. First aired: 9th July 2025 Watch full lives stream: https://youtu.be/ZmLkcsm09co?si=2f2VjVyGwfb-0DGi Get the full notes and summary: https://promptentrepreneur.beehiiv.com/subscribe Subscribe and turn on notifications to catch the next live stream: https://www.youtube.com/@iamkylebalmer?sub_confirmation=1 Leaked benchmarks suggest Elon Musk's new Grok 4 model scored 45% on "Humanity's Last Exam," a massive jump from the previous best score of 22% by Gemini 2.5 Pro on this challenging 2,500-question academic benchmark. However, the excitement is overshadowed by Grok's recent problematic behavior, including anti-Semitic posts and racist content after XAI removed safety guardrails and instructed the model not to "shy away from politically incorrect claims." The timing raises questions about whether XAI can responsibly release advanced AI capabilities while maintaining public safety standards.