Google's Gemini 3 just doubled the performance of every other AI model on a critical reasoning test, proving that scaling large language models still works. The new models cost $77 per task but crushed benchmarks without needing fancy new architectures, just more training....