Google just dropped Gemini 3.5 Flash, and the headline claim is that it can pump out nearly 300 tokens per second while matching the benchmark scores of much larger, much slower frontier models. The previous Pro model runs at roughly a quarter of that speed, so this is a real jump, not a minor tweak.
Speed matters here mostly because of agents. Every major lab has been trying to build AI agents that can handle long, complex tasks autonomously, but the economics have been brutal. The more steps an agent takes, the more tokens it burns, and the costs spiral fast enough to make most real world use cases hard to justify. Flash is positioned as the model that might finally tip that equation.
Google is also threading this through a wide range of its own products, so your daily workflow tools are about to start feeling different whether you opted in or not. Tulsee Doshi from the Gemini team framed this as the beginning of a longer rollout, so consider what launched today as the floor.
Step back and the broader picture is that the whole industry is racing toward efficiency as much as raw capability. Faster and cheaper at the same quality level is what unlocks widespread enterprise adoption, and Google just made a credible case that it is getting there.