Google was caught flat-footed by the sudden skyrocketing interest in generative AI despite its role in developing the underlying technology. This prompted the company to refocus its considerable resources on catching up to OpenAI. Since then, we've seen the detail-flubbing Bard and numerous versions of the multimodal Gemini models. While Gemini has struggled to make progress in benchmarks and user experience, that could be changing with the new 2.5 Pro (Experimental) release. With big gains in benchmarks and vibes, this might be the first Google model that can make a dent in ChatGPT's dominance.
We recently spoke to Google's Tulsee Doshi, director of product management for Gemini, to talk about the process of releasing Gemini 2.5, as well as where Google's AI models are going in the future.
Welcome to the vibes era
Google may have had a slow start in building generative AI products, but the Gemini team has picked up the pace in recent months. The company released Gemini 2.0 in December, showing a modest improvement over the 1.5 branch. It only took three months to reach 2.5, meaning Gemini 2.0 Pro wasn't even out of the experimental stage yet. To hear Doshi tell it, this was the result of Google's long-term investments in Gemini.
"A big part of it is honestly that a lot of the pieces and the fundamentals we've been building are now coming together in really awesome ways, " Doshi said. "And so we feel like we're able to pick up the pace here."
The process of releasing a new model involves testing a lot of candidates. According to Doshi, Google takes a multilayered approach to inspecting those models, starting with benchmarks. "We have a set of evals, both external academic benchmarks as well as internal evals that we created for use cases that we care about," she said.
The project involved using MonoGame, a fairly niche game creation platform, and some pretty complex 3D math problems. It wasn't perfect, and I had to occasionally steer it in the right direction when there were issues with what it produced, but overall it was frighteningly impressive. It genuinely saved me days of work and is the first time one of these things actually felt like an assistant.