No matter the model, AI Overview/Results in Google are just hallucinated nonsense, only providing roughly equivalent information to what is in the linked sources as a coincidence, rather than due to actually relying on them.
Whether DuckDuckGo, Kagi, Ecosia or anything else, they are all objectively and verifiably better search engines than Google as of today.
This isn't new either, nor has it gotten better. AI Overview has been and continues to be a mess that makes it very clear to me anyone claiming Google is still the "best" search engine results wise is lying to themselves. Anyone saying Google search in 2025 is good or even usable is objectively and verifiably wrong and claiming DDG or Kagi offer less usable results is equally unfounded.
Either fix your models finally so they adhere to and properly quote sources like your competitors somehow manage or, preferably, stop forcing this into search.
The iPad nowadays not having one, proper, sane default for window management is a nightmare for so many reasons and incidentally also in one fell swoop disproved everyone who argued MacOS level functionality on iPad OS was not happening to retain cohesiveness over the concept/the one true iPad way of doing things. Interestingly, if one wants to see cohesiveness and pushing one clear concept through, even if it may limit certain use cases, Gnome is the perfect example for that. Agree or disagree with their vision, I always appreciated that, more than any other desktop environment, whether by a Trillion Dollar company or any other FOSS team, Gnome is willing to enforce their vision and for those who it suits, is better for it. For those wo don't, there are still alternatives, which is why I never understood the significant amount of anger that Gnomes position on this front attracted, not every project needs to adhere to the same beliefs in forgoing strict, consistent defaults for more user freedom.
Liquid Glass is another clear (pardon the pun) showcase of a lack of understanding that this field entails a lot more than mere visual appeal, he did very well with the fair critique of Figma designers. Even more so, now that more of the key personnels history has come to light, which does explain the clear deficiencies in usability and accessibility that even the untrained eye quickly noticed post release.
Also agreed with the measured perspective on "AI"/LLM usage. Beyond local models having potential to enable new paradigms, I have found LLMs to be somewhat helpful in more quickly prototyping and testing usage loops/concepts and iterating on them over existing solutions for what it's worth.
Occasionally, I think back to Unity, which did such a great job of rethinking existing concepts whilst not throwing everything out with the bathwater. Some features, such as the HUD, we are barely catching up to even today and Unity just felt like people fully immersed in the users perspective were actually given both the freedom and resources to push innovative concepts forward.
Peeked at Ink and Switches output too and am finding a lot of incredibly valuable information to learn from. Truly a treasure throve of information, some I did never think about, other things I have been experimenting with for a while now as part of a project I want to finally get off the ground. Even when it's something I have already dabbled with, their writing is so incredibly expansive that they cover a lot of perspectives I'd never considered.
Overall, great presentation.
Model output seems very similar to GLM-4.6 in my purely subjective and very limited testing. Also disappointed to see the "Thinking" tokens fully hidden.
Would greatly appreciate a deep dive into what lead to the choice of training their own model, how ground up their model was trained (or whether this is based on an existing model like Composer 1 with Qwen) and why they decided not to provide their hardware with existing models like Cerebras and Groq.
I have a hard time understanding the significant positive sentiment considering how strongly the performance I am seeing deviates from the benchmark results published. 3 Flash is almost Grok level in this regard which is very disappointing for Google. Speed and cost are also not an edge seeing as e.g. Kimi K2 by not overly abusing the reasoning budget comes out cheaper in real world testing and reliably hits the same or higher throughput depending on the provider. Maybe I am underestimating how many users real life use cases cover solving ArcAGI games or publicly accessible and impossible to keep out of the training data databases of questions...
Scroll down to "Cost to Run Artificial Analysis Intelligence Index" for a per run cost comparison between 3 Flash, Kimi K2 Thinking and Haiku 4.5 with 3 Flash being almost twice as expensive as Haiku 4.5: https://artificialanalysis.ai/?models=gemini-3-flash-reasoni...