Readit News logoReadit News
Topfi commented on Gemini 3 Flash – Everything you need to know   artificialanalysis.ai/art... · Posted by u/Topfi
Topfi · 20 hours ago
That worst in class hallucination rate, coupled with a massive output token amount that ends up making the benchmark run more expensive than models such as Haiku 4.5 despite a cheaper per million token cost are really disappointing and do align with some personal testing of mine, not to mention the initial experience I commented on yesterday in the announcement thread.

I have a hard time understanding the significant positive sentiment considering how strongly the performance I am seeing deviates from the benchmark results published. 3 Flash is almost Grok level in this regard which is very disappointing for Google. Speed and cost are also not an edge seeing as e.g. Kimi K2 by not overly abusing the reasoning budget comes out cheaper in real world testing and reliably hits the same or higher throughput depending on the provider. Maybe I am underestimating how many users real life use cases cover solving ArcAGI games or publicly accessible and impossible to keep out of the training data databases of questions...

Scroll down to "Cost to Run Artificial Analysis Intelligence Index" for a per run cost comparison between 3 Flash, Kimi K2 Thinking and Haiku 4.5 with 3 Flash being almost twice as expensive as Haiku 4.5: https://artificialanalysis.ai/?models=gemini-3-flash-reasoni...

Topfi commented on Gemini 3 Flash: Frontier intelligence built for speed   blog.google/products/gemi... · Posted by u/meetpateltech
Topfi · a day ago
By existing as part of Google results, AI Search makes them the least reliable search engine of all. Just to show an example I have searched for organically today with Kagi that I tried with Google for a quick real world test, looking for the exact 0-100kph times of the Honda Pan European ST1100, I got a result of 12-13 seconds, which isn't even in the correct stratosphere (roughly around 4sec), nor anywhere in the linked sources the model claims to rely on: https://share.google/aimode/Ui8yap74zlHzmBL5W

No matter the model, AI Overview/Results in Google are just hallucinated nonsense, only providing roughly equivalent information to what is in the linked sources as a coincidence, rather than due to actually relying on them.

Whether DuckDuckGo, Kagi, Ecosia or anything else, they are all objectively and verifiably better search engines than Google as of today.

This isn't new either, nor has it gotten better. AI Overview has been and continues to be a mess that makes it very clear to me anyone claiming Google is still the "best" search engine results wise is lying to themselves. Anyone saying Google search in 2025 is good or even usable is objectively and verifiably wrong and claiming DDG or Kagi offer less usable results is equally unfounded.

Either fix your models finally so they adhere to and properly quote sources like your competitors somehow manage or, preferably, stop forcing this into search.

Topfi commented on Are we stuck with the same Desktop UX forever? [video]   youtube.com/watch?v=1fZTO... · Posted by u/todsacerdoti
Topfi · 7 days ago
This was an incredibly informative presentation and one I found myself nodding along to quite a lot as a person with a decade+ old folder collecting every desktop environment concept I could get my hands on.

The iPad nowadays not having one, proper, sane default for window management is a nightmare for so many reasons and incidentally also in one fell swoop disproved everyone who argued MacOS level functionality on iPad OS was not happening to retain cohesiveness over the concept/the one true iPad way of doing things. Interestingly, if one wants to see cohesiveness and pushing one clear concept through, even if it may limit certain use cases, Gnome is the perfect example for that. Agree or disagree with their vision, I always appreciated that, more than any other desktop environment, whether by a Trillion Dollar company or any other FOSS team, Gnome is willing to enforce their vision and for those who it suits, is better for it. For those wo don't, there are still alternatives, which is why I never understood the significant amount of anger that Gnomes position on this front attracted, not every project needs to adhere to the same beliefs in forgoing strict, consistent defaults for more user freedom.

Liquid Glass is another clear (pardon the pun) showcase of a lack of understanding that this field entails a lot more than mere visual appeal, he did very well with the fair critique of Figma designers. Even more so, now that more of the key personnels history has come to light, which does explain the clear deficiencies in usability and accessibility that even the untrained eye quickly noticed post release.

Also agreed with the measured perspective on "AI"/LLM usage. Beyond local models having potential to enable new paradigms, I have found LLMs to be somewhat helpful in more quickly prototyping and testing usage loops/concepts and iterating on them over existing solutions for what it's worth.

Occasionally, I think back to Unity, which did such a great job of rethinking existing concepts whilst not throwing everything out with the bathwater. Some features, such as the HUD, we are barely catching up to even today and Unity just felt like people fully immersed in the users perspective were actually given both the freedom and resources to push innovative concepts forward.

Peeked at Ink and Switches output too and am finding a lot of incredibly valuable information to learn from. Truly a treasure throve of information, some I did never think about, other things I have been experimenting with for a while now as part of a project I want to finally get off the ground. Even when it's something I have already dabbled with, their writing is so incredibly expansive that they cover a lot of perspectives I'd never considered.

Overall, great presentation.

Topfi commented on Tensor 1.5 is out and it's matching Claude 4.5 Opus   movementlabs.ai... · Posted by u/movementlabs-AI
Topfi · 11 days ago
I very much like the direct demo, similar to how Groq showcased their offerings initially. Would love if they shared more about their custom hardware and model training, as it stands this is far less information than Cerebras and Groq offered upon launch.

Model output seems very similar to GLM-4.6 in my purely subjective and very limited testing. Also disappointed to see the "Thinking" tokens fully hidden.

Would greatly appreciate a deep dive into what lead to the choice of training their own model, how ground up their model was trained (or whether this is based on an existing model like Composer 1 with Qwen) and why they decided not to provide their hardware with existing models like Cerebras and Groq.

Topfi commented on Clip of a Tesla Optimus teleoperator taking his headset off   bsky.app/profile/jjvincen... · Posted by u/doener
Topfi · 11 days ago
The name of the event where this took place was "Autonomy Visualized". Can we make securities fraud properly illegal again or how many more years of this will we have to endure?

u/Topfi

KarmaCake day2175June 5, 2023View Original