Readit News logoReadit News
rplnt commented on Gemini 2.5 Flash Image   developers.googleblog.com... · Posted by u/meetpateltech
fariszr · a day ago
This is the gpt 4 moment for image editing models. Nano banana aka gemini 2.5 flash is insanely good. It made a 171 elo point jump in lmarena!

Just search nano banana on Twitter to see the crazy results. An example. https://x.com/D_studioproject/status/1958019251178267111

rplnt · a day ago
Oh no, even more mis-scaled product images.
rplnt commented on Researcher Exposes 0-Day Clickjacking Vulnerabilities in Major Password Managers   socket.dev/blog/password-... · Posted by u/gpi
kcrwfrd_ · 7 days ago
Interesting

When they say iCloud passwords, do they mean

* iCloud passwords extension in chrome?

* safari?

* iOS safari?

iOS safari in particular seems to use native OS UI separate from the web page for password form auto completion, I would think it wouldn’t be susceptible?

And what about google chrome’s built in PW manager?

rplnt · 3 days ago
> I want to mention that iCloud Passwords was tested only as a browser extension (Google Chrome, Firefox, etc.) and not as a system application with Safari integration.
rplnt commented on Comet AI browser can get prompt injected from any site, drain your bank account   twitter.com/zack_overflow... · Posted by u/helloplanets
_fat_santa · 3 days ago
IMO the only place you should use Agentic AI is where you can easily rollback changes that the AI makes. Best example here is asking AI to build/update/debug some code. You can ask it to make changes but all those changes are relatively safe since you can easily rollback with git.

Using agentic AI for web browsing where you can't easily rollback an action is just wild to me.

rplnt · 3 days ago
Updating and building/running code is too powerful. So I guess in a VM?
rplnt commented on Evaluating LLMs for my personal use case   darkcoding.net/software/p... · Posted by u/goranmoomin
Workaccount2 · 3 days ago
Would you be willing to share some of those chats?
rplnt · 3 days ago
The most recent one I have was not in English. It was a translation question of a slang word between two non-English languages. It failed miserably (just made up some complete nonsense). Google had no trouble finding relevant pages or images for that word (without any extra prompt), so it was rather unique and not that obscure. Disclaimer: I'm not using any extra prompts like "don't make shit up and just tell me you don't know".

Most recent technical I can remember (and now would be a good time to have the actual prompt) was that I asked whether MySQL has a way to run UPDATE without waiting for lock. Basically ignore rows that are locked. It (Sonnet 4 IIRC) answered of course and gave me an invalid query in the form of `UDPATE ... SKIP LOCKED`;

I can't imagine what damage this does if people are using it for questions they don't/can't verify. Programming is relatively safe in this regard.

But as I noted in my other reply, there will be a bias on my side, as I probably disregard questions that I know how to easily find answers to. That's not something I'd applaud AI for.

rplnt commented on Evaluating LLMs for my personal use case   darkcoding.net/software/p... · Posted by u/goranmoomin
simonw · 4 days ago
It's useful to build up an intuition for what kind of questions LLMs can answer and what kind of questions they can't.

Once you've done that your success rate goes way up.

rplnt · 3 days ago
Oftentimes I ask simple factual questions that I don't know the answer to. This is something it should excel at, yet it usually fails, at least on the first try. I guess I subconsciously ignore questions that are extremely easy to google (if you ignore the worst AI in existence) or can be found by opening the [insert keyword] wikipedia article. You don't need AI for those.
rplnt commented on Evaluating LLMs for my personal use case   darkcoding.net/software/p... · Posted by u/goranmoomin
rplnt · 4 days ago
> Almost all models got almost all my evaluations correct

I find this the most surprising. I have yet to cross 50% threshold of bullshit to possibly truth. In any kind of topic I use LLMs for.

rplnt commented on GPT-5: It just does stuff   oneusefulthing.org/p/gpt-... · Posted by u/paulpauper
gubicle · 17 days ago
How many of these 'this new LLM version is super amazing' stories are paid for?
rplnt · 17 days ago
Do you count personal stakes? Financial or reputational.
rplnt commented on AI is a floor raiser, not a ceiling raiser   elroy.bot/blog/2025/07/29... · Posted by u/jjfoooo4
smiley1437 · a month ago
> people aren't aware of how wrong they can be, and the errors take effort and knowledge to notice.

I have friends who are highly educated professionals (PhDs, MDs) who just assume that AI\LLMs make no mistakes.

They were shocked that it's possible for hallucinations to occur. I wonder if there's a halo effect where the perfect grammar, structure, and confidence of LLM output causes some users to assume expertise?

rplnt · a month ago
Have they never used it? Majority of the responses that I can verify are wrong. Sometimes outright nonse, sometimes believable. Be it general knowledge or something where deeper expertise is required.
rplnt commented on Sumo – Simulation of Urban Mobility   eclipse.dev/sumo/... · Posted by u/Stevvo
aidenn0 · a month ago
Since it's almost on-topic, anyone know if/how these tools emulate sustained irrational behavior?

Example:

For over a decade, the freeway on-ramp nearest my work had two main ways of getting to it from downtown. One of them involved a stop-sign crossing a road that had the right-of-way (i.e. a two-way stop). The other had timed traffic signals. Every evening around 5pm, the traffic would backup from the stop-sign for multiple blocks. Meanwhile the route with lights was completely smooth.

Eventually the stop-sign was replaced with a signal, but I marveled at how many people persisted in making their daily commute much worse than it needed to be.

rplnt · a month ago
If they didn't simulate sustained irrational behavior, there wouldn't be people driving cars in cities.

u/rplnt

KarmaCake day6431July 18, 2011
About
@rplnt
View Original