Readit News logoReadit News
segh commented on Can LLMs do randomness?   rnikhil.com/2025/04/26/ll... · Posted by u/whoami_nr
whoami_nr · 8 months ago
Author here. I know 0-10 is one extra even number. I also just did this for fun so don't take the statistical significance aspect of it very seriously. You also need to run this multiple times with multiple temperature and top_p values to do this more rigorously.
segh · 8 months ago
Cool experiment! My intuition suggests you would get a better result if you let the LLM generate tokens for a while before giving you an answer. Could be another experiment idea to see what kind of instructions lead to better randomness. (And to extend this, whether these instructions help humans better generate random numbers too.)
segh commented on Widespread power outage in Spain and Portugal   bbc.com/news/live/c9wpq8x... · Posted by u/lleims
oilman · 8 months ago
In another life I worked as an engineer commissioning oil rigs and I’ve seen how tricky even a small-scale black start can be. On a rig, we simulate total power loss and have to hand-crank a tiny air compressor just to start a small emergency generator, which then powers the compressors needed to fire up the big ~7MW main generators. It's a delicate chain reaction — and that's just for one isolated platform.

A full grid black start is orders of magnitude more complex. You’re not just reviving one machine — you’re trying to bring back entire islands of infrastructure, synchronize them perfectly, and pray nothing trips out along the way. Watching a rig wake up is impressive. Restarting a whole country’s grid is heroic.

segh · 8 months ago
ChatGPT's tone is slowly taking over the entire internet
segh commented on AI Horseless Carriages   koomen.dev/essays/horsele... · Posted by u/petekoomen
ai_ · 8 months ago
No? Not everyone's dream is being a manager. I like writing code, it's fun! Telling someone else to go write code for me so that I can read it later? Not fun, avoid it if possible (sometimes it's unavoidable, we don't have unlimited time).
segh · 8 months ago
People still play chess, even though now AI is far superior to any human. In the future you will still be able to hand-write code for fun, but you might not be able to earn a living by doing it.
segh commented on AI Horseless Carriages   koomen.dev/essays/horsele... · Posted by u/petekoomen
__float · 8 months ago
The live demos were neat! I was playing around with "The Pete System Prompt", and one of the times, it signed the email literally "Thanks, [Your Name]" (even though Pete was still right there in the prompt).

Just a reminder that these things still need significant oversight or very targeted applications, I suppose.

segh · 8 months ago
The live demos are using a very cheap and not very smart model. Do not update your opinion on AI capabilities based on the poor performance of gpt-4o-mini
segh commented on AI Horseless Carriages   koomen.dev/essays/horsele... · Posted by u/petekoomen
petekoomen · 8 months ago
honestly you could try this yourself today. Grab a few emails, paste them into chatgpt, and ask it to write a system prompt that will write emails that mimic your style. Might be fun to see how it describes your style.

to address your larger point, I think AI-generated drafts written in my voice will be helpful for mundane, transaction emails, but not for important messages. Even simple questions like "what do you feel like doing for dinner tonight" could only be answered by me, and that's fine. If an AI can manage my inbox while I focus on the handful of messages that really need my time and attention that would be a huge win in my book.

segh · 8 months ago
The system prompt can include examples. That is often a good idea.
segh commented on AI agents: Less capability, more reliability, please   sergey.fyi/articles/relia... · Posted by u/serjester
postexitus · 9 months ago
and where is that product that was developed on the edge of current AI capabilities and now with latest AI model plugged in it's suddenly working consistently? All I am seeing is models getting better and better in generating videos of spaghetti eating movie stars.
segh · 9 months ago
For me, they have come from the AI labs themselves. I have been impressed with Claude Code and OpenAI's Deep Research.
segh commented on AI agents: Less capability, more reliability, please   sergey.fyi/articles/relia... · Posted by u/serjester
segh · 9 months ago
Lots of people are building on the edge of current AI capabilities, where things don't quite work, because in 6 months when the AI labs release a more capable model, you will just be able to plug it in and have it work consistently.
segh commented on Why Anthropic's Claude still hasn't beaten Pokémon   arstechnica.com/ai/2025/0... · Posted by u/Workaccount2
cratermoon · 9 months ago
> My guess is that image understanding will improve.

The "it will get better" assertion fails on what's known as the "first step fallacy". The analogy I like to use is the idea of a ladder to the moon. We can build tall self-supporting ladders, and ladder technology has advanced considerably since they were first used at least 10,000 years ago. More recently, in 1862, John H. Balsley made the step ladder safer by changing the rounded rungs to flat steps. Henry Quackenbush patented the extension ladder in 1867. Ladder technology continues to progress, will we someday build a ladder tall enough to climb to the moon?

Do you see the fallacy?

https://thebullshitmachines.com/lesson-16-the-first-step-fal...

segh · 9 months ago
This is the crux of the issue. Whether you think this is like extending a ladder to the moon, or more like we figured out how to get to the moon and are now aiming at Jupiter.
segh commented on Why Anthropic's Claude still hasn't beaten Pokémon   arstechnica.com/ai/2025/0... · Posted by u/Workaccount2
disambiguation · 9 months ago
That's great that someone out there is solving this stuff, but it begs the question why we're watching Claude fumbling around and not seeing this service in action?
segh · 9 months ago
Claude Plays Pokemon is one person's side project to see how well Sonnet can play pokemon. It is a neat LLM benchmark; it's not a serious attempt at making Pokemon-playing AI.
segh commented on Why Anthropic's Claude still hasn't beaten Pokémon   arstechnica.com/ai/2025/0... · Posted by u/Workaccount2
rsynnott · 9 months ago
> OpenAI is quietly seeding expectations for a "PhD-level" AI agent that could operate autonomously at the level of a "high-income knowledge worker" in the near future. Elon Musk says that "we'll have AI smarter than any one human probably" by the end of 2025. Anthropic CEO Dario Amodei thinks it might take a bit longer but similarly says it's plausible that AI will be "better than humans at almost everything" by the end of 2027.

I am baffled that anyone is still buying this complete nonsense. Like, come on.

segh · 9 months ago
Do you disagree that AI will ever reach the level of a "high-income knowledge worker", or do you disagree that it will happen in a year or two?

u/segh

KarmaCake day88July 3, 2018View Original