Readit News logoReadit News
logicprog commented on In a first, Google has released data on how much energy an AI prompt uses   technologyreview.com/2025... · Posted by u/jeffbee
Capricorn2481 · 2 days ago
How in the world do they claim to have 0.24Wh per query? MIT estimates the LLama 3.1 405B model uses 1.86Wh.

GPT-4 is over a trillion parameters. Is there any reason to think they have 2.5x more parameters but somehow use 6x less energy?

logicprog · a day ago
Mixture of Experts. Llama 3.1 405B is a dense model, so to evaluate the next token, the context has to go through literally every parameter in its neural network. Whereas with mixture of experts, it's usually like a sixth to a tenth or even less of the neural network parameters that actually get evaluated for every token. Also, they don't use GPT-4 or 4.5 anymore iirc, which may have been dense (and that's why they were so expensive), 4.1 and 4o are much different models.
logicprog commented on Measuring the environmental impact of AI inference   arstechnica.com/ai/2025/0... · Posted by u/ksec
esperent · 2 days ago
> Figure 4: Median Gemini Apps text prompt emissions over time—broken down by Scope 2 MB emissions (top) and Scope 1+3 emissions (bottom). Over 12 months, we see that AI model efficiency efforts have led to a 47x reduction in the Scope 2 MB emissions per prompt, and 36x reduction in the Scope 1+3 emissions per user prompt—equivalent to a 44x reduction in total emissions per prompt.

Again, it's talking about "median Gemini" while being very careful not to name any specific numbers for any specific models.

logicprog · 2 days ago
You're grouping those words wrong. As another commenter pointed out to you, which you ignored, it's median (Gemini Apps) not (median Gemini) Apps. Gemini Apps is a highly specific thing — with a legal definition even iirc — that does not include search, and encompasses a list of models you can actually see and know.
logicprog commented on AI doesn't lighten the burden of mastery   playtechnique.io/blog/ai-... · Posted by u/gwynforthewyn
aeonik · 7 days ago
I understand they don't have a logic engine built into them, ie no deduction, but I do think their inference is a weak form of reasoning, and I'm not sure about world model.

I suppose it depends on the definition of model.

I currently do consider the transformer weights to be a world model, but having a rigid one based on statistical distributions tend to create pretty wonky behavior at times.

That's why I do agree, relying on your own understanding the code is the best way.

It's amazing seeing these things produce some beautiful functions and designs, and then promptly forget that it exists, and then begin writing incompatible, half re-implemented non-idiomatic code.

If you're blind to what they are doing, it's just going to be layers upon layers of absolute dreck.

I don't think they will get out of cul-de-sacs without a true deductive engine, and a core of hard, testable facts to build on. (I'm honestly a bit surprised that this behavior didn't emerge early in training to be honest).

Though I think humans minds are the same way, in this respect, and fall for the same sort of traps. Though at least our neurons can rewire themselves on the fly.

I know a LOT of people who sparingly use their more advanced reasoning faculties, and instead primarily rely on vibes, or pre-trained biases. Even though I KNOW they are capable of better.

logicprog · 6 days ago
Good comment. I'm pretty much on the same page, my only disagreement is that transformers, if they are a world model, are a world model of some sort of semiotic shadow world, not an experiential physical consistent world like ours, so they're not equipped to handle modelling our world.
logicprog commented on Linux is about to lose a feature – over a personality clash   theregister.com/2025/08/1... · Posted by u/asimops
logicprog · 7 days ago
The thing that this article seems to be studiously avoiding mentioning is Kent Overstreet's repeated refusal to adhere to testing code quality, patch, and release standards for the kernel, such that Linus has had to call him out before and his dispute with the Linux Kernel Code of Conduct Committee. This allows them to paint Kent as some reasonable, rational person that's just being unwarrantedly attacked, left and right by everyone in the mailing list. But that's really bullshit. Honestly, I wish people would stop sharing stuff from the register because it's a hack news blog.
logicprog commented on AI doesn't lighten the burden of mastery   playtechnique.io/blog/ai-... · Posted by u/gwynforthewyn
aeonik · 7 days ago
Something I've noticed recently, is the new Opus 4.1 model seems to be incredibly good at getting out of these cul-de-sacs.

I've always had a subscription to both ChatGPT and Claude, but Claude has recently almost one-shotted major toxic waste dumps from the previous models.

I'll still use ChatGPT, it seems to be pretty good at algorithms, and bouncing ideas back and forth. but when things go off the rails Opus 4.1 bails me out.

logicprog · 7 days ago
The thing is that since these models aren't actually doing reasoning and don't possess internal world models, you're always going to end up having to rely on your own understanding at some point, they can fill in more of the map with things they can do, but they can't ever make it complete. There will always be cul-de-sacs they end up stuck in, or messes they make, or mistakes they consistently keep making, or make stochastically. So, although that's rather neat, it doesn't really change my point, I don't think.
logicprog commented on AI doesn't lighten the burden of mastery   playtechnique.io/blog/ai-... · Posted by u/gwynforthewyn
sitkack · 7 days ago
You can use Claude Code against Kimi K2, DeepSeek, Qwen, etc. The 20$ a month plan gets you access to a token amount of sonnet for coding, but that wouldn't be indicative of how people are using it.

https://gist.github.com/WolframRavenwolf/0ee85a65b10e1a442e4...

We gave Gemini CLI a spin, it is kinda unhinged, I am impressed you were able to get your results. After reading through the Gemini CLI codebase, it appears to be a shallow photocopy knockoff of Claude Code, but it has no built in feedback loops or development guides other than, "you are an excellent senior programmer ..." the built in prompts are embarrassingly naive.

Qwen has it's own agent which I haven't used https://github.com/QwenLM/qwen-code

Another is https://github.com/sst/opencode

logicprog · 7 days ago
> You can use Claude Code against Kimi K2, DeepSeek, Qwen, etc.

Yeah but I wouldn't get a generous free tier, and I am Poor lmao.

> I am impressed you were able to get your results

compared to my brief stint with OpenCode and Claude Code with claude code router, qwen-code (which is basically a carbon copy of gemini cli) is indeed unhinged, and worse than the other options, but if you baby it just right you can get stuff done lol

logicprog commented on AI doesn't lighten the burden of mastery   playtechnique.io/blog/ai-... · Posted by u/gwynforthewyn
sitkack · 7 days ago
Jon Gjengset (jonhoo) Who is famously fastidious did a stream on live coding where he did something similar in terms of control. Worth of a watch if that is a style you want to explore.

https://www.youtube.com/watch?v=EL7Au1tzNxE

I don't have the energy to do that for most things I am writing these days which are small PoC where the vibe is fine.

I suspect as you do more, you will create dev guides and testing guides that can encapsulate more of that direction so you won't need to micromanage it.

If you used Gemini CLI, you picked the coding agent with the worst output. So if you got something that worked to your liking, you should try Claude.

logicprog · 7 days ago
> I suspect as you do more, you will create dev guides and testing guides that can encapsulate more of that direction so you won't need to micromanage it.

Definitely. Prompt adherence to stuff that's in an AGENTS/QWEN/CLAUDE/GEMINI.md is not perfect ime though.

>If you used Gemini CLI, you picked the coding agent with the worst output. So if you got something that worked to your liking, you should try Claude.

I'm aware actually lol! I started with OpenCode+GLM 4.5 (via OpenRouter), but I started burning through cache extremely quickly, and I can't remotely afford Claude Code, so I was using qwen-code mostly just for the 2000 free requests a day and prompt caching abilities, and because I prefer Qwen 3 Coder to Gemini... anything for agentic coding.

logicprog commented on AI doesn't lighten the burden of mastery   playtechnique.io/blog/ai-... · Posted by u/gwynforthewyn
rafterydj · 7 days ago
Interesting. Would you mind elaborating a bit on your workflow? In my work I go back and forth between the "stock" GUIs, and copy-pasting into a separated terminal for model prompts. I hate the vibe code-y agent menu in things like Cursor, I'm always afraid integrated models will make changes that I miss because it really only works with checking "allow all changes" fairly quickly.
logicprog · 7 days ago
Ah, yeah. Some agentic coding systems try to force you really heavily into clicking a loud. I don't think it's intentional, but like, I don't think they're really thinking through the workflow of someone who's picky and wants to be involved as much as I am. So they make it to that, you know, canceling things is really disruptive to the agent or difficult or annoying to do or something. And so it kind of railroads you into letting the agent do whatever it wants, and then trying to clean up after, which is a mess.

Typically, I just use something like QwenCode. One of the things I like about it, and I assume this is true of Gemini CLI as well, is that it's explicitly designed to make it as easy as possible to interrupt an agent in the middle of its thought or execution process and redirect it, or to reject its code changes and then directly iterate on them without having to recapitulate everything from the start. It's as easy as just hitting escape at any time. So I tell it what I want to do by usually giving like a little markdown formatted you know paragraph or so that's you know got some bullet points or some numbers maybe a heading or two, explaining the exact architecture and logic I want for a feature, not just the general feature. And then I let it kind of get started and I see where it's going. And if I generally agree with the approach that it's taking, then I let it turn out a diff. And then if I like the diff after reading through it fully, then I accept it. And if there's anything I don't like about it at all, then I hit Escape and tell it what to change about the disc before it even gets to merge it in.

There are three advantages to this workflow over the chat GPT copy and paste workflow.

One is that the agent can automatically use grep and find and read source files, which makes it much easier and more convenient to load it up with all of the context that it needs to understand the existing style architecture and purpose of your codebase. Thus, it typically generates code that I'm willing to accept more often without me doing a ton of legwork.

The second is that it allows the agent to automatically of its own accord, run things like linters, type checkers, compilers, and tests, and automatically try to fix any warnings or errors in that result, so that it's more likely to produce correct code that adheres to whatever style guide I've provided. Of course, again I could run those tools manually, manually and copy and paste the output into a chat window, but that's just enough extra effort and friction after I've gotten what's ostensibly something working that I know I would be likely to be lazy and not do that at some point. This sort of ensures that it's always done. Some tools like OpenCode even automatically run LSPs and linters and feed that back into the model after the diff is applied automatically, thus allowing it to automatically correct things.

Third, this has the benefit of forcing the AI to use small and localized diffs to generate code, instead of regenerating whole files or just autoregressively completing or filling in the middle for things, which makes it way easier to keep up with what it's doing and make sure you know everything that's going on. It can't slip subtle modifications past you, or, and doesn't tend to generate 400 lines of nonsense.

u/logicprog

KarmaCake day1224April 23, 2019
About
Don't assume what I believe beyond what I tell you, you'll be wrong.

I dislike this website and the capitalist lickspittles, liberal technocrats, and cryptofascist pseudointellectuals that inhabit it. Yet I'm still here, for some reason.

Staunchly against the UNIX cargo-cultists.

Names name me not

View Original