Readit News logoReadit News
valine commented on Horses: AI progress is steady. Human equivalence is sudden   andyljones.com/posts/hors... · Posted by u/pbui
socketcluster · 10 days ago
Totally agree with the first observation. The default human state seems to be confusion. I've seen this over and over in junior coders.

It's often very creative how junior devs approach problems. It's like they don't fully understand what they're doing and the code itself is part of the exploration and brainstorming process trying to find the solution as they write... Very different from how senior engineers approach coding when it's like you don't even write your first line until you have a clear high level picture of all the parts and how they will fit together.

About the second point, I've been under the impression that because LLMs are trained on average code, they infer that the bugs and architectural flaws are desirable... So if it sees your code is poorly architected, it will generate more of that poorly architected code on top. If it sees hacks in your codebase, it will assume hacks are OK and give you more hacks.

When I use an LLM on a poorly written codebase, it does very poorly and it's hard to solve any problem or implement any feature and it keeps trying to come up with nasty hacks... Very frustrating trial and error process; eats up so many tokens.

But when I use the same LLM on one of my carefully architected side projects, it usually works extremely well, never tries to hack around a problem. It's like having good code lets you tap into a different part of its training set. It's not just because your architecture is easier to build on top, but also it follows existing coding conventions better and always addresses root causes, no hacks. Its code style looks more like that of a senior dev. You need to keep the feature requests specific and short though.

valine · 9 days ago
> About the second point, I've been under the impression that because LLMs are trained on average code, they infer that the bugs and architectural flaws are desirable

This is really only true about base models that haven’t undergone post training. The big difference between ChatGPT and GPT3 was OpenAI’s instruct fine tuning. Out of the box, language models behave how you describe. Ask them a question and half the time they generate a list of questions instead of an answer. The primary goal of post training is to coerce the model into a state in which it’s more likely to output things as if it were a helpful assistant. The simplest version is text at the start of your context window like: “the following is code was written by a meticulous senior engineer”. After a prompt like that the most likely next tokens will never be the models imitation of a sloppy code. Instruct fine tuning does the same thing but as permanent modifications to the weights of the model.

valine commented on Horses: AI progress is steady. Human equivalence is sudden   andyljones.com/posts/hors... · Posted by u/pbui
socketcluster · 10 days ago
I think my software engineering job will be safe so long as big companies keep using average code as their training set. This is because the average developer creates unnecessary complexity which creates more work for me.

The way the average dev structures their code requires like 10x the number of lines as I do and at least 10x the amount of time to maintain... The interest on technical debt compounds like interest on normal debt.

Whenever I join a new project, within 6 months, I control/maintain all the core modules of the system and everything ends up hooked up to my config files, running according to the architecture I designed. Happened at multiple companies. The code looks for the shortest path to production and creates a moat around engineers who can make their team members' jobs easier.

IMO, it's not so different to how entrepreneurship works. But with code and processes instead of money and people as your moat. I think once AI can replace top software engineers, it will be able to replace top entrepreneurs. Scary combination. We'll probably have different things to worry about then.

valine · 10 days ago
Humans don’t learn to write messy complex code. Messy, complex code is the default, writing clean code takes skill.

You’re assuming the LLM produces extra complexity because it’s mimicking human code. I think it’s more likely that LLMs output complex code because it requires less thought and planning, and LLMs are still bad at planning.

valine commented on Karpathy on DeepSeek-OCR paper: Are pixels better inputs to LLMs than text?   twitter.com/karpathy/stat... · Posted by u/JnBrymn
daxfohl · 2 months ago
It seems like we're still pretty far away from that being viable, if chatgpt is any indication. Whenever it suggests "should I generate an image of that <class design, timeline, data model, etc>, it really helps visualize it!", the result is full of hallucinations.
valine · 2 months ago
Image generation and image input are two totally different things. This is about feeding text into LLMs as images, it has nothing to do with image generation.
valine commented on Show HN: I created a small 2D game about an ant   aanthonymax.github.io/ant... · Posted by u/aanthonymax
valine · 3 months ago
Here's my version, took about 5 minutes to create inside the ChatGPT web interface. https://valine.github.io/vibe-coded-ant-game/

I don't know if this game was vibe coded, but it certainly could have been. Most notable thing about this post is probably that vibe coded games are good enough now to fool HN.

valine commented on Show HN: I created a small 2D game about an ant   aanthonymax.github.io/ant... · Posted by u/aanthonymax
krapp · 3 months ago
It's a Show HN that doesn't appear to be vibe-coded AI slop. I'll take it any day of the week.
valine · 3 months ago
Both Claude and GPT5 can single shot this type of game. The score counter looks exactly like the type of thing Claude spits out.
valine commented on iPhone Air   apple.com/newsroom/2025/0... · Posted by u/excerionsforte
hx8 · 3 months ago
I specifically want an iPhone with less mass.

I view my phone primarily as something I'm obligated to carry on myself at all times to function in modern society. The easier it is to carry the better. When I need to upgrade my phone, I'll always choose the smallest iPhone by weight.

valine · 3 months ago
Same. There are really only two features I care about in a phone: a high refresh rates and weight. At 165 grams the iPhone air is by far the lightest 120hz phone apple has ever made. Second place is the iPhone 15 Pro at 187 grams. Getting ready to ditch my 15 pro.
valine commented on From tokens to thoughts: How LLMs and humans trade compression for meaning   arxiv.org/abs/2505.17117... · Posted by u/ggirelli
blackbear_ · 6 months ago
Note that the token embeddings are also trained, therefore their values do give some hints on how a model is organizing information.

They used token embeddings directly and not intermediate representations because the latter depend on the specific sentence that the model is processing. Data on human judgment was however collected without any context surrounding each word, thus using the token embeddings seem to be the most fair comparison.

Otherwise, what sentence(s) would you have used to compute the intermediate representations? And how would you make sure that the results aren't biased by these sentences?

valine · 6 months ago
Embedding models are not always trained with the rest of the model. That’s the whole idea behind VLLMs. First layer embeddings are so interchangeable you can literally feed in the output of other models using linear projection layers.

And like the other commenter said, you can absolutely feed single tokens through the model. Your point doesn’t make any sense though regardless. How about priming the model with “You’re a helpful assistant” just like everyone else does.

valine commented on From tokens to thoughts: How LLMs and humans trade compression for meaning   arxiv.org/abs/2505.17117... · Posted by u/ggirelli
valine · 6 months ago
>> For each LLM, we extract static, token-level embeddings from its input embedding layer (the ‘E‘matrix). This choice aligns our analysis with the context-free nature of stimuli typical in human categorization experiments, ensuring a comparable representational basis.

They're analyzing input embedding models, not LLMs. I'm not sure how the authors justify making claims about the inner workings of LLMs when they haven't actually computed a forward pass. The EMatrix is not an LLM, its a lookup table.

Just to highlight the ridiculousness of this research, no attention was computed! Not a single dot product between keys and queries. All of their conclusions are drawn from the output of an embedding lookup table.

The figure showing their alignment score correlated with model size is particularly egregious. Model size is meaningless when you never activate any model parameters. If Bert is outperforming Qwen and Gemma something is wrong with your methodology.

valine commented on Outcome-Based Reinforcement Learning to Predict the Future   arxiv.org/abs/2505.17989... · Posted by u/bturtel
valine · 7 months ago
So instead of next token prediction its next event prediction. At some point this just loops around and we're back to teaching models to predict the next token in the sequence.
valine commented on Beyond Semantics: Unreasonable Effectiveness of Reasonless Intermediate Tokens   arxiv.org/abs/2505.13775... · Posted by u/nyrikki
x_flynn · 7 months ago
What the model is doing in latent space is auxilliary to anthropomorphic interpretations of the tokens, though. And if the latent reasoning matches a ground-truth procedure (A*), then we'd expect it to be projectable to semantic tokens, but it isn't. So it seems the model has learned an alternative method for solving these problems.
valine · 7 months ago
You’re thinking about this like the final layer of the model is all that exists. It’s highly likely reasoning is happening at a lower layer, in a different latent space that can’t natively be projected into logits.

u/valine

KarmaCake day5125July 26, 2014
About
email: lvaline@attentio.ai twitter: @lukasvaline github: https://github.com/valine

https://attentio.ai

View Original