> Taking longer than usual. Trying again shortly (attempt 1 of 10)
> ...
> Taking longer than usual. Trying again shortly (attempt 10 of 10)
> Due to unexpected capacity constraints, Claude is unable to respond to your message. Please try again soon.
I guess I'll have to wait until later to feel the fear...
Because LLMs make it that much faster to develop software, any potential advantage you may get from adopting a very niche language is overshadowed by the fact that you can't use it with an LLM. This makes it that much harder for your new language to gain traction. If your new language doesn't gain enough traction, it'll never end up in LLM datasets, so programmers are never going to pick it up.
I feel as though "facts" such as this are presented to me all the time on HN, but in my every day job I encounter devs creating piles of slop that even the most die-hard AI enthusiasts in my office can't stand and have started to push against.
I know, I know "they just don't know how to use LLMs the right way!!!", but all of the better engineers I know, the ones capable of quickly assessing the output of an LLM, tend to use LLMs much more sparingly in their code. Meanwhile the ones that never really understood software that well in the first place are the ones building agent-based Rube Goldberg machines that ultimately slow everyone down
If we can continue living in the this AI hallucination for 5 more years, I think the only people capable of producing anything of use or value will be devs that continued to devote some of their free time to coding in languages like Gleam, and continued to maintain and sharpen their ability to understand and reason about code.
I pointed out that it would be far more cost–effective to simple let us request hard copies of whatever books we wanted, and then they would just stay in the library. No one worked remotely at the time.
We ended up getting Safari subscriptions for everyone.
And, as far as expenses go for a research institute, $4k/mo is very inexpensive.
LLMs are designed for text completion and the chat interface is basically a fine-tuning hack to make prompting a natural form of text completion to have a more "intuitive" interface for the average user (I don't even want to think about how many AI "enthusiasts" don't really understand this).
But with open/local models in particular: each instruct/chat interface is slightly different. There are tools that help mitigate this, but the more you're working closely to the model the more likely you are to make a stupid mistake because you didn't understand some detail about how the instruct interface was fine tuned.
Once you accept that LLMs are "auto-complete on steroids" you can get much better results by programming the way they were naturally designed to work. It also helps a lot with prompt engineering because you can more easily understand what the models natural tendency is and work with that to generally get better results.
It's funny because a good chunk of my comments on HN these days are combating AI hype, but man are LLMs really fascinating to work with if you approach them with a bit more clear headed of a perspective.
I don't see the equivalence to MCMC. It's not like we have a complex probability function that we are trying to sample from using a chain.
It's just logistic regression at each step.
Every response from an LLM is essentially the sampling of a Markov chain.
But my opinions about what LLMs can do are based on... what LLMs can do. What I can see them doing. With my eyes.
The right answer to the question "What can LLMs do?" is... looking... at what LLMs can do.
You should be doubly skeptically ever since RLHF has become standard as the model has literally been optimized to give you answers you find most pleasing.
The best way to measure of course is with evaluations, and I have done professional LLM model evaluation work for about 2 years. I've seen (and written) tons of evals and they both impress me and inform my skepticism about the limitations of LLMs. I've also seen countless times where people are convinced "with their eyes" they've found a prompt trick that improves the results, only to be shown that this doesn't pan out when run on a full eval suite.
As an aside: What's fascinating is that it seems our visual system is much more skeptical, an eyeball being slightly off created by a diffusion model will immediately set off alarms where enough clever word play from an LLM will make us drop our guard.
If I'm using an MCMC algorithm to sample a probability distribution, I need to wait for my Markov chain to converge to a stationary distribution before sampling, sure.
But in no way is 'a good answer' a stationary state in the LLM Markov chain. If I continue running next-token prediction, I'm not going to start looping.
So for language when I say "Bob has three apples, Jane gives him four and Judy takes two how many apples does Bob have" we're actually pretty far from the part of the linguistic manifold where the correct answer is likely to be. As the chain wanders this space it's getting closer until it finally statistically follow the path "this answer is..." and when it's sampling from this path it's in a much more likely neighborhood of the correct answer. That is, after wandering a bit, more and more of the possible paths are closer to where the actual answer lies than they would be if we had just forced the model to choose early.
edit: Michael Betancourt has great introduction to HMC which covers warm-up and the typical set https://arxiv.org/pdf/1701.02434 (he has a ton more content that dives much more deeply into the specifics)
Example: many people created websites without a clue of how they really work. And got millions of people on it. Or had crazy ideas to do things with them.
At the same time there are devs that know how internals work but can’t get 1 user.
pc manufacturers never were able to even imagine what random people were able to do with their pc.
This to say that even if you know internals you can claim you know better, but doesn’t mean it’s absolute.
Sometimes knowing the fundamentals it’s a limitation. Will limit your imagination.
> “In the beginner’s mind there are many possibilities, but in the expert’s there are few”
Experts do tend to be limited in what they see as possible. But I don't think that allows carte blanche belief that a fancy Markov Chain will let you transcend humanity. I would argue one of the key concepts of "beginners mind" is not radical assurance in what's possible but unbounded curiosity and willingness to explore with an open mind. Right now we see this in the Stable Diffusion community: there are tons of people who also don't understand matrix multiplication that are doing incredible work through pure experimentation. There's a huge gap between "I wonder what will happen if I just mix these models together" and "we're just a few years from surrendering our will to AI". None of the people I'm concerned about have what I would consider an "open mind" about the topic of AI. They are sure of what they know and to disagree is to invite complete rejection. Hardly a principle of beginners mind.
Additionally:
> pc manufacturers never were able to even imagine what random people were able to do with their pc.
Belies a deep ignorance of the history of personal computing. Honestly, I don't think modern computing has still ever returned to the ambition of what was being dreampt up, by experts, at Xerox PARC. The demos on the Xerox Alto in the early 1970s are still ambitious in some senses. And, as much as I'm not a huge fan, Gates and Jobs absolutely had grand visions for what the PC would be.
I have spent a lot of time experimenting with Chain of Thought professionally and I have yet to see any evidence to suggest that what's happening with CoT is any more (or less) than this. If you let the model run a bit longer it enters a region close to the typical set and when it's ready to answer you have a high probability of getting a good answer.
There's absolutely no "reasoning" going on here, except that some times sampling from the typical set near the region of your answer is going to look very similar to how human reason before coming up with an answer.