Things we learned about LLMs in 2024

About "people still thinking LLMs are quite useless", I still believe that the problem is that most people are exposed to ChatGPT 4o that at this point for my use case (programming / design partner) is basically a useless toy. And I guess that in tech many folks try LLMs for the same use cases. Try Claude Sonnet 3.5 (not Haiku!) and tell me if, while still flawed, is not helpful.

But there is more: a key thing with LLMs is that their ability to help, as a tool, changes vastly based on your communication ability. The prompt is the king to make those models 10x better than they are with the lazy one-liner question. Drop your files in the context window; ask very precise questions explaining the background. They work great to explore what is at the borders of your knowledge. They are also great at doing boring tasks for which you can provide perfect guidance (but that still would take you hours). The best LLMs (in my case just Claude Sonnet 3.5, I must admit) out there are able to accelerate you.

mvkel · 8 months ago

I'm surprised at the description that it's "useless" as a programming / design partner. Even if it doesn't make "elegant" code (whatever that means), it's the difference between an app existing at all, or not.

I built and shipped a Swift app to the App Store, currently generating $10,200 in MRR, exclusively using LLMs.

I wouldn't describe myself as a programmer, and didn't plan to ever build an app, mostly because in the attempts I made, I'd get stuck and couldn't google my way out.

LLMs are the great un-stickers. For that reason per se, they are incredibly useful.

theptip · 8 months ago

The context here is super-important - the commenter is the author of Redis. So, a super-experienced and productive low-level programmer. It’s not surprising that Staff-plus experts find LLMs much less useful.

Though I’d be interested if this was an opinion on “help me write this gnarly C algorithm” or “help me to be productive in <new language>” as I find a big productivity increase from the latter.

egometry · 8 months ago

To the un-sticking point: it's also great at letting people ask questions without being perceived as dumb

Tragically - admitting ignorance, even with the desire to learn, often has negative social reprocussions

archagon · 8 months ago

Off topic, but I'm a bit confused. Your iOS apps as listed on your website are CarPrep and Brocly, neither of which appear to have notable review activity or buzz in the media. If the app you're referring to is one of these, the more interesting question (to me) is: how on Earth are you generating $10,200 MRR from it? Or is there another app that I'm missing?

(In my experience as an app developer, getting any traction and/or money from your app can be much more difficult than actually building it.)

mvdtnz · 8 months ago

> I built and shipped a Swift app to the App Store, currently generating $10,200 in MRR, exclusively using LLMs.

My experience is that people who claim they build worthwhile software "exclusively" using LLMs are lying. I don't know you and I don't know if you are lying, but I would be willing to bet my paycheck you are.

yogrish · 8 months ago

May I know what is the name of app that is built using LLM? 10k MRR is highly successful app.

oblio · 8 months ago

> I built and shipped a Swift app to the App Store, currently generating $10,200 in MRR, exclusively using LLMs.

That's great, but professional programmers are afraid of the future maintenance burden.

mellosouls · 8 months ago

I interpreted it as saying that ymmv wrt the models you try and how you use them, and sole exposure to one that doesn't work for you can put you off the whole lot - in this case antirez finds Claude sonnet (with good prompting) very helpful, but gpt 4o (by far the best known due to ChatGPT), not so much and if the latter is representative of others experience it may be why many are still sceptical.

Ruxbin · 8 months ago

May you expand how you did this? I'm seeing a number of apps that claim to do just this and there are number that are becoming super popular.

Not just the development of the code but the entire the thing from the code, infra, auth, cc payments, etc.

fijiaarone · 8 months ago

Strange that you don’t mention your product. Making too much money already?

Tomte · 8 months ago

I tried exactly that, a simple Todo-like app, without SwiftUI or Swift knowledge, and Sonnet 3.5 only gave me one syntax error after another. Now I‘m watching Paul Hudson‘s intro videos.

chairmansteve · 8 months ago

"I built and shipped a Swift app to the App Store, currently generating $10,200 in MRR, exclusively using LLMs".

What's the app?!!

s1mplicissimus · 8 months ago

Would be very interesting to have a look at this app that you wrote using only LLMs. Mind sharing the name?

raydev · 8 months ago

Which service/LLM performed the best for you?

HarHarVeryFunny · 8 months ago

Did you need a Mac for that, or is it possible to use Linux to develop a Swift app targeting iOS?

Would you mind sharing which app you released?

ninth_ant · 8 months ago

I think a lot of the confusion is in how we approach LLMs. Perhaps stemming from the over-broad term “AI”.

There are certain classes of problems that LLMs are good at. Accurately regurgitating all accumulated world knowledge ever is not one, so don’t ask a language model to diagnose your medical condition or choose a political candidate.

But do ask them to perform suitable tasks for a language model! Every day by automation I feed in the hourly weather forecast my home ollama server and it builds me a nice readable concise weather report. It’s super cool!

There are lots of cases like this where you can give an LLM reliable data and ask it to do a language related task and it will do an excellent job of it.

If nothing else it’s an extremely useful computer-human interface.

rrix2 · 8 months ago

> Every day by automation I feed in the hourly weather forecast my home ollama server and it builds me a nice readable concise weather report.

not to dissuade you from a thing you find useful but are you aware that the national weather service produces an Area Forecast Discussion product in each local NWS office daily or more often that accomplishes this with human meteorologists and clickable jargon glossary?

https://forecast.weather.gov/product.php?site=SEW&issuedby=S...

pella · 8 months ago

> so don’t ask a language model to diagnose your medical condition

(o1-preview) LLMs show promise in clinical reasoning but fall short in probabilistic tasks, underscoring why AI shouldn't replace doctors for diagnosis just yet.

"Superhuman performance of a large language model on the reasoning tasks of a physician" https://arxiv.org/abs/2412.10849 [14 Dec 2024]

dinosaurdynasty · 8 months ago

> choose a political candidate

I actually found 4o+search to be really good at this... Admittedly what I did was more "research these candidates, tell me anything newsworthy, pros/cons, etc" (much longer prompt) and well, it was way faster/patient at finding sources than I ever would've been, telling me things I never would've figured out with <5 minutes of googling each set of candidates (which is what I've done before).

Honestly my big rule for what LLMs are good at is stuff like "hard/tedious/annoying to do, easy to verify" and maybe a little more than that. (I think after using a model for a while you can get a "feel" for when it's likely BSing.)

pixl97 · 8 months ago

>don’t ask a language model to diagnose your medical condition

Honestly they are very decent at it if you give them accurate information in which to make the diagnosis. The typical problem people have is being unable to feed accurate information to the model. They'll cut out parts they don't want to think about or not put full test results in for consideration.

mvdtnz · 8 months ago

> Every day by automation I feed in the hourly weather forecast my home ollama server and it builds me a nice readable concise weather report. It’s super cool!

You feed it a weather report and it responds with a weather report? How is that useful?

Deleted Comment

sdesol · 8 months ago

> Perhaps stemming from the over-broad term “AI”.

No, I think if we follow the money, we will find the problem.

uludag · 8 months ago

I don't think people finding LLMs useless is a good representation of the general sentiment though. I feel that more than anything, people are annoyed at LLM slop. Someone uses an LLM too much to write code, they create "slop," which ends up making things worse.

antirez · 8 months ago

Unfortunately complex tools will be misused by part of the population. There is no easy escape from that in the modernity of possibilities. Look at the Internet itself.

gre · 8 months ago

Yes but then they can prompt it to golf the code and most of the slop goes away. This sometimes breaks the code.

miki123211 · 8 months ago

> But there is more: a key thing with LLMs is that their ability to help, as a tool, changes vastly based on your communication ability. The prompt is the king to make those models 10x better than they are with the lazy one-liner question.

People keep saying this, and there are use cases for which this is definitely the case, but I find the opposite to be just as true in some circumstances.

I'm surprised at how good LLMs are at answering "me be monkey, me have big problem with code" questions. For simple one-offs like "how to do x in Pandas" (a frequent one for me), I often just give Claude a mish-mash of keywords, and it usually figures out what I want.

An example prompt of mine from yesterday, which Claude successfully answered, was "python sha256 of file contents base64 safe for fs path."

With a system prompt to make Claude's output super brief and a command to execute queries from the terminal via Simon Willison's LLM tool, this is extremely useful.

mewpmewp2 · 8 months ago

Using the correct keywords like you did is part of communication though.

Good communication with LLMs is the least keywords used to make it deducible for LLM what you exactly want.

mikehollinger · 8 months ago

and

> a key thing with LLMs is that their ability to help, as a tool, changes vastly based on your communication ability.

I still hold that the innovations we've seen as an industry with text transfer to the data from other domains. And there's an odd misbehavior with people that I've now seen play out twice -- back in 2017 with vision models (please don't shove a picture of a spectrogram into an object detector), and today. People are trying to coerce text models to do stuff with data series, or (again!) pictures of charts, rather than paying attention to timeseries foundation models which directly can work on the data.[1]

Further, the tricks we're seeing with encoder / decoder pipelines should work for other domains. And we're not yet recognizing that as an industry. For example, whisper or the emerging video models are getting there, but think about multi-spectral satellite data, fraud detection (a type graph problem).

There's lots of value to unlock from coding models. They're just text models. So what if you were to shove an abstract syntax tree in as the data representation, or the intermediate code from LLVM or a JVM or whatever runtime and interact with that?

[1] https://huggingface.co/ibm-granite/granite-timeseries-ttm-r1 - shout-out to some former colleagues!

simonw · 8 months ago

Andrej Karpathy: https://twitter.com/karpathy/status/1835024197506187617

> It's a bit sad and confusing that LLMs ("Large Language Models") have little to do with language; It's just historical. They are highly general purpose technology for statistical modeling of token streams. A better name would be Autoregressive Transformers or something.

> They don't care if the tokens happen to represent little text chunks. It could just as well be little image patches, audio chunks, action choices, molecules, or whatever. If you can reduce your problem to that of modeling token streams (for any arbitrary vocabulary of some set of discrete tokens), you can "throw an LLM at it".

monero-xmr · 8 months ago

The environmental arguments are hilarious to me as a diehard crypto guy. The ultimate answer to “waste” of electricity arguments is that energy is a free market and people pay the price if it’s useful for them. As long as the activity isn’t illegal then training LLMs or mining bitcoins, it doesn’t matter. I pay for the electricity I use.

swalsh · 8 months ago

I'm a big believer in Claude. I've accomplished some huge productivity gains by leveraging it. That said, I can see places where the models are strong and weak. If you're doing react, or python. These models are incredible. C#, C++ they're not terrible. Rust though, it's not great. If your experience is exclusively trying to use it to write Rust, it doesn't matter if you're using o1, Claude or anything else. It's just not great at it yet.

duped · 8 months ago

> Try Claude Sonnet 3.5 (not Haiku!) and tell me if, while still flawed, is not helpful.

It's not as helpful as Google was ten years ago. It's more helpful than Google today, because Google search has slowly been corrupted by garbage SEO and other LLM spam, including their own suggestions.

ChicagoDave · 8 months ago

Claude Sonnet 3.5 can write whole React applications with proper contextual clues and some minor iterations. Google has never coded for you.

I’ve written two large applications and about a dozen smaller ones using Claude as an assistant.

I’m a terrible front-end developer and almost none of that work was possible without Claude. The API and AWS deployment were sped up tremendously.

I’ve created unit tests and I’ve read through the resulting code and it’s very clean. One of my core pre-prompt requirements has always been to follow domain-driven design principles, something a novice would never understand.

I also start with design principles and a checklist that Claude is excellent at providing.

My only complaint is you only have a 3-4 hour window before you’re cutoff for a few hours.

And needing an enterprise agreement to have a walled garden for proprietary purposes.

I was not a fan in Q1. Q2 improved. Q3 was a massive leap forward.

Deleted Comment

bdangubic · 8 months ago

comparing google to claude 3.5 is like comparing tesla s plaid with a horse

jimt1234 · 8 months ago

Google Search has been corrupted by...Google.

Dead Comment

emptiestplace · 8 months ago

What a hilariously absurd statement. You might want to actually try it.

abhijeetpbodas · 8 months ago

> ask very precise questions explaining the background

IME, being forced to write about something or verbally explaining/enumerating things in detail _by itself_ leads to a lot of clarity in the writer's thoughts, irrespective of if there's an LLM answering back.

People have been doing rubber-duck-debugging since long. The metaphorical duck (LLMs in our context), if explained to well, has now started answering back with useful stuff!

danielbln · 8 months ago

One thing LLMs have been incredibly strong even since gpt-3.5 is being the most advanced non-human rubber duck, and while they can do plenty more, that alone provides (me at least) with tremendous utility.

aleph_minus_one · 8 months ago

> About "people still thinking LLMs are quite useless", I still believe that the problem is that most people are exposed to ChatGPT 4o that at this point for my use case (programming / design partner) is basically a useless toy. And I guess that in tech many folks try LLMs for the same use cases. Try Claude Sonnet 3.5 (not Haiku!) and tell me if, while still flawed, is not helpful.

I see much deeper problems. Just to give two examples:

- I asked various AIs concerning explanations of proofs of some deep (established) mathematical theorems: the explanations were to my understanding very hallucinated, and thus worse than "obviously wrong". I also asked for literature references for some deep mathematical theory frameworks: bascially all of the references were again hallucinated.

- I asked lots of AIs on https://lmarena.ai/ to write a suitably long text about some political topic that is quite controversial in my country (but does have lots proponents even in a very radical formulation, even though most people would not use such a radical formulation in public). All of the LLMs that I checked refused or tried to indoctrinate me that this thesis is wrong. I did not ask the LLM to lecture me, but I gave it a concrete task! Society is deeply divided, so if the LLM only spreads propaganda of its political teaching, it will be useless for many tasks for a very significant share of the society.

kromem · 8 months ago

Both new Sonnet and Haiku have a masking overhead.

Using a few messages to get them out of "I aim to be direct" AI assistant mode gets much better overall results for the rest of the chat.

Haiku is actually incredibly good at high level systems thinking. Somehow when they moved to a smaller model the "human-like" parts fell away but the logical parts remained at a similar level.

Like if you were taking meeting notes from a business strategy meeting and wanted insights, use Haiku over Sonnet, and thank me later.

cruffle_duffle · 8 months ago

To get the most out of them you have to provide context. Treat these models like some kind of eager beaver junior engineer who wants to jump in and write code without asking questions. Force it to ask questions (eg: “do not write code yet, please restate my requirements to make sure we are in alignment. Are there any extra bits of context or information that would help? I will tell you when to write code”)

If your model / chat app has the ability to always inject some kind of pre-prompt make sure to add something like “please do not jump to writing code. If this was a coding interview and you jumped to writing code without asking questions and clarifying requirements you’d fail”.

At the top of all your source files include a comment with the file name and path. If you have a project on one of these services add an artifact that is the directory tree (“tree —-gitignore” is my goto). This helps “unaided” chats get a sense of what documents they are looking at.

And also, it’s a professional bullshitter so don’t trust it with large scale code changes that rely on some language / library feature you don’t have personal experience with. It can send you down a path where the entire assumption that something was possible turns out to be false.

Does it seek like a lot of work? Yes. Am I actually more productive with the tool than without? Probably. But it sure as shit isn’t “free” in terms of time spent providing context. I think the more I use these models, the more I get a sense of what it is good at and what is going to be a waste of time.

Long story short, prompting is everything. These things aren’t mind readers (and worse they forget everything in each new session)

layer8 · 8 months ago

You are right, but doing all that is incredibly cumbersome, at least to some people, which is why they don’t like working with LLMs.

isoprophlex · 8 months ago

Super interesting that my experience mirrors exactly what you are writing... except for me finding Claude to be almost useless (often misunderstands me, gives answers that are plain wrong) and 4o to be a very helpful, if not somewhat dull, jack-of-all trades in helping me be a cruise control for the mind.

I could only ever really jam with 4o.

Makes me wonder if there's personal communication preferences at play here.

hdjjhhvvhga · 8 months ago

While Claude Sonnet is superior than 4o for most my use cases, there are still occasionally some specific tasks where it performs slightly better.

antirez · 8 months ago

Probably. But statistically to work with 4o is a lose of time for me. LLMs is like an investment: you write the prompts, you "work" with them. If the LLM is too weak, this is a lose of time. You need to have a return on the investment that is positive. With ChatGPT 4o / o1 most of the times for me the investment of time has almost zero return. Before Claude Sonnet 3.5 I already had a ChatGPT PRO account but never used it for coding since it was most of the times useless if not for throw away scripts that I didn't want to do myself or as a stack overflow replacement for trivial stuff. Now it's different.

tootie · 8 months ago

Like what? Claude has become my go-to, but I find that it's wrong enough often enough that I really can't trust it for anything. If it says something I have to go dig through it's citations very carefully.

minimaxir · 8 months ago

> Claude Sonnet 3.5 (not Haiku!)

A very big surprise is just how much better Sonnet 3.5 is than Haiku. Even the confusingly-more-expensive-Haiku-variant Haiku 3.5 that's more recent than Sonnet 3.5 is still much worse.

worldsayshi · 8 months ago

I ponder if LLM:s are very useful but at a quite narrower set of tasks than we expect. Like fuzzy manipulation of logical specifications.

I.e. over time it constitute a fundamental shift in how we interact with abstractions in computers. The current fundamentals will still remain but they will become increasingly malleable. Details in code will become less important. Architecture will become increasingly important. But at the same time the cost of refactoring or changing architecture will quickly drop.

Any details that are easily lost when passing through an LLM will be details that have the highest maintenance cost. Any important details that can be retained by an LLM can move up and down the ladder of abstraction at will.

Can an LLM based solution maintain software architectures without introducing noise? The answer to that is the difference between somewhat useful and game changing.

vasco · 8 months ago

Most people consider their own brain useless and don't use it, so it's not strange that they do the same with AI. How many people just refuse to learn how to parallel park, a new language, calculus or even basic arithmetic, "because they aren't good at it".

qwertox · 8 months ago

LLMs have given computers the ability to communicate with us in natural language, we didn't have that before at this level. In order to do this, they've been fed with a lot of coherent stuff and give the impression of being coherent, but we know they're just statistical machines. But at least they can now communicate naturally with us, so now we have that infrastructure available, as we do have TTS or ASR or monitors and keyboards available. It's still up to us to now make proper agents out of them. Agents for the software we've been using for decades. They can take over a lot of tedious work for us.

salawat · 8 months ago

Why are you pasting huge chunks of potentially crown jewels code into a 3rd party service where prompts are going to most likely be turned into training/surveillance material?

simonw · 8 months ago

A lot of vendors promise not to train on input to their models. I choose to believe those promises.

zahlman · 8 months ago

>They are also great at doing boring tasks for which you can provide perfect guidance (but that still would take you hours)

All the tasks I can think of dealing with on my own computer that would take hours, a) are actually pretty interesting to me and b) would equally well take hours to "provide perfect guidance". The drudge work of programming that I notice comes in blocks of seconds at a time, and the mental context switch to using an LLM would be costlier.

jsheard · 8 months ago

I swear these goalposts keep getting moved, I remember being told that GPT3.5 is a useless toy but the paid GPT4 is lifechanging, and now that GPT4 is free I'm told that it's a useless toy but paid o1 or paid Sonnet are lifechanging. Looking forward to o1 and Sonnet becoming useless toys, unlike the lifechanging o3.

raincole · 8 months ago

Except GPT4 isn't free.

The GP is claiming GPT4o is bad but Sonnet is good. GPT4o is about only 20% cheaper than Sonnet.

aetherson · 8 months ago

You will also be dismayed to hear that a 2011 iPhone is no longer state-of-the-art, and indeed can't run most modern apps.

FooBarWidget · 8 months ago

Why do people have such narrow views on what makes LLMs useful? I use them for basically everything.

My son throwing an irrational tantrum at the amusement park and I can't figure out why he's like that (he won't tell me or he doesn't know himself either) or what I should do? I feed Claude all the facts of what happened that day and ask for advice. Even if I don't agree with the advice, at the very least the analysis helps me understand/hypothesize what's going on with him. Sure beats having to wait until Monday to call up professionals. And in my experience, those professionals don't do a better job of giving me advice than Claude does.

It's weekend, my wife is sick, the general practitioner is closed, the emergency weekend line has 35 people in the queue, and I want some quick half-assed medical guidance that while I know might not be 100% reliable, is still better than nothing for the next 2 hours? Feed all the symptoms and facts to Claude/ChatGPT and it does an okay job a lot of the time.

I've been visiting Traditional Chinese Medicine (TCM) practitioner for a week now and my symptoms are indeed reducing. But TCM paradigm and concepts are so different from western medicine paradigms and concepts that I can't understand the doctor's explanation at all. Again, Claude does a reasonable job of explaining to me what's going on or why it works from a western medicine point of view.

Want to write a novel? Brainstorm ideas with GPT-4o.

I had a debate with a friend's child over the correct spelling of a Dutch word ("instabiel" vs "onstabiel"). Google results were not very clear. ChatGPT explained it clearly.

Just where is this "useless" idea coming from? Do people not have a life outside of coding?

krapp · 8 months ago

Yes people have lives outside of coding, but most people are able to manage without having AI software intercede in as much of their lives as possible.

It seems like you trust AI more than people and prefer it to direct human interaction. That seems to be satisfying a need for you that most people don't have.

jiggawatts · 8 months ago

At the risk of sounding impolite or critical of your personal choices: this, right here, is the problem!

You don’t understand how medicine works, at any level.

Yet you turn to a machine for advice, and take it at face value.

I say these things confidently, because I do understand medicine well enough to not to seek my own answers. Recently I went to a doctor for a serious condition and every notion I had was wrong. Provably wrong!

I see the same behaviour in junior developers that simply copy-paste in whatever they see in StackOverflow or whatever they got out of ChatGPT with a terrible prompt, no context, and no understanding on their part of the suitability of the answer.

This is why I and many others still consider AIs mostly useless. The human in the loop is still the critical element. Replace the human with someone that thinks that powdered rhino horn will give them erections, and the utility of the AI drops to near zero. Worse, it can multiply bad tendencies and bad ideas.

I’m sure someone somewhere is asking DeepSeek how best to get endangered animals parts on the black market.

tokai · 8 months ago

This reads like satire to me. Scarry that it isn't.

CRConrad · 8 months ago

Scary. Reads rather like you're well on your way to replace basic life skills with reliance on LLMs.

qsort · 8 months ago

I believe it's more frustration directed at the mismatch between marketing and reality, combined with the general well deserved growing hatred for SV culture, and, more broadly, software engineers. The sentiment would be completely different if the entire industry marketed themselves like the helpful tools they are rather than the second coming of Christ they aren't. This distinction is hard to make on "fast food" forums like this one.

If you aren't a coder, it's hard to find much utility in "Google, but it burns a tree whenever you make an API call, and everything it tells you might be wrong". I for one have never used it for anything else. It just hasn't ever come up.

It's great at cheating on homework, kids love GPTs. It's great at cheating in general, in interviews for instance. Or at ruining Christmas, after this year's LLM debacle it's unclear if we'll have another edition of Advent of Code. None of this is the technology's fault, of course, you could say the same about the Internet, phones or what have you, but it's hardly a point in favor either.

And if you are a coder, models like Claude actually do help you, but you have to monitor their output and thoroughly test whatever comes out of them, a far cry from the promises of complete automation and insane productivity gains.

If you are only a consumer of this technology, like the vast majority of us here, there isn't that much of an upside in being an early adopter. I'll sit and wait, slowly integrating new technology in my workflow if and when it makes sense to do so.

Happy new year, I guess.

fragmede · 8 months ago

> there isn't that much of an upside in being an early adopter.

Other than, y'know, using the new tools. As a programmer heavy forum, we focus a lot on LLMs' (lack of) correctness. There's more than a little bit of annoyance when things are wrong, like being asked to grab the red blanket and then getting into an argument over it being orange instead of what was important, someone needed the blanket because they were cold.

Most of the non-tech people who use ChatGPT that I've talked to absolutely love it because they don't feel it judges them for asking stupid questions and they have conversations about absolutely everything in their lives with it down to which outfit to wear to the party. There are wrong answers to that question as well, but they're far more subjective and just having another opinion in the room is invaluable. It's just a computer and won't get hurt if you totally ignore it's recommendations, and even better, it won't gloat (unless you ask it to) if you tell it later that it was right and you were wrong.

Some people have found upsides for themselves in their lives, even at this nascent stage. No one's forcing you to use one, but your job isn't going to be taken by AI, it's going to be taken by someone else who can outperform you that's using AI.

weMadeThat · 8 months ago

> They work great to explore what is at the borders of your knowledge.

But not at exploring what is at the border of knowledge itself. And by converging on the conventional, LLMs actually lead you away from anything that actually extends.

> doing boring tasks for which you can provide perfect guidance

That's true but you never need an LLM for that. There are wonderful scripts written by wonderful people and provided for free almost all the time and for those who search in the right places. LLM companies benefit/profit of these without providing anything in return.

They are worse than people who grab FOSS and turn it into overpriced and aggressively marketed business models and services or people who threaten and sue FOSS for being better and free alternatives to their bloated and often "illegally telemetric" services.

> able to accelerate you

True, but you leave too much for data brokers and companies like Meta to abuse and exploit in the future. All that additional "interactional data" will do so much worse to humanity than all those previous data sets did in elections, for example, or pretty much all consumer markets. They will mostly accelerate all these dimwitted Fortune 5000 companies that have sabotaged consumers into way too much dumb shit - way more than is reasonable or "ok". And educated, wealthy and or tech-savvy people won't be able to avoid/evade any of that. Especially when it's paired with meds, drugs, foods, biases, fallacies, priming and so on and all the knowledge we will gain on bio-chemical pathways and human liability to sabotage.

They are great for coders, of course, everyone can be an army of clone-warriors with auto-complete on steroids now and nobody can tell you what to do with all that time that you now have and all that money, which, thanks to all of us but mostly our ancestors, is the default. The problem is the resulting hyper-amplified, augmented financial imbalance. It's gonna fuck our species if all the technical people don't restore some of that balance, and everybody knows what that means and what must be done.

atombender · 8 months ago

Is there a way to use this in Jetbrains IDEs? (I've not been impressed with their AI Assistant.) There are a few plugins, but from the reviews they all seem kind of mediocre.

cube2222 · 8 months ago

I personally use the Zed editor AI assistant integration with Sonnet for anything AI-related, while using a JetBrains IDE for coding / code reading, side-by-side.

I haven’t found anything comparably good for JetBrains IDEs yet, but I’m also not switching to something else as my main editor.

Too · 8 months ago

Github copilot plugin is decent. It's not going to write a whole app for you, but it accelerates repetitive stuff, can give suggestions you didn't think of or save you a trip to the documentation.

Sn0wCoder · 8 months ago

I use IntelliJ as my main coding tool but also use VSCode and Sublime text. If you have access to local LLMs or have an API key for some the Continue Plugin (basically Cursor but can use in IntelliJ) is the Best of the Best for IntelliJ (IMO). I have a box running some local models including Phind and StarCoder (plus some small embeddings) and have been super happy with the end product. The next up is Google Gemini Code Assist has been the best of the IntelliJ (non-configured) AI tools I have tried. There are better ones out there but IMO not for IntelliJ. It's still free for a few more weeks and I have been using it since the free release, fun to use. Can pre-prompt, say you are an expert XXX, please be funny, fill in the rest of your regular prompts. The Co-Pilot I use for work is very limited and will only answer coding questions. I tried to tell it that it was my coding buddy, and its name was Phil and told me it cannot have a personality or be funny. I believe the paid personal Co-Pilot allows you to choose which LLM it uses (I cannot confirm). The Phind VSCode plugin works really well. Also, the Phind coding models are on par with some of the other big ones and free if you have a subscription (or run locally). Sublime is around to open those GIG+ files as VSCode chocks and not worth the RAM of opening another IntelliJ.

Each task / programming language / query requires trying different LLM models and novel ways of prompting. If it's not work-related (or work pays for the one you use) sending as much of the code as relevant also helps the answers be more useful.

Most of the people I meet that say LLMs are not useful have only tried one (flavor / plugin), do not know how to pre-prompt or prompt, and do not give the tools a chance. Try one or two things, say yep, it's not good and give up.

Still hard for me to admit that Prompt Engineering is a profession, but it's the same as Google Fu. Once you learn it you can become an LLM Ninja!

I do not believe LLMs are coming for my job (just yet) but do believe they are going to be able to replace some people, are useful and those that do not use them will be at a disadvantage.

cpursley · 8 months ago

Try Cursor. I’m serious.

wslh · 8 months ago

Right, in simpler terms: The measure of LLMs success is how effectively they help you achieve your goal faster.

antirez · 8 months ago

Exactly, and right now the LLMs acceleration effect is a tool, not "give me the final solution". Even people that can't code, using LLMs to build applications from scratch, still have this tool mindset. This is why they can use them effectively: they don't stop at the first failed solution; they provide hints to the LLM, test the code, try to figure what's the problem (also with the LLM help), and so forth. It's a matter of mindset.

Deleted Comment

brookst · 8 months ago

I’m surprised you only have one use case. I use LLMs to research travel, adjust recipes, check biographies and book reviews, and many many more things.

mhh__ · 8 months ago

Hopefully things have narrowed but you can see from the trends data just how few people (API may be a different story) use claude relative to chatgpt.

minimaxir · 8 months ago

Brand awareness is a hell of a drug.

ddgflorida · 8 months ago

Definitely not a "useless toy" with the right use case. It's great at code snippets, scripts, etc. It's an assistant.

Deleted Comment

ta12653421 · 8 months ago

ClaudeAI ++1000

1oooqooq · 8 months ago

yeah, they save as much time as finding a template with a good old search and using it.

Dead Comment

dxbydt · 8 months ago

> best LLMs are able to accelerate you

https://www2.math.upenn.edu/~ghrist/preprints/LAEF.pdf - this math textbook was written in just 55 days!

Paraphrasing the acknowledgements -

...Begun November 4, 2024, published December 28, 2024.

...assisted by Claude 3.5 sonnet, trained on my previous books...

...puzzles co-created by the author and Claude

...GPT-4o and -o1 were useful in latex configurations...doing proof-reading.

...Gemini Experimental 1206 was an especially good proof-reader

...Exercises were generated with the help of Claude and may have errors.

...project was impossible without the creative labors of Claude

The obvious comparison is to the classic Strang https://math.mit.edu/~gs/everyone/ which took several *years* to conceptualize, write, peer review, revise and publish.

Ok maybe Strang isn't your cup of tea, :%s/Strang/Halmos/g , :%s/Strang/Lipschutz/g, :%s/Strang/Hefferon/g, :%s/Strang/Larson/g ...

Working through the exercises in this new LLMbook, I'm thinking...maybe this isn't going to stand the test of time. Maybe acceleration is not so hot after all.

pton_xd · 8 months ago

"The story of linear algebra begins with systems of equations, each line describing a constraint or boundary traced upon abstract space. These simplest mathematical models of limitation — each equation binding variables in measured proportion — conjoin to shape the realm of possible solutions. When several such constraints act in concert, their collaboration yields three possible fates: no solution survives their collective force; exactly one point satisfies all bounds; or infinite possibilities trace curves and planes through the space of satisfaction. This trichotomy — of emptiness, uniqueness, and infinity — echoes through all of linear algebra, appearing in increasingly sophisticated forms as our understanding deepens."

Maybe I'm not the target audience, but... that really doesn't make me interested in continuing to read.

datadrivenangel · 8 months ago

Going faster isn't good if the quality drops enough that overall productivity decreases... Infinite slop is only a good thing for pigs.

kianN · 8 months ago

^ This perfectly encapsulates the story I see every time someone digs into the details of any llm generated or assisted content that has any level of complexity.

Great on the surface but lacks any depth, cohesive, or substance

mooreds · 8 months ago

I started a book about CIAM (customer identity and access management) using Claude to help outline a chapter. I'd edit and refine the outline to make sure it covered everything.

Then I'd have Claude create text. I'd then edit/refine each chapter's text.

Wow, was it unpleasant. It was kinda cool to see all the words put together, but editing the output was a slog.

It's bad enough editing your own writing, but for some reason this was even worse.

dxbydt · 8 months ago

just to clarify - I have nothing to do with this book. I was just forwarded a copy and I thought its relevant to the topic at hand. from the wild swings in karma, looks like people are annoyed with the message and shooting down the messenger.

karmakaze · 8 months ago

We're at the "computers play chess badly" stage. Then we'll hit the Deep Thought (1988) and Deep Blue (1995-1997) stages, but still saying that solving Go won't happen for 50+ years and that humans will continue to be better than computers.

The date/time that divides my world into before/after is AlphaGo v Lee Sedol game 3 (2016). From that time forward, I don't dismiss out of hand speculations of how soon we can have intelligent machines. Ray Kurzweil date of 2045 is as good as any (and better than most) for an estimate. Like Moore's (and related) Laws, it's not about how but the historical pace of advancements crossing a fairly static point of human capability.

Application coding, requires much less intelligence than playing Go at these high levels. The main differences are concise representation and clear final outcome scoring. LLMs deal quite well with the fuzziness of human communications. There may be a few more pegs to place but when seems predictably unknown.

> There’s a flipside to this too: a lot of better informed people have sworn off LLMs entirely because they can’t see how anyone could benefit from a tool with so many flaws. The key skill in getting the most out of LLMs is learning to work with tech that is both inherently unreliable and incredibly powerful at the same time. This is a decidedly non-obvious skill to acquire!

I wish the author qualified this more. How does one develop that skill?

What makes LLMs so powerful on a day to day basis without a large RAG system around it?

Personally, I try LLMs every now and then, but haven’t seen any indication of their usefulness for my day to day outside of being a smarter auto complete.

lumost · 8 months ago

When I started my career in 2010, google was a semi-serious skill. All of the little things that we know how to do now such as ignoring certain sites, lingering on others, and iteratively refining our search queries were not universally known at the time. Experienced engineers often relied on encyclopedic knowledge of their environment or by "reading the manual".

In my experience, LLM tools are the same, you ask for something basic initially and then iteratively refine the query either via dialog or a new prompt until you get what you are looking for or hit the end of the LLM's capability. Knowing when you've reached the latter is critically important.

layer8 · 8 months ago

One difference is that skillful googling still only involved typing a few keywords or a short phrase and some syntax, and then knowing how to skim the results and iterate, and how to operate your browser efficiently. With LLMs, you have to type a lot more (and/or use voice input), and often also read more, it’s also not stateless/repeatable like following a web link, and most output looks the same (as opposed to the variations in web sites). I pride(d) myself on my Google foo, it was fun, but I find using LLMs to be quite exhausting in comparison.

o11c · 8 months ago

The problems with that skill is that:

* Most existing LLM interfaces are very bad at editing history, instead focusing entirely on appending to history. You can sort of ignore this for one-shot, and this can be properly fixed with additional custom tools, but ...

* By the time you refine your input enough to patch over all the errors in the LLM's output for your sensible input, you're bigger than the LLM can actually handle (much smaller than the alleged context window), so it starts randomly ignoring significant chunks of what you wrote (unlike context-window problems, the ignored parts can be anywhere in the input).

Obscurity4340 · 8 months ago

Googlefu is how its usually called. It would be fantastic if there was a general course to teach it

simonw · 8 months ago

One of the things I find most frustrating about LLMs is how resistant they are to teaching other people how to use them!

I'd love to figure this out. I've written more about them than most people at this point, and my goal has always been to help people learn what they can and cannot do - but distilling that down to a concise set of lessons continues to defeat me.

The only way to really get to grips with them is to use them, a lot. You need to try things that fail, and other things that work, and build up an intuition about their strengths and weaknesses.

The problem with intuition is it's really hard to download that into someone else's head.

I share a ton of chat conversations to show how I use them - https://simonwillison.net/tags/tools/ and https://simonwillison.net/tags/ai-assisted-programming/ have a bunch of links to my exported Claude transcripts.

bjt · 8 months ago

Thank you for doing this work, though.

My first stab at trying ChatGPT last year was asking it to write some Rust code to do audio processing. It was not a happy experience. I stepped back and didn't play with LLMs at all for a while after that. Reading your posts has helped me keep tabs on the state of the art and decide to jump back in (though with different/easier problems this time).

Deleted Comment

mvdtnz · 8 months ago

It's really important to go and read the code that the author of this article actually produces with LLMs. He posted on hacker news a few months ago, a post called something like "everything I've made with ChatGPT in the month of September" or something. He's producing little toy applications that don't even begin to resemble real production code. He thinks these "tools" are useful because they help him write pointless slop.

simonw · 8 months ago

Here's that post: https://simonwillison.net/2024/Oct/21/claude-artifacts/

You're misrepresenting it here.

The point of that post isn't "look at these incredible projects I've built (proceeds to show simple projects)."

It's "I built 14 small and useful tools in a single week, each taking between 2 and 10 minutes".

The thing that's interesting here is that I can have an LLM kick out a working prototype of a small, useful tool in only a little more time than it takes to run a Google search.

That post isn't meant to be about writing "real production code". I don't know why people are confused over that.

Philpax · 8 months ago

Do you know who Simon is?

BeetleB · 8 months ago

I think most tech folks struggle with it because they treat LLMs as computer programs, and their experience is that SW should be extremely reliable - imagine using a calculator that was wrong 5% of the time - no one would accept that!

Instead, think of an LLM as the equivalent of giving a human a menial task. You know that they're not 100% reliable, and so you give them only tasks that you can quickly verify and correct.

Abstract that out a bit further, and realize that most managers don't expect their reports to be 100% reliable.

Don't use LLMs where accuracy is paramount. Use it to automate away tedious stuff. Examples for me:

Cleaning up speech recognition. I use a traditional voice recognition tool to transcribe, and then have GPT clean it up. I've tried voice recognition tools for dictation on and off for over a decade, and always gave up because even a 95% accuracy is a pain to clean up. But now, I route the output to GPT automatically. It still has issues, but I now often go paragraphs before I have to correct anything. For personal notes, I mostly don't even bother checking its accuracy - I do it only when dictating things others will look at.

And then add embellishments to that. I was dictating out a recipe I needed to send to someone. I told GPT up front to write any number that appears next to an ingredient as a numeral (i.e. 3 instead of "three"). Did a great job - didn't need to correct anything.

And then there are always the "I could do this myself but I didn't have time so I gave it to GPT" category. I was giving a presentation that involved graphs (nodes, edges, etc). I was on a tight deadline and didn't want to figure out how to draw graphs. So I made a tabular representation of my graph, gave it to GPT, and asked it to write graphviz code to make that graph. It did it perfectly (correct nodes and edges, too!)

Sure, if I had time, I'd go learn graphviz myself. But I wouldn't have. The chances I'll need graphviz again in the next few years is virtually 0.

I've actually used LLMs to do quick reformatting of data a few times. You just have to be careful that you can verify the output quickly. If it's a long table, then don't use LLMs for this.

Another example: I have a custom note taking tool. It's just for me. For convenience, I also made an HTML export. Wouldn't it be great if it automatically made alt text for each image I have in my notes? I would just need to send it to the LLM and get the text. It's fractions of a cent per image! The current services are a lot more accurate at image recognition than I need them to be for this purpose!

Oh, and then of course, having it write Bash scripts and CSS for me :-) (not a frontend developer - I've learned CSS in the past, but it's quicker to verify whatever it throws at me than Google it).

Any time you have a task and lament "Oh, this is likely easy, but I just don't have the time" consider how you could make an LLM do it.

dartos · 8 months ago

> Don't use LLMs where accuracy is paramount.

Then why do people keep pushing it for code related tasks?

Accuracy and precision is paramount with code. It needs to express exactly what needs to be done and how.

jaredsohn · 8 months ago

A similar use case for me - I wrote some technical documentation for our wiki about a somewhat complicated relationship between ids in some database tables. I copied my text explanation into an LLM and asked it to make a diagram and it did so. Took very little time from me and it was fast/easy to verify that the quality was good.

layer8 · 8 months ago

I think there’s the added reason that a lot of folks went into tech because (consciously or unconsciously) they prefer dealing with predictable machines than with unreliable humans. And now that career choice begins to look like a bait and switch. ;)

aleph_minus_one · 8 months ago

> Instead, think of an LLM as the equivalent of giving a human a menial task. You know that they're not 100% reliable, and so you give them only tasks that you can quickly verify and correct.

The problem is: for the tasks that I can give the LLM (or human) that I can easily verify and correct, the LLM fails with the majority of them, for example

- programming tasks of my area of expertise (which is more "mathematical" than what is common in SV startups), where I know how a high-level solution has to look like, and where I can ask the LLM to explain the gory details to me. Yes, these gory details are subtle (which is why the task can be menial), but the code has to be right. I can verify this, and the code is not correct.

- getting literature references about more obscure scientific (in particular mathematical) topics. I can easily check whether these literature references (or summaries of these references) are hallucinations - they typically are.

zahlman · 8 months ago

> Don't use LLMs where accuracy is paramount. Use it to automate away tedious stuff.

My programmer mind tells me that "tedious stuff" is where accuracy is the most important.

wodderam · 8 months ago

My experience is that for certain tasks LLMs are great, for certain tasks LLMS are basically useless.

The best prompts though are always written in a separate text file for me and pasted in. Follow up questions are never as good as a detailed initial prompt.

I would imagine well formulated questions to solve the problem at hand is a skill but beyond that I don't think there is anything special about how to ask LLMs a question.

In areas the LLM is rather useless, no amount of variation in prompting can solve that problem IMO. Just like if the tasks is something the LLM is good at, the prompt can be pretty sloppy and seem like magic with how it can understand what you want.

simonw · 8 months ago

I think one of the most important skills is being able to predict which tasks an LLM is a good fit for and which aren't.

perrygeo · 8 months ago

There's a similar dynamic in building reliable distributed systems on top of an unreliable network. The parts are prone to failure but the system can keep on working.

The tricky problem with LLMs is identifying failures - if you're asking the question, it's implied that you don't have enough context to assess whether it's a hallucination or a good recommendation! One approach is to build ensembles of agents that can check each other's work, but that's a resource-intensive solution.

swalsh · 8 months ago

It's amazing this is still an opinion in 2025. I now ask devs how they use AI as part of their workflows when I interview. It's a standard skill I expect my guys to have.

dartos · 8 months ago

I feel bad for your team.

Let people work how they want. I wouldn’t not hire someone on the basis of them not using a language server.

The creator of the Odin language famously doesn’t use one. He’s says that he, specifically, is faster without one.

sramam · 8 months ago

I concur that asking devs how they use AI is a great idea.

Recently, I shared a code base with a junior dev and she was surprised with the speed and sophistication of the code. The LLM did 80+% of the "coding".

What was telling was as she was grokking the code (for helping the ~20%), she was surprised at the quality of the code - her use of the LLM did not yield code of similar quality.

I find that the more domain awareness one brings to the table, the better the output is. Basically the clearer one's vision of the end-state, the better the output.

One other positive side-effect of using "LLMs as a junior-dev" for me has been that my ambitions are greater. I want it all - better code, more sophisticated capabilities even for relatively not-important projects, documentation, tests, debug-ability. And once the basic structure is in place, many a time it is trivial to get the rest.

It's never 100%, but even with 80+%, I am faster than ever before, deliver better quality code, and can switch domains multiple times a week and never feel drained.

Sharing best AI hacks within a team will have the same effect as code-reviews do in ensuring consistency. Perhaps an "LLM chat review", especially when something particularly novel was accomplished!

layer8 · 8 months ago

Using cloud-based AI is a no-go where I work, for IP and contractual reasons. And on-premises AI is not as capable and more difficult to integrate.

BeetleB · 8 months ago

Just curious, but what AI related skills do you expect them to have?

dogcomplex · 8 months ago

I would characterize good prompting as: write out your whole problem you're trying to solve, then think to yourself what the clarifying questions would be if you were a junior trying to solve it. Better yet - ask the LLM to ask you challenging clarifying questions for several rounds. Then, take all that information and re-compile it back into a list of all the important components of the project, and re-read it to make sure there's no particular ambiguous part or weird part that would be over-emphasized by the language you used. Then, emphasize the core concerns again, and tell it how you'd like it to output the response (keeping in mind that it will always do best with a conversation-style format with loose restrictions). Never let a conversation stray too long from the original goals lest it start forgetting.

Once that's all done, you basically have a well-structured question you could pass to an underling and have them completely independently work on the project without bugging you. That's the goal. Now, pass that to o1 or Claude, depending on whether it's a general-purpose task (o1) or a code-specific task (Claude), and wait for response. From there, have a conversation or test-and-followup of whatever it spits out, this time with you asking questions. If good enough, done. If not, wrap up whatever useful insights from that line of questioning and put it back into the initial prompt and either re-post it at the end of the conversation or start a fresh conversation.

I find 90% of the time this gets exactly what I'm after eventually. The few other cases are usually because we hit some cycle where the AI doesn't fully know what to change/respond, and it keeps repeating itself when I ask. The trick then is to ask things a different way or emphasize something new. This is usually just a code-specific issue, for general problems it's much better. One other trick is to ask it to take a step back and just tackle the problem in a theoretical/philosophical way first before trying to do any coding or practical solving, and then do that in a second phase (asking o1 to architect code structure and then Claude to implement it is a great combo too). Also if there is any way to break up the problem into smaller pieces which can be tackled one conversation at a time - much better. Just remember to include all relevant context it needs to interface with the overall problem too.

That sounds like a lot, but it's essentially just project management and delegation to somewhat-flawed underlings. The upside is instead of waiting a workweek for them to get back to you, you just have to wait 20 seconds. But it does mean a ton of reading and writing. There are certainly already some meta-prompts where you can get the AI to essentially do this whole process for you and assess itself, but like all automation that means extra ways for things to break too. Let the AI devs cook though and those will be a lot more commonplace soon enough...

[Edit: o1 mostly agrees lol. Some good additional suggestions for systematizing this: https://chatgpt.com/share/6775b85c-97c4-8003-bd31-ee288396ab... ]

Dead Comment