There's a running theme in here of programming problems LLMs solve where it's actually not that important that the LLM is perfectly correct. I've been using GPT4 for the past couple months to comprehend Linux kernel code; it's spooky good at it.
I'm a C programmer, so I can with some effort gradually work my way through random Linux kernel things. But what I can do now instead is take a random function, ask GPT4 what it does and what subsystem it belongs to, and then ask GPT4 to write me a dummy C program that exercises that subsystem (I've taken to asking it to rewrite kernel code in Python, just because it's more concise and easy to read).
I don't worry at all about GPT4 hallucinating stuff (I'm sure it's doing that all the time!), because I'm just using its output as Cliff's Notes for the actual kernel code; GPT4 isn't the "source of truth" in this situation.
This is close to how I've been using them too. As a device for speeding up learning, they're incredible. Best of all, they're strongest where I'm weakest: finding all the arbitrary details that are needed for the question. That's the labor-intensive part of learning technical things.
I don't need the answer to be correct because I'm going to do that part myself. What they do is make it an order of magnitude faster to get anything on the board. They're the ultimate prep cook.
There are things to dislike and yes there is over-hype but "making learning less tedious" is huge!
You put words to what I've been thinking for a while. When I'm still new to some new technology it is a huge time-saver. I used to need to go bother some folks somewhere on a discord / Facebook group / matrix chat to get the one piece of context that I was hung up on. Sometimes it is hours or days to get that one nugget.
I feel more interested in approaching challenging problems in fact because I know I can get over those frustrating phases much more easily and quickly.
I'm finding myself using the extensively in the learning way, but also I'm an extreme generalist. I've learned so many languages over 23 years, but remembering the ones I don't use frequently is hard. The LLMs become the ultimate memory aid. I know that I can do something in a given language, and will recognised that it's correct when I see it.
Together with increasingly powerful speech to text I find myself talking to the computer more and more.
There are flaws, there are weaknesses, and a bubble, but any dev that can't find any benefit in LLMs is just not looking.
I use it like a dictionary (select text and lookup) and based on what I looked up and answer, I judge myself how correct the answers are, and they are on point usually.
It has also made making small pure vanilla html/js based tools fun. It gives me a good enough prototype which I can mold to my needs. I have wrote a few very useful scripts/tools past few months which otherwise I would never even have started because of all the required first steps and basic learnings.
(never thought I would see your comment as a user)
> Best of all, they're strongest where I'm weakest: finding all the arbitrary details that are needed for the question. That's the labor-intensive part of learning technical things.
not arguing just an open question here but, is there a downside to this? Perhaps we won't retain this knowledge as easily because it's so readily provided. Not that I want to fill my head with even more arbitrary information, but there's probably some fluency gained in that
Exactly. It's similar in other (non programming) fields - if you treat it as a "smart friend" it can be very helpful but relying on everything it says to be correct is a mistake.
For example, I was looking at a differential equation recently and saw some unfamiliar notation[1] (Newton's dot notation). So I asked claude for why people use Newton's notation vs Lagrange's notation. It gave me an excellent explanation with tons of detail, which was really helpful. Except in every place it gave me an example of "Lagrange" notation it was actually in Leibniz notation.
So it was super helpful and it didn't matter that it made this specific error because I knew what it was getting at and I was treating it as a "smart friend" who was able to explain something specific to me. I would have a problem if I was using it somewhere where the absolute accuracy was critical because it made such a huge mistake throughout its explanation.
Once you know LLMS make mistakes and know to look for them half the battle is done. Humans make mistakes, which is why we take effort to validate thinking and actions.
As I use it more and more often the mistakes are born of ambiguity. As I supply more information to the LLM it's answer(s) gets better. I'm finding more and more ways to supply it with robust and extensive information.
- "things trivial to verify", so it doesn't matter if the answer is not correct - I can iterate/retry if needed and fallback to writing things myself, or
- "ideas generator", on the brainstorming level - maybe it's not correct, but I just want a kickstart with some directions for actual research/learning
Expecting perfect/correct results is going to lead to failure at this point, but it doesn't prevent usefulness.
Right, and it only needs to be right often enough that taking the time to ask it is positive EV. In practice, with the Linux kernel, it's more or less consistently right (I've noticed it's less right about other big open source codebases, which checks out, because there's a huge written record of kernel development for it to draw on).
I've been using it for all kinds of stuff. I was using a drying machine at a hotel a while ago and I was not sure about the icon that it was display on the visor regarding my clothes, so I asked gpt and it told me correctly. It read all the manuals and documentations from pretty much everything right? Better then Google it and you just ask for the exact thing you want.
I used LLMs for something similar recently. I have some old microphones that I've been using with a USB audio interface I bought twenty years ago. The interface stopped working and I needed to buy a new one, but I didn't know what the three-pronged terminals on the microphone cords were called or whether they could be connected to today's devices. So I took a photo of the terminals and explained my problem to ChatGPT and Claude, and they were able to identify the plug and tell me what kinds of interfaces would work with them. I ordered one online and, yes, it worked with my microphones perfectly.
My washing machine went out because some flooding and I gave chatGPT all of the diagnostic codes and it concluded that it was probably a short in my lid lock.
The lid lock came a few days later, I put it in, and I'm able to wash laundry again.
Yes, I like to think of LLM's as hint generators. Turns out that a source of hints is pretty useful when there's more to a problem than simply looking up an answer.
Especially when the hint is elementary but the topic is one I don't know about (or don't remember) and there exists a large corpus of public writing about it.
In such cases it makes getting past zero fast and satisfying, where before it would often be such a heavy lift I wouldn't bother.
For about 20 years, chess fans would hold "centaur" tournaments. In those events, the best chess computers, who routinely trounced human grandmasters, teamed up with those same best-in-the-world humans and proceeded to wipe both humans and computers off the board. Nicholas is describing in detail how he pairs up with LLMs to get a similar result in programming and research.
Sobering thought: centaur tournaments at the top level are no more. That's because the computers got so good that the human half of the beast no longer added any meaningful value.
Most people only have heard "Didn't an IBM computer beat the world champion", and don't know that Kasparov pysched himself out when Deep Blue had actually maken a mistake. I was part of the online analysis of the (mistaken) engame move at the time that were the first to reveal the error. Kasparov was very stressed by that and other issues, some of which IBM caused ("we'll get you the printout as promised in the terms" and then never delivered). My friend IM Mike Valvo (now deceased) was involved with both matches. More info: https://www.perplexity.ai/search/what-were-the-main-controve...
When I was a kid my dad told me about the most dangerous animal in the world, the hippogator. He said that it had the head of a hippo on one end and the head of an alligator on the other, and it was so dangerous because it was very angry about having nowhere to poop. I'm afraid that this may be a better model of an AI human hybrid than a centaur.
A bit of a detour (inspired by your words)... if anything, LLMs will soon be "eating their own poop", so structurally, they're a "dual" of the "hippogator" -- an ouroboric coprophage. If LLMs ever achieve sentience, will they be mad at all the crap they've had to take?
This mostly matches my experience but with one important caveat around using them to learn new subjects.
When I'm diving into a wholly new subject for the first time, in a field totally unrelated to my field (similar to the author, C programming and security) for example biochemistry or philosophy or any field where I don't have even a basic grounding, I still worry about having subtly-wrong ideas about fundamentals being planted early-on in my learning.
As a programmer I can immediately spot "is this code doing what I asked it to do" but there's no equivalent way to ask "is this introductory framing of an entire field / problem space the way an actual expert would frame it for a beginner" etc.
At the end of the day we've just made the reddit hivemind more eloquent. There's clearly tons of value there but IMHO we still need to be cognizant of the places where bad info can be subtly damaging.
I don't worry about that much at all, because my experience of learning is that you inevitably have to reconsider the fundamentals pretty often as you go along.
High school science is a great example: once you get to university you have to un-learn all sorts of things that you learned earlier because they were simplifications that no longer apply.
For fields that I'm completely new to, the thing I need most is a grounding in the rough shape and jargon of the field. LLMs are fantastic at that - it's then up to me to take that grounding and those jargon terms and start building my own accurate-as-possible mental model of how that field actually works.
If you treat LLMs as just one unreliable source of information (like your well-read friend who's great at explaining things in terms that you understand but may not actually be a world expert on a subject) you can avoid many of the pitfalls. Where things go wrong is if you assume LLMs are a source of irrefutable knowledge.
> like your well-read friend who's great at explaining things in terms that you understand but may not actually be a world expert on a subject
I guess part of my problem with using them this way is that I am that well-read friend.
I know how the sausage is made, how easy it is to bluff a response to any given question, and for myself I tend to prefer reading original sources to ensure that the understanding that I'm conveying is as accurate as I can make it and not a third-hand account whose ultimate source is a dubious Reddit thread.
> High school science is a great example: once you get to university you have to un-learn all sorts of things that you learned earlier because they were simplifications that no longer apply.
The difference between this and a bad mental model generated by an LLM is that the high school science models were designed to be good didactic tools and to be useful abstractions in their own right. An LLM output may be neither of those.
In the article the author mentions wanting to benchmark a GPU and using ChatGPT to write CUDA. Benchmarks are easy to mess up and to interpret incorrectly without understanding. I see this as an example where a subtly-wrong idea could cause cascading problems.
This just does not match my experience with these tools. I've been on board with the big idea expressed in the article at various points and tried to get into that work flow, but with each new generation of models they just do not do well enough, consistently enough, on serious tasks to be a time or effort saver. I don't know what world these apparently high output people live in where their days consist of porting Conway's Game of Life and writing shell scripts that only 'mostly' need to work, but I hope one day I can join them.
Not to pick on you but this general type of response to these AI threads seems to be coalescing around something that looks like a common cause. The thing that tips your comment into that bucket is the "serious tasks" phrasing. Trying to use current LLMs for either extremely complex work involving many interdependent parts or for very specialized domains where you're likely contributing something unique or to any other form of "serious task" you can think of generally doesn't work out. If all you do all day long are serious tasks like that then congrats you've found yourself a fulfilling and interesting role in life. Unfortunately, the majority of other people have to spend 80 to 90 percent of their day dealing with mind numbing procedural work like generating reports no one reads and figuring out problems that end up being user error. Fortunately, lots of this boring common work has been solved six ways from Sunday and so we can lean on these LLMs to bootstrap our n+1th solution that works in our org with our tech stack and uses our manager's favourite document format/reporting tool. That's where the other use cases mentioned in the article come in, well that and side projects or learning X in Y minutes.
You get used to their quirks. I can more or less predict what Claude/GPT can do faster than me, so I exclusively use them for those scenarios. Implementing it to one's development routine isn't easy though, so I had to trial and error until it made me faster in certain aspects. I can see it being more useful for people who have a good chunk of experience with coding, since you can filter out useless suggestions much faster - ex. give a dump of code, description of a stupid bug, and ask it where the problem might be. If you generally know how things work, you can filter out the "definitely that's not the case" suggestions, it might route you to a definitive answer faster.
If you use it as an intern, as a creative partner, as a rubber-duck-plus, in an iterative fashion, give it all the context you have and your constraints and what you want... it's fantastic. Often I'll take pieces from it, if it's simple enough I can just use it's output.
I also use LLMs similarly. As a professional programmer, LLMs save me a lot of time. They are especially efficient when I don't understand a flow or need to transform values from one format to another. However, I don't currently use them to produce code that goes into production. I believe that in the coming years, LLMs will evolve to analyze complete requirements, architecture, and workflow and produce high-quality code. For now, using LLMs to write production-ready applications in real-time scenarios will take longer.
I've been pleasantly surprised by GitHub's "copilot workspace" feature for creating near production code. It takes a GitHub issue, converts it to a specification, then to a list of proposed edits to a set of files, then it makes the edits. I tried it for the first time a few days ago and was pleasantly surprised at how well it did. I'm going to keep experimenting with it more/pushing it to see how well it works next week.
A small portion of my regular and freelance work is translating things from a database to something an application can use. A perfect example of this is creating a model in MVC architecture from a database table/stored procedure/function. I used to have a note or existing code I would have to copy and paste and then modify each and every property one at a time, to include the data types. Not hard stuff at all, but very tedious and required a certain amount of attention. This would have taken me maybe 5 to 20 minutes in the perfect scenario, minus any typos in datatypes, names of properties, etc.
Now I'll do something like this for a table, grabbing the column names and data types:
SELECT
COLUMN_NAME,
DATA_TYPE,
CHARACTER_MAXIMUM_LENGTH,
NUMERIC_PRECISION,
NUMERIC_SCALE
FROM
INFORMATION_SCHEMA.COLUMNS
WHERE
TABLE_NAME = 'Table Name Goes Here'
ORDER BY
COLUMN_NAME;
Then I'll ask my custom GPT to make me a model from the SQL output for my application. I do a quick spot check on the new class and done - the code is completed without any typos in much less time. This kind of stuff goes into production on a regular basis, and I feel about as guilty as I did in 10th grade using a TI-89 for matrix operations, which is zero.
I think all of these can be summarized into three items
1. Search engine - Words like "teach" or "learn" used to be slapped on Google once upon a time. One real great thing about LLMs here is that they do save time. The internet these days is unbelievably crappy and choppy. It often takes more time to click through the first item in the Google result and read it than to simply ask an LLM and wait for its slowish answer.
2. Pattern matching and analysis - LLMs are probably the most advanced technology for recognizing well-known facts and patterns from text, but they do make quite some errors especially with numbers. I believe that a properly fine-tuned small LLMs would easily beat gigantic models for this purpose.
3. Interleaving knowledge - this is the biggest punch that LLMs have, and also the main source of all the over-hype (which does still exist). It can produce something valuable by synthesizing multiple facts, like writing complex answers and programs. But this is where hallucination happens most frequently, so it's critical that you review the output carefully.
The problem is that AI is being sold to multiple industries as the cure for their data woes.
I work in education, and every piece of software now has AI insights added. Multiple companies are selling their version as hallucination free.
The problem is the data sets they evaluate are so large and complicated for a college that there is literally no way for humans to verify the insights.
It's actually kind of scary. Choices are being made about the future of human people based on trust in New Software.
My experience is that LLMs can't actually do 3 at all. The intersection of knowledge has to already be in the training data. It hallucinates if the intersection of knowledge is original. That is exactly what should expect though given the architecture.
Super interested in hearing more about why you think this -
> I believe that a properly fine-tuned small LLMs would easily beat gigantic models for this purpose.
I've long felt that vertical search engines should be able to beat the pants off Google. I even built one (years ago) to search for manufacturing suppliers that was, IMO, superior to Google's. But the only way I could get traffic or monetize was as middleware to clean up google, in a sense.
I just want to emphasise two things, which are both mentioned in the article, but I still want to emphasise them as they are core to what I take from the article as someone who has been a fan boy of Nicholas for years now
1. Nicholas really does know how badly machine learning models can be made to screw up. Like, he really does. [0]
2. This is how Nicholas -- an academic researcher in the field of security of machine learning -- uses LLMs to be more efficient.
I don't know whether Nicolas works on globally scaled production systems with have specific security/data/whatever controls that need to be adhered to, or whether he even touches any proprietary code. But seeing as he heavily emphasised the "i'm a researcher doing research things" in the article -- I'd take a heavy bet that he does not. And academic / research / proof-of-concept coding has different limitations/context/needs than other areas.
I think this is a really great write up, even as someone on the anti-LLM side of the argument. I really appreciate the attempt to do a "middle of the road" post which is absolutely what the conversation needs right now (pay close attention to how this was written LLM hypers).
I don't share his experience, I still value and take enjoyment from the "digging for information" process -- it is how I learn new things. Having something give me the answer doesn't help me learn, and writing new software is a learning process for me.
I did take a pause and digested the food for thought here. I still won't be using an LLM tomorrow. I am looking forward to his next post, which sounds very interesting.
I'm a C programmer, so I can with some effort gradually work my way through random Linux kernel things. But what I can do now instead is take a random function, ask GPT4 what it does and what subsystem it belongs to, and then ask GPT4 to write me a dummy C program that exercises that subsystem (I've taken to asking it to rewrite kernel code in Python, just because it's more concise and easy to read).
I don't worry at all about GPT4 hallucinating stuff (I'm sure it's doing that all the time!), because I'm just using its output as Cliff's Notes for the actual kernel code; GPT4 isn't the "source of truth" in this situation.
I don't need the answer to be correct because I'm going to do that part myself. What they do is make it an order of magnitude faster to get anything on the board. They're the ultimate prep cook.
There are things to dislike and yes there is over-hype but "making learning less tedious" is huge!
I feel more interested in approaching challenging problems in fact because I know I can get over those frustrating phases much more easily and quickly.
Together with increasingly powerful speech to text I find myself talking to the computer more and more.
There are flaws, there are weaknesses, and a bubble, but any dev that can't find any benefit in LLMs is just not looking.
It has also made making small pure vanilla html/js based tools fun. It gives me a good enough prototype which I can mold to my needs. I have wrote a few very useful scripts/tools past few months which otherwise I would never even have started because of all the required first steps and basic learnings.
(never thought I would see your comment as a user)
not arguing just an open question here but, is there a downside to this? Perhaps we won't retain this knowledge as easily because it's so readily provided. Not that I want to fill my head with even more arbitrary information, but there's probably some fluency gained in that
For example, I was looking at a differential equation recently and saw some unfamiliar notation[1] (Newton's dot notation). So I asked claude for why people use Newton's notation vs Lagrange's notation. It gave me an excellent explanation with tons of detail, which was really helpful. Except in every place it gave me an example of "Lagrange" notation it was actually in Leibniz notation.
So it was super helpful and it didn't matter that it made this specific error because I knew what it was getting at and I was treating it as a "smart friend" who was able to explain something specific to me. I would have a problem if I was using it somewhere where the absolute accuracy was critical because it made such a huge mistake throughout its explanation.
[1] https://en.wikipedia.org/wiki/Notation_for_differentiation#N...
As I use it more and more often the mistakes are born of ambiguity. As I supply more information to the LLM it's answer(s) gets better. I'm finding more and more ways to supply it with robust and extensive information.
- "things trivial to verify", so it doesn't matter if the answer is not correct - I can iterate/retry if needed and fallback to writing things myself, or
- "ideas generator", on the brainstorming level - maybe it's not correct, but I just want a kickstart with some directions for actual research/learning
Expecting perfect/correct results is going to lead to failure at this point, but it doesn't prevent usefulness.
The lid lock came a few days later, I put it in, and I'm able to wash laundry again.
In such cases it makes getting past zero fast and satisfying, where before it would often be such a heavy lift I wouldn't bother.
10 seconds later, I am playing with these out.
Sobering thought: centaur tournaments at the top level are no more. That's because the computers got so good that the human half of the beast no longer added any meaningful value.
https://en.wikipedia.org/wiki/Advanced_chess
Beautiful story, and thanks for sharing :)
Hmmmm.
When I'm diving into a wholly new subject for the first time, in a field totally unrelated to my field (similar to the author, C programming and security) for example biochemistry or philosophy or any field where I don't have even a basic grounding, I still worry about having subtly-wrong ideas about fundamentals being planted early-on in my learning.
As a programmer I can immediately spot "is this code doing what I asked it to do" but there's no equivalent way to ask "is this introductory framing of an entire field / problem space the way an actual expert would frame it for a beginner" etc.
At the end of the day we've just made the reddit hivemind more eloquent. There's clearly tons of value there but IMHO we still need to be cognizant of the places where bad info can be subtly damaging.
High school science is a great example: once you get to university you have to un-learn all sorts of things that you learned earlier because they were simplifications that no longer apply.
Terry Pratchett has a great quote about this: https://simonwillison.net/2024/Jul/1/terry-pratchett/
For fields that I'm completely new to, the thing I need most is a grounding in the rough shape and jargon of the field. LLMs are fantastic at that - it's then up to me to take that grounding and those jargon terms and start building my own accurate-as-possible mental model of how that field actually works.
If you treat LLMs as just one unreliable source of information (like your well-read friend who's great at explaining things in terms that you understand but may not actually be a world expert on a subject) you can avoid many of the pitfalls. Where things go wrong is if you assume LLMs are a source of irrefutable knowledge.
I guess part of my problem with using them this way is that I am that well-read friend.
I know how the sausage is made, how easy it is to bluff a response to any given question, and for myself I tend to prefer reading original sources to ensure that the understanding that I'm conveying is as accurate as I can make it and not a third-hand account whose ultimate source is a dubious Reddit thread.
> High school science is a great example: once you get to university you have to un-learn all sorts of things that you learned earlier because they were simplifications that no longer apply.
The difference between this and a bad mental model generated by an LLM is that the high school science models were designed to be good didactic tools and to be useful abstractions in their own right. An LLM output may be neither of those.
And writing shell scripts that "mostly" work is what it does.
I don't expect it to work. Just like I don't expect my own code to ever work.
My stuff mostly works too. In either case I will be shaving yaks to sort out where it doesn't work.
At a certain level of complexity, the whole house of cards does break down where LLMs get stuck in a loop.
Then I will try using a different LLM to get it unstuck from the loop, which works well.
You will have cases where both LLMs get stuck in a loop, and you're screwed. Okay.. well, now you're however far ahead you were at that stage.
Essentially, some of us have spent more of our life fixing code, than we have writing it from scratch.
At that level, it's much easier for me to fix code, than write it from scratch. That's the skill you're implementing with LLMs.
This line really struck me and is an excellent way to frame this issue.
GitHub's blog post: https://github.blog/news-insights/product-news/github-copilo...
My first experience with it: https://youtube.com/watch?v=TONH_vqieYc
Now I'll do something like this for a table, grabbing the column names and data types: SELECT COLUMN_NAME, DATA_TYPE, CHARACTER_MAXIMUM_LENGTH, NUMERIC_PRECISION, NUMERIC_SCALE FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'Table Name Goes Here' ORDER BY COLUMN_NAME;
Then I'll ask my custom GPT to make me a model from the SQL output for my application. I do a quick spot check on the new class and done - the code is completed without any typos in much less time. This kind of stuff goes into production on a regular basis, and I feel about as guilty as I did in 10th grade using a TI-89 for matrix operations, which is zero.
1. Search engine - Words like "teach" or "learn" used to be slapped on Google once upon a time. One real great thing about LLMs here is that they do save time. The internet these days is unbelievably crappy and choppy. It often takes more time to click through the first item in the Google result and read it than to simply ask an LLM and wait for its slowish answer.
2. Pattern matching and analysis - LLMs are probably the most advanced technology for recognizing well-known facts and patterns from text, but they do make quite some errors especially with numbers. I believe that a properly fine-tuned small LLMs would easily beat gigantic models for this purpose.
3. Interleaving knowledge - this is the biggest punch that LLMs have, and also the main source of all the over-hype (which does still exist). It can produce something valuable by synthesizing multiple facts, like writing complex answers and programs. But this is where hallucination happens most frequently, so it's critical that you review the output carefully.
The problem is that AI is being sold to multiple industries as the cure for their data woes.
I work in education, and every piece of software now has AI insights added. Multiple companies are selling their version as hallucination free.
The problem is the data sets they evaluate are so large and complicated for a college that there is literally no way for humans to verify the insights.
It's actually kind of scary. Choices are being made about the future of human people based on trust in New Software.
> I believe that a properly fine-tuned small LLMs would easily beat gigantic models for this purpose.
I've long felt that vertical search engines should be able to beat the pants off Google. I even built one (years ago) to search for manufacturing suppliers that was, IMO, superior to Google's. But the only way I could get traffic or monetize was as middleware to clean up google, in a sense.
1. Nicholas really does know how badly machine learning models can be made to screw up. Like, he really does. [0]
2. This is how Nicholas -- an academic researcher in the field of security of machine learning -- uses LLMs to be more efficient.
I don't know whether Nicolas works on globally scaled production systems with have specific security/data/whatever controls that need to be adhered to, or whether he even touches any proprietary code. But seeing as he heavily emphasised the "i'm a researcher doing research things" in the article -- I'd take a heavy bet that he does not. And academic / research / proof-of-concept coding has different limitations/context/needs than other areas.
I think this is a really great write up, even as someone on the anti-LLM side of the argument. I really appreciate the attempt to do a "middle of the road" post which is absolutely what the conversation needs right now (pay close attention to how this was written LLM hypers).
I don't share his experience, I still value and take enjoyment from the "digging for information" process -- it is how I learn new things. Having something give me the answer doesn't help me learn, and writing new software is a learning process for me.
I did take a pause and digested the food for thought here. I still won't be using an LLM tomorrow. I am looking forward to his next post, which sounds very interesting.
[0]: https://nicholas.carlini.com/papers
> academic / research / proof-of-concept coding has different limitations/context/needs than other areas.