I get so confused on this. I play around, test, and mess with LLMs all the time and they are miraculous. Just amazing, doing things we dreamed about for decades. I mean, I can ask for obscure things with subtle nuance where I misspell words and mess up my question and it figures it out. It talks to me like a person. It generates really cool images. It helps me write code. And just tons of other stuff that astounds me.
And people just sit around, unimpressed, and complain that ... what ... it isn't a perfect superintelligence that understands everything perfectly? This is the most amazing technology I've experienced as a 50+ year old nerd that has been sitting deep in tech for basically my whole life. This is the stuff of science fiction, and while there totally are limitations, the speed at which it is progressing is insane. And people are like, "Wah, it can't write code like a Senior engineer with 20 years of experience!"
The technology is not just less than superintelligence, for many applications it is less than prior forms of intelligence like traditional search and Stack Exchange, which were easily accessible 3 years ago and are in the process of being displaced by LLMs. I find that outcome unimpressive.
And this Tweeter's complaints do not sound like a demand for superintelligence. They sound like a demand for something far more basic than the hype has been promising for years now.
- "They continue to fabricate links, references, and quotes, like they did from day one."
- "I ask them to give me a source for an alleged quote, I click on the link, it returns a 404 error." (Why have these companies not manually engineered out a problem like this by now? Just do a check to make sure links are real. That's pretty unimpressive to me.)
- "They reference a scientific publication, I look it up, it doesn't exist."
- "I have tried Gemini, and actually it was even worse in that it frequently refuses to even search for a source and instead gives me instructions for how to do it myself."
- "I also use them for quick estimates for orders of magnitude and they get them wrong all the time. "
- "Yesterday I uploaded a paper to GPT to ask it to write a summary and it told me the paper is from 2023, when the header of the PDF clearly says it's from 2025. "
A municipality in Norway used LLM to create a report about the school structure in the municipality (how many schools are there, how many should there be, where should they be, how big should they be, pros and cons of different size schools and classes etc etc). Turns out the LLM invented scientific papers to use as references and the whole report is complete and utter garbage based on hallucinations.
"They continue to fabricate links, references, and quotes, like they did from day one." - "I ask them to give me a source for an alleged quote, I click on the link, it returns a 404 error."
Why have these companies not manually engineered out a problem like this by now? Just do a check to make sure links are real. That's pretty unimpressive to me.
There are no fabricated links, references, or quotes, in OpenAI's GPT 4.5 + Deep Research.
It's unfortunate the cost of a Deep Research bespoke white paper is so high. That mode is phenomenal for pre-work domain research. You get an analyst's two week writeup in under 20 minutes, for the low cost of $200/month (though I've seen estimates that white paper cost OpenAI over USD 3000 to produce for you, which explains the monthly limits).
You still need to be a domain expert to make use of this, just as you need to be to make use of an analyst. Both the analyst and Deep Research can generate flawed writeups with similar misunderstandings: mis-synthesizing, misapplication, or missing inclusion of some essential.
Neither analyst nor LLM is a substitute for mastery.
This assessment is incomplete. Large languages models are both less and more than these traditional tools. They have not subsumed them and all can sit together in separate tabs of a single browser window. They are another resource, and when the conditions are right, which is often the case in my experience, they are a startlingly effective tool for navigating the information landscape. The criticism of Gemini is a fair one, and I encountered it yesterday, but perhaps with 50% less entitlement. But Gemini also helped me translate obscure termios APIs to python from C source code I provided. The equivalent using search and/or Stack Overflow would have required multiple piecemeal searches without guarantees -- and definitely would have taken much more time.
The 404 links are hilarious, like you can't even parse the output and retry until it returns a link that doesn't 404? Even ignoring the billions in valuation, this is so bad for a $20 sub.
The tweeters complaints sound like a user problem. LLM’s are tools. How you use them, when you use them, and what you expect out of them should be based on the fact they are tools.
I’m sorry but the experience of coding with an LLM is about ten billion times better than googling and stack overflowing every single problem I come across. I’ve stack overflowed maybe like two things in the past half year and I’m so glad to not have to routinely use what is now a very broken search engine and web ecosystem.
The whole point is that an LLM is not a search engine and obviously anyone who treats it as one is going to be unsatisfied. It's just not a sensible comparison. You should compare working with an LLM to working with an old "state of the art" language tool like Python NLTK -- or, indeed, specifying a problem in Python versus specifying it in the form of a prompt -- to understand the unbridgeable gap between what we have today and what seemed to be the best even a few years ago. I understand when a popular science author or my relatives haven't understood this several years after mass access to LLMs, but I admit to being surprised when software developers have not.
Hosted and free or subscription-based DeepResearch like tools that integrate LLMs with search functionality (the whole domain of "RAG" or "Retrieval Augmented Generation") will be elementary for a long time yet simply because the cost of the average query starts to go up exponentially and there isn't that much money in it yet. Many people have and will continue to build their own research tools where they can determine how much compute time and API access cost they're willing to spend on a given query. OCR remains a hard problem, let alone appropriately chunking potentially hundreds of long documents into context length and synthesizing the outputs of potentially thousands of LLM outputs into a single response.
It's mostly because of how they were initially marketed. In an effort to drive hype 'we' were promised the world. Remember the "leaks" from Google about an engineer trying to get the word out that they had created a sentient intelligence? In reality Bard, let alone whatever early version he was using, is about as sentient as my left asscheek.
OpenAI did similar things by focusing to the point of absurdity on 'safety' for what was basically a natural language search engine that has a habit of inventing nonsensical stuff. But on that same note (and also as you alluded to) - I do agree that LLMs have a lot of use as natural language search engines in spite of their proclivity to hallucinate. Being able to describe a e.g. function call (or some esoteric piece of history) by description and then often get the precise term/event that I'm looking for is just incredibly useful.
But LLMs obviously are not sentient, are not setting us on the path to AGI, or any other such nonsense. They're arguably what search engines should have been 10 or 15 years ago, but anti-competitive monopolization of the industry meant that search engine technology progress basically stalled out, if not regressed for the sake of ads (and individual 'entrepreneurs' becoming better at SEO), about the time Google fully established itself.
> Remember the "leaks" from Google about an engineer trying to get the word out that they had created a sentient intelligence?
I presume you are referring to this Google engineer, who was sacked for making the claim. Hardly an example of AI companies overhyping the tech; precisely the opposite, in fact.
https://www.bbc.co.uk/news/technology-62275326
It seems to be a common human hallucination to imagine that large organisations are conspiring against us.
> Remember the "leaks" from Google about an engineer trying to get the word out that they had created a sentient intelligence?
That's not what happened. Google stomped hard on Lemoine, saying clearly that he was wrong about LaMDA being sentient ... and then they fired him for leaking the transcripts.
Your whole argument here is based on false information and faulty logic.
> OpenAI did similar things by focusing to the point of absurdity on 'safety' for what was basically a natural language search engine that has a habit of inventing nonsensical stuff.
The focus on safety, and the concept of "AI", preexisted the product. An LLM was just the thing they eventually made; it wasn't the thing they were hoping to make. They applied their existing beliefs to it anyway.
I am worried about them as a substitute for search engines.
My reasoning is that classic google web-scraping and SEO, as shitty as it may be, is 'open-source' (or at least, 'open-citation') in nature - you can 'inspect the sh*t it's built from'.
Whereas LLMs, to me seem like a chinese - or western - totalitarian political system wet dream - 'we can set up an inscrutable source of "truth" for the people to use, with the _truths_ we intend them to receive'.
We already saw how weird and unsane this was, when they were configured to be woke under the previous regime. Imagine it being configured for 'the other post-truth' is a nightmare.
> Remember the "leaks" from Google about an engineer trying to get the word out that they had created a sentient intelligence?
No, first time I hear about it. I guess the secret to happiness is not following leaks. I had very low expectations before trying LLMs and I’m extremely impressed now.
They have their value in analyzing huge amounts of data for example scientific papers or raw observations, but the popular public ones are mostly trained on stolen/pirated texts offthe internet and from social media clouds the companies control. So this means: bullshit in -> bullshit out. I don't need machines for that the regular human bullshitters do this job just fine.
Nobody promised the world. The marketing underpromised and LLMs overdelivered. Safety worries didn't come from marketing, it came from people who were studying this as a mostly theoretical worry for the next 50+ years, only to see major milestones crossed a decade or more before they expected.
Did many people overhype LLMs? Yes, like with everything else (transhumanist ideas, quantum physics). It helps being more picky who one listens to, and whether they're just painting pretty pictures with words, or actually have something resembling a rational argument in there.
Folks really over index when an LLM is very good for their use case. And most of the folks here are coders, at which they're already good and getting better.
For some tasks they're still next to useless, and people who do those tasks understandably don't get the hype.
Tell a lab biologist or chemist to use an LLM to help them with their work and they'll get very little useful out of it.
Ask an attorney to use it and it's going to miss things that are blindingly obvious to the attorney.
Ask a professional researcher to use it and it won't come up with good sources.
For me, I've had a lot of those really frustrating experiences where I'm having difficulty on a topic and it gives me utter incorrect junk because there just isn't a lot already published about that data.
I've fed it tricky programming tasks and gotten back code that doesn't work, and that I can't debug because I have no idea what it's trying to do, or I'm not familiar with the libraries it used.
It sounds like you're trying to use these llms as oracles, which is going to cause you a lot of frustration. I've found almost all of them now excel at imitating a junior dev or a drunk PhD student. For example the other day I was looking at acoustic sensor data and I ran it down the trail of "what are some ways to look for repeating patterns like xyz" and 10 minutes later I had a mostly working proof of concept for a 2nd order spectrogram that reasonably dealt with spectral leakage and a half working mel spectrum fingerprint idea. Those are all things I was thinking about myself, so I was able to guide it to a mostly working prototype in very little time. But doing it myself from zero would've taken at least a couple of hours.
But truthfully 90% of work related programming is not problem solving, it's implementing business logic. And dealing with poor, ever changing customer specs. Which an llm will not help with.
Strongly in agreement. I've tried them and mostly come away unimpressed. If you work in a field where you have to get things right, and it's more work to double check and then fix everything done by the LLM, they're worse than useless. Sure, I've seen a few cases where they have value, but they're not much of my job. Cool is not the same as valuable.
If you think "it can't quite do what I need, I'll wait a little longer until it can" you may still be waiting 50 years from now.
My view is that it will be some time before they can as well because of the success in the software domain - not because LLM's aren't capable as a tech but because data owners and practitioners in other domains will resist the change. From the SWE experience, news reports, financial magazines, etc many are preparing accordingly, even if it is a subconscious thing. People don't like change, and don't want to be threatened when it is them at risk - no one wants what happened to artists and now SWE's to happen to their profession. They are happy for other professions to "democratize/commoditize" as long as it isn't them - after all this increases their purchasing power. Don't open source knowledge/products, don't let AI near your vertical domain, continue to command a premium for as long as you can - I've heard variations of this in many AI conversations. Much easier in oligopoly and monopoly like domains and/or domains where knowledge was known to be a moat even when mixed with software as you have more trust competitors won't do the same.
For many industries/people work is a means to earn, not something to be passionate in for its own sake. Its a means to provide for other things in life you are actually passionate about (e.g. family, lifestyle, etc). In the end AI may get your job eventually but if it gets you much later vs other industries/domains you win from a capital perspective as other goods get cheaper and you still command your pre-AI scarcity premium. This makes it easier for them to acquire more assets from the early disrupted industries and shield them from eventual AI taking over.
I'm seeing this directly in software. Less new frameworks/libraries/etc outside the AI domain being published IMO, more apprehension from companies to open source their work and/or expose what they do, etc. Attracting talent is also no longer as strong of a reason to showcase what you do to prospective employees - economic conditions and/or AI make that less necessary as well.
Honestly it's worse than this. A good lab biologist/chemist will try to use it, understand that it's useless, and stop using it. A bad lab biologist/chemist will try to use it, think that it's useful, and then it will make them useless by giving them wrong information. So it's not just that people over-index when it is useful, they also over-index when it's actively harmful but they think it's useful.
The problem Sabine tries to communicate is that reality is different from what the cash-heads behind main commercial models are trying to portray. They push the narrative that they’ve created something akin to human cognition, when in reality, they’ve just optimised prediction algorithms on an unprecedented scale. They are trying to say that they created Intelligence, which is the ability to acquire and apply knowledge and skills, but we all know the only real Intelligence they are creating is the collection of information of military or political value.
The technology is indeed amazing and very amusing, but like all the good things in the hands of corporate overlords, it will be slowly turning into profit-milking abomination.
> They push the narrative that they’ve created something akin to human cognition
This is your interpretation of what these companies are saying. I'd love to see if some company specifically anything like that?
Out of the last 100 years how many inventions have been made that could make any human awe like llms do right now? How many things from today when brought back into 2010 would make the person using it make it feel like they're being tricked or pranked? We already take them for granted even thought they've only been around for less than half of a decade.
LLMs aren't a catch all solution to the world's problems; or something that is going to help us in every facet of our lives; or an accelerator for every industry that exists out there. But at no point in history could you talk to your phone about general topics, get information, practice language skills, build an assistant that teaches your kid about the basics of science, use something to accelerate your work in a many different ways etc...
Looking at llms shouldn't be boolean, it shouldn't be between they're the best thing ever invented vs they're useless; but it seems like everyone presents the issue in this manner and Sabine is part of that problem.
Much as i agree with the point about overhyping from companies, I'd be more sympathetic to this point of view if she acknowledged the merits of the technology.
Yes, it hallucinates and if you replace your brain with one of these things, you won't last too long. However, it can do things which, in the hands of someone experienced, are very empowering. And it doesn't take an expert to see the potential.
As it stands, it sounds like a case of "it's great in practice but the important question is how good it is in theory."
I hate to bring an ad hominem into this, but Sabine is a YouTube influencer now. That's her current career. So I'd assume this Tweet storm is also pushing a narrative on its own, because that's part of doing the work she chose to do to earn a living.
LLMs seem akin to parts of human cognition, maybe the initial fast thinking bit when ideas pop up in a second of two. But any human writing a review with links to sources would look them up and check the are they right ones that match the initial idea. Current LLMs don't seem to do that, at least the ones Sabine complains about.
Akin to human cognition but still a few bricks short of a load, as it were.
You lay the rhetoric on so thick (“cash heads”, “pushing the narrative”, “corporate overlords”, “profit-making abomination”) that it’s hard to understand your actual claim.
Are you trying to say that LLMs are useful now but you think that will stop being the case at some point in the future?
Look man, and I'm saying this not to you but to everyone who is in this boat; you've got to understand that after a while, the novelty wears off. We get it. It's miraculous that some gigabytes of matrices can possibly interpret and generate text, images, and sound. It's fascinating, it really is. Sometimes, it's borderline terrifying.
But, if you spend too much time fawning over how impressive these things are, you might forget that something being impressive doesn't translate into something being useful.
Well, are they useful? ... Yeah, of course LLMs are useful, but we need to remain somewhat grounded in reality. How useful are LLMs? Well, they can dump out a boilerplate React frontend to a CRUD API, so I can imagine it could very well be harmful to a lot of software jobs, but I hope it doesn't bruise too many egos to point out that dumping out yet another UI that does the same thing we've done 1,000,000 times before isn't exactly novel. So it's useful for some software engineering tasks. Can it debug a complex crash? So far I'm around zero for ten and believe me, I'm trying. From Claude 3.7 to Gemini 2.5, Cursor to Claude Code, it's really hard to get these things to work through a problem the way anyone above the junior dev level can. Almost unilaterally, they just keep digging themselves deeper until they eventually give up and try to null out the code so that the buggy code path doesn't execute.
So when Sabine says they're useless for interpreting scientific publications, I have zero trouble believing that. Scoring high on some shitty benchmarks whose solutions are in the training set is not akin to generalized knowledge. And these huge context windows sound impressive, but dump a moderately large document into them and it's often a challenge to get them to actually pay attention to the details that matter. The best shot you have by far is if the document you need it to reference definitely was already in the training data.
It is very cool and even useful to some degree what LLMs can do, but just scoring a few more points on some benchmarks is simply not going to fix the problems current AI architecture has. There is only one Internet, and we literally lit it on fire to try to make these models score a few more points. The sooner the market catches up to the fact that they ran out of Internet to scrape and we're still nowhere near the singularity, the better.
100% this. I think we should start producing independent evaluations of these tools for their usefulness, not for whatever made up or convoluted evaluation index the OpenAI, Google or Anthropic throw at us.
Hardly. I pretty much have been using LLM at least weekly (most of the time daily) since GPT3.5. I am still amazed. It's really, really hard to not be bullish for me.
It kinda reminds me the days I learned Unix-like command line. At least once a week, I shouted to me self: "What? There is a one-liner that does that? People use awk/sed/xargs this way??" That's how I feel about LLM so far.
> Well, are they useful? ... Yeah, of course LLMs are useful, but we need to remain somewhat grounded in reality. How useful are LLMs?
They are useful enough that they can passably replace (much more expensive) humans in a lot of noncritical jobs, thus being a tangible tool for securing enterprise bottom lines.
>they can dump out a boilerplate react frontend to a CRUD API
This is so clearly biased that it boarders on parody. You can only get out what you put in.
The real use case of current LLMs is that any project that would previously require collaboration can now be down solo with a much faster turnover. Of course in 20 years when compute finally catches up they will just be super intelligent AGI
I see a difference between seeing them as valuable in their current state vs being "bullish about LLMs" in the stock market sense.
The big problem with being bullish in the stock market sense is that OpenAI isn't selling the LLMs that currently exist to their investors, they're selling AGI. Their pitch to investors is more or less this:
> If we accomplish our goal we (and you) will have infinite money. So the expected value of any investment in our technology is infinite dollars. No, you don't need to ask what the odds are of us accomplishing our goal, because any percent times infinity is infinity.
Since OpenAI and all the founders riding on their coat tails are selling AGI, you see a natural backlash against LLMs that points out that they are not AGI and show no signs of asymptotically approaching AGI—they're asymptotically approaching something that will be amazing and transformative in ways that are not immediately clear, but what is clear to those who are watching closely is that they're not approaching Altman's promises.
The AI bubble will burst, and it's going to be painful. I agree with the author that that is inevitable, and it's shocking how few people see it. But also, we're getting a lot of cool tech out of it and plenty of it is being released into the open and heavily commoditized, so that's great!
I think that people who don't believe LLMs to be AGI are not very good at Venn diagrams. Because they certainly are artificial, general, and intelligent according to any dictionary.
I feel like LLMs are the same as the leap from "world before web search" to "world after web search." Yeah, in google, you get crap links for sure, and you have to wade through salesy links and random blogs. But in the pre-web-search world, your options were generally "ask a friend who seems smart" or "go to the library for quite a while," AND BOTH OF THOSE OPTIONS HAD PLENTY OF ISSUES. I found a random part in an old arduino kit I bought years ago, and GPT-4o correctly identified it and explained exactly how to hook it up and code for it to me. That is frickin awesome, and it saves me a ton of time and leads me to reuse the part. I used DeepResearch to research car options that fit my exact needs, and it was 100% spot on - multiple people have suggested models that DeepReearch did not identify that would be a fit, but every time I dig in, I find that DeepResearch was right and the alternative actually had some dealbreaker I had specified. Etc., etc.
In the 90s, Robert Metcalfe infamously wrote "Almost all of the many predictions now being made about 1996 hinge on the Internet’s continuing exponential growth. But I predict the Internet, which only just recently got this section here in InfoWorld, will soon go spectacularly supernova and in 1996 catastrophically collapse." I feel like we are just hearing LLM versions of this quote over and over now, but they will prove to be equally accurate.
Generic. For the Internet, more complex questions would have been "What are the potential benefits, what the potential risks, what will grow faster" etc. The problem is not the growth but what that growth means. For LLMs, the big clear question is "will they stop just being LLMs, and when will they". Progress is seen, but we seek a revolution.
It would be fine if it were sold that way, but there is so much hype. We're told that it's going to replace all of us and put us all out our jobs. They set the expectations so high. Like remember OpenAI showing a video of it doing your taxes for you? Predictions that super-intelligent AI is going to radically transform society faster than we can keep up? I think that's where most of the backlash is coming from.
> We're told that it's going to replace all of us and put us all out our jobs.
I think this is the source of a lot of the hype. There are people salivating at the thought of no longer needing to employ the peasant class. They want it so badly that they'll say anything to get more investment in LLMs even if it might only ever allow them to fire a fraction of their workers, and even if their products and services suffer because the output they get with "AI" is worse than what the humans they throw away were providing.
They know they're overselling it, but they're also still on their knees praying that by some miracle their LLMs trained on the collective wisdom of facebook and youtube comments will one day gain actual intelligence and they can stop paying human workers.
In the meantime, they'll shove "AI" into everything they can think of for testing and refinement. They'll make us beta test it for them. They don't really care if their AI makes your customer service experience go to shit. They don't care if their AI screws up your bill. They don't care if their AI rejects your claims or you get denied services you've been paying for and are entitled to. They don't care if their AI unfairly denies you parole or mistakenly makes you the suspect of a crime. They don't care if Dr. Sbaitso 2.0 misdiagnoses you. Your suffering is worth it to them as long as they can cut their headcount by any amount and can keep feeding the AI more and more information because just maybe with enough data one day their greatest dream will become reality, and even if that never happens a lot of people are currently making massive amounts of money selling that lie.
The problem is that the bubble will burst eventually. The more time goes by and AI doesn't live up to the hype the harder that hype becomes to sell. Especially when by shoving AI into everything they're exposing a lot of hugely embarrassing shortcomings. Repeating "AI will happen in just 10 more years" gives people a lot of time to make money and cash out though.
On the plus side, we do get some cool toys to play with and the dream of replacing humans has sparked more interest in robotics so it's not all bad.
Yeah, it won't do your taxes for you, but it can sure help you do them yourself. Probably won't put you out of your job either, but it might help you accomplish more. Of course, one result of people accomplishing more in less time is that you need fewer people to do the same amount of work - so some jobs could be lost. But it's also possible that for the most part instead, more will be accomplished overall.
Forget OpenAI ChatGPT doing your taxes for you. Now Gemini will write up your sales slides about Gouda cheese, stating wrongly in the process that gouda makes up about 50% of all cheese consumption worldwide :) These use-cases are getting more useful by the day ;)
I mean, it's been like 3 years. 3 years after the web came out was barely anything. 3 years after the first GPU was cool, but not that cool. The past three years in LLMs? Insane.
Things could stall out and we'll have bumps and delays ... I hope. If this thing progresses at the same pace, or speeds up, well ... reality will change.
Or not. Even as they are, we can build some cool stuff with them.
> And people just sit around, unimpressed, and complain that ... what ... it isn't a perfect superintelligence that understands everything perfectly?
The trouble is that, while incredibly amazing, mind blowing technology, it falls down flat often enough that it is a big gamble to use. It is never clear, at least to me, what it is good at and what it isn't good at. Many things I assume it will struggle with, it jumps in with ease, and vice versa.
As the failures mount, I admittedly do find it becoming harder and harder to compel myself to see if it will work for my next task. It very well might succeed, but by the time I go to all the trouble to find out it often feels that I may as well just do it the old fashioned way.
If I'm not alone, that could be a big challenge in seeing long-term commercial success. Especially given that commercial success for LLMs is currently defined as 'take over the world' and not 'sustain mom and pop'.
> the speed at which it is progressing is insane.
But same goes for the users! As a result the failure rate appears to be closer to a constant. Until we reach the end of human achievement, where the humans can no longer think of new ways to use LLMs, that is unlikely to change.
It's becoming clear to me that some people just have vastly different uses and use cases than I do. Summarizing a deep, cutting edge physics paper is I'm sure vasty different than summarizing a web page while I'm browsing HN, or writing a Python plugin for Icinga to monitor a web endpoint that spits out JSON.
The author says they use several LLMs every day and they always produce incorrect results. That "feels" weird, because it seems like you'd develop an intuition fairly quickly for the kinds of questions you'd ask that LLMs can and can't answer. If I want something with links to back up what is being said, I know I should ask Perplexity or maybe just ask a long-form prompt-like question of Google or Kagi. If I want a Python or bash program I'm probably going to ask ChatGPT or Gemini. If I want to work on some code I want to be in Cursor and am probably using Claude. For general life questions, I've been asking Claude and ChatGPT.
Running into the same issue with LLMs over and over for years, with all due respect, seems like the "doing the same thing and expecting different results" situation.
This is so true. I really hope she joins this conversation so we can have a productive discussion and understand what she's actually hoping to achieve.
The two sides are never going to understand each other because I suspect we work on entirely different things and have radically different workflows. I suspect that hackernews gets more use out of LLMs in general than the average programmer because they are far more likely to be at a web startup and more likely to actually be bottlenecked on how fast you can physically put more code in the file and ship sooner.
If you work on stuff that is at all niche (as in, stack overflow was probably not going to have the answer you needed even before LLMs became popular), then it's not surprising when LLMs can't help because they've not been trained.
For people that were already going fast and needed or wanted to put out more code more quickly, I'm sure LLMs will speed them up even more.
For those of us working on niche stuff, we weren't going fast in the first place or being judged on how quickly we ship in all likelihood. So LLMs (even if they were trained on our stuff) aren't going to be able to speed us up because the bottleneck has never been about not being able to write enough code fast enough. There are architectural and environmental and testing related bottlenecks that LLMs don't get rid of.
That's a good point, I've personally not got much use out of LLMs (I use them to generate fantasy names for my D&D campaign, but find they fall down for anything complex) - but I've also never got much use out of StackOverflow either.
I don't think I'm working on anything particularly niche, but nor is it cookie-cutter generic either, and that could be enough to drastically reduce their utility.
Two things can be true: e.g., that LLMs are incredible tech we only dreamed of having, and that they’re so flawed that they’re hard to put to productive use.
I just tried to use the latest Gemini release to help me figure out how to do some very basic Google Cloud setup. I thought my own ignorance in this area was to blame for the 30 minutes I spent trying to follow its instructions - only to discover that Gemini had wildly hallucinated key parts of the plan. And that’s Google’s own flagship model!
I think it’s pretty telling that companies are still struggling to find product-market fit in most fields outside of code completion.
It really depends on the task. Like Sabine, I’m operating on the very frontier of a scientific domain that is extremely niche. Every single LLM out there is worse than useless in this domain. It spits out incomprehensible garbage.
But ask it to solve some leet code and it’s brilliant.
The question I ask afterwords then is: is solving some leet code brilliant? Is designing a simple inventory system brilliant if they've all been accomplished already? My answer tends towards no, since they still make mistakes in the process, and it harms newer developers from learning.
I should start collecting examples, if only for threads like this. Recently I tried to llm a tsserver plugin that treats lines ending with "//del" as empty. You can only imagine all the sneaky failures in the chat and the total uselessness of these results.
Anything that is not literally millions (billions?) of times in the training set is doomed to be fantasized about by an LLM. In various ways, tones, etc. After many such threads I came to conclusion that people who find it mostly useful are simply treading water as they probably have done most of their career. Their average product is a react form with a crud endpoint and excitement about it. I can't explain their success reports otherwise, cause it rarely works on anything beyond that.
Sabine is lex Friedman for women. Stay in your lane about quantum physics and stop trying to opine on LLMs. I’m tired of seeing the huge amount of FUD from her.
Because it has a sample size of our collective human knowledge and language big enough to trick our brains into believing that.
As a parallel thought, it reminds of a trick derren brown did. He picked every horse correctly across 6 races. The person who he was picking for was obviously stunned, as were the audience watching it.
The reality of course is just that people couldn't comprehend that he just had to go to extreme and tedious lengths to make this happen. They started with 7000 people and filmed every one like it was going to be the "one" and then the probability pyramid just dropped people out. It was such a vast undertaking of time and effort that we're biased towards believing there must be something really happening here.
LLMs currently are a natural language interface to a Microsoft Encarta like system that is so unbelievably detailed and all encompassing that we risk accepting that there's something more going on there. There isn't.
> Wah, it can't write code like a Senior engineer with 20 years of experience!
No, that's not my problem with it. My problem with it is that inbuilt into the models of all LLMs is that they'll fabricate a lot. What's worse, people are treating them as authoritative.
Sure, sometimes it produces useful code. And often, it'll simply call the "doTheHardPart()" method. I've even caught it literally writing the wrong algorithm when asked to implement a specific and well known algorithm. For example, asking it "write a selection sort" and watching it write a bubble sort instead. No amount of re-prompts pushes it to the right algorithm in those cases either, instead it'll regenerate the same wrong algorithm over and over.
Outside of programming, this is much worse. I've both seen online and heard people quote LLM output as if it were authoritative. That to me is the bigger danger of LLMs to society. People just don't understand that LLMs aren't high powered attorneys, or world renown doctors. And, unfortunately, the incorrect perception of LLMs is being hyped both by LLM companies and by "journalists" who are all to ready to simply run with and discuss the press releases from said LLM companies.
> What's worse, people are treating them as authoritative. … I've both seen online and heard people quote LLM output as if it were authoritative.
Thats not an LLM problem. But indeed quite bothersome. Dont tell me what Chatgpt told you. Tell me what you know. Maybe you got it from ChatGPT and verified it. Great. But my jaw kind of drops when people cite an LLM and just assume it’s correct.
3rd Order Ignorance (3OI)—Lack of Process.
I have 3OI when I don't know a suitably efficient way to find out I don't know that I don't know something. This is lack of process, and it presents me with a major problem: If I have 3OI, I don't know of a way to find out there are things I don't know that I don't know.
—- not from an llm
My process: use llms and see what I can do with them while taking their Output with a grain of salt.
My company just broadly adopted AI. It’s not a tech company and usually late to the game when it comes to tech adoption.
I’m counting down the days when some AI hallucination makes its way all the way to the C-suite. People will get way too comfortable with AI and don’t understand just how wrong it can be.
Some assumption will come from AI, no one will check it and it’ll become a basic business input. Then suddenly one day someone smart will say “thats not true” and someone will trace it back to AI. I know it.
I assume at that point in time there will be some general directive on using AI and not assuming it’s correct. And then AI will slowly go out of favor.
People fabricated a lot too. Yesterday I spent far less time fixing issues in the far more complex and larger changes Claude Code managed to churn out than what the junior developer I worked with needed. Sometimes it's the reverse. But with my time factored in, working with Claude Code is generally more productive for me than working with a junior. The only reason I still work with a junior dev is as an investment into teaching him.
But this was true before LLMs. People would and still do take any old thing from an internet search and treat it as true. There is a known, difficult-to-remedy failure to properly adjudicate information and source quality, and you can find it discussed in research prior to the computer age. It is a user problem more than a system problem. In my experience, with the way I interact with LLMs, they are more likely to give me useful output than not, and this is borne out by mainstream non-edge-case academic peer-reviewed work. Useful does not necessarily equal 100% correct, just as a Google search does not. I judge and vet all information, whether from an LLM, search, book, paper, or wherever We can build a straw person who "always" takes LLM output as true and uses it as-is but those are the same people who use most information tools poorly, be they internet search, dictionaries, or even looking in their own files for their own work or sent mail (I say this as an IT professional who has seen the worker types from before the pre-internet days through now). In any case, we use automobiles despite others misusing them. But only the foolish among us completely take our hands off the wheel for any supposed "self-driving" features. While we must prevent and decry the misuse by fools, we cannot let their ignorance hold us back. Let's let their ignorance help make tools, as they help identify more undesirable scenarios.
> My problem with it is that inbuilt into the models of all LLMs is that they'll fabricate a lot. What's worse, people are treating them as authoritative.
The same is true about the internet, and people even used to use these arguments to try to dissuade people from getting their information online (back when Wikipedia was considered a running joke, and journalists mocked blogs). But today it would be considered silly to dissuade someone from using the internet just because the information there is extremely unreliable.
Many programmers will say Stack Overflow is invaluable, but it's also unreliable. The answer is to use it as a tool and a jumping off point to help you solve your problem, not to assume that its authoritative.
The strange thing to me these days is the number of people who will talk about the problems with misinformation coming from LLMs, but then who seem to uncritically believe all sorts of other misinformation they encounter online, in the media, or through friends.
Yes, you need to verify the information you're getting, and this applies to far more than just LLMs.
> I've even caught it literally writing the wrong algorithm when asked to implement a specific and well known algorithm
Happened to me as well. Wanted it to quickly write an algorithm for standard deviation over a stream of data, which is a text-book algorithm. It did it almost right, but messed up the final formula and the code gave wrong answers. Weird, considering some correct codes exist for that problem in Wikipedia.
I was part of preparing an offer a few weeks ago. The customer prepared a lot of documents for us - maybe 100 pages on total. Boss insisted on using chatgpt to summarize this stuff and read only the summary. I did a loner, slower, reading and cought on some topics chatgpt outright dropped. Our offer was based on the summary - and fell through because we missed these nuances.
But hey, boss did not read as much as previously...
> What's worse, people are treating them as authoritative.
So what? People are wrong all the time. What happens when people are wrong? Things go wrong. What happens then? People learn that the way they got their information wasn't robust enough and they'll adapt to be more careful in the future.
This is the way it has always worked. But people are "worried" about LLMs... Because they're new. Don't worry, it's just another tool in the box, people are perfectly capable of being wrong without LLMs.
Humans bullshit and hallucinate and claim authority without citation or knowledge. They will believe all manner of things. They frequently misunderstand.
The LLM doesn’t need to be perfect. Just needs to beat a typical human.
LLM opponents aren’t wrong about the limits of LLMs. They vastly overestimate humans.
> I mean, I can ask for obscure things with subtle nuance where I misspell words and mess up my question and it figures it out.
If you're lucky it figures it out. If you aren't, it makes stuff up in a way that seems almost purposefully calculated to fool you into assuming that it's figured everything out. That's the real problem with LLM's: they fundamentally cannot be trusted because they're just a glorified autocomplete; they don't come with any inbuilt sense of when they might be getting things wrong.
I see this complaint a lot, and frankly, it just doesn't matter.
What matters is speeding up how fast I can find information. Not only will LLMs sometimes answer my obscure questions perfectly themselves, but they also help to point me to the jargon I need to use to find that information online. In many areas this has been hugely valuble to me.
Sometimes you do just have to cut your losses. I've given up on asking LLMs for help with Zig, for example. It is just too obscure a language I guess, because the hallucination rate is too high to be useful. But for webdev, Python, matplotlib, or bash help? It is invaluable to me, even though it makes mistakes every now and then.
I am so confused too. I hold these beliefs at the same time, and I don't feel they don't contradict each other, but apparently for many people some of these do:
- LLMs are a miraculous technology that are capable of tasks far beyond what we believed would be achievable with AI/ML in the near future. Playing with them makes me constantly feel like "this is like sci-fi, this shouldn't be possible with 2025's technology".
- LLMs are fairly clueless for many tasks that are easy enough for humans, and they are nowhere near AGI. It's also unclear whether they scale up towards that goal. They are also worse programmers than people make them to be. (At least I'm not happy with their results.)
- Achieving AGI doesn't seem impossibly unlikely any more, and doing so is likely to be an existentially disastrous event for humanity, and the worst fodder of my nightmares. (Also in the sense of an existential doomsday scenario, but even just the tought of becoming... irrelevant is depressing.)
Having one of these beliefs makes me the "AI hyper" stereotype, another makes me the "AI naysayer" stereotype and yet another makes me the "AI doomer" stereotype. So I guess I'm all of those!
> but even just the tought of becoming... irrelevant is depressing
In my opinion, there can exist no AI, person, tool, ultra-sentient omniscient being, etc. that would ever render you irrelevant. Your existence, experiences, and perception of reality are all literally irreplaceable, and (again, just my opinion) inherently meaningful. I don't think anyone's value comes from their ability to perform any particular feat to any particular degree of skill. I only say this because I had similar feelings of anxiety when considering the idea of becoming "irrelevant", and I've seen many others say similar things, but I think that fear is largely a product of misunderstanding what makes our lives meaningful.
I guess that Sabine's beef with LLM's that they are hyped as a legit "human level assistant" -kind of thing by the business people, which they clearly aren't yet. Maybe I've just managed to... manage my expectations?
Back when handwriting recognition was a new thing I was greatly impressed by how good it was. This was primarily because being an engineer I knew how difficult the problem is to solve. %90 recognition seemed really good to me.
When I tried to use the technology that %90 meant 1 out of every 10 things I wrote were incorrect. If it had been a keyboard I would have thrown it in the trash. That is were my Palm ended up.
People expect their technology to do things better not almost as well as a human. Waymo with LIDAR hasn't killed people. Tesla, with camera only, has done so multiple times. I will ride in a Waymo never in a Tesla self driving car.
Anyone who doesn't understand this either isn't required to use to utility it provides or has no idea how to prompt it correctly. My wife is a bookkeeper. There are some tasks that are a pain in the ass without writing some custom code. In her case, we just saved her about 2 hours by asking Claude to do it. It wrote the code, applied the code to a CSV we uploadrd and gave us exactly what we needed in 2 minutes.
>Anyone who doesn't understand this either isn't required to use to utility it provides or has no idea how to prompt it correctly.
Almost every counter-criticism of LLMs almost boil down to
1. you're holding it wrong
2. Well, I use it $DAYJOB and it works great for me! (And $DAYJOB is software engineering).
I'm glad your wife was able to save 2 hours of work, but forgive me if that doesn't translate to the trillion dollar valuation OpenAI is claiming. It's strange you don't see the inherent irony in your post. Instead of your wife just directly uploading the dataset and a prompt, she first has to prompt it to write code. There are clear limitations and it looks like LLMs are stuck at some sort of wall.
It's definitely a tech that's here to stay, unlike block chain/nfts
But I mirror the confusion why people are still bullish on it.
The current valuation for it is because the market thinks that it's able to write code like a senior engineer and have AGI, because that's how they're marketed by the LLM providers.
I'm not even certain if they'll be ubiquitous after the venture capital investments are gone and the service needs to actually be priced without losing money, because they're (at least currently) mostly pretty expensive to run.
There seems to be a widely held misconception that company valuations have any basis in the underlying fundamentals of what the companies do. This is not and has not been the case for several years. The US stock market’s darlings are Kardashians, they are valuable for being valuable the way the Kardashians are famous for being famous.
In markets, perception is reality, and the perception is that these companies are innovative. That’s it.
NFT is still a great tool if you want a bunch of unique tokens as part of a blockchain app. ERC-721 was proven a capable protocol in a variety of projects. What it isn't, and never will be, is an amazing investment opportunity, or a method to collect cool rare apes and go to yacht parties.
LLMs will settle in and have their place too, just not in the forefront of every investors mind.
I am more than happy to pay for access to LLMs, and models continue to get smaller and cheaper. I would be very surprised if they are not far more widely used in 5 or 10 years time than they are today.
Even if the VC-backed companies jacked up their prices, the models that I can run on my own laptop for "free" now are magical compared to the state of the art from 2 years ago. Ubiquity may come from everyone running these on their own hardware.
Takes like yours are just crazy given the pace of things. We can argue all day if people are "too bullish" or literally on the market size of enterprise AI, but truly, absolutely no one knows how good these things will get and the problems they'll overcome in the next 5 years. You saying "I am confused on why people are still bullish" is implicitly building in some huge assumptions about the near future.
Most “AI” companies are simply wrapping the ChatGPT API in some form. You can tell from the job posts.
They aren’t building anything themselves. I find this to be disingenuous as best, and is a sign to me of bubble attribution.
I also think that re-branding Machine Learning as AI to also be disingenuous.
These technologies of course have their use cases and excel in some things, but this isn’t the ushering of actual, sapient intelligence, that for the majority of the term’s existence was the de facto agreed standard for the term “AI”. This technology does lack the actual markers of what is generally accepted as intelligence to begin with
Remember the quote that IBM thought there would be a total market for maybe 10 or 15 computer computers in the entire world? They were giant, and expensive, and very limited in application.
Tesla is valued based on the hope that it'll be the first to full self-driving cars. I don't think stock markets need to make sense, you invest in things that if true, could have huge growth, that's why LLM is being invested in, because alternatives will make you some ROI, but if LLM do break through major disruption in even a handful of large markets, your ROI will be huge.
That's not really true. Just the entertainment value alone is already causing OpenAI to rate limit its systems, and they're buying up significant amounts of NVIDIA's capacity, and NVIDIA itself is buying up significant portions of the entire world's chip-making budget. Even if just limited to entertainment, the value is immense, apparently.
That's a funny comparison, I can and do use cryptocurrency to pay web hosting, VPN and a few other things as it's become the native currency of the internet. I love llms too but agree with the parent comment that says it's inevitable they'll be replaced with something better well Bitcoin seems to be sticking around for the long long term.
> The current valuation for it is because the market thinks that it's able to write code like a senior engineer and have AGI, because that's how they're marketed by the LLM providers.
No it's not. If it was valued for that it'd be at least 10X what it is now.
Blockchain is here to stay, this is way past the point of "believing in the tech" - recently an wss:// order book exchange (Hyperliquid) crossed $1T volume traded, and they started in 2023.
Blockchains are becoming real-time data structures where everyone has admin level read-only access to everyone.
It's more like duct-taping a VR headset to your head, calibrating your environment to a bunch of cardboard boxes and walls, and calling it a holodeck. It actually kinda works until you push at it too hard.
It reminds me a lot of when I first started playing No Man's Sky (the video game). Billions of galaxies! Exotic, one of a kind life forms on every planet! Endless possibilities! I poured hundreds of hours into the game! But, despite all the variety and possibilities, the patterns emerge, and every 'new' planet just feels like a first-person fractal viewer. Pretty, sometimes kinda nifty, but eventually very boring and repetitive. The illusion wore off, and I couldn't really enjoy it anymore.
I have played with a LOT of models over the years. They can be neat, interesting, and kinda cool at times, but the patterns of output and mistakes shatters the illusion that I'm talking to anything but a rather expensive auto-complete.
I´m in the same boat and I think it boils down to this: some people are actually quite passive, while others are more active in their use of technology.
It`d take more time for me to flesh this out than I want to give but the basic idea is I am not just sitting there "expecting things". I´ve been puzzled too at why so many people don´t seem to get it or are so frustrated like this lady, and in my observation this is their common element. It just looks very passive to me, the way they seem to use the machines and expect a result to be "given" to them.
PS. It reminds me very strongly of how our parent generation uses computers. Like the whole way of thinking is different, I cannot even understand why they would act certain ways or be afraid of acting in other ways, it´s like they use a different compass or have a very different (and wrong) model in their head of how this thing in front of them works.
> And people just sit around, unimpressed, and complain that ... what ... it isn't a perfect superintelligence that understands everything perfectly?
IMO there are two distinct reasons for this:
1. You've got the Sam Altman's of the world claiming that LLMs are or nearly are AGI and that ASI is right around the corner. It's obvious this isn't true even if LLMs are still incredibly powerful and useful. But Sam doing the whole "is it AGI?" dance gets old really quick.
2. LLMs are an existential threat to basically every knowledge worker job on the planet. Peoples' natural response to threats is to become defensive.
I’m not sure how anyone can claim number 2 is true, unless it’s someone who is a programmer doing mostly grunt code and thinks every knowledge worker job is similar.
Just off the top of my head there are plenty of knowledge worker jobs where the knowledge isn’t public, nor really in written form anywhere. There just simply wouldn’t be anything for AI to train on.
> LLMs are an existential threat to basically every knowledge worker job on the planet.
Given the typical problems of LLMs they are not. You still need them to check the results. It’s like FSD, impressive when it works, bad if not, scary because you never known beforehand when it’s failing
how much time do I need to devote to see anything but garbage?
For reference, I program systems code in C/C++ in a large, proprietary codebase.
My experiences with OpenAI(a year ago or more), and more recently, Cursor, Grok-v3 and Deepseek-r1, were all failures. The later two started out OK and got worse over time.
What I haven't done is asked "AI" to whip up a more standard application. I have some ideas(an ncurses frontend to p4 written in python similar to tig, for instance), but haven't gotten around to it.
I want this stuff to work, but so far it hasn't. Now I don't think "programming" a computer in english is a very good idea anyway, but I want a competent AI assistant to pair program with. To the degree that people are getting results, to me it seems they are leveraging very high-level APIs/libraries of code which are not written by AI and solving well-solved, "common" problems(simple games, simple web or phone apps). Sort of like how people gloss over the heavy lifting done by language itself when they praise the results from LLMs in other fields.
I know it eventually will work. I just don't know when. I also get annoyed by the hype of folks who think they can become software engineers because they can talk to an LLM. Most of my job isn't programming. Most of my job is thinking about what the solution should be, talking to other people like me in meetings, understanding what customers really want beyond what they are saying, and tracking what I'm doing in various forms(which is something I really do want AI to help me with).
Vibe coding is aptly named because it's sort of the VB6 of the modern era. Holy cow! I wrote a Windows GUI App!!!. It's letting non-programmers and semi-programmers(the "I write glue code in Python to munge data and API ins/outs" crowd) create usable things. Cool! So did spreadsheets. So did Hypercard. Andrej tweeting that he made a phone app was kinda cool but also kinda sad. If this is what the hundreds of billions spent on AI(and my bank account thanks you for that) delivers then the bubble is going to pop soon.
Far from just programming too. They're useful for so many things. I use it for quickly coming up with shell scripts (or even complex piped commands (or if I'm being honest even simple commands since it's easier than skimming the man page)). But I also use it to bounce ideas off of when negotiating contracts. Or to give me a spoiler-free reminder of a plot point I'd forgotten in a book or TV series. Or to explain legal or taxation issues (which I of course verify, but it points me in the right direction). Or any number of other things.
As the parent says, while far from perfect, they're an incredible aid in so many areas. When used well, they help you produce not just faster but also better results. The only trick really is that you need to treat it as a (very knowledgeable but overconfident) collaborator rather than an oracle.
I love using it to boilerplate code for a new API I want to integrate. Much better than having to manually search. In the near future, not knowing how to effectively use AI to enhance productivity will be a disadvantage to potential employers.
I use ChatGPT all the time. I really like it. It's not perfect; how I've described it (and I doubt that I'm unique in this): it's like having a really smart and eager intern at your disposal.
I say "intern" in the sense that its error-prone and kind of inexperienced, but also generally useful. I can ask it to automatically create a lot of the bootstrapping or tedious code that I always dread writing so that I can focus on the fun stuff, which is often the stuff that's pawned off onto interns and junior-level engineers. I think for the most part, when you treat it like that, it lives up to and sometimes even surpasses expectations.
I mean, I can't speak for everyone, but whenever I begin a new project, a large percentage of the first ~3 hours is simply copying and pasting and editing from documentation, either an API I have to call or some bootstrapping code from a framework or just some cruft to make built-in libraries work how you want. I hate doing all that, it actively makes me not want to start a new project. Being able to get ChatGPT to give me stuff that I need to actually get started on my project has made coding a lot more fun for me again. At this point, you can take my LLM from my cold dead hands.
I do think it will keep getting better, but I'm also at a point where even if it never improves I will still keep using it.
I guess it totally depends on the specific use cases.
Just yesterday, I asked Gemini 2.5 for how to test a Laravel artisan command that internally calls another artisan command using the Pest testing framework.
Gemini showed me a method that looked perfect.
But when I tried to execute it, I got an error that the method doesn't exist.
So I asked Gemini about this, and it told me: "Ah, apologies! That's a classic sign that you might be using a version of Laravel older than 5.7."
So I told it that I am actually using the latest release (12.3), to which it responded with: "You're right to question things when you get an error! However, let's clarify the Laravel version first.
As of today, March 27, 2025, the latest stable version of Laravel is Laravel 11, which was released in March 2024. Laravel 12 has not been released yet (it's expected roughly in Q1 2026 based on the usual schedule).
Could you please double-check the exact Laravel version you are using?"
So it did not believe me and I had to convince it first that I was using a real version. This went on for a while, with Gemini not only hallucinating stuff, but also being very persistent and difficult to convince of anything else.
Well, in the end it was still certain that this method should exist, even though it could not provide any evidence for it and my searching through the internet and the Git history of the related packages did also not provide any results.
So I gave up and tried it with Claude 3.7 which could also not provide any working solution.
In the end, I found an entirely different solution for my problem, but that wasn't based on anything the AIs told me, but just my own thinking and talking to other software developers.
I would not go that far to call these AIs useless. In software development they can help with simple stuff and boilerplate code, and I found them a lot more helpful in creative work. This is basically the opposite from what I would have expected 5 years ago ^^
But for any important tasks, these LLMs are still far too unreliable.
They often feel like they have a lot of knowledge, but no wisdom.
They don't know how to apply their knowledge ideally, and they often basically brute-force it with a mix of strange creativity and statistical models that are apparently based on a vast amount of internet content that has big parts of troll content and satire.
My issue with higher ups pushing LLMs is that what slows me down at work is not having to write the code. I can write the code. If all I had to do was sit down and write code, then I would be incredibly productive because I'm a good programmer.
But instead, my productivity is hampered by issues with org communication, structure, siloed knowledge, lack of documentation, tech debt, and stale repos.
I have for years tried to provide feedback and get leadership to do something about these issues, but they do nothing and instead ask "How have you used AI to improve your productivity?"
Ive had the same experience as you, and also rather recently. I had to learn two lessons: first, what I could trust it with (as with Wikipedia when it was new), and second, what makes sense to ask it (as with YouTube when it was new). Once I got that down, it is one fabulous tool to have on my belt, among many other tools.
Thing is, the LLMs that I use are all freeware, and they run on my gaming PC. Two to six tokens per second are alright honestly. I have enough other things to take care of in the meantime. Other tools to work with.
I don't see the billion dollar business. And even if that existed, the means of production would be firmly in the hands of the people, as long as they play video games. So, have we all tripled our salaries?
If we haven't, is that because knowledge work is a limited space that we are competing in, and LLMs are an equalizer because we all have them? Because I was taught that knowledge work was infinite. And the new tool should allow us to create more, and better, and more thoroughly. And that should get us all paid better.
Depends on your use case. If you don't need them to be the source of truth, then they work great, but if you do, the experience sucks because they're so unreliable.
The problems start when people start hyperventilating because they think since LLMs can generate tests for a function for you, that they'll be replacing engineers soon. They're only suitable for generating output that you can easily verify to be correct.
LLM training is designed to distill a massive corpus of facts, in the form of token sequences, into a much, much smaller bundle of information that encodes (somehow!) the deep structure of those facts minus their particulars.
They’re not search engines, they’re abstract pattern matchers.
I asked Grok to describe a picture I took of me and my kid at Hilton Head island. Based on the plant life it guesses it was a southeast barrier island in Georgia or the Carolinas. It guessed my age and my son’s age. LLMs are completely insane tech for a 90s kid. The first fundamental advance in tech I’ve seen in my lifetime—like what it must’ve been like for people who used a telephone for the first time, or watched a television.
Flat TVs, digital audio players (the iPod), the smartphone, laptops, the smartwatches,... You have a very selective definition of advance in tech. Compare today (minus LLMs) with any movies depicting life in the nineties and you can see how much tech have developed.
There are basically 3 categories of LLM users (very roughly).
1. People creating or dealing with imprecise information. People doing SEO spam, people dealing with SEO spam, almost all creative arts people, people writing corporatese- or legalese- documents or mails, etc. For these tasks LLMs are god-like.
2. People dealing with precise information and or facts. For these people LLMs is no better than a parrot.
3. Subset of 2 - programmers. Because of the huge amount of stolen training data, plus almost perfect proofing software is the form of compilers, static analyzers etc. for this case LLMs are more or less usable, the more data was used the better (JS is the best as I understand).
This is why people's reaction is so polarizing. Their results differ.
The crisis in programming hasn’t been writing code. It has been developing languages and tools so that we can write less of it that is easy to verify as correct. These tools generate more code. More than you can read and more than you will want to before you get bored and decide to trust the output. It is trained on the most average code available that could be sucked up and ripped off the Internet. It will regurgitate the most subtle errors that humans are not good at finding. It only saves you time if you don’t bother reading and understanding what it outputs.
I don’t want to think about the potential. It may never materialize. And much of what was promised even a few years ago hasn’t come to fruition. It’s always a few years away. Always another funding round.
Instead we have massive amounts of new demand for liquid methane, infrastructure struggling to keep up, billions of gallons of fresh water wasted, all so that rich kids can vibe code their way to easy money and realize three months later they’ve been hacked and they don’t know what to do. The context window has been lost and they ran out of API credits. Welcome to the future.
Yeah basically this. If I look at how it helps me as an individual, I can totally see how AI can sometimes be useful. If I take a look at what societal effect of AI, it becomes apparent that AI just is a net negative. Some examples:
- AI is great for disinformation
- AI is great at generating porn of women without their consent.
- Open source projects massively struggle as AI scrapers DDOS them.
- AI uses massive amounts of energy and water, most importantly the expectation is that energy usage will rise when we drastically in a world where we need to lower it. If Sam Altman gets his way, we're toast.
- AI makes us intellectually lazy and worse thinkers. We were already learning less and less in school because of our impoverished attention span. This is even worse now with AI.
- AI makes us even more dependent on cloud vendors and third-parties, further creating a fragile supply chain.
Like AI ostensibly empowers us as individuals, but in reality I think it's a disservice, and the ones it truly empowers are the tech giants, as citizens become dumber and even more dependent on them and tech giants amass more and more power.
I can't believe I had to dig this deep to find this comment.
I have yet to see an AI-generated image that was "really cool".
AI images and videos strike me as the coffee pods of the digital world -- we're just absolutely littering the internet with garbage. And as a bonus, it's also environmentally devastating to the real world!
I live nearby a landfill, and go there often to get rid of yard waste, construction materials, etc. The sheer volume of perfectly serviceable stuff people are throwing out in my relatively small city (<200k) is infuriating and depressing. I think if more people visited their local landfills, they might get a better sense for just how much stuff humans consume and dispose. I hope people are noticing just how much more full of trash the internet has become in the last few years. It seems like it, but then I read this thread full of people that are still hyped about it all and I wonder.
This isn't even to mention the generated text... it's all just so inane and I just don't get it. I've tried a few times to ask for relatively simple code and the results have been laughable.
If you ask for obscure things how do you know you are getting right answers? From my experience unless the thing you are looking for is not found easily with a google search LLMs have no hope getting it correct. This is mostly trying to code against obscure API that isn’t well documented and the little documentation there is is spread across multiple wikis. And the LLMs keep hallucinating functions that simply do not exist
It is an amazing technology and like crypto/blockchain it is nerdy to understand how it works and play with it. I think there are two things at stake here:
1. Some people are just uncomfortable with it because it “could” replace their jobs.
2. Some people are warning that the ecosystem bubble is significantly out of proportions. They are right and having the whole stock market, companies and US economy attached to LLMs is just down right irresponsible.
> Some people are just uncomfortable with it because it “could” replace their jobs.
What jobs are seriously at risk of being totally replaced by LLM's? Even in things like copywriting and natural language translation, which is somewhat of a natural "best case" for the underlying tech, their output is quite sub par compared to the average human's.
> And people just sit around, unimpressed, and complain that ... what ... it isn't a perfect superintelligence that understands everything perfectly
Hossenfelder is a scientist. There's a certain level of rigour that she needs to do her job, which is where current LLMs often fall down. Arguably it's not accelerating her work to have to check every single thing the LLM says.
I use them everyday and they save me so much time and enable me to do things that I wouldn't be able to do otherwise just due to the amount of time it would take.
I think some people just aren't using them correctly or don't understand their limitations.
They are especially helpful for helping me get over thought paralysis when starting new project.
The frustration of using an LLM is greater than the frustration of doing it myself. If it's going to be a tool, it needs to work. Otherwise, it's just a research a toy.
They can do fun and interesting stuff, but we keep hearing how they’re going to replace human workers, and too many people in positions of power not only believe they are capable of this, but are taking steps to replace people with LLMs.
But while they are fun to play with, anything that requires a real answer, but can’t be directly and immediately checked, like customer support, scientific research, teaching, legal advice, identifying humans, correctly summarizing text - LLMs are very bad at these things, make up answers, mix contexts inappropriately, and more.
I’m not sure how you can have played with LLMs so much and missed this. I hope you don’t trust what they say about recipes or how to handle legal problems or how to clean things or how to treat disease or any fact-checking whatsoever.
>I’m not sure how you can have played with LLMs so much and missed this. I hope you don’t trust what they say about recipes or how to handle legal problems or how to clean things or how to treat disease or any fact-checking whatsoever.
This is like a GPT3.5 level criticism. o1-pro is probably better at pure fact retrieval than most PhDs in any given field. I challenge you to try it.
The main issue is that if you ask most LLMs to do something they aren't good at, they don't say "Sorry, I'm not sure how to do that yet," they says "Sure, absolutely! Here you go:" and proceed to make things up, provide numbers or code that don't actually add up, and make up references and sources.
To someone who doesn't actually check or have the knowledge or experience to check the output, it sounds like they've been given a real, useful answer.
When you tell the LLM that the API it tried to call doesn't exist it says "Oh, you're right, sorry about that! Here's a corrected version that should work!" and of course that one probably doesn't work either.
Yes. One of my early observations about LLMs was that we've now produced software that regularly lies to us. It seems to be a quite intractable problem. Also, since there's no real visibility as to how an LLM reaches a conclusion, there's no way to validate anything.
One takeaway from this is that labelling LLMs as "intelligent" is a total misnomer. They're more like super parrots.
For software development, there's also the problem of how up to date they are. If they could learn on the fly (or be constantly updated) that would help.
They are amazing in some ways, but they've been over-hyped tremendously.
I agree, they are an amazing piece of technology, but the investment and hype doesn't match the reality. This might age like milk, but I don't think OpenAI is going to make it. They burnt $9B to lose $5B in 2024, trying to raise money like their life depends on it.. because their life depends on it. From what I can tell, none of the AI model produces are profiting from their model usage at this point, except maybe Deepseek. This will be a market, they are useful, astonishing impressive even, but IMO they are either going to become waaayy more expensive to use and/or/combo of market/investment will greatly shrink to be sustainable.
When I saw GPT-3 in action in 2023, I couldn’t believe my eyes. I thought I was being tricked somehow. I’d seen ads for “AI-powered” services and it was always the same unimpressive stuff. Then I saw GPT-3 and within minutes I knew it was completely different. It was the first thing I’d ever seen that felt like AI.
That was only a few years ago. Now I can run something on my 8GB MacBook Air that blows GPT-3 out of the water. It’s just baffling to me when people say LLM’s are useless or unimpressive. I use them constantly and I can still hardly believe they exist!!
LLMs are better at formally verifiable tasks like coding, also coding makes more money on a pure demand basis so development for it gets more resources. In descriptive science fields, it's not great because science fields don't generate a lot of text compared to other things, so the training data is dwarfed by the huge corpus of general internet text. The software industry created the internet and loves using it, so they have published a lot more text in comparison. It can be really bad in bio for example.
Is your testing adversarial or merely anecdotal curiosity? If you don't actively look for it why would you expect to find it?
It's bad technology because it wastes a lot of labor, electricity, and bandwidth in a struggle to achieve what most human beings can with minimal effort. It's also a blatant thief of copyrighted materials.
If you want to like it, guess what, you'll find a way to like it. If you try to view it from another persons use case you might see why they don't like it.
> can ask for obscure things with subtle nuance where I misspell words and mess up my question and it figures it out. It talks to me like a person. It generates really cool images. It helps me write code. And just tons of other stuff that astounds me.
It is an impressive technology but is it US$244.22bn [1] impressive (I know this stat is supposed to account for computer vision as well but seeing as to how LLMs are now a big chunk of that I think it's a safe assumption)? It's projected to grow to over US$1tr by 2031. That's higher than the market size of commercial aviation at its peak [2]. I'm sorry if I agree that a cool chatbot is not approximately as important as flying.
You no longer have the console as the primary interface, but a GUI, which 99.9+% of computer users control via a mouse.
You no longer have the screen as the primary interface, but an AUI, which 99.9+% of computer users control via a headset, earbuds, or a microphone and speaker pair.
You mostly speak and listen to other humans, and if you're not reading something they've written, you could have it read to you in order to detach from the screen or paper.
You'll talk with your computer while in the car, while walking, or while sitting in the office.
An LLM makes the computer understand you, and it allows you to understand the computer.
Even if you use smart glasses, you'll mostly talk to the computer generating the displayed results, and it will probably also talk to you, adding information to the displayed results. It's LLMs that enable this.
Just don't focus too much on whether the LLM knows how high Mount Kilimanjaro is; its knowledge of that fact is simply a hint that it can properly handle language.
Still, it's remarkable how useful they are at analyzing things.
LLMs have a bright future ahead, or whatever technology succeeds them.
I don’t even argue that they might get useful at some point, but when I point a mouse at a button and press the button it usually results in a reliable action.
When I use the LLM (I have so far tried: Claude, ChatGPT, DeepSeek, Mistral) it does something but that something usually isn’t what I want (~the linked tweet).
Prompting, studying and understanding the result and then cleaning up the mess for the low price of an expensive monthly sub leaves me with worse results than if I did the thing myself, usually takes longer and often leaves me with subtle bugs I’m genuinely afraid of growing into exploitable vulnerabilities.
Using it strictly as a rubber duck is neat but also largely pointless.
Since other people are getting something out of the tech, I’ll just assume that the hammer doesn’t fit my nails.
It's like a mouse that some variable proportion of the time pretends it's moved the cursor and clicked a button, but actually it hasn't and you have to put a lot of work in to find out whether it did or didn't do what you expected.
It used to be annoying enough just having to clean the trackball, but at least you knew when it wasn't working.
I think it’s more that the people who are boosting LLMs are claiming that perfect super intelligence is right around the corner.
Personally, I look back at how many years ago it was that we were seeing claims that truck drivers were all going to lose there jobs and society would tear itself apart over it within the next few years… and yet here we still are.
I'm completely with you. The technology is absolutely fascinating in its own right.
That said, I do experience frustrations:
- Getting enraged when it messes up perfectly good code it wrote just 10 minutes ago
- Constantly reminding it we're NOT using jest to write tests
- Discovering it's created duplicate utilities in different folders
There's definitely a lot of hand-holding required, and I've encountered limitations I initially overlooked in my optimism.
But here's what makes it worthwhile: LLMs have significantly eased my imposter syndrome when it comes to coding. I feel much more confident tackling tasks that would have filled me with dread a year ago.
I honestly don't understand how everyone isn't completely blown away by how cool this technology is. I haven't felt this level of excitement about a new technology since I discovered I could build my own Flash movies.
It depends. For small tasks like summarization or self-contained code snippets, it’s really good—like figuring out how to inspect a binary executable on Linux, or designing a ranking algorithm for different search patterns. If you only want average performance or don’t care much about the details, it can produce reasonable results without much oversight.
But for larger tasks—say, around 2,000 lines of code—it often fails in a lot of small ways. It tends to generate a lot of dead code after multiple iterations, and might repeatedly fail on issues you thought were easy to fix. Mentally, it can get exhausting, and you might end up rewriting most of it yourself. I think people are just tired of how much we expect LLMs to deliver, only for them to fail us in unexpected ways. The LLM is good, but we really need to push to understand its limitations.
This is true. But it needs to be more than a toy if it is to be economically viable.
So far the industrial applications haven't been that promising, code writing and documentation is probably the most promising but even there it's not like it can replace a human or even substantially increase their productivity.
I think its perception of usefulness depends on how often you ask/google questions. If you are constantly wondering about X thing, LLMs are amazing - especially compared to previous alternatives like googling or asking on Reddit.
If you don’t constantly look for information, they might be less useful.
I'm a senior engineer with 20 years of experience and mostly find all of the AI bs for the last couple of years to be occasionally helpful for general stuff but absolutely incompetent when I need help mildly complicated tasks.
I did have a eureka moment the other day with deepseek and a very obscure bug I was trying to tackle. One api query was having a very weird, unrelated side effect. I loaded up cursor with a very extensive prompt and it actually figured out the call path I hadn't been able to track down.
Today, I had a very simple task that eventually only took me half an hour to manually track. But I started with cursor using very similar context as the first example. It just kept repeatedly dreaming up non-existent files in the PR and making suggestions to fix code that doesn't exist.
So what's the worth to my company of my very expensive time? Should I spend 10,20,50 percent of my time trying to get answers from a chatbot, or should I just use my 20 years of experience to get the job done?
I’ve been playing with Gemini 2.5 pro throwing all kinds of problems that will help me with personal productivity and it’s mostly one shoting them. I’m still in disbelief tbh.
A lot of people who don’t understand how to use LLM effectively will be at an economic disadvantage.
Can you give some examples? Do you mean things like "How do I control my crippling anxiety", things like "What highways would be best to take to Chicago", things like "Write me a Python library to parse the file format in this hex dump", or things like "What should I make for dinner"?
“The growth of the Internet will slow drastically, as the flaw in ‘Metcalfe’s law' becomes apparent: most people have nothing to say to each other! By 2005, it will become clear that the Internet’s impact on the economy has been no greater than the fax machine’s”
Same as reading books, Internet, Wikipedia, working towards/keeping your health and fitness, etc...
The quote about books being a mirror reflecting genius or idiocy seems to apply.
I see LLMs a kind of hyper-keyboard. Speeding up typing AND structuring content, completing thoughts, and inspiring ideas.
Unlike a regular keyboard, an LLM transforms input contextually. One no longer merely types but orchestrates concepts and modulates language, almost like music.
Yet mastery is key. Just as a pianist turns keystrokes into a symphony through skill, a true virtuoso wields LLMs not as a crutch but as an amplifier of thought.
As a 50+ nerd, for decades I carried the idea that can't we just build a sufficiently large neural net, throw some data at it and have somehow be usefully intelligent? So it's kind of showing strong signs of something I've been waiting for.
In the 70's I read in some science book for kids about how one day we will likely be able to use light emitting diodes for illumination instead of light bulbs, and this "cold light" will save us lots of energy. Waited out that one too; it turned out so.
I’m reminded of how I always think current cutting edge good examples of CG in movies looks so real and then, consistently, when I watch it again in 10 years it always looks distractingly shitty.
Perhaps you have already paid off your mortgage and saved up a million dollars for retirement? And you're not threatened by dismissal or salary reduction because supposedly "AI will replace everyone."
By the way, you don't need to be a 50+ year old nerd. Nerds are a special culture-pen where smart straight-A students from schools are placed so they can work, increase stakeholder revenues, and not even accidentally be able to do anything truly worthwhile that could redistribute wealth in society.
> And people just sit around, unimpressed, and complain that ... what ... it isn't a perfect superintelligence that understands everything perfectly
More like we note the frequency with which these tools produce shallow bordering on useless responses, note the frequency with which they produce outright bullshit, and conclude their output should not be taken seriously. This smells like the fervor around ELIZA, but with several multinational marketing campaigns behind it pushing.
Yeah, like I. I. Rabi said in regard to people no longer being amazed by the achievements of physics, "What more do you want, mermaids?"
Anyone who remembers further back than a decade or so remembers when the height of AI research was chess programs that could beat grandmasters. Yes, LLMs aren't C3PO or the like, but they are certainly more like that than anything we could imagine just a few years ago.
The speed at which anything progresses is impressive if you're not paying attention while other people toil away on it for decades, until one day you finally look up and say, "Wow, the speed at which this thing progressed is insane!"
I remember seeing an AI lab in the late 1980's and thinking "that's never going to work" but here we are, 40 years later. It's finally working.
I'm glad I'm not the only person in awe with LLMs. It feels like it came straight out of science fiction novel. What does it take to impress people nowadays?
I feel like if teleportation was invented tomorrow, people would complain that it can't transport large objects so it's useless.
I often ask "So you say LLMs are worthless because you can't blindly trust the first thing they say? Do you blindly trust the first google search result? Do you trust every single thing your family members tell you?" It reminds me of my high school teachers saying Wikipedia can't be trusted.
Yeah the amount of "piffle work" that LLMs save me is astounding. Sure, I can look up fifty different numbers and copy them into excel. Or I can just tell an LLM "make a chart comparing measurements XYZ across devices ABC" and I'm looking at the info right there.
Probably because you don't have the same use-case as them... doing "code" is an "easy" use-case, but pondering on a humanities subject is much harder... you cannot "learn the structure" of humanities, you have to know the facts... and LLMs are bad at that
Because we're being told it is a perfect super intellegence, that it is going to replace senior engineers. The hype cycle is real, and worse than blockchain ever was. I'm sure llms will be able to code a full enterprise app about the same time moon coin replaces $usd.
I wholeheartedly agree with you and it’s funny reading the replies to your comment.
Basically people just doubling down on everything you just described. I can’t quite put a finger on it but it has a tinge of insecurity or something like that, hope that’s not the case and me just misinterpreting
It's like computer graphics and VR: Amazing advances over the years, very impressive, fun, cool, and by no means a temporary fad...
... But I do not believe we're on the cusp of a Lawnmower-Man future where someone's Metaverse eats all the retail-conference-halls and movie-theaters and retail-stores across the entire globe in an unbridled orgy of mind-shattering investor returns.
Similarly, LLMs are neat and have some sane uses, but the fervor about how we're about to invent the Omnimind and usher in the singularity and take over the (economic) world? Nah.
Today's models are far from autonomous thinking machines. It is a cognitive bias among the masses that agree. It is just a giant calculator. It predicts "the most probable next word" from a sea of all combinations of next words.
I don't see it as a bigger leap than the internet itself. I recall needing books on my desk or a road trip to the local bookshop to find out coding answers. Stack Overflow beats AI most days, but the LLMs are another nice tool.
For exploring topics in a shallow fashion is fine with LLMs, doing anything deep is just too unreliable due to hallucination. All models I’ve talked to desperately want to give a positive answer, and thus will often just lie.
Indeed, it is the stuff of science fiction, and the you get an "akshually, it's just statistics" comment. I feel people projecting their fears, because deep down, they're simply afraid.
I like LLMs for what they are. Classifiers. I don’t trust them as search engines because of hallucinations. I use them to get a bearing on a subject but then I’ll turn to Google to do the real research.
I go back and forth. I share your amazement. I used Gemini Deep Research the other day and was blown away. It claimed to go read 20 websites, I showed its "thinking" and steps. Its conclusions at each step. Then it wrote a large summary (several pages)
On the other hand, I saw github recently added Copilot as a code reviewer. For fun I let it review my latest pull request. I hated its suggestions but could imagine a not too distant future where I'm required by upper management to satisfy the LLM before I'm allowed to commit. Similarly, I've asked ChatGPT questions and it's been programmed to only give answers that Silicon Valley workers have declared "correct".
The thing I always find frustrating about the naysayers is that they seem to think how it works today is the end if it. Like I recently listened to an episode of Econtalk interviewing someone on AI and education. See lives in the UK and used Tesla FSD as an example of how bad AI is. Yet I live in California and see Waymo mostly working today and lots of people using it. I believe she wouldn't have used the Tesla FSD example, and would possibly have changed her world view at least a little, if she'd updated on seeing self driving work.
What you're impressed with is 40% human skill in creating an LLM, 0.5% value created by the model. And 59.5% the skills of all the people it ate and is now trying to destroy the livelihood of
As others have pointed out already, the hype about writing code like senior engineer, or in general acting as a competent assistant is what created the expectation in the first place. They keep over-promising, but underdelivering. Who is the guy talking about AGI most of the time? Could it be the top-executive of one of the largest gen AI companies, do you think? I won't deny it has occasionally a certain 'star-trek-computer' flair to it, but most of the time it feels like having a heavily degraded version of "rain man". He may count your cards perfectly one moment, then will get stuck trying to untie his shoes. I stopped counting how many times it produced just outright wrong outputs, to the point of suggesting literally the opposite of what one is asking of them. I would not mind it so much, if they were being advertised for what they are, not for what they could potentially be, if only another half a trillion dollar were invested in data-centers. It is not going to happen with this technology, the issue is structural, not resource-related.
Really? I just get garbage. Both Claude and CoPilot kept insisting that it was ok to use react hooks outside of function components. There have been many other situations where it gave me some code and even after refining the prompt it just gave me wrong or non working code. I’m not expecting perfection, but at least don’t waste my time with hallucinations or just flat out stuff that doesn’t work.
> And people are like, "Wah, it can't write code like a Senior engineer with 20 years of experience!"
Except this isn't true. The code quality varies dramatically depending on what you're doing, the length of the chat/context, etc. It's an incredible productivity booster, but even earlier today, I wasted time debugging hallucinated code because the LLM mixed up methods in a library.
The problem isn't so much that it's not an amazing technology, it's how it's being sold. The people who stand to benefit are speaking as though they've invented a god and are scaring the crap out of people making them think everyone will be techno-serfs in a few years. That's incredibly careless, especially when as a technical person, you understand how the underlying system works and know, definitively, that these things aren't "intelligent" the way they're being sold.
Like the startups of the 2010s, everyone is rushing, lying, and huffing hopium deluding themselves that we're minutes away from the singularity.
You forget the large group of people that proudly declare they invent AGI and they can make everyone lose jobs and starve. complains are for them, not for you.
Keep in mind it understands nothing. The notion that LLMs understand anything is fundamentally flawed, as they do not demonstrate any markers of understanding
The fact that you don't know what Markov chain means and get angry over others over that pisses me off.
Both are Markov chains, that you used to erroneously think Markov chain is a way to make a chatbot rather than a general mathematical process is on you not them.
not one of them have managed to generate a successful promise based implementation of recaptcha v2 in javascript from scratch https://developers.google.com/recaptcha/docs/loading they have a million+ references for this
Because the marketers oversold it. That is why you are seeing a pushback. I also outright rejected them because 1) they were sold and marketed as end all be all replacements for human thought, and 2) they promised to replace only the parts of my job that I enjoy. Billboards were up in San Francisco telling my "bosses" that I was soon to be replaced, and the loudest and earliest voices told me that the craft I love is dead. Imagine Nascar drivers excitedly discussing how cool it was they wouldn't have to turn left anymore - made me wonder why everyone else was here.
It was, more or less, the same narrative arc as Bitcoin, and was (is) headed for a crash.
That said, I've spent a few weeks with augment, and it is revelatory, certainly. All the marketing - aimed at a suite I have no interest in - managed to convince me it was something it wasn't. It isn't a replacement, any more than a power drill is a replacement for a carpenter.
What it is, is very helpful. "The world's most fully functioning scaffolding script", an upgrade from copilot's "the world's most fully functioning tab-completer". I appreciate it usefulness as a force multiplier, but I am already finding corners and places where I'd just prefer to do it myself. And this is before we get into the craft of it all - I am not excited by the pitch "worse code, faster", but the utility is undeniable in this capitalistic hell planet, and I'm not a huge fan of writing SQL queries anyway, so here we are!
For me, LLMs are a bit like if you were shown a talking dog with the education and knowledge of a first grad student: a talking dog is amazing in itself, and a truly impressive technical feat, that said you wouldn't make the dog file your taxes or represent you in court.
To quote Joel Spolsky, "When you’re working on a really, really good team with great programmers, everybody else’s code, frankly, is bug-infested garbage, and nobody else knows how to ship on time.", and that's the state we end up if we believe in the hype and use LLMs willy-nilly.
That's why people are annoyed, not because LLMs cannot code like a senior engineer, but because lots of content marketing a company valuation is dependent on making people believe it's the case.
I mean. How would you feel if you coded a menu in Python with certain choices but when you used it the choices were never the same or in the same order, sometimes there were fake choices, sometimes they are improperly labelled and sometimes the menu just completely fails to open. And you as a coder and you as a user have absolutely no control over any of those issues. Then, when you go online to complain people say useful stuff like "Isn't it amazing that it does anything at all!? Give us a break, we're working on it bro."
That's how I see LLMs and the hype surrounding them.
a lot of it is just plain denial. a certain subgenre of person will forever attack everything AI does because they feel threatened by it and a certain other subgenre of person will just copy this behaviour and parrot opinions for upvotes/likes/retweets.
I'll keep bringing up this example whenever people dismiss LLMs.
I can ask Claude the most inane programming question and got an answer. If I were to do that on StackOverflow, I'd get downvoted, rude comments, and my question closed for being off-topic. I don't have to be super knowledgeable about the thing I'm asking about with Claude (or any LLM for that matter).
Even if you ignore the rudeness and elitism of power-users of certain platforms, there's no more waiting for someone to respond to your esoteric questions. Even if the LLM spews bullshit, you can ask it clarifying questions or rephrase until you see something that makes sense.
I love LLMs, I don't care what people say. Even when I'm just spitballing ideas[1], the output is great.
For me, I think they're valuable but also overhyped. They're not at the point they're replacing entire dev teams like some articles point out. In addition, they are amazingly accurate sometimes and amazingly misleading other times. I've noticed some ardent advocates ignore the latter.
It's incredibly frustrating when people think they're a miracle tool and blindly copy/paste output without doing any kind of verification. This is especially frustrating when someone who's supposed to be a professional in the field is doing it (copy lasting non working AI generated code and putting it up for review)
That said, on one hand, they multiply productive and useful information. On the other hand, they kill productive and spread misinformation. That said, I still seem them as useful but not a miracle
I blame overpromised expectations from startups and public companies, screaming about AGI and superintelligence.
Truly amazing technology which is very good at generating and correcting texts is marketed as senior developer, talented artist, and black box that has solution to all your problems. This impression shatters on the first blatant mistake, e.g. counting elephant legs: https://news.ycombinator.com/item?id=38766512
It's the classic HN-like anti-anything bubble we see with Javascript frameworks. Hundreds of thousands of people are productive with them and enjoy them. They created entire industries and job fields. The same is happening with LLMs, but the usual counter-culture dev crowd is denying it while it's happening right before their eyes. I too use LLMs every day. I never click and a link and it doesn't exist. When I want to take my mind off of things, I just talk with GPT.
You're being disingenuous. The tweet was talking about asserting the existence of fake articles, claiming that a paper was written in one year while summarizing a paper that explicitly says it was written in another, and severe hallucinations. Nowhere does she even imply that she's looking for superintelligence.
What I find interesting is that my experience has been 100% the opposite. I’ve been using ChatGPT, Claude, and Gemini for almost a year (well only the ChatGPT for a year since the rest are more recent.) I’ve been using them to help build circuits and write code. They are almost always wrong with circuit design, and create code that doesn’t work north of 80% of the time. My patience has dropped off to the point where I only experiment with LLM a few times a week because they are so bad. Yes it is miraculous that we can have a conversation, but it means nothing if the output is always wrong.
But I will admit the dora muckbang feet shit is fucking insane. And that just flat out scares the pants off me.
>They are almost always wrong with circuit design, and create code that doesn’t work north of 80% of the time.
Sorry but this is a total skill issue lol. 80% code failure rate is just total nonsense. I don't think 1% of the code I've gotten from LLMs has failed to execute correctly.
LLMs can't be trusted. They aé like an overconfident idiot who is pretending quite impressively, butif you check on the result it's just a bit too much bullshit in the result. So there's practically zero gain in using LLMs except WHEN you actually need a text that's nice and eloquent bullshit.
Almost everytime I've tried using LLMs I've fallen into thepattern on calling out, correcting and argueing with the LLMs which is of course in itself sillyto do, because they don't learn, they don't really "get it" when they are wrong. There's no benefit to talking to a human.
This is the place where tech shiny meets actual use cases, and users aren’t really good at articulating their problems.
Its also a slow burn issue - you have to use it for a while for what is obvious to users, to become obvious to people who are tech first.
The primary issue is the hype and forecasted capabilities vs actual use cases. People want something they can trust as much as an authority, not as much as a consultant.
If I were to put it in a single sentence? These are primarily narrative tools, being sold as factual /scientific tools.
When this is pointed out, the conversation often shifts to “well people aren’t that great either”. This takes us back to how these tools are positioned and sold. They are being touted as replacements to people in the future. When this claim is pressed, we get to the start of this conversation.
Frankly, people on HN aren’t pessimistic enough about what is coming down the pipe. I’ve started looking at how to work in 0 Truth scenarios, not even 0 trust. This is a view held by everyone I have spoken to in fraud, misinformation, online safety.
There’s a recent paper which showed that GAI tools improved the profitability of Phishing attempts by something like 50x in some categories, and made previously loss making (in $/hour terms) targets, profitable. Schneier was one of the authors.
A few days ago I found out someone I know who works in finance, had been deepfaked and their voice/image used to hawk stock tips. People were coming to their office to sue them.
I love tech, but this is the dystopia part of cyberpunk being built. These are narrative tools, good enough to make people think they are experts..
The thing LLMs are really really good at, is sounding authoritative.
If you ask it random things the output looks amazing, yes. At least at first glance. That's what they do. It's indeed magical, a true marvel that should make you go: Woooow, this is amazing tech: Coming across as convincing, even if based on hallucinations, is in itself a neat trick!
But is it actually useful? The things they come up with are untrustworthy and on the whole far less good than previously available systems. In many ways, insidiously worse: It's much harder to identify bad information than it was before.
It's almost like we designed a system to pass turing tests with flying colours but forgetting that usefulness is what we actually wanted, not authoritative, human sounding bullshit.
I don't think the LLM naysayers are 'unimpressed', or that they demand perfection. I think they are trying to make statements aimed at balancing things:
Both the LLMs themselves, and the humans parroting the hype, are severely overstating the quality of what such systems produce. Hence, and this is a natural phenomenon you can observe in all walks of life, the more skeptical folks tend to swing the pendulum the other way, and thus it may come across to you as them being overly skeptical instead.
I totally agree, and this community far is from the worst. In trans communities there's incredible hostility towards LLMs - even local ones. "You're ripping off artists", "A pissing contest for tech bros", etc.
I'm trans, and I don't disagree that this technology has aspects that are problematic. But for me at least, LLMs have been a massive equalizer in the context of a highly contentious divorce where the reality is that my lawyer will not move a finger to defend me. And he's lawyer #5 - the others were some combination of worse, less empathetic, and more expensive. I have to follow up a query several times to get a minimally helpful answer - it feels like constant friction.
ChatGPT was a total game-changer for me. I told it my ex was using our children to create pressure - feeding it snippets of chat transcripts. ChatGPT suggested this might be indicative of coercive control abuse. It sounded very relevant (my ex even admitted in a rare, candid moment that she feels a need to control everyone around her one time), so I googled the term - essentially all the components were there except physical violence (with two notable exceptions).
Once I figured that out, I asked it to tell me about laws related to controlling relationships - and it suggested laws either directly addressing (in the UK and Australia), and the closest laws in Germany (Nötigung, Nachstellung, violations of dignity, etc., translating them to English - my best language). Once you name specific laws broken and provide a rationale for why there's a Tatbestand (ie the criterion for a violation is fulfilled), your lawyer has no option but to take you more seriously. Otherwise he could face a malpractice suit.
Sadly, even after naming specific law violations and pointing to email and chat evidence, my lawyer persists in dragging his feet - so much so that the last legal letter he sent wasn't drafted by him - it was ChatGPT. I told my lawyer: read, correct, and send to X. All he did was to delete a paragraph and alter one or two words. And the letter worked.
Without ChatGPT, I would be even more helpless and screwed than I am. It's far from clear I will get justice in a German court, but at least ChatGPT gives me hope, a legal strategy. Lastly - and this is a godsend for a victim of coercive control - it doesn't degrade you. Lawyers do. It completely changed the dynamics of my divorce (4 years - still no end in sight, lost my custody rights, then visitation rights, was subjected to confrontational and gaslighting tactics by around a dozen social workers - my ex is a social worker -, and then I literally lost my hair: telogen effluvium, tinea capitis, alopecia areata... if it's stress-related, I've had it), it gave me confidence when confronting my father and brother about their family violence.
It's been the ONLY reliable help, frankly, so much so I'm crying as I write this. For minorities that face discrimination, ChatGPT is literally a lifeline - and that's more true the more vulnerable you are.
I agree. I recently asked if a certain GPU would fit in a certain computer... And it understood that fit could mean physically inside by could also mean that the interface is compatible, and answered both.
TBH, they produce trash results for almost any question I might want to ask them. This is consistently the case. I must use them differently than other people.
LLMs produce midwit answers. If you are an expert in your domain, the results are kind of what you would expect for someone who isn’t an expert. That is occasionally useful but if I wanted a mediocre solution in software I’d use the average library. No LLM I have ever used has delivered an expert answer in software. And that is where all the value is.
I worked in AI for a long time, I like the idea. But LLMs are seemingly incapable of replacing anything of value currently.
The elephant in the room is that there is no training data for the valuable skills. If you have to rely on training data to be useful, LLMs will be of limited use.
Here’s when we can start getting excited about LLMs: when they start making new and valid scientific discoveries that can radically change our world.
When an AI can say “Here’s how you make better, smaller, more powerful batteries, follow these plans”, then we will have a reason to worship AI.
When AI can bring us wonders like room temperature semiconductors, fast interstellar travel, anti-gravity tech, solutions to world hunger and energy consumption, then it will have fulfilled the promise of what AI could do for humanity.
Until then, LLMs are just fancy search and natural language processors. Puppets with strings. It’s about as impressive as Google was when it first came out.
My experience (almost exclusively Claude), has just been so different that I don't know what to say. Some of the examples are the kinds of things I explicitly wouldn't expect LLMs to be particularly good at so I wouldn't use them for, and others, she says that it just doesn't work for her, and that experience is just so different than mine that I don't know how to respond.
I think that there are two kinds of people who use AI: people who are looking for the ways in which AIs fail (of which there are still many) and people who are looking for the ways in which AIs succeed (of which there are also many).
A lot of what I do is relatively simple one off scripting. Code that doesn't need to deal with edge cases, won't be widely deployed, and whose outputs are very quickly and easily verifiable.
LLMs are almost perfect for this. It's generally faster than me looking up syntax/documentation, when it's wrong it's easy to tell and correct.
Look for the ways that AI works, and it can be a powerful tool. Try and figure out where it still fails, and you will see nothing but hype and hot air.
Not every use case is like this, but there are many.
-edit- Also, when she says "none of my students has ever invented references that just don't exist"...all I can say is "press X to doubt"
> Look for the ways that AI works, and it can be a powerful tool. Try and figure out where it still fails, and you will see nothing but hype and hot air. Not every use case is like this, but there are many.
The problem is that I feel I am constantly being bombarded by people bullish on AI saying "look how great this is" but when I try to do the exact same things they are doing, it doesn't work very well for me
Of course I am skeptical of positive claims as a result.
I don't know what you are doing or why it's failed. Maybe my primary use cases really are in the top whatever percentile for AI usefulness, but it doesn't feel like it. All I know is that frontier models have already been good enough for more than a year to increase my productivity by a fair bit.
I literally had a developer of an open source package I’m working with tell me “yeah that’s a known problem, I gave up on trying to fix it. You should just ask ChatGPT to fix it, I bet it will immediately know the answer.”
Annoying response of course. But I’d never used an LLM to debug before, so I figured I’d give it a try.
First: it regurgitated a bunch of documentation and basic debugging tips, which might have actually been helpful if I had just encountered this problem and had put no thought into debugging it yet. In reality, I had already spent hours on the problem. So not helpful
Second: I provided some further info on environment variables I thought might be the problem. It latched on to that. “Yes that’s your problem! These environment variables are (causing the problem) because (reasons that don’t make sense). Delete them and that should fix things.” I deleted them. It changed nothing.
Third: It hallucinated a magic numpy function that would solve my problem. I informed it this function did not exist, and it wrote me a flowery apology.
Clearly AI coding works great for some people, but this was purely an infuriating distraction. Not only did it not solve my problem, it wasted my time and energy, and threw tons of useless and irrelevant information at me. Bad experience.
I see people say, "Look how great this is," and show me an example, and the example they show me is just not great. We're literally looking at the same thing, and they're excited that this LLM can do a college grads's job to the level of a third grader, and I'm just not excited about that.
What changed my point of view regarding LLMs was when I realized how crucial context is in increasing output quality.
Treat the AI as a freelancer working on your project. How would you ask a freelancer to create a Kanban system for you? By simply asking "Create a Kanban system", or by providing them a 2-3 pages document describing features, guidelines, restrictions, requirements, dependencies, design ethos, etc?
Which approach will get you closer to your objective?
The same applies to LLM (when it comes to code generation). When well instructed, it can quickly generate a lot of working code, and apply the necessary fixes/changes you request inside that same context window.
It still can't generate senior-level code, but it saves hours when doing grunt work or prototyping ideas.
"Oh, but the code isn't perfect".
Nor is the code of the average jr dev, but their codes still make it to production in thousands of companies around the world.
They're sophisticated tools at much as any other software.
About 2 weeks ago I started on a streaming markdown parser for the terminal because none really existed. I've switched to human coding now but the first version was basically all llm prompting and a bunch of the code is still llm generated (maybe 80%). It's a parser, those are hard. There's stacks, states, lookaheads, look behinds, feature flags, color spaces, support for things like links and syntax highlighting... all forward streaming. Not easy
> LLMs are almost perfect for this. It's generally faster than me looking up syntax/documentation, when it's wrong it's easy to tell and correct.
Exactly this.
I once had a function that would generate several .csv reports. I wanted these reports to then be uploaded to s3://my_bucket/reports/{timestamp}/.csv
I asked ChatGPT "Write a function that moves all .csv files in the current directory to and old_reports directory, calls a create_reports function, then uploads all the csv files in the current directory to s3://my_bucket/reports/{timestamp}/
.csv with the timestamp in YYYY-MM-DD format""
And it created the code perfectly. I knew what the correct code would look like, I just couldn't be fucked to look up the exact calls to boto3, whether moving files was os.move or os.rename or something from shutil, and the exact way to format a datetime object.
It created the code far faster that I would have.
Like, I certainly wouldn't use it to write a whole app, or even a whole class, but individual blocks like this, it's great.
I have been saying this about llms for a while - if you know what you want, how to ask for it, and what the correct output will look like, LLMs are fantastic (at least Claude Sonnet is). And I mean that seriously, they are a highly effective tool for productive development for senior developers.
I use it to produce whole classes, large sql queries, terraform scripts, etc etc. I then look over that output, iterate on it, adjust it to my needs. It's never exactly right at first, but that's fine - neither is code I write from scratch. It's still a massive time saver.
I've had so many cases exactly like your example here. If you build up an intuition that knows that e.g. Claude 3.7 Sonnet can write code that uses boto3, and boto3 hasn't had any breaking changes that would affect S3 usage in the past ~24 months, you can jump straight into a prompt for this kind of task.
It doesn't just save me a ton of time, it results in me building automations that I normally wouldn't have taken on at all because the time spent fiddling with os.move/boto3/etc wouldn't have been worthwhile compared to other things on my plate.
I think you have an interesting point of view and I enjoy reading your comments, but it sounds a little absurd and circular to discount people's negativity about LLMs simply because it's their fault for using an LLM for something it's not good at. I don't believe in the strawman characterization of people giving LLMs incredibly complex problems and being unreasonably judgemental about the unsatisfactory results. I work with LLMs every day. Companies pay me good money to implement reliable solutions that use these models and it's a struggle. Currently I'm working with Claude 3.5 to analyze customer support chats. Just as many times as it makes impressive, nuanced judgments it fails to correctly make simple trivial judgements. Just as many times as it follows my prompt to a tee, it also forgets or ignores important parts of my prompt. So the problem for me is it's incredibly difficult to know when it'll succeed and when it'll fail for a given input. Am I unreasonable for having these frustrations? Am I unreasonable for doubting the efficacy of LLMs to address problems that many believe are already solved? Can you understand my frustration to see people characterize me as such because ChatGPT made a really cool image for them once?
It's a weird circle with these things. If you _can't_ do the task you are using the LLM for, you probably shouldn't.
But if you can do the task well enough to at least recognize likely-to-be-correct output, then you can get a lot done in less time than you would do it without their assistance.
Is that worth the second order effects we're seeing? I'm not convinced, but it's definitely changed the way we do work.
I think this points to much of the disagreement over LLMs. They can be great at one-off scripts and other similar tasks like prototypes. Some folks who do a lot of that kind of work find the tools genuinely amazing. Other software engineers do almost none of that and instead spend their coding time immersed in large messy code bases, with convoluted business logic. Looping an LLM into that kind of work can easily be net negative.
Maybe they are just lazy around tooling. Cursor with Claude works well for project sizes much larger than I expected but it takes a little set up. There is a chasm between engineers who use tools well and who do not.
I'm tired of people bashing LLMs. AI is so useful in my daily work that I can't understand where these people are coming from. Well, whatever...
As you said, examples where I wouldn't expect LLMs to be good at from people who dismiss the scenarios where LLMs are great at. I don't want to convince anyone, to be honest - I just want to say they are incredibly useful for me and a huge time saver. If people don't want to use LLMs, it's fine for me as I'll have an edge over them in the market. Thanks for the cash, I guess.
every time someone brings up "Code that doesn't need to deal with edge cases" I like to point at that such code is not likely to be used for anything that matters
Oh, but it is. I can have code that does something nice to have, needs not to be 100% correct etc. For example, I want a background for my playful webpage. Maybe a WebGL shader. It might not be exactly what I asked for, but I can have it in few minutes up and running. Or some non-critical internal tools - like scraper for lunch menus from restaurants around office. Or simple parking spot sharing app. Or any kind of prototypes which in some companies are being created all the time. There are so many use cases that are forgiving regarding correctness and are much more sensitive to development effort.
I’m always amazed in these discussions how many people apparently have jobs doing a bunch of stuff that either doesn’t need to be correct or is simple enough that it doesn’t require any significant amount of external context.
Automating the easy 80% sounds useful, but in practice I'm not convinced that's all that helpful. Reading and putting together code you didn't write is hard enough to begin with.
I write code like that all the time. It's used for very specific use cases, only by myself or something I've also written. It's not exposed to random end users or inputs.
> Also, when she says "none of my students has ever invented references that just don't exist"...all I can say is "press X to doubt"
I’ve never seen it from my students. Why do you think this? It’s trivial to pick a real book/article. No student is generating fake material whole cloth and fake references to match. Even if they could, why would they risk it?
Look for the ways that AI works, and it can be a powerful tool. Try and figure out where it still fails, and you will see nothing but hype and hot air.
Perfectly put, IMO.
I know arguments from authority aren't primary, but I think this point highlights some important context: Dr. Hossenfelder has gained international renown by publishing clickbait-y YouTube videos that ostensibly debunk scientific and technological advances of all kinds. She's clearly educated and thoughtful (not to mention otherwise gainfully employed), but her whole public persona kinda relies on assuming the exclusively-critical standpoint you mention.
I doubt she necessarily feels indebted to her large audience expecting this take (it's not new...), but that certainly does seem like a hard cognitive habit to break.
More often than not, when I inquire deeper, I often find their prompting isn't very good at all.
"Garbage in, garbage out" as the law says.
Of course, it took a lot of trial and error for me to get to my current level of effectiveness with LLMs. It's probably our responsibility to teach these who are willing.
It seems hard to be bullish on LLMs as a generally useful tool if the solution to problems people have is "use trial and error to improve how you write your prompts, no, it's not obvious how to do so, yes, it depends heavily on the exact model you use."
Agreed. If one compares ChatGPT to, say, the Cline IDE plugin backed by Claude 3.7, they might well be blown away by how far behind ChatGPT seems. A lot of the difference has to do with prompting, for sure -- Cline helps there by generating prompts from your IDE and project context automatically.
Every once in a while I send a query off to ChatGPT and I'm often disappointed and jam on the "this was hallucinated" feedback button (or whatever it is called). I have better luck with Claude's chat interface but nowhere near the quality of response that I get with Cline driving.
I want to sit next to you and stop you every time you use your LLM and say, “Let me just carefully check this output.” I bet you wouldn’t like that. But when I want to do high quality work, I MUST take that time and carefully review and test.
What I am seeing is fanboys who offer me examples of things working well that fail any close scrutiny— with the occasional example that comes out actually working well.
I agree that for prototyping unimportant code LLMs do work well. I definitely get to unimportant point B from point A much more quickly when trying to write something unfamiliar.
What's also scary is that we know LLMs do fail, but nobody (even the people who wrote the LLM) can tell you how often it will fail at any particular task. Not even an order of magnitude. Will it fail 0.2%, 2%, or 20% of the time? Nobody knows! A computer that will randomly produce an incorrect result to my calculation is useless to me because now I have to separately validate the correctness of every result. If I need to ask an LLM to explain to me some fact, how do I know if this time it's hallucinating? There is no "LLM just guessed" flag in the output. It might seem to people to be "miraculous" that it will summarize a random scientific paper down to 5 bullet points, but how do you know if it's output is correct? No LLM proponent seems to want to answer this question.
I think it’s very odd that you think that people using LLMs regularly aren’t carefully checking the outputs. Why do you think that people using LLMs don’t care about their work?
> invented references that just don't exist"...all I can say is "press X to doubt
This doesn’t include lying and cheating which LLMs can’t.
On the other hand AI is used to solve problems that are already solved. I just recently got an ad about a software for process modeling where one claim was you don’t need always to start from the ground up but can say the AI give me the customer order process to start from that point. That is basically what templates are for with much less energy consumption.
I've noticed there seems to be a gatekeeping archetype that operates as a hard cynic to nearly everything, so that when they finally judge something positively they get heaps of attention.
It doesn't always correlate with narcissism, but it happens much more than chance.
>A lot of what I do is relatively simple one off scripting. Code that doesn't need to deal with edge cases, won't be widely deployed, and whose outputs are very quickly and easily verifiable.
Yes somewhat. Its good for powershell/bash/cmd scripts and configs, but early models it would hallucinate PowerShell cmdlets especially.
One thing I think is clear is society is now using a lot of words to describe things when the words being used are completely devoid of the necessary context. It's like calling a powder you've added to water "juice" and also freshly-squeezed fruit just picked perfectly ripe off a tree "juice". A word stretched like that becomes nearly devoid of meaning.
"I write code all day with LLMs, it's amazing!" is in the exact same category. The code you (general you, I'm not picking on you in particular) write using LLMs, and the code I write apart from LLMs: they are not the same. They are categorically different artifacts.
all fun and games until your AI generated script deletes the production database. I think that's the point, fault tolerance in academic and financial settings is too high for LLMs to be useful
The point is that given the current valuations, being good at a bunch of narrow use cases is just not good enough. It needs to be able to replace humans in every role where the primary output is text or speech to meet expectations.
I don't think that "replacing humans in every role" is the line for "being bullish on AI models". I think they could stop development exactly where they are, and they would still make pretty dramatic improvements to productivity in a lot of places. For me at least, their value already exceeds the $20/month I'm paying, and I'm pretty sure that way more than covers inference costs.
The most interesting thing about this post is how it reinforces how terrible the usability of LLMs still is today:
"I ask them to give me a source for an alleged quote, I click on the link, it returns a 404 error. I Google for the alleged quote, it doesn't exist. They reference a scientific publication, I look it up, it doesn't exist."
To experienced LLM users that's not surprising at all - providing citations, sources for quotes, useful URLs are all things that they are demonstrably terrible at.
But it's a computer! Telling people "this advanced computer system cannot reliably look up facts" goes against everything computers have been good at for the last 40+ years.
One of the things that’s hard about these discussions is that behind them is an obscene amount of money and hype. She’s not responding to realists like you. She’s responding to the bulls. The people saying these tools will be able to run the world by the end of this year, maybe next.
And that’s honestly unfair to you since you do awesome realistic and level headed work with LLM.
But I think it’s important when having discussions to understand the context within which they are occurring.
Without the bulls she might very well be saying what you are in your last paragraph. But because of the bulls the conversation becomes this insane stratified nonsense.
Possibly a reaction to Bill Gates recent statements that it will begin replacing doctors and teachers. It's ridiculous to say LLMs are incredibly useful and valuable. It's highly dubious to think they can be trusted with actual critical tasks without careful supervision.
This isn't really a problem in tool-assisted LLMs.
Use google AI studio with search grounding. Provides correct links and citations every time. Other companies have similar search modes, but you have to enable those settings if you want good results.
You're using them wrong. Everyone is though I can't fault you specifically. Chatbot is like the worst possible application of these technologies.
Of late, deaf tech forums are taken over by language model debates over which works best for speech transcription. (Multimodal language models are the the state of the art in machine transcription. Everyone seems to forget that when complaining they can't cite sources for scientific papers yet.) The debates are sort of to the point that it's become annoying how it has taken over so much space just like it has here on HN.
But then I remember, oh yeah, there was no such thing as live machine transcription ten years ago. And now there is. And it's going to continue to get better. It's already good enough to be very useful in many situations. I have elsewhere complained about the faults of AI models for machine transcription - in particular when they make mistakes they tend to hallucinate something that is superficially grammatical and coherent instead - but for a single phrase in an audio transcription sporadically that's sometimes tolerable. In many cases you still want a human transcriber but the cost of that means that the amount of transcription needed can never be satisfied.
It's a revolutionary technology. I think in a few years I'm going have glasses that continuously narrate the sounds around me and transcribe speech and it's going to be so good I can probably "pass" as a hearing person in some contexts. It's hard not to get a bit giddy and carried away sometimes.
> You're using them wrong. Everyone is though I can't fault you specifically.
If everyone is using them wrong, I would argue that says something more about them than the users. Chat-based interfaces are the thing that kicked LLMs into the mainstream consciousness and started the cycle/trajectory we’re on now. If this is the wrong use case, everything the author said is still true.
There are still applications made better by LLMs, but they are a far cry from AGI/ASI in terms of being all-knowing problem solvers that don’t make mistakes. Language tasks like transcription and translation are valuable, but by no stretch do they account for the billions of dollars of spend on these platforms, I would argue.
LLM providers actually have an incentive not to write literature on how to use LLM optimally, as that causes friction which means less engagement/money spent on the provider. There's also the typical tin-foil hat explanation of "it's bad so you'll keep retrying it to get the LLM to work which means more money for us."
Isn't this more a product of the hype though? At worst you're describing a product marketing mistake, not some fundamental shortcoming of the tech. As you say "chat" isn't a use case, it's a language-based interface. The use case is language prediction, not an encyclopedic storage and recall of facts and specific quotes. If you are trying to get specific facts out of an LLM, you'd better be using it as an interface that accesses some other persistent knowledge store, which has been incorporated into all the major 'chat' products by now.
Surely you're not saying everyone is using them wrong. Let's say only 99% of them are using LLMs wrong, and the remaining 1% creates $100B of economic value. That's $100B of upside.
Yes the costs of training AI models these days are really high too, but now we're just making a quantitative argument, not a qualitative one.
The fact that we've discovered a near-magical tech that everyone wants to experiment with in various contexts, is evidence that the tech is probably going somewhere.
Historically speaking, I don't think any scientific invention or technology has been adopted and experimented with so quickly and on such a massive scale as LLMs.
It's crazy that people like you dismiss the tech simply because people want to experiment with it. It's like some of you are against scientific experimentation for some reason.
I think all the technology is already in place. There are already smart glasses with tiny text displays. Also smartphones have more than enough processing capacity to handle live speech transcription.
Thru the 90s and 00s and well into the 10s I generally dismissed speech recognition as useless to me, personally.
I have a minor speech impediment because of the hearing loss. They never worked for me very well. I don't speak like a standard American - I have a regional accent and I have a speech impediment. Modern speech recognition doesn't seem to have a problem with that anymore.
IBM's ViaVoice from 1997 in particular was a major step. It was really impressive in a lot of ways but the accuracy rate was like 90 - 95% which in practice means editing major errors with almost every sentence. And that was for people who could speak clearly. It never worked for me very well.
You also needed to speak in an unnatural way [pause] comma [pause] and it would not be fair to say that it transcribed truly natural speech [pause] full stop
Such voice recognition systems before about 2016 also required training on the specific speaker. You would read many pages of text to the recognition engine to tune it to you specifically.
It could not just be pointed at the soundtrack to an old 1980s TV show then produce a time-sync'd set of captions accurate enough to enjoy the show. But that can be done now.
I become more and more convinced with each of these tweets/blogs/threads that using LLMs well is a skill set akin to using Search well.
It’s been a common mantra - at least in my bubble of technologists - that a good majority of the software engineering skill set is knowing how to search well. Knowing when search is the right tool, how to format a query, how to peruse the results and find the useful ones, what results indicate a bad query you should adjust… these all sort of become second nature the longer you’ve been using Search, but I also have noticed them as an obvious difference between people that are tech-adept vs not.
LLMs seems to have a very similar usability pattern. They’re not always the right tool, and are crippled by bad prompting. Even with good prompting, you need to know how to notice good results vs bad, how to cherry-pick and refine the useful bits, and have a sense for when to start over with a fresh prompt. And none of this is really _hard_ - just like Search, none of us need to go take a course on prompting - IMO folks jusr need to engage with LLMs as a non-perfect tool they are learning how to wield.
The fact that we have to learn a tool doesn’t make it a bad one. The fact that a tool doesn’t always get it 100% on the first try doesn’t make it useless. I strip a lot of screws with my screwdriver, but I don’t blame the screwdriver.
I don't know if she is a fraud, but she has definitely greatly amplified Rage Bait Farming and talking about things that are far outside of her domain of expertise as if she were an expert.
In no way am I credentialing her, lots of people can make astute observations about things they weren't trained in, but she both mastered sounding authoritative and at the same time, presenting things to go the most engagement possible.
Thanks for sharing this. I was heavily involved in graduate physics when I was in school, and was very worried about what direction shed take after the first big viral vid "telling her story." I wasn't sure it was well understood, or even understood at all, how blinkered her...viewpoint?...was.
LLMs function as a new kind of search engine, one that is amazingly useful because it can surface things that traditional search could never dream of. Don't know the name of a concept, just describe it vaguely and the LLM will pull out the term. Are you not sure what kind of information even goes into a cover letter or what's customary to talk about? Ask an LLM to write you one, it will be bland and generic sure but that's not the point because you now know the "shape" of what they're supposed to look like and that's great for getting unblocked. Have you stumbled across a passage of text that's almost English but you're not really sure what to look up to decipher it? Paste it into the LLM and it will tell you that it's "Early Modern English" which you can look up to confirm and get a dictionary for.
Broader than that, it’s critical thinking skills. Using search and LLMs requires analyzing the results and being able to separate what is accurate and useful from what isn’t.
From my experience this is less an application of critical skills and more a domain knowledge check. If you know enough about the subject to have accumulated heuristics for correctness and intuition for "lgtm" in the specific context, then it's not very difficult or intellectually demanding to apply them.
If you don't have that experience in this domain, you will spend approximately as much effort validating output as you would have creating it yourself, but the process is less demanding of your critical skills.
I'm not so sure about that. I was really Anti llm in the previous generation of LLMs (GPT3.5/4) but never stopped trying them out. I just found the results to be subpar.
Since reasoning models came about I've been significantly more bullish on them purely because they are less bad. They are still not amazing but they are at a poiny where I feel like including them in my workflow isn't an impediment.
They can now reliably complete a subset of tasks without me needing to rewrite large chunks of it myself.
They are still pretty terrible at edge cases ( uncommon patterns / libraries etc ) but when on the beaten path they can actually pretty decently improve productivity. I still don't think 10x ( well today was the first time I felt a 10x improvement but I was moving frontend code from a custom framework to react, more tedium than anything else in that and the AI did a spectacular job ).
If there's one common thread across LLM criticisms, it's that they're not perfect.
These critics don't seem to have learned the lesson that the perfect is the enemy of the good.
I use ChatGPT all the time for academic research. Does it fabricate references? Absolutely, maybe about a third of the time. But has it pointed me to important research papers I might never have found otherwise? Absolutely.
The rate of inaccuracies and falsehoods doesn't matter. What matters is, is it saving you time and increasing your productivity. Verifying the accuracy of its statements is easy. While finding the knowledge it spits out in the first place is hard. The net balance is a huge positive.
People are bullish on LLM's because they can save you days' worth of work, like every day. My research productivity has gone way up with ChatGPT -- asking it to explain ideas, related concepts, relevant papers, and so forth. It's amazing.
> Verifying the accuracy of its statements is easy.
For single statements, sometimes, but not always. For all of the many statements, no. Having the human attention and discipline to mindfully verify every single one without fail? Impossible.
Every software product/process that assumes the user has superhuman vigilance is doomed to fail badly.
> Automation centaurs are great: they relieve humans of drudgework and let them focus on the creative and satisfying parts of their jobs. That's how AI-assisted coding is pitched [...]
> But a hallucinating AI is a terrible co-pilot. It's just good enough to get the job done much of the time, but it also sneakily inserts booby-traps that are statistically guaranteed to look as plausible as the good code (that's what a next-word-guessing program does: guesses the statistically most likely word).
> This turns AI-"assisted" coders into reverse centaurs. The AI can churn out code at superhuman speed, and you, the human in the loop, must maintain perfect vigilance and attention as you review that code, spotting the cleverly disguised hooks for malicious code that the AI can't be prevented from inserting into its code. As qntm writes, "code review [is] difficult relative to writing new code":
> Having the human attention and discipline to mindfully verify every single one without fail? Impossible.
I mean, how do you live life?
The people you talk to in your life say factually wrong things all the time.
How do you deal with it?
With common sense, a decent bullshit detector, and a healthy level of skepticism.
LLM's aren't calculators. You're not supposed to rely on them to give perfect answers. That would be crazy.
And I don't need to verify "every single statement". I just need to verify whichever part I need to use for something else. I can run the code it produces to see if it works. I can look up the reference to see if it exists. I can Google the particular fact to see if it's real. It's really very little effort. And the verification is orders of magnitude easier and faster than coming up with the information in the first place. Which is what makes LLM's so incredibly helpful.
> Does it fabricate references? Absolutely, maybe about a third of the time
And you don't have concerns about that? What kind of damage is that doing to our society, long term, if we have a system that _everyone_ uses and it's just accepted that a third of the time it is just making shit up?
No, I don't. Because I know it does and it's incredibly easy to type something into Google Scholar and see if a reference exists.
Like, I can ask a friend and they'll mistakenly make up a reference. "Yeah, didn't so-and-so write a paper on that? Oh they didn't? Oh never mind, I must have been thinking of something else." Does that mean I should never ask my friend about anything ever again?
Nobody should be using these as sources of infallible truth. That's a bonkers attitude. We should be using them as insanely knowledgeable tutors who are sometimes wrong. Ask and then verify.
> And you don't have concerns about that? What kind of damage is that doing to our society, long term, if we have a system that _everyone_ uses and it's just accepted that a third of the time it is just making shit up?
Main problem with our society is that two thirds of what _everyone_ says is made up shit / motivated reasoning. The random errors LLMs make are relatively benign, because there is no motivation behind them. They are just noise. Look through them.
Could it end up being a net benefit? will the realistic sounding but incorrect facts generated by A.I. make people engage with arguments more critically, and be less likely to believe random statements they're given?
Now, I don't know, or even think it is likely that this will happen, but I find it an interesting thought experiment.
That's hilarious; I had no idea it was that bad. And for every conscientious researcher who actually runs down all the references to separate the 2/3 good from the 1/3 bad, how many will just paste them in, adding to the already sky-high pile of garbage out there?
LLMs will spit out responses with zero backing with 100% conviction. People see citations and assume it's correct. We're conditioned for it thanks to....everything ever in history. Rarely do I need to check a wikipedia entry's source.
So why do people not understand that: this is absolutely going to pour jet fuel on misinformation in the world. And we as a society are allowed to hold a bar higher for what we'll accept get shoved down our throats by corporate overlords that want their VC payout.
I think many people are just not really good at dealing with "imperfect" tools. Different tools can have different success probability, let's call that probability p here. People typically use tool that have p=100%, or at least very close to it. But LLM is a tool that is far from that, so making use of it takes different approach.
Imagine there is an probabilistic oracle that can answer any question with a yes/no with success probability p. If p=100% or p=0% then it is apparently very useful. If p=50% then it is absolutely worthless. In other cases, such oracle can be utilized in different way to get the answer we want, and it is still a useful thing.
One of the magic things about engineering is that I can make usefulness out of unreliability. Voltage can fluctuate and I can transmit 1s and 0s, lines can fizz, machines can die, and I can reliably send video from one end to the other.
Unreliability is something we live in. It is the world. Controlling error, increasing signal over noise, extracting energy from the fluctuations. This is life, man. This is what we are.
I can use LLMs very effectively. I can use search engines very effectively. I can use computers.
Many others can’t. Imagine the sheer fortune to be born in the era where I was meant to be: tools transformative and powerful in my hands; useless in others’.
Your point reminded me of Terrence Tao’s point that AI has a “plausibility problem”. When it can’t be accurate, it still disguises itself as accurate.
Its true success rate is by no means 100%, and sometimes is 0%, but it always tries to make you feel confident.
I’ve had to catch myself surrendering too much judgment to it. I worry a high school kid learning to write will have fewer qualms surrendering judgment
A scientific instrument that is unreliably accurate is useless. Imagine a kitchen scale that always gave +/- 50% every 3rd time you use it. Or maybe 5th time. Or 2nd.
So we're trying to use tools like this currently to help solve deeper problems and they aren't up to the task. This is still the point we need to start over and get better tools. Sharpening a bronze knife will never be as sharp or have the continuity as a steel knife. Same basic elements, very different material.
A bad analogy doesn't make a good argument. The best analogy for LLMs is probably a librarian on LSD in a giant library. They will point you in a direction if you have a question. Sometimes they will pull up the exact page you need, sometimes they will lead you somewhere completely wrong and confidently hand you a fantasy novel, trying to convince you it's a real science book.
It's completely up to your ability to both find what you need without them and verify the information they give you to evaluate their usefulness. If you put that on a matrix, this makes them useful in the quadrant of information that is both hard to find, but very easy to verify. Which at least in my daily work is a reasonable amount.
I think people confuse the power of the technology with the very real bubble we’re living in.
There’s no question that we’re in a bubble which will eventually subside, probably in a “dot com” bust kind of way.
But let me tell you…last month I sent several hundred million requests to AI, as a single developer, and got exactly what I needed.
Three things are happening at once in this industry…
(1) executives are over promising a literal unicorn with AGI, that is totally unnecessary for the ongoing viability of LLM’s and is pumping the bubble.
(2) the technology is improving and delivery costs are changing as we figure out what works and who will pay.
(3) the industry’s instincts are developing, so it’s common for people to think “AI” can do something it absolutely cannot do today.
But again…as one guy, for a few thousand dollars, I sent hundreds of millions of requests to AI that are generating a lot of value for me and my team.
Our instincts have a long way to go before we’ve collectively internalized the fact that one person can do that.
That's exactly what happened – I called the OpenAI API, using custom application code running on a server, a few hundred million times.
It is trivial for a server to send/receive 150 requests per second to the API.
This is what I mean by instincts...we're used to thinking of developers-pressing-keys as a fundamental bottleneck, and it still is to a point. But as soon as the tracks are laid for the AI to "work", things go from speed-of-human-thought to speed-of-light.
A lot of people are feeding all the email and slack messages for entire companies through AI to classify sentiment (positive, negative, neutral etc), or summarize it for natural language search using a specific dictionary. You can process each message multiple ways for all sorts of things, or classify images. There's a lot of uses for the smaller cheaper faster llms
And people just sit around, unimpressed, and complain that ... what ... it isn't a perfect superintelligence that understands everything perfectly? This is the most amazing technology I've experienced as a 50+ year old nerd that has been sitting deep in tech for basically my whole life. This is the stuff of science fiction, and while there totally are limitations, the speed at which it is progressing is insane. And people are like, "Wah, it can't write code like a Senior engineer with 20 years of experience!"
Crazy.
And this Tweeter's complaints do not sound like a demand for superintelligence. They sound like a demand for something far more basic than the hype has been promising for years now. - "They continue to fabricate links, references, and quotes, like they did from day one." - "I ask them to give me a source for an alleged quote, I click on the link, it returns a 404 error." (Why have these companies not manually engineered out a problem like this by now? Just do a check to make sure links are real. That's pretty unimpressive to me.) - "They reference a scientific publication, I look it up, it doesn't exist." - "I have tried Gemini, and actually it was even worse in that it frequently refuses to even search for a source and instead gives me instructions for how to do it myself." - "I also use them for quick estimates for orders of magnitude and they get them wrong all the time. " - "Yesterday I uploaded a paper to GPT to ask it to write a summary and it told me the paper is from 2023, when the header of the PDF clearly says it's from 2025. "
Why have these companies not manually engineered out a problem like this by now? Just do a check to make sure links are real. That's pretty unimpressive to me.
There are no fabricated links, references, or quotes, in OpenAI's GPT 4.5 + Deep Research.
It's unfortunate the cost of a Deep Research bespoke white paper is so high. That mode is phenomenal for pre-work domain research. You get an analyst's two week writeup in under 20 minutes, for the low cost of $200/month (though I've seen estimates that white paper cost OpenAI over USD 3000 to produce for you, which explains the monthly limits).
You still need to be a domain expert to make use of this, just as you need to be to make use of an analyst. Both the analyst and Deep Research can generate flawed writeups with similar misunderstandings: mis-synthesizing, misapplication, or missing inclusion of some essential.
Neither analyst nor LLM is a substitute for mastery.
Deleted Comment
Dead Comment
Hosted and free or subscription-based DeepResearch like tools that integrate LLMs with search functionality (the whole domain of "RAG" or "Retrieval Augmented Generation") will be elementary for a long time yet simply because the cost of the average query starts to go up exponentially and there isn't that much money in it yet. Many people have and will continue to build their own research tools where they can determine how much compute time and API access cost they're willing to spend on a given query. OCR remains a hard problem, let alone appropriately chunking potentially hundreds of long documents into context length and synthesizing the outputs of potentially thousands of LLM outputs into a single response.
OpenAI did similar things by focusing to the point of absurdity on 'safety' for what was basically a natural language search engine that has a habit of inventing nonsensical stuff. But on that same note (and also as you alluded to) - I do agree that LLMs have a lot of use as natural language search engines in spite of their proclivity to hallucinate. Being able to describe a e.g. function call (or some esoteric piece of history) by description and then often get the precise term/event that I'm looking for is just incredibly useful.
But LLMs obviously are not sentient, are not setting us on the path to AGI, or any other such nonsense. They're arguably what search engines should have been 10 or 15 years ago, but anti-competitive monopolization of the industry meant that search engine technology progress basically stalled out, if not regressed for the sake of ads (and individual 'entrepreneurs' becoming better at SEO), about the time Google fully established itself.
I presume you are referring to this Google engineer, who was sacked for making the claim. Hardly an example of AI companies overhyping the tech; precisely the opposite, in fact. https://www.bbc.co.uk/news/technology-62275326
It seems to be a common human hallucination to imagine that large organisations are conspiring against us.
That's not what happened. Google stomped hard on Lemoine, saying clearly that he was wrong about LaMDA being sentient ... and then they fired him for leaking the transcripts.
Your whole argument here is based on false information and faulty logic.
The focus on safety, and the concept of "AI", preexisted the product. An LLM was just the thing they eventually made; it wasn't the thing they were hoping to make. They applied their existing beliefs to it anyway.
No, first time I hear about it. I guess the secret to happiness is not following leaks. I had very low expectations before trying LLMs and I’m extremely impressed now.
Did many people overhype LLMs? Yes, like with everything else (transhumanist ideas, quantum physics). It helps being more picky who one listens to, and whether they're just painting pretty pictures with words, or actually have something resembling a rational argument in there.
We are now at Artificial SUPER Intelligence.
I’m waiting for Artificial Pro Max Super Duper Intelligence.
For some tasks they're still next to useless, and people who do those tasks understandably don't get the hype.
Tell a lab biologist or chemist to use an LLM to help them with their work and they'll get very little useful out of it.
Ask an attorney to use it and it's going to miss things that are blindingly obvious to the attorney.
Ask a professional researcher to use it and it won't come up with good sources.
For me, I've had a lot of those really frustrating experiences where I'm having difficulty on a topic and it gives me utter incorrect junk because there just isn't a lot already published about that data.
I've fed it tricky programming tasks and gotten back code that doesn't work, and that I can't debug because I have no idea what it's trying to do, or I'm not familiar with the libraries it used.
But truthfully 90% of work related programming is not problem solving, it's implementing business logic. And dealing with poor, ever changing customer specs. Which an llm will not help with.
If you think "it can't quite do what I need, I'll wait a little longer until it can" you may still be waiting 50 years from now.
For many industries/people work is a means to earn, not something to be passionate in for its own sake. Its a means to provide for other things in life you are actually passionate about (e.g. family, lifestyle, etc). In the end AI may get your job eventually but if it gets you much later vs other industries/domains you win from a capital perspective as other goods get cheaper and you still command your pre-AI scarcity premium. This makes it easier for them to acquire more assets from the early disrupted industries and shield them from eventual AI taking over.
I'm seeing this directly in software. Less new frameworks/libraries/etc outside the AI domain being published IMO, more apprehension from companies to open source their work and/or expose what they do, etc. Attracting talent is also no longer as strong of a reason to showcase what you do to prospective employees - economic conditions and/or AI make that less necessary as well.
As with all LLM usage right now, it's a tool and not fit for every purpose. But it has legit uses for some attorney tasks.
This is because programmers talk on the forums that programmers scrape to get data to train the models.
Dead Comment
The technology is indeed amazing and very amusing, but like all the good things in the hands of corporate overlords, it will be slowly turning into profit-milking abomination.
This is your interpretation of what these companies are saying. I'd love to see if some company specifically anything like that?
Out of the last 100 years how many inventions have been made that could make any human awe like llms do right now? How many things from today when brought back into 2010 would make the person using it make it feel like they're being tricked or pranked? We already take them for granted even thought they've only been around for less than half of a decade.
LLMs aren't a catch all solution to the world's problems; or something that is going to help us in every facet of our lives; or an accelerator for every industry that exists out there. But at no point in history could you talk to your phone about general topics, get information, practice language skills, build an assistant that teaches your kid about the basics of science, use something to accelerate your work in a many different ways etc...
Looking at llms shouldn't be boolean, it shouldn't be between they're the best thing ever invented vs they're useless; but it seems like everyone presents the issue in this manner and Sabine is part of that problem.
Yes, it hallucinates and if you replace your brain with one of these things, you won't last too long. However, it can do things which, in the hands of someone experienced, are very empowering. And it doesn't take an expert to see the potential.
As it stands, it sounds like a case of "it's great in practice but the important question is how good it is in theory."
Pinch of salt & all.
Akin to human cognition but still a few bricks short of a load, as it were.
Are you trying to say that LLMs are useful now but you think that will stop being the case at some point in the future?
I mean fine, argue that they're mistaken to be concerned, if that's your belief. But dismissing it all as obvious shilling is not that argument.
But, if you spend too much time fawning over how impressive these things are, you might forget that something being impressive doesn't translate into something being useful.
Well, are they useful? ... Yeah, of course LLMs are useful, but we need to remain somewhat grounded in reality. How useful are LLMs? Well, they can dump out a boilerplate React frontend to a CRUD API, so I can imagine it could very well be harmful to a lot of software jobs, but I hope it doesn't bruise too many egos to point out that dumping out yet another UI that does the same thing we've done 1,000,000 times before isn't exactly novel. So it's useful for some software engineering tasks. Can it debug a complex crash? So far I'm around zero for ten and believe me, I'm trying. From Claude 3.7 to Gemini 2.5, Cursor to Claude Code, it's really hard to get these things to work through a problem the way anyone above the junior dev level can. Almost unilaterally, they just keep digging themselves deeper until they eventually give up and try to null out the code so that the buggy code path doesn't execute.
So when Sabine says they're useless for interpreting scientific publications, I have zero trouble believing that. Scoring high on some shitty benchmarks whose solutions are in the training set is not akin to generalized knowledge. And these huge context windows sound impressive, but dump a moderately large document into them and it's often a challenge to get them to actually pay attention to the details that matter. The best shot you have by far is if the document you need it to reference definitely was already in the training data.
It is very cool and even useful to some degree what LLMs can do, but just scoring a few more points on some benchmarks is simply not going to fix the problems current AI architecture has. There is only one Internet, and we literally lit it on fire to try to make these models score a few more points. The sooner the market catches up to the fact that they ran out of Internet to scrape and we're still nowhere near the singularity, the better.
Hardly. I pretty much have been using LLM at least weekly (most of the time daily) since GPT3.5. I am still amazed. It's really, really hard to not be bullish for me.
It kinda reminds me the days I learned Unix-like command line. At least once a week, I shouted to me self: "What? There is a one-liner that does that? People use awk/sed/xargs this way??" That's how I feel about LLM so far.
Deleted Comment
As a yet that's exactly what people get paid to do every day. And if it saves them time, they won't exactly get bored of that feature.
They are useful enough that they can passably replace (much more expensive) humans in a lot of noncritical jobs, thus being a tangible tool for securing enterprise bottom lines.
This is so clearly biased that it boarders on parody. You can only get out what you put in. The real use case of current LLMs is that any project that would previously require collaboration can now be down solo with a much faster turnover. Of course in 20 years when compute finally catches up they will just be super intelligent AGI
The big problem with being bullish in the stock market sense is that OpenAI isn't selling the LLMs that currently exist to their investors, they're selling AGI. Their pitch to investors is more or less this:
> If we accomplish our goal we (and you) will have infinite money. So the expected value of any investment in our technology is infinite dollars. No, you don't need to ask what the odds are of us accomplishing our goal, because any percent times infinity is infinity.
Since OpenAI and all the founders riding on their coat tails are selling AGI, you see a natural backlash against LLMs that points out that they are not AGI and show no signs of asymptotically approaching AGI—they're asymptotically approaching something that will be amazing and transformative in ways that are not immediately clear, but what is clear to those who are watching closely is that they're not approaching Altman's promises.
The AI bubble will burst, and it's going to be painful. I agree with the author that that is inevitable, and it's shocking how few people see it. But also, we're getting a lot of cool tech out of it and plenty of it is being released into the open and heavily commoditized, so that's great!
In the 90s, Robert Metcalfe infamously wrote "Almost all of the many predictions now being made about 1996 hinge on the Internet’s continuing exponential growth. But I predict the Internet, which only just recently got this section here in InfoWorld, will soon go spectacularly supernova and in 1996 catastrophically collapse." I feel like we are just hearing LLM versions of this quote over and over now, but they will prove to be equally accurate.
Generic. For the Internet, more complex questions would have been "What are the potential benefits, what the potential risks, what will grow faster" etc. The problem is not the growth but what that growth means. For LLMs, the big clear question is "will they stop just being LLMs, and when will they". Progress is seen, but we seek a revolution.
I think this is the source of a lot of the hype. There are people salivating at the thought of no longer needing to employ the peasant class. They want it so badly that they'll say anything to get more investment in LLMs even if it might only ever allow them to fire a fraction of their workers, and even if their products and services suffer because the output they get with "AI" is worse than what the humans they throw away were providing.
They know they're overselling it, but they're also still on their knees praying that by some miracle their LLMs trained on the collective wisdom of facebook and youtube comments will one day gain actual intelligence and they can stop paying human workers.
In the meantime, they'll shove "AI" into everything they can think of for testing and refinement. They'll make us beta test it for them. They don't really care if their AI makes your customer service experience go to shit. They don't care if their AI screws up your bill. They don't care if their AI rejects your claims or you get denied services you've been paying for and are entitled to. They don't care if their AI unfairly denies you parole or mistakenly makes you the suspect of a crime. They don't care if Dr. Sbaitso 2.0 misdiagnoses you. Your suffering is worth it to them as long as they can cut their headcount by any amount and can keep feeding the AI more and more information because just maybe with enough data one day their greatest dream will become reality, and even if that never happens a lot of people are currently making massive amounts of money selling that lie.
The problem is that the bubble will burst eventually. The more time goes by and AI doesn't live up to the hype the harder that hype becomes to sell. Especially when by shoving AI into everything they're exposing a lot of hugely embarrassing shortcomings. Repeating "AI will happen in just 10 more years" gives people a lot of time to make money and cash out though.
On the plus side, we do get some cool toys to play with and the dream of replacing humans has sparked more interest in robotics so it's not all bad.
Things could stall out and we'll have bumps and delays ... I hope. If this thing progresses at the same pace, or speeds up, well ... reality will change.
Or not. Even as they are, we can build some cool stuff with them.
The trouble is that, while incredibly amazing, mind blowing technology, it falls down flat often enough that it is a big gamble to use. It is never clear, at least to me, what it is good at and what it isn't good at. Many things I assume it will struggle with, it jumps in with ease, and vice versa.
As the failures mount, I admittedly do find it becoming harder and harder to compel myself to see if it will work for my next task. It very well might succeed, but by the time I go to all the trouble to find out it often feels that I may as well just do it the old fashioned way.
If I'm not alone, that could be a big challenge in seeing long-term commercial success. Especially given that commercial success for LLMs is currently defined as 'take over the world' and not 'sustain mom and pop'.
> the speed at which it is progressing is insane.
But same goes for the users! As a result the failure rate appears to be closer to a constant. Until we reach the end of human achievement, where the humans can no longer think of new ways to use LLMs, that is unlikely to change.
The author says they use several LLMs every day and they always produce incorrect results. That "feels" weird, because it seems like you'd develop an intuition fairly quickly for the kinds of questions you'd ask that LLMs can and can't answer. If I want something with links to back up what is being said, I know I should ask Perplexity or maybe just ask a long-form prompt-like question of Google or Kagi. If I want a Python or bash program I'm probably going to ask ChatGPT or Gemini. If I want to work on some code I want to be in Cursor and am probably using Claude. For general life questions, I've been asking Claude and ChatGPT.
Running into the same issue with LLMs over and over for years, with all due respect, seems like the "doing the same thing and expecting different results" situation.
If you work on stuff that is at all niche (as in, stack overflow was probably not going to have the answer you needed even before LLMs became popular), then it's not surprising when LLMs can't help because they've not been trained.
For people that were already going fast and needed or wanted to put out more code more quickly, I'm sure LLMs will speed them up even more.
For those of us working on niche stuff, we weren't going fast in the first place or being judged on how quickly we ship in all likelihood. So LLMs (even if they were trained on our stuff) aren't going to be able to speed us up because the bottleneck has never been about not being able to write enough code fast enough. There are architectural and environmental and testing related bottlenecks that LLMs don't get rid of.
I don't think I'm working on anything particularly niche, but nor is it cookie-cutter generic either, and that could be enough to drastically reduce their utility.
I just tried to use the latest Gemini release to help me figure out how to do some very basic Google Cloud setup. I thought my own ignorance in this area was to blame for the 30 minutes I spent trying to follow its instructions - only to discover that Gemini had wildly hallucinated key parts of the plan. And that’s Google’s own flagship model!
I think it’s pretty telling that companies are still struggling to find product-market fit in most fields outside of code completion.
But ask it to solve some leet code and it’s brilliant.
I should start collecting examples, if only for threads like this. Recently I tried to llm a tsserver plugin that treats lines ending with "//del" as empty. You can only imagine all the sneaky failures in the chat and the total uselessness of these results.
Anything that is not literally millions (billions?) of times in the training set is doomed to be fantasized about by an LLM. In various ways, tones, etc. After many such threads I came to conclusion that people who find it mostly useful are simply treading water as they probably have done most of their career. Their average product is a react form with a crud endpoint and excitement about it. I can't explain their success reports otherwise, cause it rarely works on anything beyond that.
And how many actual humans, with a fair bit of training, can become a little bit less than useless?
I mean, my parents used to have this dog that would just look at you like "go get you own damn ball, stupid human" if you threw a ball around him.
--edit--
and, yes, the dog also made grammatical mistakes.
Because it has a sample size of our collective human knowledge and language big enough to trick our brains into believing that.
As a parallel thought, it reminds of a trick derren brown did. He picked every horse correctly across 6 races. The person who he was picking for was obviously stunned, as were the audience watching it.
The reality of course is just that people couldn't comprehend that he just had to go to extreme and tedious lengths to make this happen. They started with 7000 people and filmed every one like it was going to be the "one" and then the probability pyramid just dropped people out. It was such a vast undertaking of time and effort that we're biased towards believing there must be something really happening here.
LLMs currently are a natural language interface to a Microsoft Encarta like system that is so unbelievably detailed and all encompassing that we risk accepting that there's something more going on there. There isn't.
Yes, it's artificial intelligence. It's not the real thing, it's artificial.
No, that's not my problem with it. My problem with it is that inbuilt into the models of all LLMs is that they'll fabricate a lot. What's worse, people are treating them as authoritative.
Sure, sometimes it produces useful code. And often, it'll simply call the "doTheHardPart()" method. I've even caught it literally writing the wrong algorithm when asked to implement a specific and well known algorithm. For example, asking it "write a selection sort" and watching it write a bubble sort instead. No amount of re-prompts pushes it to the right algorithm in those cases either, instead it'll regenerate the same wrong algorithm over and over.
Outside of programming, this is much worse. I've both seen online and heard people quote LLM output as if it were authoritative. That to me is the bigger danger of LLMs to society. People just don't understand that LLMs aren't high powered attorneys, or world renown doctors. And, unfortunately, the incorrect perception of LLMs is being hyped both by LLM companies and by "journalists" who are all to ready to simply run with and discuss the press releases from said LLM companies.
Still the elephant in the room. We need an AI technology that can output "don't know" when appropriate. How's that coming along?
Thats not an LLM problem. But indeed quite bothersome. Dont tell me what Chatgpt told you. Tell me what you know. Maybe you got it from ChatGPT and verified it. Great. But my jaw kind of drops when people cite an LLM and just assume it’s correct.
The underlying cause: 3rd order ignorance:
3rd Order Ignorance (3OI)—Lack of Process. I have 3OI when I don't know a suitably efficient way to find out I don't know that I don't know something. This is lack of process, and it presents me with a major problem: If I have 3OI, I don't know of a way to find out there are things I don't know that I don't know.
—- not from an llm
My process: use llms and see what I can do with them while taking their Output with a grain of salt.
I’m counting down the days when some AI hallucination makes its way all the way to the C-suite. People will get way too comfortable with AI and don’t understand just how wrong it can be.
Some assumption will come from AI, no one will check it and it’ll become a basic business input. Then suddenly one day someone smart will say “thats not true” and someone will trace it back to AI. I know it.
I assume at that point in time there will be some general directive on using AI and not assuming it’s correct. And then AI will slowly go out of favor.
Claude is cheaper, faster, produces better code.
The solution is to be selective and careful like always
The same is true about the internet, and people even used to use these arguments to try to dissuade people from getting their information online (back when Wikipedia was considered a running joke, and journalists mocked blogs). But today it would be considered silly to dissuade someone from using the internet just because the information there is extremely unreliable.
Many programmers will say Stack Overflow is invaluable, but it's also unreliable. The answer is to use it as a tool and a jumping off point to help you solve your problem, not to assume that its authoritative.
The strange thing to me these days is the number of people who will talk about the problems with misinformation coming from LLMs, but then who seem to uncritically believe all sorts of other misinformation they encounter online, in the media, or through friends.
Yes, you need to verify the information you're getting, and this applies to far more than just LLMs.
Because in people's experience, LLMs are often correct.
You are right LLMs are not authoritative, but people trust it exactly because they often do produce correct answers.
Happened to me as well. Wanted it to quickly write an algorithm for standard deviation over a stream of data, which is a text-book algorithm. It did it almost right, but messed up the final formula and the code gave wrong answers. Weird, considering some correct codes exist for that problem in Wikipedia.
FWIW, here's 4o writing a selection sort: https://chatgpt.com/share/67e60f66-aacc-800c-9e1d-303982f54d...
Code created by LLM's doesnt compile, hallucinated API's.. invalid syntax and completely broken logic, why would you trust it with someones life !
So what? People are wrong all the time. What happens when people are wrong? Things go wrong. What happens then? People learn that the way they got their information wasn't robust enough and they'll adapt to be more careful in the future.
This is the way it has always worked. But people are "worried" about LLMs... Because they're new. Don't worry, it's just another tool in the box, people are perfectly capable of being wrong without LLMs.
The LLM doesn’t need to be perfect. Just needs to beat a typical human.
LLM opponents aren’t wrong about the limits of LLMs. They vastly overestimate humans.
If you're lucky it figures it out. If you aren't, it makes stuff up in a way that seems almost purposefully calculated to fool you into assuming that it's figured everything out. That's the real problem with LLM's: they fundamentally cannot be trusted because they're just a glorified autocomplete; they don't come with any inbuilt sense of when they might be getting things wrong.
What matters is speeding up how fast I can find information. Not only will LLMs sometimes answer my obscure questions perfectly themselves, but they also help to point me to the jargon I need to use to find that information online. In many areas this has been hugely valuble to me.
Sometimes you do just have to cut your losses. I've given up on asking LLMs for help with Zig, for example. It is just too obscure a language I guess, because the hallucination rate is too high to be useful. But for webdev, Python, matplotlib, or bash help? It is invaluable to me, even though it makes mistakes every now and then.
Spend some time with current reasoning models. Your experience is obsolete if you still hold this belief.
- LLMs are a miraculous technology that are capable of tasks far beyond what we believed would be achievable with AI/ML in the near future. Playing with them makes me constantly feel like "this is like sci-fi, this shouldn't be possible with 2025's technology".
- LLMs are fairly clueless for many tasks that are easy enough for humans, and they are nowhere near AGI. It's also unclear whether they scale up towards that goal. They are also worse programmers than people make them to be. (At least I'm not happy with their results.)
- Achieving AGI doesn't seem impossibly unlikely any more, and doing so is likely to be an existentially disastrous event for humanity, and the worst fodder of my nightmares. (Also in the sense of an existential doomsday scenario, but even just the tought of becoming... irrelevant is depressing.)
Having one of these beliefs makes me the "AI hyper" stereotype, another makes me the "AI naysayer" stereotype and yet another makes me the "AI doomer" stereotype. So I guess I'm all of those!
In my opinion, there can exist no AI, person, tool, ultra-sentient omniscient being, etc. that would ever render you irrelevant. Your existence, experiences, and perception of reality are all literally irreplaceable, and (again, just my opinion) inherently meaningful. I don't think anyone's value comes from their ability to perform any particular feat to any particular degree of skill. I only say this because I had similar feelings of anxiety when considering the idea of becoming "irrelevant", and I've seen many others say similar things, but I think that fear is largely a product of misunderstanding what makes our lives meaningful.
When I tried to use the technology that %90 meant 1 out of every 10 things I wrote were incorrect. If it had been a keyboard I would have thrown it in the trash. That is were my Palm ended up.
People expect their technology to do things better not almost as well as a human. Waymo with LIDAR hasn't killed people. Tesla, with camera only, has done so multiple times. I will ride in a Waymo never in a Tesla self driving car.
Almost every counter-criticism of LLMs almost boil down to
1. you're holding it wrong
2. Well, I use it $DAYJOB and it works great for me! (And $DAYJOB is software engineering).
I'm glad your wife was able to save 2 hours of work, but forgive me if that doesn't translate to the trillion dollar valuation OpenAI is claiming. It's strange you don't see the inherent irony in your post. Instead of your wife just directly uploading the dataset and a prompt, she first has to prompt it to write code. There are clear limitations and it looks like LLMs are stuck at some sort of wall.
But I mirror the confusion why people are still bullish on it. The current valuation for it is because the market thinks that it's able to write code like a senior engineer and have AGI, because that's how they're marketed by the LLM providers.
I'm not even certain if they'll be ubiquitous after the venture capital investments are gone and the service needs to actually be priced without losing money, because they're (at least currently) mostly pretty expensive to run.
In markets, perception is reality, and the perception is that these companies are innovative. That’s it.
NFT is still a great tool if you want a bunch of unique tokens as part of a blockchain app. ERC-721 was proven a capable protocol in a variety of projects. What it isn't, and never will be, is an amazing investment opportunity, or a method to collect cool rare apes and go to yacht parties.
LLMs will settle in and have their place too, just not in the forefront of every investors mind.
Based on that alone it’s worth quite a lot.
They aren’t building anything themselves. I find this to be disingenuous as best, and is a sign to me of bubble attribution.
I also think that re-branding Machine Learning as AI to also be disingenuous.
These technologies of course have their use cases and excel in some things, but this isn’t the ushering of actual, sapient intelligence, that for the majority of the term’s existence was the de facto agreed standard for the term “AI”. This technology does lack the actual markers of what is generally accepted as intelligence to begin with
We saw this with the web. Pet.com was not a billion dollar company but the web was real.
I am actually of the belief that LLMs will be amazing but that rank and file companies are going to be the ones that benefit the most.
Just like the internet.
But moore's law should kick in, shouldn't it?
No it's not. If it was valued for that it'd be at least 10X what it is now.
Blockchains are becoming real-time data structures where everyone has admin level read-only access to everyone.
It reminds me a lot of when I first started playing No Man's Sky (the video game). Billions of galaxies! Exotic, one of a kind life forms on every planet! Endless possibilities! I poured hundreds of hours into the game! But, despite all the variety and possibilities, the patterns emerge, and every 'new' planet just feels like a first-person fractal viewer. Pretty, sometimes kinda nifty, but eventually very boring and repetitive. The illusion wore off, and I couldn't really enjoy it anymore.
I have played with a LOT of models over the years. They can be neat, interesting, and kinda cool at times, but the patterns of output and mistakes shatters the illusion that I'm talking to anything but a rather expensive auto-complete.
It`d take more time for me to flesh this out than I want to give but the basic idea is I am not just sitting there "expecting things". I´ve been puzzled too at why so many people don´t seem to get it or are so frustrated like this lady, and in my observation this is their common element. It just looks very passive to me, the way they seem to use the machines and expect a result to be "given" to them.
PS. It reminds me very strongly of how our parent generation uses computers. Like the whole way of thinking is different, I cannot even understand why they would act certain ways or be afraid of acting in other ways, it´s like they use a different compass or have a very different (and wrong) model in their head of how this thing in front of them works.
IMO there are two distinct reasons for this:
1. You've got the Sam Altman's of the world claiming that LLMs are or nearly are AGI and that ASI is right around the corner. It's obvious this isn't true even if LLMs are still incredibly powerful and useful. But Sam doing the whole "is it AGI?" dance gets old really quick.
2. LLMs are an existential threat to basically every knowledge worker job on the planet. Peoples' natural response to threats is to become defensive.
Just off the top of my head there are plenty of knowledge worker jobs where the knowledge isn’t public, nor really in written form anywhere. There just simply wouldn’t be anything for AI to train on.
Given the typical problems of LLMs they are not. You still need them to check the results. It’s like FSD, impressive when it works, bad if not, scary because you never known beforehand when it’s failing
I feel bad for people who haven't yet experienced how useful these models are for programming.
Some also just prefer manually entering everything. Those people I will never understand.
For reference, I program systems code in C/C++ in a large, proprietary codebase.
My experiences with OpenAI(a year ago or more), and more recently, Cursor, Grok-v3 and Deepseek-r1, were all failures. The later two started out OK and got worse over time.
What I haven't done is asked "AI" to whip up a more standard application. I have some ideas(an ncurses frontend to p4 written in python similar to tig, for instance), but haven't gotten around to it.
I want this stuff to work, but so far it hasn't. Now I don't think "programming" a computer in english is a very good idea anyway, but I want a competent AI assistant to pair program with. To the degree that people are getting results, to me it seems they are leveraging very high-level APIs/libraries of code which are not written by AI and solving well-solved, "common" problems(simple games, simple web or phone apps). Sort of like how people gloss over the heavy lifting done by language itself when they praise the results from LLMs in other fields.
I know it eventually will work. I just don't know when. I also get annoyed by the hype of folks who think they can become software engineers because they can talk to an LLM. Most of my job isn't programming. Most of my job is thinking about what the solution should be, talking to other people like me in meetings, understanding what customers really want beyond what they are saying, and tracking what I'm doing in various forms(which is something I really do want AI to help me with).
Vibe coding is aptly named because it's sort of the VB6 of the modern era. Holy cow! I wrote a Windows GUI App!!!. It's letting non-programmers and semi-programmers(the "I write glue code in Python to munge data and API ins/outs" crowd) create usable things. Cool! So did spreadsheets. So did Hypercard. Andrej tweeting that he made a phone app was kinda cool but also kinda sad. If this is what the hundreds of billions spent on AI(and my bank account thanks you for that) delivers then the bubble is going to pop soon.
That's okay.
It's not my responsibility to convince or convert them.
I prefer to just let them be and not engage.
As the parent says, while far from perfect, they're an incredible aid in so many areas. When used well, they help you produce not just faster but also better results. The only trick really is that you need to treat it as a (very knowledgeable but overconfident) collaborator rather than an oracle.
I say "intern" in the sense that its error-prone and kind of inexperienced, but also generally useful. I can ask it to automatically create a lot of the bootstrapping or tedious code that I always dread writing so that I can focus on the fun stuff, which is often the stuff that's pawned off onto interns and junior-level engineers. I think for the most part, when you treat it like that, it lives up to and sometimes even surpasses expectations.
I mean, I can't speak for everyone, but whenever I begin a new project, a large percentage of the first ~3 hours is simply copying and pasting and editing from documentation, either an API I have to call or some bootstrapping code from a framework or just some cruft to make built-in libraries work how you want. I hate doing all that, it actively makes me not want to start a new project. Being able to get ChatGPT to give me stuff that I need to actually get started on my project has made coding a lot more fun for me again. At this point, you can take my LLM from my cold dead hands.
I do think it will keep getting better, but I'm also at a point where even if it never improves I will still keep using it.
As of today, March 27, 2025, the latest stable version of Laravel is Laravel 11, which was released in March 2024. Laravel 12 has not been released yet (it's expected roughly in Q1 2026 based on the usual schedule).
Could you please double-check the exact Laravel version you are using?" So it did not believe me and I had to convince it first that I was using a real version. This went on for a while, with Gemini not only hallucinating stuff, but also being very persistent and difficult to convince of anything else.
Well, in the end it was still certain that this method should exist, even though it could not provide any evidence for it and my searching through the internet and the Git history of the related packages did also not provide any results.
So I gave up and tried it with Claude 3.7 which could also not provide any working solution.
In the end, I found an entirely different solution for my problem, but that wasn't based on anything the AIs told me, but just my own thinking and talking to other software developers.
I would not go that far to call these AIs useless. In software development they can help with simple stuff and boilerplate code, and I found them a lot more helpful in creative work. This is basically the opposite from what I would have expected 5 years ago ^^
But for any important tasks, these LLMs are still far too unreliable. They often feel like they have a lot of knowledge, but no wisdom. They don't know how to apply their knowledge ideally, and they often basically brute-force it with a mix of strange creativity and statistical models that are apparently based on a vast amount of internet content that has big parts of troll content and satire.
But instead, my productivity is hampered by issues with org communication, structure, siloed knowledge, lack of documentation, tech debt, and stale repos.
I have for years tried to provide feedback and get leadership to do something about these issues, but they do nothing and instead ask "How have you used AI to improve your productivity?"
Thing is, the LLMs that I use are all freeware, and they run on my gaming PC. Two to six tokens per second are alright honestly. I have enough other things to take care of in the meantime. Other tools to work with.
I don't see the billion dollar business. And even if that existed, the means of production would be firmly in the hands of the people, as long as they play video games. So, have we all tripled our salaries?
If we haven't, is that because knowledge work is a limited space that we are competing in, and LLMs are an equalizer because we all have them? Because I was taught that knowledge work was infinite. And the new tool should allow us to create more, and better, and more thoroughly. And that should get us all paid better.
Right?
The problems start when people start hyperventilating because they think since LLMs can generate tests for a function for you, that they'll be replacing engineers soon. They're only suitable for generating output that you can easily verify to be correct.
LLM training is designed to distill a massive corpus of facts, in the form of token sequences, into a much, much smaller bundle of information that encodes (somehow!) the deep structure of those facts minus their particulars.
They’re not search engines, they’re abstract pattern matchers.
1. People creating or dealing with imprecise information. People doing SEO spam, people dealing with SEO spam, almost all creative arts people, people writing corporatese- or legalese- documents or mails, etc. For these tasks LLMs are god-like.
2. People dealing with precise information and or facts. For these people LLMs is no better than a parrot.
3. Subset of 2 - programmers. Because of the huge amount of stolen training data, plus almost perfect proofing software is the form of compilers, static analyzers etc. for this case LLMs are more or less usable, the more data was used the better (JS is the best as I understand).
This is why people's reaction is so polarizing. Their results differ.
The crisis in programming hasn’t been writing code. It has been developing languages and tools so that we can write less of it that is easy to verify as correct. These tools generate more code. More than you can read and more than you will want to before you get bored and decide to trust the output. It is trained on the most average code available that could be sucked up and ripped off the Internet. It will regurgitate the most subtle errors that humans are not good at finding. It only saves you time if you don’t bother reading and understanding what it outputs.
I don’t want to think about the potential. It may never materialize. And much of what was promised even a few years ago hasn’t come to fruition. It’s always a few years away. Always another funding round.
Instead we have massive amounts of new demand for liquid methane, infrastructure struggling to keep up, billions of gallons of fresh water wasted, all so that rich kids can vibe code their way to easy money and realize three months later they’ve been hacked and they don’t know what to do. The context window has been lost and they ran out of API credits. Welcome to the future.
- AI is great for disinformation
- AI is great at generating porn of women without their consent.
- Open source projects massively struggle as AI scrapers DDOS them.
- AI uses massive amounts of energy and water, most importantly the expectation is that energy usage will rise when we drastically in a world where we need to lower it. If Sam Altman gets his way, we're toast.
- AI makes us intellectually lazy and worse thinkers. We were already learning less and less in school because of our impoverished attention span. This is even worse now with AI.
- AI makes us even more dependent on cloud vendors and third-parties, further creating a fragile supply chain.
Like AI ostensibly empowers us as individuals, but in reality I think it's a disservice, and the ones it truly empowers are the tech giants, as citizens become dumber and even more dependent on them and tech giants amass more and more power.
I have yet to see an AI-generated image that was "really cool".
AI images and videos strike me as the coffee pods of the digital world -- we're just absolutely littering the internet with garbage. And as a bonus, it's also environmentally devastating to the real world!
I live nearby a landfill, and go there often to get rid of yard waste, construction materials, etc. The sheer volume of perfectly serviceable stuff people are throwing out in my relatively small city (<200k) is infuriating and depressing. I think if more people visited their local landfills, they might get a better sense for just how much stuff humans consume and dispose. I hope people are noticing just how much more full of trash the internet has become in the last few years. It seems like it, but then I read this thread full of people that are still hyped about it all and I wonder.
This isn't even to mention the generated text... it's all just so inane and I just don't get it. I've tried a few times to ask for relatively simple code and the results have been laughable.
I don't have a proposal for what a better name would have been, naming things is hard, but AI carries quite a bit of baggage and expectations with it.
1. Some people are just uncomfortable with it because it “could” replace their jobs.
2. Some people are warning that the ecosystem bubble is significantly out of proportions. They are right and having the whole stock market, companies and US economy attached to LLMs is just down right irresponsible.
What jobs are seriously at risk of being totally replaced by LLM's? Even in things like copywriting and natural language translation, which is somewhat of a natural "best case" for the underlying tech, their output is quite sub par compared to the average human's.
Hossenfelder is a scientist. There's a certain level of rigour that she needs to do her job, which is where current LLMs often fall down. Arguably it's not accelerating her work to have to check every single thing the LLM says.
I think some people just aren't using them correctly or don't understand their limitations.
They are especially helpful for helping me get over thought paralysis when starting new project.
But while they are fun to play with, anything that requires a real answer, but can’t be directly and immediately checked, like customer support, scientific research, teaching, legal advice, identifying humans, correctly summarizing text - LLMs are very bad at these things, make up answers, mix contexts inappropriately, and more.
I’m not sure how you can have played with LLMs so much and missed this. I hope you don’t trust what they say about recipes or how to handle legal problems or how to clean things or how to treat disease or any fact-checking whatsoever.
This is like a GPT3.5 level criticism. o1-pro is probably better at pure fact retrieval than most PhDs in any given field. I challenge you to try it.
In fact take the GPQA test yourself and see how you do then give the same questions to o1. https://arxiv.org/pdf/2311.12022
I wonder if people that are amazed by LLM lack this information gathering skill.
After all I met plenty of architect and senior level people that just… had zero google and research skills.
To someone who doesn't actually check or have the knowledge or experience to check the output, it sounds like they've been given a real, useful answer.
When you tell the LLM that the API it tried to call doesn't exist it says "Oh, you're right, sorry about that! Here's a corrected version that should work!" and of course that one probably doesn't work either.
One takeaway from this is that labelling LLMs as "intelligent" is a total misnomer. They're more like super parrots.
For software development, there's also the problem of how up to date they are. If they could learn on the fly (or be constantly updated) that would help.
They are amazing in some ways, but they've been over-hyped tremendously.
When I saw GPT-3 in action in 2023, I couldn’t believe my eyes. I thought I was being tricked somehow. I’d seen ads for “AI-powered” services and it was always the same unimpressive stuff. Then I saw GPT-3 and within minutes I knew it was completely different. It was the first thing I’d ever seen that felt like AI.
That was only a few years ago. Now I can run something on my 8GB MacBook Air that blows GPT-3 out of the water. It’s just baffling to me when people say LLM’s are useless or unimpressive. I use them constantly and I can still hardly believe they exist!!
Exactly how I feel. I probably write 50 prompts/day, and a few times a week I still think, "I can't believe this is real tech."
It's bad technology because it wastes a lot of labor, electricity, and bandwidth in a struggle to achieve what most human beings can with minimal effort. It's also a blatant thief of copyrighted materials.
If you want to like it, guess what, you'll find a way to like it. If you try to view it from another persons use case you might see why they don't like it.
It is an impressive technology but is it US$244.22bn [1] impressive (I know this stat is supposed to account for computer vision as well but seeing as to how LLMs are now a big chunk of that I think it's a safe assumption)? It's projected to grow to over US$1tr by 2031. That's higher than the market size of commercial aviation at its peak [2]. I'm sorry if I agree that a cool chatbot is not approximately as important as flying.
[1] https://www.statista.com/outlook/tmo/artificial-intelligence...
[2] https://www.statista.com/markets/419/topic/490/aviation/#sta...
You no longer have the console as the primary interface, but a GUI, which 99.9+% of computer users control via a mouse.
You no longer have the screen as the primary interface, but an AUI, which 99.9+% of computer users control via a headset, earbuds, or a microphone and speaker pair.
You mostly speak and listen to other humans, and if you're not reading something they've written, you could have it read to you in order to detach from the screen or paper.
You'll talk with your computer while in the car, while walking, or while sitting in the office.
An LLM makes the computer understand you, and it allows you to understand the computer.
Even if you use smart glasses, you'll mostly talk to the computer generating the displayed results, and it will probably also talk to you, adding information to the displayed results. It's LLMs that enable this.
Just don't focus too much on whether the LLM knows how high Mount Kilimanjaro is; its knowledge of that fact is simply a hint that it can properly handle language.
Still, it's remarkable how useful they are at analyzing things.
LLMs have a bright future ahead, or whatever technology succeeds them.
It used to be annoying enough just having to clean the trackball, but at least you knew when it wasn't working.
Deleted Comment
Personally, I look back at how many years ago it was that we were seeing claims that truck drivers were all going to lose there jobs and society would tear itself apart over it within the next few years… and yet here we still are.
That said, I do experience frustrations: - Getting enraged when it messes up perfectly good code it wrote just 10 minutes ago - Constantly reminding it we're NOT using jest to write tests - Discovering it's created duplicate utilities in different folders
There's definitely a lot of hand-holding required, and I've encountered limitations I initially overlooked in my optimism.
But here's what makes it worthwhile: LLMs have significantly eased my imposter syndrome when it comes to coding. I feel much more confident tackling tasks that would have filled me with dread a year ago.
I honestly don't understand how everyone isn't completely blown away by how cool this technology is. I haven't felt this level of excitement about a new technology since I discovered I could build my own Flash movies.
But for larger tasks—say, around 2,000 lines of code—it often fails in a lot of small ways. It tends to generate a lot of dead code after multiple iterations, and might repeatedly fail on issues you thought were easy to fix. Mentally, it can get exhausting, and you might end up rewriting most of it yourself. I think people are just tired of how much we expect LLMs to deliver, only for them to fail us in unexpected ways. The LLM is good, but we really need to push to understand its limitations.
So far the industrial applications haven't been that promising, code writing and documentation is probably the most promising but even there it's not like it can replace a human or even substantially increase their productivity.
If you don’t constantly look for information, they might be less useful.
I did have a eureka moment the other day with deepseek and a very obscure bug I was trying to tackle. One api query was having a very weird, unrelated side effect. I loaded up cursor with a very extensive prompt and it actually figured out the call path I hadn't been able to track down.
Today, I had a very simple task that eventually only took me half an hour to manually track. But I started with cursor using very similar context as the first example. It just kept repeatedly dreaming up non-existent files in the PR and making suggestions to fix code that doesn't exist.
So what's the worth to my company of my very expensive time? Should I spend 10,20,50 percent of my time trying to get answers from a chatbot, or should I just use my 20 years of experience to get the job done?
Same vibe.
The quote about books being a mirror reflecting genius or idiocy seems to apply.
I see LLMs a kind of hyper-keyboard. Speeding up typing AND structuring content, completing thoughts, and inspiring ideas.
Unlike a regular keyboard, an LLM transforms input contextually. One no longer merely types but orchestrates concepts and modulates language, almost like music.
Yet mastery is key. Just as a pianist turns keystrokes into a symphony through skill, a true virtuoso wields LLMs not as a crutch but as an amplifier of thought.
In the 70's I read in some science book for kids about how one day we will likely be able to use light emitting diodes for illumination instead of light bulbs, and this "cold light" will save us lots of energy. Waited out that one too; it turned out so.
By the way, you don't need to be a 50+ year old nerd. Nerds are a special culture-pen where smart straight-A students from schools are placed so they can work, increase stakeholder revenues, and not even accidentally be able to do anything truly worthwhile that could redistribute wealth in society.
Deleted Comment
More like we note the frequency with which these tools produce shallow bordering on useless responses, note the frequency with which they produce outright bullshit, and conclude their output should not be taken seriously. This smells like the fervor around ELIZA, but with several multinational marketing campaigns behind it pushing.
https://www.ycombinator.com/companies/domu-technology-inc/jo...
If we judge a technology by how it transforms our lives, LLMs and GenAI has mostly been a net negative (at least that is how it feels).
Anyone who remembers further back than a decade or so remembers when the height of AI research was chess programs that could beat grandmasters. Yes, LLMs aren't C3PO or the like, but they are certainly more like that than anything we could imagine just a few years ago.
I remember seeing an AI lab in the late 1980's and thinking "that's never going to work" but here we are, 40 years later. It's finally working.
Deleted Comment
I feel like if teleportation was invented tomorrow, people would complain that it can't transport large objects so it's useless.
Basically people just doubling down on everything you just described. I can’t quite put a finger on it but it has a tinge of insecurity or something like that, hope that’s not the case and me just misinterpreting
https://news.ycombinator.com/item?id=43504459
> And people are like, "Wah, it can't write code like a Senior engineer with 20 years of experience!"
But LLMs should be good enough to resolve this confusion, ask them!
... But I do not believe we're on the cusp of a Lawnmower-Man future where someone's Metaverse eats all the retail-conference-halls and movie-theaters and retail-stores across the entire globe in an unbridled orgy of mind-shattering investor returns.
Similarly, LLMs are neat and have some sane uses, but the fervor about how we're about to invent the Omnimind and usher in the singularity and take over the (economic) world? Nah.
What next, "This Internet thing was just a fad" or "The industrial age was a fad"?
As far as breaking our reality and society? Absolutely :(
Choose a very narrow domain, that you known well, and you quickly realize they are just repeating the training data.
On the other hand, I saw github recently added Copilot as a code reviewer. For fun I let it review my latest pull request. I hated its suggestions but could imagine a not too distant future where I'm required by upper management to satisfy the LLM before I'm allowed to commit. Similarly, I've asked ChatGPT questions and it's been programmed to only give answers that Silicon Valley workers have declared "correct".
The thing I always find frustrating about the naysayers is that they seem to think how it works today is the end if it. Like I recently listened to an episode of Econtalk interviewing someone on AI and education. See lives in the UK and used Tesla FSD as an example of how bad AI is. Yet I live in California and see Waymo mostly working today and lots of people using it. I believe she wouldn't have used the Tesla FSD example, and would possibly have changed her world view at least a little, if she'd updated on seeing self driving work.
Except this isn't true. The code quality varies dramatically depending on what you're doing, the length of the chat/context, etc. It's an incredible productivity booster, but even earlier today, I wasted time debugging hallucinated code because the LLM mixed up methods in a library.
The problem isn't so much that it's not an amazing technology, it's how it's being sold. The people who stand to benefit are speaking as though they've invented a god and are scaring the crap out of people making them think everyone will be techno-serfs in a few years. That's incredibly careless, especially when as a technical person, you understand how the underlying system works and know, definitively, that these things aren't "intelligent" the way they're being sold.
Like the startups of the 2010s, everyone is rushing, lying, and huffing hopium deluding themselves that we're minutes away from the singularity.
Thank goodness for that too. I want it to help me with my job, not replace me.
Both are Markov chains, that you used to erroneously think Markov chain is a way to make a chatbot rather than a general mathematical process is on you not them.
Chatbots like in the sci-fi of your nostalgia? I never dreamed about that shit, sorry.
It was, more or less, the same narrative arc as Bitcoin, and was (is) headed for a crash.
That said, I've spent a few weeks with augment, and it is revelatory, certainly. All the marketing - aimed at a suite I have no interest in - managed to convince me it was something it wasn't. It isn't a replacement, any more than a power drill is a replacement for a carpenter.
What it is, is very helpful. "The world's most fully functioning scaffolding script", an upgrade from copilot's "the world's most fully functioning tab-completer". I appreciate it usefulness as a force multiplier, but I am already finding corners and places where I'd just prefer to do it myself. And this is before we get into the craft of it all - I am not excited by the pitch "worse code, faster", but the utility is undeniable in this capitalistic hell planet, and I'm not a huge fan of writing SQL queries anyway, so here we are!
Maybe Freud could explain.
it isnt ANY form of intelligence.
To quote Joel Spolsky, "When you’re working on a really, really good team with great programmers, everybody else’s code, frankly, is bug-infested garbage, and nobody else knows how to ship on time.", and that's the state we end up if we believe in the hype and use LLMs willy-nilly.
That's why people are annoyed, not because LLMs cannot code like a senior engineer, but because lots of content marketing a company valuation is dependent on making people believe it's the case.
That's how I see LLMs and the hype surrounding them.
And people keep forgetting how new this stuff is
This is like trashing video games in 1980 because Pong has awful graphics
No, it provides responses. It does not talk.
I can ask Claude the most inane programming question and got an answer. If I were to do that on StackOverflow, I'd get downvoted, rude comments, and my question closed for being off-topic. I don't have to be super knowledgeable about the thing I'm asking about with Claude (or any LLM for that matter).
Even if you ignore the rudeness and elitism of power-users of certain platforms, there's no more waiting for someone to respond to your esoteric questions. Even if the LLM spews bullshit, you can ask it clarifying questions or rephrase until you see something that makes sense.
I love LLMs, I don't care what people say. Even when I'm just spitballing ideas[1], the output is great.
---
[1]: https://blog.webb.page/2025-03-27-spitball-with-claude.txt
It's incredibly frustrating when people think they're a miracle tool and blindly copy/paste output without doing any kind of verification. This is especially frustrating when someone who's supposed to be a professional in the field is doing it (copy lasting non working AI generated code and putting it up for review)
That said, on one hand, they multiply productive and useful information. On the other hand, they kill productive and spread misinformation. That said, I still seem them as useful but not a miracle
Truly amazing technology which is very good at generating and correcting texts is marketed as senior developer, talented artist, and black box that has solution to all your problems. This impression shatters on the first blatant mistake, e.g. counting elephant legs: https://news.ycombinator.com/item?id=38766512
https://youtu.be/aGnMbKwP36U?si=WbXzphhhP8Hak1OQ
It’s a human nature thing - we’re supposed to be collecting nuts in the forest.
But I will admit the dora muckbang feet shit is fucking insane. And that just flat out scares the pants off me.
Sorry but this is a total skill issue lol. 80% code failure rate is just total nonsense. I don't think 1% of the code I've gotten from LLMs has failed to execute correctly.
Almost everytime I've tried using LLMs I've fallen into thepattern on calling out, correcting and argueing with the LLMs which is of course in itself sillyto do, because they don't learn, they don't really "get it" when they are wrong. There's no benefit to talking to a human.
Its also a slow burn issue - you have to use it for a while for what is obvious to users, to become obvious to people who are tech first.
The primary issue is the hype and forecasted capabilities vs actual use cases. People want something they can trust as much as an authority, not as much as a consultant.
If I were to put it in a single sentence? These are primarily narrative tools, being sold as factual /scientific tools.
When this is pointed out, the conversation often shifts to “well people aren’t that great either”. This takes us back to how these tools are positioned and sold. They are being touted as replacements to people in the future. When this claim is pressed, we get to the start of this conversation.
Frankly, people on HN aren’t pessimistic enough about what is coming down the pipe. I’ve started looking at how to work in 0 Truth scenarios, not even 0 trust. This is a view held by everyone I have spoken to in fraud, misinformation, online safety.
There’s a recent paper which showed that GAI tools improved the profitability of Phishing attempts by something like 50x in some categories, and made previously loss making (in $/hour terms) targets, profitable. Schneier was one of the authors.
A few days ago I found out someone I know who works in finance, had been deepfaked and their voice/image used to hawk stock tips. People were coming to their office to sue them.
I love tech, but this is the dystopia part of cyberpunk being built. These are narrative tools, good enough to make people think they are experts..
If you ask it random things the output looks amazing, yes. At least at first glance. That's what they do. It's indeed magical, a true marvel that should make you go: Woooow, this is amazing tech: Coming across as convincing, even if based on hallucinations, is in itself a neat trick!
But is it actually useful? The things they come up with are untrustworthy and on the whole far less good than previously available systems. In many ways, insidiously worse: It's much harder to identify bad information than it was before.
It's almost like we designed a system to pass turing tests with flying colours but forgetting that usefulness is what we actually wanted, not authoritative, human sounding bullshit.
I don't think the LLM naysayers are 'unimpressed', or that they demand perfection. I think they are trying to make statements aimed at balancing things:
Both the LLMs themselves, and the humans parroting the hype, are severely overstating the quality of what such systems produce. Hence, and this is a natural phenomenon you can observe in all walks of life, the more skeptical folks tend to swing the pendulum the other way, and thus it may come across to you as them being overly skeptical instead.
I'm trans, and I don't disagree that this technology has aspects that are problematic. But for me at least, LLMs have been a massive equalizer in the context of a highly contentious divorce where the reality is that my lawyer will not move a finger to defend me. And he's lawyer #5 - the others were some combination of worse, less empathetic, and more expensive. I have to follow up a query several times to get a minimally helpful answer - it feels like constant friction.
ChatGPT was a total game-changer for me. I told it my ex was using our children to create pressure - feeding it snippets of chat transcripts. ChatGPT suggested this might be indicative of coercive control abuse. It sounded very relevant (my ex even admitted in a rare, candid moment that she feels a need to control everyone around her one time), so I googled the term - essentially all the components were there except physical violence (with two notable exceptions).
Once I figured that out, I asked it to tell me about laws related to controlling relationships - and it suggested laws either directly addressing (in the UK and Australia), and the closest laws in Germany (Nötigung, Nachstellung, violations of dignity, etc., translating them to English - my best language). Once you name specific laws broken and provide a rationale for why there's a Tatbestand (ie the criterion for a violation is fulfilled), your lawyer has no option but to take you more seriously. Otherwise he could face a malpractice suit.
Sadly, even after naming specific law violations and pointing to email and chat evidence, my lawyer persists in dragging his feet - so much so that the last legal letter he sent wasn't drafted by him - it was ChatGPT. I told my lawyer: read, correct, and send to X. All he did was to delete a paragraph and alter one or two words. And the letter worked.
Without ChatGPT, I would be even more helpless and screwed than I am. It's far from clear I will get justice in a German court, but at least ChatGPT gives me hope, a legal strategy. Lastly - and this is a godsend for a victim of coercive control - it doesn't degrade you. Lawyers do. It completely changed the dynamics of my divorce (4 years - still no end in sight, lost my custody rights, then visitation rights, was subjected to confrontational and gaslighting tactics by around a dozen social workers - my ex is a social worker -, and then I literally lost my hair: telogen effluvium, tinea capitis, alopecia areata... if it's stress-related, I've had it), it gave me confidence when confronting my father and brother about their family violence.
It's been the ONLY reliable help, frankly, so much so I'm crying as I write this. For minorities that face discrimination, ChatGPT is literally a lifeline - and that's more true the more vulnerable you are.
Dead Comment
WhY aRe PeOpLe BuLlIsH
Dead Comment
LLMs produce midwit answers. If you are an expert in your domain, the results are kind of what you would expect for someone who isn’t an expert. That is occasionally useful but if I wanted a mediocre solution in software I’d use the average library. No LLM I have ever used has delivered an expert answer in software. And that is where all the value is.
I worked in AI for a long time, I like the idea. But LLMs are seemingly incapable of replacing anything of value currently.
The elephant in the room is that there is no training data for the valuable skills. If you have to rely on training data to be useful, LLMs will be of limited use.
If this were true, no one would hire junior employees and assistants. There's a huge amount of work that requires more time than expertise.
When an AI can say “Here’s how you make better, smaller, more powerful batteries, follow these plans”, then we will have a reason to worship AI.
When AI can bring us wonders like room temperature semiconductors, fast interstellar travel, anti-gravity tech, solutions to world hunger and energy consumption, then it will have fulfilled the promise of what AI could do for humanity.
Until then, LLMs are just fancy search and natural language processors. Puppets with strings. It’s about as impressive as Google was when it first came out.
I think that there are two kinds of people who use AI: people who are looking for the ways in which AIs fail (of which there are still many) and people who are looking for the ways in which AIs succeed (of which there are also many).
A lot of what I do is relatively simple one off scripting. Code that doesn't need to deal with edge cases, won't be widely deployed, and whose outputs are very quickly and easily verifiable.
LLMs are almost perfect for this. It's generally faster than me looking up syntax/documentation, when it's wrong it's easy to tell and correct.
Look for the ways that AI works, and it can be a powerful tool. Try and figure out where it still fails, and you will see nothing but hype and hot air. Not every use case is like this, but there are many.
-edit- Also, when she says "none of my students has ever invented references that just don't exist"...all I can say is "press X to doubt"
The problem is that I feel I am constantly being bombarded by people bullish on AI saying "look how great this is" but when I try to do the exact same things they are doing, it doesn't work very well for me
Of course I am skeptical of positive claims as a result.
Annoying response of course. But I’d never used an LLM to debug before, so I figured I’d give it a try.
First: it regurgitated a bunch of documentation and basic debugging tips, which might have actually been helpful if I had just encountered this problem and had put no thought into debugging it yet. In reality, I had already spent hours on the problem. So not helpful
Second: I provided some further info on environment variables I thought might be the problem. It latched on to that. “Yes that’s your problem! These environment variables are (causing the problem) because (reasons that don’t make sense). Delete them and that should fix things.” I deleted them. It changed nothing.
Third: It hallucinated a magic numpy function that would solve my problem. I informed it this function did not exist, and it wrote me a flowery apology.
Clearly AI coding works great for some people, but this was purely an infuriating distraction. Not only did it not solve my problem, it wasted my time and energy, and threw tons of useless and irrelevant information at me. Bad experience.
I see people say, "Look how great this is," and show me an example, and the example they show me is just not great. We're literally looking at the same thing, and they're excited that this LLM can do a college grads's job to the level of a third grader, and I'm just not excited about that.
Treat the AI as a freelancer working on your project. How would you ask a freelancer to create a Kanban system for you? By simply asking "Create a Kanban system", or by providing them a 2-3 pages document describing features, guidelines, restrictions, requirements, dependencies, design ethos, etc?
Which approach will get you closer to your objective?
The same applies to LLM (when it comes to code generation). When well instructed, it can quickly generate a lot of working code, and apply the necessary fixes/changes you request inside that same context window.
It still can't generate senior-level code, but it saves hours when doing grunt work or prototyping ideas.
"Oh, but the code isn't perfect".
Nor is the code of the average jr dev, but their codes still make it to production in thousands of companies around the world.
About 2 weeks ago I started on a streaming markdown parser for the terminal because none really existed. I've switched to human coding now but the first version was basically all llm prompting and a bunch of the code is still llm generated (maybe 80%). It's a parser, those are hard. There's stacks, states, lookaheads, look behinds, feature flags, color spaces, support for things like links and syntax highlighting... all forward streaming. Not easy
https://github.com/kristopolous/Streamdown
Exactly this.
I once had a function that would generate several .csv reports. I wanted these reports to then be uploaded to s3://my_bucket/reports/{timestamp}/.csv
I asked ChatGPT "Write a function that moves all .csv files in the current directory to and old_reports directory, calls a create_reports function, then uploads all the csv files in the current directory to s3://my_bucket/reports/{timestamp}/
.csv with the timestamp in YYYY-MM-DD format""And it created the code perfectly. I knew what the correct code would look like, I just couldn't be fucked to look up the exact calls to boto3, whether moving files was os.move or os.rename or something from shutil, and the exact way to format a datetime object.
It created the code far faster that I would have.
Like, I certainly wouldn't use it to write a whole app, or even a whole class, but individual blocks like this, it's great.
I use it to produce whole classes, large sql queries, terraform scripts, etc etc. I then look over that output, iterate on it, adjust it to my needs. It's never exactly right at first, but that's fine - neither is code I write from scratch. It's still a massive time saver.
It doesn't just save me a ton of time, it results in me building automations that I normally wouldn't have taken on at all because the time spent fiddling with os.move/boto3/etc wouldn't have been worthwhile compared to other things on my plate.
But if you can do the task well enough to at least recognize likely-to-be-correct output, then you can get a lot done in less time than you would do it without their assistance.
Is that worth the second order effects we're seeing? I'm not convinced, but it's definitely changed the way we do work.
As you said, examples where I wouldn't expect LLMs to be good at from people who dismiss the scenarios where LLMs are great at. I don't want to convince anyone, to be honest - I just want to say they are incredibly useful for me and a huge time saver. If people don't want to use LLMs, it's fine for me as I'll have an edge over them in the market. Thanks for the cash, I guess.
I'm growing weary of trying to help people use these tools properly.
Automating the easy 80% sounds useful, but in practice I'm not convinced that's all that helpful. Reading and putting together code you didn't write is hard enough to begin with.
I’ve never seen it from my students. Why do you think this? It’s trivial to pick a real book/article. No student is generating fake material whole cloth and fake references to match. Even if they could, why would they risk it?
I know arguments from authority aren't primary, but I think this point highlights some important context: Dr. Hossenfelder has gained international renown by publishing clickbait-y YouTube videos that ostensibly debunk scientific and technological advances of all kinds. She's clearly educated and thoughtful (not to mention otherwise gainfully employed), but her whole public persona kinda relies on assuming the exclusively-critical standpoint you mention.
I doubt she necessarily feels indebted to her large audience expecting this take (it's not new...), but that certainly does seem like a hard cognitive habit to break.
"Garbage in, garbage out" as the law says.
Of course, it took a lot of trial and error for me to get to my current level of effectiveness with LLMs. It's probably our responsibility to teach these who are willing.
Those people, likely, will never change their opinion.
And that’s fine, because they won’t get the huge benefits that come from spending time learning how to use the tool properly.
Every once in a while I send a query off to ChatGPT and I'm often disappointed and jam on the "this was hallucinated" feedback button (or whatever it is called). I have better luck with Claude's chat interface but nowhere near the quality of response that I get with Cline driving.
What I am seeing is fanboys who offer me examples of things working well that fail any close scrutiny— with the occasional example that comes out actually working well.
I agree that for prototyping unimportant code LLMs do work well. I definitely get to unimportant point B from point A much more quickly when trying to write something unfamiliar.
This doesn’t include lying and cheating which LLMs can’t.
On the other hand AI is used to solve problems that are already solved. I just recently got an ad about a software for process modeling where one claim was you don’t need always to start from the ground up but can say the AI give me the customer order process to start from that point. That is basically what templates are for with much less energy consumption.
You hit the nail on this one. Around me I noticed that the bashing of LLMs come from the smart people that want others to know they are smart.
It doesn't always correlate with narcissism, but it happens much more than chance.
Yes somewhat. Its good for powershell/bash/cmd scripts and configs, but early models it would hallucinate PowerShell cmdlets especially.
The use cases are vastly different and the first is just _not_ world changing. It’s great, don’t get me wrong, but it won’t change the world.
"I write code all day with LLMs, it's amazing!" is in the exact same category. The code you (general you, I'm not picking on you in particular) write using LLMs, and the code I write apart from LLMs: they are not the same. They are categorically different artifacts.
Deleted Comment
Deleted Comment
"I ask them to give me a source for an alleged quote, I click on the link, it returns a 404 error. I Google for the alleged quote, it doesn't exist. They reference a scientific publication, I look it up, it doesn't exist."
To experienced LLM users that's not surprising at all - providing citations, sources for quotes, useful URLs are all things that they are demonstrably terrible at.
But it's a computer! Telling people "this advanced computer system cannot reliably look up facts" goes against everything computers have been good at for the last 40+ years.
And that’s honestly unfair to you since you do awesome realistic and level headed work with LLM.
But I think it’s important when having discussions to understand the context within which they are occurring.
Without the bulls she might very well be saying what you are in your last paragraph. But because of the bulls the conversation becomes this insane stratified nonsense.
Use google AI studio with search grounding. Provides correct links and citations every time. Other companies have similar search modes, but you have to enable those settings if you want good results.
Of late, deaf tech forums are taken over by language model debates over which works best for speech transcription. (Multimodal language models are the the state of the art in machine transcription. Everyone seems to forget that when complaining they can't cite sources for scientific papers yet.) The debates are sort of to the point that it's become annoying how it has taken over so much space just like it has here on HN.
But then I remember, oh yeah, there was no such thing as live machine transcription ten years ago. And now there is. And it's going to continue to get better. It's already good enough to be very useful in many situations. I have elsewhere complained about the faults of AI models for machine transcription - in particular when they make mistakes they tend to hallucinate something that is superficially grammatical and coherent instead - but for a single phrase in an audio transcription sporadically that's sometimes tolerable. In many cases you still want a human transcriber but the cost of that means that the amount of transcription needed can never be satisfied.
It's a revolutionary technology. I think in a few years I'm going have glasses that continuously narrate the sounds around me and transcribe speech and it's going to be so good I can probably "pass" as a hearing person in some contexts. It's hard not to get a bit giddy and carried away sometimes.
If everyone is using them wrong, I would argue that says something more about them than the users. Chat-based interfaces are the thing that kicked LLMs into the mainstream consciousness and started the cycle/trajectory we’re on now. If this is the wrong use case, everything the author said is still true.
There are still applications made better by LLMs, but they are a far cry from AGI/ASI in terms of being all-knowing problem solvers that don’t make mistakes. Language tasks like transcription and translation are valuable, but by no stretch do they account for the billions of dollars of spend on these platforms, I would argue.
Yes the costs of training AI models these days are really high too, but now we're just making a quantitative argument, not a qualitative one.
The fact that we've discovered a near-magical tech that everyone wants to experiment with in various contexts, is evidence that the tech is probably going somewhere.
Historically speaking, I don't think any scientific invention or technology has been adopted and experimented with so quickly and on such a massive scale as LLMs.
It's crazy that people like you dismiss the tech simply because people want to experiment with it. It's like some of you are against scientific experimentation for some reason.
What? Then what the hell do you call Dragon NaturallySpeaking and other similar software in that niche?
I have a minor speech impediment because of the hearing loss. They never worked for me very well. I don't speak like a standard American - I have a regional accent and I have a speech impediment. Modern speech recognition doesn't seem to have a problem with that anymore.
IBM's ViaVoice from 1997 in particular was a major step. It was really impressive in a lot of ways but the accuracy rate was like 90 - 95% which in practice means editing major errors with almost every sentence. And that was for people who could speak clearly. It never worked for me very well.
You also needed to speak in an unnatural way [pause] comma [pause] and it would not be fair to say that it transcribed truly natural speech [pause] full stop
Such voice recognition systems before about 2016 also required training on the specific speaker. You would read many pages of text to the recognition engine to tune it to you specifically.
It could not just be pointed at the soundtrack to an old 1980s TV show then produce a time-sync'd set of captions accurate enough to enjoy the show. But that can be done now.
It’s been a common mantra - at least in my bubble of technologists - that a good majority of the software engineering skill set is knowing how to search well. Knowing when search is the right tool, how to format a query, how to peruse the results and find the useful ones, what results indicate a bad query you should adjust… these all sort of become second nature the longer you’ve been using Search, but I also have noticed them as an obvious difference between people that are tech-adept vs not.
LLMs seems to have a very similar usability pattern. They’re not always the right tool, and are crippled by bad prompting. Even with good prompting, you need to know how to notice good results vs bad, how to cherry-pick and refine the useful bits, and have a sense for when to start over with a fresh prompt. And none of this is really _hard_ - just like Search, none of us need to go take a course on prompting - IMO folks jusr need to engage with LLMs as a non-perfect tool they are learning how to wield.
The fact that we have to learn a tool doesn’t make it a bad one. The fact that a tool doesn’t always get it 100% on the first try doesn’t make it useless. I strip a lot of screws with my screwdriver, but I don’t blame the screwdriver.
On a side note, this lady is a fraud: https://www.youtube.com/watch?v=nJjPH3TQif0&themeRefresh=1
In no way am I credentialing her, lots of people can make astute observations about things they weren't trained in, but she both mastered sounding authoritative and at the same time, presenting things to go the most engagement possible.
If you don't have that experience in this domain, you will spend approximately as much effort validating output as you would have creating it yourself, but the process is less demanding of your critical skills.
Since reasoning models came about I've been significantly more bullish on them purely because they are less bad. They are still not amazing but they are at a poiny where I feel like including them in my workflow isn't an impediment.
They can now reliably complete a subset of tasks without me needing to rewrite large chunks of it myself.
They are still pretty terrible at edge cases ( uncommon patterns / libraries etc ) but when on the beaten path they can actually pretty decently improve productivity. I still don't think 10x ( well today was the first time I felt a 10x improvement but I was moving frontend code from a custom framework to react, more tedium than anything else in that and the AI did a spectacular job ).
These critics don't seem to have learned the lesson that the perfect is the enemy of the good.
I use ChatGPT all the time for academic research. Does it fabricate references? Absolutely, maybe about a third of the time. But has it pointed me to important research papers I might never have found otherwise? Absolutely.
The rate of inaccuracies and falsehoods doesn't matter. What matters is, is it saving you time and increasing your productivity. Verifying the accuracy of its statements is easy. While finding the knowledge it spits out in the first place is hard. The net balance is a huge positive.
People are bullish on LLM's because they can save you days' worth of work, like every day. My research productivity has gone way up with ChatGPT -- asking it to explain ideas, related concepts, relevant papers, and so forth. It's amazing.
For single statements, sometimes, but not always. For all of the many statements, no. Having the human attention and discipline to mindfully verify every single one without fail? Impossible.
Every software product/process that assumes the user has superhuman vigilance is doomed to fail badly.
> Automation centaurs are great: they relieve humans of drudgework and let them focus on the creative and satisfying parts of their jobs. That's how AI-assisted coding is pitched [...]
> But a hallucinating AI is a terrible co-pilot. It's just good enough to get the job done much of the time, but it also sneakily inserts booby-traps that are statistically guaranteed to look as plausible as the good code (that's what a next-word-guessing program does: guesses the statistically most likely word).
> This turns AI-"assisted" coders into reverse centaurs. The AI can churn out code at superhuman speed, and you, the human in the loop, must maintain perfect vigilance and attention as you review that code, spotting the cleverly disguised hooks for malicious code that the AI can't be prevented from inserting into its code. As qntm writes, "code review [is] difficult relative to writing new code":
-- https://pluralistic.net/2025/03/18/asbestos-in-the-walls/
I mean, how do you live life?
The people you talk to in your life say factually wrong things all the time.
How do you deal with it?
With common sense, a decent bullshit detector, and a healthy level of skepticism.
LLM's aren't calculators. You're not supposed to rely on them to give perfect answers. That would be crazy.
And I don't need to verify "every single statement". I just need to verify whichever part I need to use for something else. I can run the code it produces to see if it works. I can look up the reference to see if it exists. I can Google the particular fact to see if it's real. It's really very little effort. And the verification is orders of magnitude easier and faster than coming up with the information in the first place. Which is what makes LLM's so incredibly helpful.
And you don't have concerns about that? What kind of damage is that doing to our society, long term, if we have a system that _everyone_ uses and it's just accepted that a third of the time it is just making shit up?
Like, I can ask a friend and they'll mistakenly make up a reference. "Yeah, didn't so-and-so write a paper on that? Oh they didn't? Oh never mind, I must have been thinking of something else." Does that mean I should never ask my friend about anything ever again?
Nobody should be using these as sources of infallible truth. That's a bonkers attitude. We should be using them as insanely knowledgeable tutors who are sometimes wrong. Ask and then verify.
The net benefit is huge.
Main problem with our society is that two thirds of what _everyone_ says is made up shit / motivated reasoning. The random errors LLMs make are relatively benign, because there is no motivation behind them. They are just noise. Look through them.
Could it end up being a net benefit? will the realistic sounding but incorrect facts generated by A.I. make people engage with arguments more critically, and be less likely to believe random statements they're given?
Now, I don't know, or even think it is likely that this will happen, but I find it an interesting thought experiment.
LLMs will spit out responses with zero backing with 100% conviction. People see citations and assume it's correct. We're conditioned for it thanks to....everything ever in history. Rarely do I need to check a wikipedia entry's source.
So why do people not understand that: this is absolutely going to pour jet fuel on misinformation in the world. And we as a society are allowed to hold a bar higher for what we'll accept get shoved down our throats by corporate overlords that want their VC payout.
Imagine there is an probabilistic oracle that can answer any question with a yes/no with success probability p. If p=100% or p=0% then it is apparently very useful. If p=50% then it is absolutely worthless. In other cases, such oracle can be utilized in different way to get the answer we want, and it is still a useful thing.
Unreliability is something we live in. It is the world. Controlling error, increasing signal over noise, extracting energy from the fluctuations. This is life, man. This is what we are.
I can use LLMs very effectively. I can use search engines very effectively. I can use computers.
Many others can’t. Imagine the sheer fortune to be born in the era where I was meant to be: tools transformative and powerful in my hands; useless in others’.
I must be blessed by God.
Its true success rate is by no means 100%, and sometimes is 0%, but it always tries to make you feel confident.
I’ve had to catch myself surrendering too much judgment to it. I worry a high school kid learning to write will have fewer qualms surrendering judgment
So we're trying to use tools like this currently to help solve deeper problems and they aren't up to the task. This is still the point we need to start over and get better tools. Sharpening a bronze knife will never be as sharp or have the continuity as a steel knife. Same basic elements, very different material.
It's completely up to your ability to both find what you need without them and verify the information they give you to evaluate their usefulness. If you put that on a matrix, this makes them useful in the quadrant of information that is both hard to find, but very easy to verify. Which at least in my daily work is a reasonable amount.
There’s no question that we’re in a bubble which will eventually subside, probably in a “dot com” bust kind of way.
But let me tell you…last month I sent several hundred million requests to AI, as a single developer, and got exactly what I needed.
Three things are happening at once in this industry… (1) executives are over promising a literal unicorn with AGI, that is totally unnecessary for the ongoing viability of LLM’s and is pumping the bubble. (2) the technology is improving and delivery costs are changing as we figure out what works and who will pay. (3) the industry’s instincts are developing, so it’s common for people to think “AI” can do something it absolutely cannot do today.
But again…as one guy, for a few thousand dollars, I sent hundreds of millions of requests to AI that are generating a lot of value for me and my team.
Our instincts have a long way to go before we’ve collectively internalized the fact that one person can do that.
There are 2.6 million seconds in a month. You are claiming to have sent hundreds of requests per second to AI.
It is trivial for a server to send/receive 150 requests per second to the API.
This is what I mean by instincts...we're used to thinking of developers-pressing-keys as a fundamental bottleneck, and it still is to a point. But as soon as the tracks are laid for the AI to "work", things go from speed-of-human-thought to speed-of-light.
If you have a lot of GPU's and you're doing massive text processing like spam detection for hundreds of thousands of users, sure.
But "as a single developer", "value for me and my team"... I'm confused.