Don't believe the hype: why ChatGPT is not the “holy grail” of AI research

The first steam engines were also written off as being less powerful than a horse.

The first electric motors were written off as being less powerful than steam engines.

So it goes.

I think both of these views can be true at the same time: ChatGPT (or, LLMs really) are revolutionary and they won't revolutionize the world the way technologists/researchers say.

Early adopters will use the technology and do amazing things with it. Unions are already pushing back on AI (truckers, federal employees in Canada, writers in Hollywood) and maybe rightly so. At the same time, dismissing these technologies because they don't meet your high standards yet is probably foolish.

magicalist · 2 years ago

There's a strong argument for its actual merits but this particular line of argument isn't convincing.

> The first steam engines were also written off as being less powerful than a horse.

> The first electric motors were written off as being less powerful than steam engines

And the first Segways were written off, the first NFTs were written off, etc.

(in fact, we've seen this same argument for blockchains changing everything real soon now :)

MrPatan · 2 years ago

The first segways were written off... and now you have electric scooters on demand on every city of the planet.

The first NFTs were written off... by you and other writter-offers. If you didn't write them off back then and minted some you'd have made a pretty penny by now.

I don't know, I'm not sure those examples strengthen your case.

munificent · 2 years ago

> Unions are already pushing back on AI (truckers, federal employees in Canada, writers in Hollywood) and maybe rightly so.

Interesting bit of historical trivia: In the US, the main truck driver's union is the International Brotherhood of Teamsters. What is a teamster? Historically, it was a person who wrangled a team of horses or oxen to pull a wagon. That profession was effectively eliminated by the creation of the internal combustion engine.

Two of the other main trades that were involved in the rise of unions are longshoremen and stevedores. The former pull cargo off ships and get it onto land. The latter organize cargo on the ships.

The creation of shipping containers dramatically reduced the need for those jobs, leading to some of the longest strikes in the US. While the unions technically "won", the strikes mostly incentivizing shipping companies to push even farther into mechanization so that they were less reliant on labor. There are far fewer dockworkers and longshoremen today than there were before containerization.

Change is hard.

mediaman · 2 years ago

More on the port workers: today you'll find that skilled union port workers are extremely well-paid. Workers in the cranes that pull containers off the ship, for example, are paid >$300k.

They are quite skilled, but so are many other skilled tradespeople. Electricians and machinists for example do not make that kind of money despite their skill. So why do crane operators make so much?

When containerization, and subsequently other types of automation, hit ports, the union resisted fiercely but ultimately had to work out a deal with the port operators.

That deal was to reward the most tenured union tradespeople with much larger pay packages, at the cost of the less experienced tradespeople, who would have to find different work.

This agreement became agreeable to both sides, since the operators could still massively reduce the workforce, and the only price they'd have to pay would be high salaries for the union workers who remained. And the union was able to reward their longest tenured members.

It's hard to fault the union for this: the alternative was likely both huge reduction in workforce and less attractive pay, so they at least got good pay out of it for the remaining workers. But it made a lot of the less tenured union workers resentful because they felt the union sacrificed them in favor of the union 'insiders'.

The book 'The Box' by Marc Levinson is a great coverage of this topic, if steel shipping containers are the sort of thing that get you going.

Deleted Comment

slashdev · 2 years ago

I use it almost on a daily basis now, and I pay for the monthly subscription. It’s not magic, but it can save me a lot of time sometimes. I use it in my job as a software engineer, and I mostly use it to create unit tests. The code usually needs a lot of love, but it gets me started.

PartiallyTyped · 2 years ago

Copilot and chatgpt are some of the services i am perfectly content paying for.

ChatGPT is a great way to delve into new topics, and for that alone it’s worth it.

visarga · 2 years ago

> I think both of these views can be true at the same time: ChatGPT (or, LLMs really) are revolutionary and they won't revolutionize the world the way technologists/researchers say.

Yes, because people think that LLMs are almost AGI based on the social media reactions and can't imagine they still have unknown/unsolved problems. But if we take a look at the 14 years of self driving car development, it becomes clear how AI can be both amazing and not good enough at the same time.

Paul-Craft · 2 years ago

> Yes, because people think that LLMs are almost AGI....

Surprise, surprise... this has happened before:

> Lay responses to ELIZA were disturbing to Weizenbaum and motivated him to write his book Computer Power and Human Reason: From Judgment to Calculation, in which he explains the limits of computers, as he wants to make clear his opinion that the anthropomorphic views of computers are just a reduction of the human being and any life form for that matter.[29] In the independent documentary film Plug & Pray (2010) Weizenbaum said that only people who misunderstood ELIZA called it a sensation.[30]

https://en.wikipedia.org/wiki/ELIZA#Response_and_legacy

And, it's easy to see why. You can talk the damn thing, and it talks back! People love to anthropomorphize things, anyway, but if you can talk to it and it talks back, people think there's got to be something to it.

This time, though, is a little different. GPT-3 and GPT-4 actually do behave like they understand natural language to a great extent. That makes them directly analogous to Searle's Chinese room construct, and suggests that they could actually pass the Turing test (if suitably fine-tuned).

This is great, because, as you say, it's amazing. But I also think it's not good enough, because the fact that GPT-4 may be able to pass the Turing test really says more to me about the limitations of the Turing test than anything else. Likewise with the Chinese room analogy: we know what's in the box, and we know it shouldn't be trusted.

But, you're not going to get that kind of analysis from the general public.

https://en.wikipedia.org/wiki/Chinese_room

https://en.wikipedia.org/wiki/Turing_test

spacebanana7 · 2 years ago

LLMs are different from self driving cars in that they can be useful even when they make wrong decisions occasionally. Copilots, document drafting (legal, copy, etc) and summarisation are useful services that people and enterprises are currently enjoying.

cl42 · 2 years ago

AVs have also struggled with regulatory muddle, which is partly my point.

Self-driving is very possible in many situations and if there was a "Manhattan Project" for self-driving to be up and running by 2025 I think we could do it... But there are so many vested interests that this won't happen.

... and then everyone is disappointed.

BTW, I'm not saying this is all bad... Everyone asking for a 6-month AI research moratorium gets it indirectly via societal inertia and regulatory muddle!

somesortofthing · 2 years ago

The difference here is that steam engines were "less powerful than a horse" in an easily quantifiable, easily diagnosable way. They produced fewer newtons of force. You could tell this was the case because your mechanism just wouldn't move when you wanted it to. Most new technologies followed this pattern, they quantifiably underperformed alternatives until the field matured. But AI doesn't act like a dumb human who's missing information or is inept at the task presented to them. It doesn't refuse to answer if you give it a question that's too hard for it or requires info that it doesn't have in the training dataset, it confidently makes stuff up and then covers up for the fact that it made stuff up by burying it in marketing copy and extraneous info such that you need to be an expert in the topic you're using AI for to even tell that it failed. Better AI models do help with this, but they simultaneously improve the AI's obfuscation abilities to the point where fatal flaws in its output are going to be even harder to catch than they are now with human review. It doesn't have the same risk calculus as a human, it doesn't care whether the marketing copy you're writing describes your product as wonderful and perfect or if it's providing you completely bogus legal advice that'll land you in jail for a decade if you follow it.

And this is all before we even bring up the topic of prompt injection, a problem so intrinsic to the technology that OpenAI doesn't even take bug reports on it because bug reports "are for problems that can be fixed".

daveguy · 2 years ago

One of the biggest problems in AI for the last 60 years is the grounding problem. The ability of a model to be rooted in objective reality. In other words, for one of these LLMs to understand when they are being accurate vs hallucinating. None of the current crop of LLMs has come close to solving this problem. On the contrary, they make the problem blatantly obvious. No LLMs will achieve AGI until this is solved sufficiently that the answers of an LLM can be depended on without complete independent secondary verification.

mrguyorama · 2 years ago

A LANGUAGE model cannot solve this because truth and fiction is not a property of LANGUAGE

esjeon · 2 years ago

I think the analogy doesn't work here.

In my understanding, LLMs already hit a big wall. We can't increase the size of models mainly because it's too expensive, but also doing so may not be as effective as before. We've also run out of data. The free lunch is likely already over, for now. It's unlikely that we'll see huge improvements in the direction we've seen during recent years.

Instead, what I see is that the first letter 'L' is getting smaller. People are working on (relatively) smaller specialized models. But it means these models are unlikely outperform larger LLMs (in the direction mentioned above).

SilasX · 2 years ago

Don’t forget the infamous remark from the Slashdot mod about the first iPod, “No wireless. Less space than a nomad. Lame.”

https://slashdot.org/story/21026

pmoriarty · 2 years ago

they weren't wrong...

wnevets · 2 years ago

A more modern example is the first iPhone, the first gen was bad even by the standards of the day. If you look passed the novelty of having a lightsaber app on your phone it was terrible.

3-cheese-sundae · 2 years ago

I had a Motorola Q at the time and the first iPhone was light years beyond it, even if the only metric used to compare was browsing the internet. Most sites were barely functional in Windows Mobile IE.

Q6T46nT668w6i3m · 2 years ago

What? Safari on your phone was mind blowing and immediately useful. Also apps weren’t part of the original iPhone.

ghaff · 2 years ago

I'm not sure "bad" is fair. But the app ecosystem wasn't really developed, the network connectivity wasn't great, and there were probably a lot of other shortcomings especially in retrospect. I had a Treo at the time and didn't upgrade for a few years to the 3GS which, as I recall, was when the iPhone really took off.

The iPod had a somewhat similar trajectory. The first gen version was pretty much just another MP3 player and iTunes didn't even run on Windows at first.

kgwgk · 2 years ago

> the first iPhone, the first gen was bad even by the standards of the day

That's why it barely got over six million units sold.

charcircuit · 2 years ago

No, because next word predictors are fundamentally limited in their capabilites and have interest problems. This isn't something you can just iterate on to fix. You need a different architecture.

marban · 2 years ago

An iPod, a phone, an Internet communicator. Not so bad if you ask me.

sclarisse · 2 years ago

What are you referring to here? The iPhone didn’t have apps for over a year, until the same time as the iPhone 3G launch.

denimnerd42 · 2 years ago

the worst was it was ATT only and they had a coverage hole on the block where I lived so I had to get rid of it for something supported by verizon. didn't go back to an iphone until 10 years later.

hodgesrm · 2 years ago

> The first steam engines were also written off as being less powerful than a horse.

Which steam engine do you mean and do you have a citation for this comment? The first industrial steam engines were based on the Newcomen design and were used to pump water out of mines. Their big drawback was efficiency, not power. They were only economical in coal mines, which had fuel immediately available at near zero cost.

[0] https://en.wikipedia.org/wiki/Newcomen_atmospheric_engine

Brendinooo · 2 years ago

> I think both of these views can be true at the same time: ChatGPT (or, LLMs really) are revolutionary and they won't revolutionize the world the way technologists/researchers say.

This is pretty much what I've seen. ChatGPT (that's 3.5, right? GPT3 was interesting but still pretty laughable) was a massive step forward, and incredibly exciting to witness and interact with. But it still does have limitations, especially if you try to separate hype (which comes from an ecosystem of people who have incentives to hype it) from reality.

godelski · 2 years ago

> The first steam engines were also written off as being less powerful than a horse.

This may not be the best example considering that steam engines were around since at least 20BCE[0] but the first successful application wasn't till almost 1700.

[0] https://en.wikipedia.org/wiki/Aeolipile

Deleted Comment

Large Language Models aren't a silver bullet – they don't solve all your problems. But they are a holy grail – as a universal common sense module they give IT systems a capability they never had before, a capability which has been sought after from since computers became a thing, a capacity for common sense.

We now have that capacity and that alone will revolutionize the world. The chatbots aren't about chat, they are about common sense.

Like the article, I am only talking about technology that already exists although the progress in deep learning is still super-exponential.

We will certainly achieve AGI during this year as it will only require making these systems self-play like we did with AlphaGo -> AlphaZero -> MuZero. Self-play, or reinforcement learning with machine feedback will skyrocket the performance of these systems in language domain, which conveniently encompasses much of what is still missing for AGI.

bccdee · 2 years ago

You've got it exactly backwards. They're not about common sense; they're about chat.

LLMs act in insensible ways all the time. They contradict themselves. They hallucinate. If you ask an LLM to follow a simple but long logic puzzle and show you its work, it will often make extremely obvious errors and fail to notice even when you ask it to review its work.

What LLMs can do is coherently string together language. That requires sophisticated linguistic understanding, and LLMs are pretty impressive for it. But merely understanding language is not intelligence, or even common sense. I suspect we're approaching the limits of what we can get out of statistical language generation.

Actual AGI would require a logical model of the world, not a probabilistic model of language. Looking at non-human animals, we can see problem-solving evolved long before language did. Language is a second-order phenomenon we use to express first-order problem-solving conclusions about the world, and I'm skeptical that we'll ever manage to accurately recreate first-order problem-solving by training models on the second-order linguistic artefacts of problem-solving. It's actually very easy to create superficially-realistic second-order problem-solving artefacts which, when examined with first-order problem solving capabilities, don't stand up to scrutiny—i.e. contradictions, hallucinations, and faulty reasoning. I suspect that, when computers do learn to problem-solve, it will have been by training to solve problems.

There's also no way we'll get the kind of growth you're predicting from self-play. When it comes to a simple competition like a board game, it's easy to optimize for more skilled play. But there's no objective way to win a conversation. Maybe we'll get some gains out of training these systems against each other, but it's just not a clearly viable use case.

sillysaurusx · 2 years ago

> Like the article, I am only talking about technology that already exists although the progress in deep learning is still super-exponential.

> We will certainly achieve AGI during this year as it will only require making these systems self-play like we did with AlphaGo -> AlphaZero -> MuZero. Self-play, or reinforcement learning with machine feedback will skyrocket the performance of these systems in language domain, which conveniently encompasses much of what is still missing for AGI.

There’s an important difference between exponential and sigmoidal curves. The early stages are indistinguishable, and not enough time has passed to judge.

Personally, I don’t think AGI is possible with current techniques. You say all that’s needed is self play or RLHF. This is categorically not true. It doesn’t even guarantee that AIs will ever care whether they’re alive, a fundamental property of sentience.

ehsanu1 · 2 years ago

There is likely a definitional gap here. Sentience is unnecessary for intelligence for my definition of intelligence, but agreeing on a common definition has been tough when we understand it so poorly.

I happen to also disagree with "caring" (requires definition) being relevant to sentience, defined as the ability to perceive or feel things.

ribosometronome · 2 years ago

>It doesn’t even guarantee that AIs will ever care whether they’re alive, a fundamental property of sentience.

Since when?

mellosouls · 2 years ago

We will certainly achieve AGI during this year...

As exciting and transformative as GPT3+ is, let's not get too hypey.

You need to back up outlandish claims with actual evidence and references. As discussed many times in this forum there's no evidence of sentience or any reason to consider the current systems to be even on the path to AGI.

hammyhavoc · 2 years ago

Props up falling share prices though.

esjeon · 2 years ago

> a capacity for common sense.

Uh, nope. Being able to spur out text is far from understanding what common sense is. If it did have the common sense, why would OpenAI struggle so much with filtering? Because the model doesn't comprehend what it generates. It's only capable of interpolate textual data it witnessed. The sense of common sense is merely an illusion created by the brain, which also loves interpolating whatever there are.

rdedev · 2 years ago

It's pretty straightforward to build an RL environment for closed systems like chess but I don't think it's close enough for an AGI to learn. Like RLHF uses human feedback. Unless we come up with a way to scale that process AGI by this year doesn't seem possible

nerpderp82 · 2 years ago

Wait until they can watch TV to learn (I am serious). If you imbue them with competitive play, oh boy. We just gotta figure out what they think funny is.

belter · 2 years ago

"Learning Video Representations from Large Language Models" - https://arxiv.org/abs/2212.04501

NoZebra120vClip · 2 years ago

https://xkcd.com/1696/

keskival · 2 years ago

idopmstuff · 2 years ago

I'm surprised this article is getting upvoted - it feels like very lazy journalism to me.

> The discomforting reality is that, while Altman and his ilk have been predicting an exponential acceleration of productivity, we have been experiencing a deceleration.

This is a very big claim, and there is absolutely nothing to back it up. The only specific reference to productivity is about an MIT paper that showed increases in worker productivity (but the authors of this just wave that aside as unimportant because they didn't think the work it was doing was important).

> More dangerously, ChatGPT can make authoritative statements that sound believable but turn out to be false if investigated closely.

We get it! We know! But look, this is a bad use case for GPT. If you pretend that it only has a single use case, and you pick the use case that it's worst at, you will think it's bad. This is just so, so lazy. No references to summarizing docs or writing code/SQL queries/Excel formulas or any of the other things that it's genuinely useful at.

> At best, LLMs can be used for rough first drafts of low-value writing tasks with humans filling in the details and checking for rants and lies.

Rants? Come on - GPT hallucinates, but it's not an unhinged lunatic that goes ranting about stuff. Also, again, this is not all they can be used for - it just ignores all of the better use cases.

> What about Altman's vision of humans appreciating art and nature while most of the world's goods and services are produced by AI? We have a lot more respect for the work that people do than for the usefulness of LLMs.

Huh? It's great that you respect the work people do, but that has nothing to do with whether they'll affect society.

> ChatGPT is entertaining but it is, at most, a baby step towards an AI revolution and, at worst, a very expensive detour away from the holy grail of artificial general intelligence.

What? This is the closing to the article and it just throws out this enormous claim, which is backed up by absolutely nothing. It's demonstrably a big step towards an AI revolution - if nothing else, it's brought a ton of money and interest into the space, which is certainly important for a revolution.

But to say it's a detour away from AGI and then give absolutely no explanation of why that is or what direction AI research should be going? This is very poor journalism.

> This is very poor journalism

Then be the change you want to see. Write a counterargument and submit it. Better yet, use an LLM to write the article and state the prompts used.

substation13 · 2 years ago

It's really easy to get an LLM to hallucinate by asking an open ended question - the type typically answered by a Google search or checking Wikiedpia. However, this is not the best application of LLMs. This criticism is getting old.

LLMs are great at:

- Text synthesis given all of the facts in a prompt (expand these bullet points)

- Summarization (condense this text)

- Data extraction (fit this data into this schema)

- Fiction (virtual characters, scripts, etc.)

They will dramatically change these industries.

soperj · 2 years ago

> - Fiction (virtual characters, scripts, etc.)

I've found it bad for this, does not generate something I'd actually read.

vidarh · 2 years ago

It seems to do ok at coming up with starting points or give you options if you're stuck. But the quality of the prose it comes up with is indeed awful. It gets a bit better if you ask it to write in the style of a specific author, but marginally so.

I guess maybe it gets to mediocre fan fiction level.

That's still pretty impressive, but not very usable for creative writing yet.

sumtechguy · 2 years ago

It is good at inferring the correct people into a story. But the story many times leaves something to be desired.

Other times though I did have a lot of fun having it spit out SCP stories. As those can many times have a ton of template like logic to them. Due to the nature of SCP being written in a tone of a formal report. Plus well over 2000 different examples.

Also some of that could be due to lack of training data. Like a TV show might be 4 seasons long and a particular character may have had 3 or 4 lines total. It would be like asking it to write a story about Boba Fett given the original 2 movies where he showed up and had maybe 1 or two lines. There just is not enough to extrapolate anything. But you ask it to write something about Harry Potter and it probably could get the style close enough as there is more training data.

My biggest grip is sometimes it just gets stuck in a loop. Once you are in one, the thing just will dump out the same hallucinations over and over.

Try Anthropic's Claude[1]. I've found it to be better at creative writing than GPT4 or even Claude+.

That said, it's still not great, though sometimes you can luck on to finding a gem in what it writes.

I've also had luck in giving it examples of the sort of thing I wanted it to write and asking it to write something similar, but with certain modifications that I wanted it to make.

Giving two or more examples and asking it to combine them is also fun.

[1] - https://poe.com/Claude-instant

sharemywin · 2 years ago

I wonder if an LLM trained on your favorite author how many words/sentences paragraphs it could generate in the middle of a book that would be basically undetectable.

jhp123 · 2 years ago

LLMs also hallucinate during summarization tasks, adding topics that were not in the original

I've built internal systems that do summarization based on knowledge retrieval systems for specific nonpublic corporate information.

With GPT-4, I find very little hallucinating. It very rarely deviates from the source material. Every time I've found something unexpected, there was a problem in the source material provided to the model.

morelisp · 2 years ago

> However, this is not the best application of LLMs

It is, however, the commercial application everyone - including search engines! - is implementing.

iamjackg · 2 years ago

To be fair, the ones I've seen use a form of point 1 (giving all facts in the prompt) by allowing for searching the web, which becomes a version of point 2 (summarization).

I can see LLMs as a novel front-end for a traditional search engine.

iforgotpassword · 2 years ago

Regarding the last point: What I still find the most entertaining is how easily you can change its personality, especially via the system prompt. You can get it to be rather snarky, even sometimes insulting, which makes for hilarious IRC bots.

EMM_386 · 2 years ago

In less then 20 tokens, you can get ChatGPT simply via the web interface to become snarky and swearing like a drunken sailor.

And it is indeed hilarious at times.

galaxytachyon · 2 years ago

As the hype phase has passed (probably), now we will see a bit of overcorrection with these dismissive articles. Sure, LLMs as they are now aren't anywhere close to true AGI and even Microsoft admitted it. But its potential is not something anyone can ignore. The capabilities of LLMs has already been successfully used by millions of people and startups. It is a groundbreaking improvement that makes at least one field of study nearly obsolete (NLP). It captured attentions of both corporations and government who are pouring billions into it. All of this in the span of one year or less.

With the multimodal models coming next and still exabytes of videos, games, sound, musics, etc. data to train them, we aren't peaking yet. Sure, it isn't the holy grail. But it is a really valuable treasure that only a few exist, to use the same analogy. To view it so dismissively because of some drawbacks, which are entirely obvious and can be accounted for, is just arrogance.

ARandumGuy · 2 years ago

> It captured attentions of both corporations and government who are pouring billions into it. All of this in the span of one year or less.

Corporations and governments have thrown tons of money into technologies that ended up going nowhere. We're only a few years out from everyone dumping their money into "blockchain solutions", which turned out to go nowhere.

Investors and government stakeholders are easily swayed by hype. Sometimes this hype is well placed, but often the hype results in throwing money at projects that don't produce anything of value. Hype just isn't a good measure of a technology's long term viability.

When only a few of them followed the hype, yes, it can possibly go nowhere.

But when the entire industry, experts and non-experts included, are fascinated and obsessed with the same thing, it is more likely to be something real. An easy example is the first iPhone.

Another more negative example is bitcoin which even though it is probably a scam, its values and influence on society has massively grown more than what it was 1 year after released. Even though it has been a disappointment technologically.

jonwinstanley · 2 years ago

Sure, anyone that uses ChatGPT knows it's currently not perfect.

But there's a presumption that these tools are going to keep improving over time. Which is presumably why the AI hype is so strong.

Whether AI ends up displacing people from their jobs in the long term, well, that's impossible to know. Just because no technological advancement has ever done that in the past doesn't mean it will never happen in the future.

mbgerring · 2 years ago

As long as the accuracy of an LLM’s output is unknowable, there’s going to be a pretty hard limit on the kinds of jobs these tools can “replace”. And its not at all clear that this fundamental problem can be fixed at all with the current approach.

adam_arthur · 2 years ago

A tool doesn't have to obviate a worker's contributions 1:1 to replace them.

If one person can now do the work of 1.5 people, then the number of people needed for a profession shrinks, all else equal. For example, a professional translator may be able to do 2x the work by leveraging LLM/other AI, even though you still need them to validate the results. If productivity doubles, then only half the people are required to meet current needs.

The mistake is in believing that LLM's output should be deterministic to be useful.

Human output is not deterministic.

Fields with text-heavy output are already being upended by this. Being able to summarize long legal briefs, identify contract problems, do classification of discovery documents, or even write first drafts of common legal forms is already upending the legal discipline.

Chat-based customer support agents are seeing 25% productivity improvements based on two-year-old models for new employees, according to a study published in NBER.

Things like BabyAGI and other sequential "do anything" tools appear to be close to useless now, and unfortunately that is what is catching a lot of hype on Twitter. But actual industry applications are much quieter (often NDA) and much more impactful.

Humans can make mistakes and lie, and we've been able to deal with it by checking their work, giving feedback to help them improve, placing less trust in those who habitually lie, etc..

LLMs making mistakes and "hallucinating" can be dealt with in similar ways, and as this is an open area of research with lots of proposed solutions and probably many more in the years to come, we do/will have plenty of other ways to deal with it too.

ska · 2 years ago

It’s not the first time we’ve been here , either, with AI although t this time it’s a bit more in the public , ie retail sphere. There are people who will confidently tell you that LLM are the next transistor level invention, and people who will tell you it’s more incremental, like eg an electric pressure cooker - improvement in some ways over what came before, got lots of people using them, but not fundamental. I’m sure there is a better example.

Anyway, the truth is nobody actually knows at this point .

This reminds of all the hype for self-driving cars a few years back. Self-driving systems performed well for 95% of driving, and it seemed like only a matter of time before the last 5% was ironed out.

Turns out, the last 5% was both extremely difficult, and extremely important. It turns out that a self driving car that randomly makes dangerous maneuvers isn't desirable. Similarly, a LLM that occasionally outputs plausible sounding bullshit quickly turns from a useful tool to something actively harmful.

Heliosmaster · 2 years ago

As far as I understand, LLMs with 95% correct answers are much more useful than a car that doesn't crash 95% of times (if you need to pay attention to correct mistakes, you may well be driving).

A 95% correct LLM might be utter garbage in some areas but nearly flawless (thus reliable) in other, menial and time consuming tasks, such as summarization, rewording, providing new ideas, etc.

elfleco · 2 years ago

Also a 95% correct LLM is arguably the same or better than a human doing a similar task.

methodical · 2 years ago

I think a lot of people on here are for some reason believers in the idea that if a technology has detractors, then it must be another case of the steam engine, human flight, or some other technology that had doubters before completely revolutionizing our world. In reality, there is no such law of the universe that says that some technology will be wildly successful because it is heavily controversial, and, in some cases, it turns out that a lot people were correct in predicting a technology's short/long term uselessness (crypto, web3, AR). Every time some article is posted highlighting AI's shortcomings in relation to its posited ubiquity in professional settings about 10 people wax poetic about how the internet/cars/etc. were doubted heavily, when they clearly are not similar in nearly any regard. I wish we could appreciate new technology without blowing its applications out of proportion and then being disappointed when it falls short of an impossible bar, which is my main gripe with both AI doomers and people who are entirely dismissive of the technology (despite basically nobody saying anything of the sort).

> I think a lot of people on here are for some reason believers in the idea that if a technology has detractors, then it must be another case of the steam engine, human flight, or some other technology that had doubters before completely revolutionizing our world.

I think that's a misinterpretation - I don't think that it's a revolutionary technology because it has detractors or because it's controversial; I think it's revolutionary because of its capabilities. Those comparisons just serve to point out that there are plenty of historical examples of people criticizing things that turned out to be revolutionary, and the same may well turn out to be the case here.

a couple observations in favor of AGI getting here quicker than we anticipate:

ChatGPT isn't an LLM it's a product(blackbox) with one part being an LLM.

Go, Chess, Dota 2 would be better examples of things that AI mastered.

"NVIDIA GPU computing has given the industry a path forward -- and will provide a 1,000X speed-up by 2025"

LLM agents do seem to work better than LLMs on their own.

tim333 · 2 years ago

Maybe but it's seemed obvious for decades that AI will be a revolutionary thing. Not so for web3 etc.