When ChatGPT broke the field of NLP: An oral history

I am in academia and worked in NLP although I would describe myself as NLP adjacent.

I can confirm LLMs have essentially confined a good chunk of historical research into the bin. I suspect there are probably still a few PhD students working on traditional methods knowing full well a layman can do better using the mobile ChatGPT app.

That said traditional NLP has its uses.

Using the VADER model for sentiment analysis while flawed is vastly cheaper than LLMs to get a general idea. Traditional NLP is suitable for many tasks people are now spending a lot of money asking GPT to do just because they know GPT.

I recently did an analysis on a large corpus and VADER was essentially free while the cloud costs to run a Llama based sentiment model was about $1000. I ran both because VADER costs nothing but minimal CPU time.

NLP can be wrong but it can’t be jailbroken and it won’t make stuff up.

Cheer2171 · 8 months ago

That's because VADER is just a dictionary mapping each word to a single sentiment weight and adding it up with some basic logic for negations and such. There's an ocean of smaller NLP ML between that naive approach and LLMs. LLMs are trained to do everything. If all you need is a model trained to do sentiment analysis, using VADER over something like DistilBERT is NLP malpractice in 2025.

teruakohatu · 8 months ago

> using VADER over something like DistilBERT is NLP malpractice in 2025.

Ouch. Was that necessary?

I used $1000 worth of GPU credits and threw in VADER because it’s basically free both in time and credits.

I usually do this on large dataset out of pure interest in how it correlates with expensive methods on English language text.

I am well aware of how VADER works and its limitations, I am also aware of the limitations of all sentiment analysis.

crowcroft · 8 months ago

Price isn't a real issue in almost every imaginable use case either. Even a small open source model would outperform and you're going to get a lot of tokens per dollar with that.

moffkalast · 8 months ago

> dictionary mapping each word to a single sentiment weight

That seems to me like it would flat out fail on sarcasm. How is that still considered a usable method today?

mgraczyk · 8 months ago

It currently costs around $2200 to run Gemini flash lite on all of Wikipedia English. It would probably cost around 10x that much to run sentiment analysis on every Yelp review ever posted. It's true that LLMs still cost a lot for some use cases, but for essentially any business case it's not worth using traditional NLP any more

Deleted Comment

more-nitor · 8 months ago

idk why are you changing targets for comparison?

it's like:

"does apple cure cancer in monkeys?" vs "does blueberry cure diabetes in pigs?"

ahz001 · 8 months ago

I work in a non-profit and continue to use traditional NLP for the same reasons. I have lots of text, and LLMs are expensive. Also, our organization has restrictive policies on AIs, especially LLMs.

I try to get the best of both words by using LLMs to generate synthetic data to train NLP classifiers. First, I use LLMs to generate variations of human-labeled data. Second, I use LLMs to label unlabeled data.

In a future challenge, I want to train LLMs to generate data to train NER for segmenting documents and extracting information.

wickedsight · 8 months ago

So... What were the results? How did the Llama based model compare to VADER?

fouc · 8 months ago

*consigned a good chunk of historical research into the bin

jihadjihad · 8 months ago

To be fair it is still stuck there, so

scarface_74 · 8 months ago

Sentiment analysis using traditional means is really lacking. I can’t talk about the current project I’m working on. But I needed a more nuanced sentiment. Think of something like people commenting on the Uber Eats app versus people commenting on a certain restaurant.

intended · 8 months ago

VADER made me sad when it couldn’t do code mixed analyses in 2020. I’m thinking of dusting off that project, but then I dread the thought of using LLMs to do the same sentiment analysis.

aitchnyu · 8 months ago

Does it work for sarcasm and typos which real world people tend to do?

qnleigh · 8 months ago

How well did VADER correlate with Llama? Did you try any other methods intermediate between them?

mootothemax · 8 months ago

I’d love to hear your thoughts on BERTs - I’ve dabbled a fair bit, fairly amateurishly, and have been astonished by their performance.

I’ve also found them surprisingly difficult and non-intuitive to train, eg deliberately including bad data and potentially a few false positives has resulted in notable success rate improvements.

Do you consider BERTs to be the upper end of traditional - or, dunno, transformer architecture in general to be a duff? Am sure you have fascinating insight on this!

teruakohatu · 8 months ago

That is a really good question, I am not sure where to draw the line.

I think it would be safe to say BERT is/was firmly in the non-traditional side of NLP.

A variety of task specific RNN models preceded BERT, and RNN as a concept has been around for quite a long time, with the LSTM being more modern.

Maybe word2vec ushered in the end of traditional NLP and was simultaneously also the beginning of non-traditional NLP? Much like Newton has been said to be both the first scientist and also the last magician.

I find discussing these kind of questions with NLP academics to be awkward.

yieldcrv · 8 months ago

a) you can save costs on llama by running it locally

b) compute costs are plummeting. inference in the cloud costs has dropped over 80% in 1 year

c) similar to a), spending a little more and having a beefy enough machine is functionally cheaper after just a few projects

d) everyone trying to do sentiment analysis is trying to make waaaay more money anyway

so I dont see NLP’s even lower costs of being that relevant. its like pointing out that I could use assembly instead of 10 layers of abstraction. It doesnt really matter

As an NLP professor, yes, I think we're mostly screwed - saying LLMs are a dead end or not a big deal, like some of the interviewed say, is just wishful thinking. A lot of NLP tasks that were subject of active research for decades have just been wiped out.

Ironically, the tasks that still aren't solved well by LLMs and can still have a few years of life in them are the most low-level ones, that had become unfashionable in the last 15 years or so - part-of-speech tagging, syntactic parsing, NER. Of course, they have lost a lot of importance as well: you no longer need them for user-oriented downstream tasks. But they may still get some use: for example NER for its own sake is used in biomedical domains, and parsing can be useful for scientific studies of language (say, language universals related to syntax, language evolution, etc.). Which is more than you can say about translation or summarization, which have been pretty much obsoleted by LLMs. Still, even with these tasks, NLP will go from broad applicability to niche.

I'm not too worried for my livelihood at the moment (partly because I have tenure, and partly because the academic system works in such a way that zombie fields keep walking for quite long - there are still journals and conferences on the semantic web, which has been a zombie for who knows how long). But it's a pity: I got into this because it was fun and made an impact and now it seems most of my work is going to be irrelevant, like those semantic web researchers I used to look down at. I guess my consolation is that computers that really understand human language was the dream that got me into this in the first place, and it has been realized early. I can play with it and enjoy it while I sink into irrelevant research, I guess :/ Or try to pivot into discrete math or something.

larodi · 8 months ago

Let's admit it - overnight it became much much harder to be a convincing professor, given each student can use GPTs of all sorts to contradict or otherwise intimidate you. Only a seasoned professor knows the feeling of being bullied by a smart-ass student. Which brings down the total value, the incentive, to teach, and also given the avoidance GPTs silently imprint in students. I mean - why write a program/paper/research when the GPT can do it for you, and save you the suffering.

The whole area of algorithms suddenly became more challenging, as now you also have to understand folding multi-dimensional spaces, and retell this all as a nice story for students to remember.

We are very likely heading into some dark uncharted era for academia, which will very likely lead to academia shrinking massively. And given the talk of 'all science now happens in big corpos'... I can expect the universities to go back to the original state they started from - monasteries.

Saying this all having spent 20+ years as part-time contributor to one such monastery.

Al-Khwarizmi · 8 months ago

Yes, my comment focused on NLP research but the importance of university teaching has also taken a hit - not that I fear bullying, but now students can have a dedicated custom teacher with infinite time and patience and that can answer questions at 3 AM, and obviously that reduces the relevance of the professor. While the human interaction in in-person teaching still provides some exclusive value, demand logically should go down. Although don't underestimate the power of inertia - one could also think that small noname universities would go out of business when the likes of MIT began offering their courses to everyone online, and it didn't happen. I do think LLMs bring higher risk than that, and a shrinking will indeed happen, but maybe not so dramatic. Let's see.

Regarding science, if we leave it exclusively to corporations we won't get very far, because most corporations aren't willing to do the basic/foundational science work. The Transformers and most of the relevant followup work that led to LLMs were developed in industry, but they wouldn't have been possible without the academics that kept working on neural networks while that field was actively scorned during the 90s-2000s AI winter. So I think research universities still should have a role to play. Of course, convincing the funders that this is indeed the case might be a different story.

simianwords · 8 months ago

Not totally related but I have wondered how someone who thinks they are an expert in a field may deal with contradictions presented by GPT.

For example, you may consider yourself an expert on some niche philosophy like say Orientalism. A student can now contradict any theories you can come up with using GPT and the scary thing is that they will be sensible contradictions.

I feel like the bar to consider yourself an expert is much higher - you not only have to beat your students but also know enough to beat GPT.

karel-3d · 8 months ago

I used to study NLP but before transformers, and now I don't work with NLP/ML/LLMs at all. Can you explain to me this view?

LLMs are NLP? We have a model that works, and that works great for many people and many different usages, shouldn't NLP be at its top now? LLMs are not conceptionally that different to other models?

I worked with GIZA/MOSES statistical MT back in the day during my studies, it's at the end of the day just matrices that you don't really understand, same as with LLMs?

Al-Khwarizmi · 8 months ago

NLP is indeed at its top, NLP professors aren't :)

Imagine if you had stayed in academia and kept working in MT for the last two decades. First of all, now you would see how LLMs render all your work pretty much obsolete. That's already hard for many people. Not so much for me, as I consider myself to be rather adaptable, and maybe not for you either - you can embrace the new thing and start working on it yourself, right?

But the problem is that you can't. The main issue is not lack of explainability, but the sheer resources needed. In academia we can't run, let alone train, anything within even one or two orders of magnitude of ChatGPT. We can work with toy models, knowing that we won't get even remotely near the state of the art (many are now doing this, with the excuse of "green AI", sustainability and such, but that won't even hold much longer). Or we can work with the likes of ChatGPT as plain users, but then we are studying the responses of an "oracle" that we don't even have control of, and it starts looking more like religion than science.

Ten years ago an academic could beat the state of the art in MT and many other NLP tasks, now for many tasks that's just impossible (unless we count coming up with some clever prompt engineering, but again, religion). For those of us who were in this field because we liked it, not only to make a living, research has become quite unfulfilling.

justanotherjoe · 8 months ago

Its not the same at all. Llm is big yes and thats part of it. But llm is small compared to equivalent performance machine with something like ngram statistical models. You'd need the whole universe. Or something prohibitive like that. And itd still be worse. People don't like it but LLMs 'understand' texts in a very real meaning of the word. Because that's the most compressive way to do the task it's trained to. Is it the same as human understanding? Most likely not, but that complain is cheating.

Dead Comment

horsh1 · 8 months ago

Unless we intend to surrender everything about human symbolic manipulations (all math, all proving, all computations, all programming) to llm in the nearest future, we still need some formal representations for engineering.

The major part of tradidional NLP was about formal representations. We are still to see the efficient mining techniques to extract the formal representations and analyses back from LLM.

How would we solve the traditional NLP problems, such as, for example, formalization of law corpus of a given country with LLM?

As an approximation we can look at non-natural language processing, e.g. compiler technologies. How do we write an optimizing compiler on LLM technologies? How do we ensure stability, correctness and price?

In a sence, the traditional NLP field has just doubled, not died. In addition to humans as language capable entities, who can not really explain how they use the language, we now also have LLM as another kind of language capable entities. Who in fact also can not explain anything. The only benefit is that it is cheaper to ask LLM the same question a million of times that a human.

whoaann_92 · 8 months ago

As someone deeply involved in NLP, I’ve observed the field’s evolution: from decades of word counting and statistical methods to a decade of deep learning enabling “word arithmetic.” Now, with Generative AI, we’ve reached a new milestone, a universal NLP engine.

IMHO, the path to scalability often involves using GPT models for prototyping and cold starts. They are incredible at generating synthetic data, which is invaluable for bootstrapping datasets and also data labelling of a given dataset. Once a sufficient dataset is available, training a transformer model becomes feasible for high-intensity data applications where the cost of using GPT would be prohibitive.

GPT’s capabilities in data extraction and labeling are to me the killer applications, making it accessible for downstream tasks.

This shift signifies that NLP is transitioning from a data science problem to an engineering one, focusing on building robust, scalable systems.

GardenLetter27 · 8 months ago

Reminds me of the whole Chomsky vs. Norvig debate - https://norvig.com/chomsky.html

Thanks for the link, just read it, and the Chomsky transcript. Chomsky wanted deep structure, Norvig bet on stats, but maybe Turing saw it coming, kids talk before they know grammar and so did the machines. It turns out we didn’t need to understand language to automate it.

AndrewKemendo · 8 months ago

If Chomsky was writing papers in 2020 his paper would’ve been “language is all you need.”

That is clearly not true and as the article points out wide scale very large forecasting models beat that hypothesis that you need an actual foundational structure for language in order to demonstrate intelligence when in fact is exactly the opposite.

I’ve never been convinced by that hypothesis if for no other reason that we can demonstrate in the real world that intelligence is possible without linguistic structure.

As we’re finding: solving the markov process iteratively is the foundation of intelligence

out of that process emerges novel state transition processes - in some cases that’s novel communication methods that have structured mapping to state encoding inside the actor

communications happen across species to various levels of fidelity but it is not the underlying mechanism of intelligence, it is an emerging behavior that allows for shared mental mapping and storage

aidenn0 · 8 months ago

Some people will never be convinced that a machine demonstrates intelligence. This is because for a lot of people, intelligence exists a subjective experience that they have and the belief that others have it too is only inasmuch as others appear to be like the self.

meroes · 8 months ago

It doesn’t mean they tie intelligence to subjective experience. Take digestion. Can a computer simulate digestion, yes. But no computer can “digest” if it’s just silicon in the corner of an office. There are two hurdles. The leap from simulating intelligence to intelligence, and the leap from intelligence to subjective experience. If the computer gets attached to a mechanism that physically breaks down organic material, that’s the first leap. If the computer gains a first person experience of that process, that’s the second.

You can’t just short-circuit from simulates to does to has subjective experience.

And the claim other humans don’t have subjective experience is such non-starter.

simonw · 8 months ago

It's called the AI effect: https://en.wikipedia.org/wiki/AI_effect

> The author Pamela McCorduck writes: "It's part of the history of the field of artificial intelligence that every time somebody figured out how to make a computer do something—play good checkers, solve simple but relatively informal problems—there was a chorus of critics to say, 'that's not thinking'."

dekhn · 8 months ago

This is why I want the field to go straight to building indistinguishable agents- specifically, you should be able to video chat with an avatar that is impossible to tell from a human.

Then we can ask "if this is indistinguishable from a human, how can you be sure that anybody is intelligent?"

Personally I suspect we can make zombies that appear indistinguishable from humans (limited to video chat; making a robot that appears human to a doctor would be hard) but that don't have self-consciousness or any subjective experience.

Humans are basically incapable of recognizing that there’s something that’s more powerful than them

They’re never going to actively collectively admit that that’s the case, because humans collectively are so so systematically arrogant and self possessed that they’re not even open to the possibility of being lower on the intelligence totem pole

The only possible way forward for AI is to create the thing that everybody is so scared of so they can actually realize their place in the universe

AIPedant · 8 months ago

I will not be convinced a machine demonstrates intelligence until someone demonstrates a robot that can navigate 3D space as intelligently as, say, a cockroach. AFAICT we are still many years away from this, probably decades. A bunch of human language knowledge and brittle heuristics doesn't convince me at all.

This ad hominem is really irritating. People have complained since Alan Turing that AI research ignores simpler intelligence, instead trying to bedazzle people with fancy tricks that convey the illusion of human intelligence. Still true today: lots of talk about o3's (exaggerated) ability to do fancy math, little talk about its appallingly bad general quantitative reasoning. The idea of "jagged AI" is unscientific horseshit designed to sweep this stuff under the rug.

felipeerias · 8 months ago

In the natural world, intelligence requires embodiment. And, depending on your point of view, consciousness. Modern AI exhibits neither of those characteristics.

shmel · 8 months ago

How do they convince themselves that other people have intelligence too?

6stringmerc · 8 months ago

It is until proven otherwise because modern science still doesn’t have a consensus or standards or biological tests which can account for it. As in, highly “intelligent” people often lack “common sense” or fall prey to con artists. It’s pompous as shit to assert a black box mimicry constitutes intelligence. Wake me up when it can learn to play a guitar and write something as good as Bob Dylan and Tom Petty. Hint: we’ll both be dead before that happens.

emp17344 · 8 months ago

> As we’re finding: solving the markov process iteratively is the foundation of intelligence

No, the markov process allows LLMs to make connections between existing representations of human intelligence. LLMs are nothing without a data set developed by an existing intelligence.

vjerancrnjak · 8 months ago

CNNs were outperforming traditional methods on some tasks before 2017.

Problem was that all of the low level tasks , like part of speech tagging, parsing, named entity recognition , etc. never resulted in a good summarizing system or translating system.

Probabilistic graphical models worked a bit but not much.

Transformers were a leap, where none of the low level tasks had to be done for high level ones.

Pretty sure that equivalent leap happened in computer vision a bit before.

People were fiddling with low level pattern matching and filters and then it was all obliterated with an end to end cnn .

ActorNightly · 8 months ago

There was no leap in research. Everything had to do with availability of compute.

Neural nets are quite old, and everyone knew that they were universal function approximators. The reason why models never took off was because it was very expensive to train a model even of a limited size. There was no real available hardware to do this on short of supercomputer clusters, which were just all cpus, and thus wildly inefficient. But any researcher back then would have told you that you can figure anything out with neural nets.

Sometime in 2006, Nvidia realized that a lot of the graphics compute was just generic parallel compute and released Cuda. People started using graphics cards for compute. Then someone figured out you can actually train deep neural nets with decent speed.

Transformers wasn't even that big of a leap. The paper makes it sound like its some sort of novel architecture - in essence, instead of inputweights to next layer, you do inputmatrix1, inputmatrix2, inputmatrix3, and multiply them together. And as you guessed this, to train it you need more hardware because now you have to train 3 matrices rather than just one.

If we ever get like ASIC for ml, basically at a certain point, we will be able to iterate on architectures itself. The optimal LLM may be a combination of CNN,RNN, and Transformer blocks, all interwtined.

tomrod · 8 months ago

> ever get like ASIC for ml

Is this what you're mentioning?

[0] https://linearmicrosystems.com/using-asic-chips-for-artifici...

mistrial9 · 8 months ago

> never resulted in a good ... translating system

that seems too broad

> all obliterated with an end to end cnn

you mixed your nouns.. what you were saying about transformers was about transformers.. that specifically replaced cnn. So,no

kadushka · 8 months ago

In NLP, transformers replaced RNNs. In computer vision, CNNs replaced previous methods (e.g. feature descriptors), and recently got replaced by visual transformers, though modern CNNs are still pretty good.

languagehacker · 8 months ago

Great seeing Ray Mooney (who I took a graduate class with) and Emily Bender (a colleague of many at the UT Linguistics Dept., and a regular visitor) sharing their honest reservations with AI and LLMs.

I try to stay as far away from this stuff as possible because when the bottom falls out, it's going to have devastating effects for everyone involved. As a former computational linguist and someone who built similar tools at reasonable scale for largeish social media organizations in the teens, I learned the hard way not to trust the efficacy of these models or their ability to get the sort of reliability that a naive user would expect from them in practical application.

ahnick · 8 months ago

How exactly is the bottom going to fall out? And are you really trying to present that you have practical experience building comparable tools to an LLM prior to the Transformer paper being written?

Now, there does appear to be some shenanigans going on with circular financing involving MSFT, NVIDIA, and SMCI (https://x.com/DarioCpx/status/1917757093811216627), but the usefulness of all the modern LLMs is undeniable. Given the state of the global economy and the above financial engineering issues I would not be surprised that at some point there isn't a contraction and the AI hype settles down a bit. With that said, LLMs could be made illegal and people would still continue running open source models indefinitely and organizations will build proprietary models in secret, b/c LLMs are that good.

Since we are throwing out predictions, I'll throw one out. Demand for LLMs to be more accurate will bring methods like formal verification to the forefront and I predict eventually model/agents will start to be able to formalize solved problems into proofs using formal verification techniques to guarantee correctness. At that point you will be able to trust the outputs for things the model "knows" (i.e. has proved) and use the probably correct answers the model spits out as we currently do today.

Probably something like the following flow:

1) Users enter prompts

2) Model answers questions and feeds those conversations to another model/program

3) Offline this other model uses formal verification techniques to try and reduce the answers to a formal proof.

4) The formal proofs are fed back into the first model's memory and then it uses those answers going forward.

5) Future questions that can be mapped to these formalized proofs can now be answered with almost no cost and are guaranteed to be correct.

throwaway314155 · 8 months ago

> And are you really trying to present that you have practical experience building comparable tools to an LLM prior to the Transformer paper being written?

I believe (could be wrong) they were talking about their prior GOFAI/NLP experience when referencing scaling systems.

In any case, is it really necessary to be so harsh about over-confidence and then go on to predict the future of solving hallucinations with your formal verification ideas?

Talk is cheap. Show me the code.

drpbl · 8 months ago

I have argued the same theme elsewhere. Formal reasoning over LLM output is the next step for AI. Where do we go for funding?

Legend2440 · 8 months ago

They are far far more capable than anything your fellow computational linguists have come up with.

As the saying goes, 'every time I fire a linguist, the performance of the speech recognizer goes up'

dunefox · 8 months ago

1. Sadly, they are for most tasks, yes.

2. Linguist, not computational linguist. ;)

suddenlybananas · 8 months ago

Don't try and say anything pro-linguistics here, people are weirdly hostile if you think it's anything but probabilities.

parpfish · 8 months ago

Over my years in academia, I noticed that the linguistics departments were always the most fiercely ideological. Almost every comment of a talk would be get contested by somebody from the audience.

It was annoying, but as a psych guy I was also jealous of them for having such clearly articulated theoretical frameworks. It really helped them develop cohesive lines of research to delineate the workings of each theory

jdgoesmarching · 8 months ago

Maybe they could’ve tried to say something pro-linguistics, but the comment was entirely anti-LLM.

motorest · 8 months ago

> Don't try and say anything pro-linguistics here, (...)

Shit-talking LLMs without providing any basis or substance is not what I would call "pro-linguistics". It just sounds like petty spiteful behavior, lashing out out of frustration for rendering old models obsolete.

PaulDavisThe1st · 8 months ago

The interesting question is whether just a gigantic set of probabilities somehow captures things about language and cognition that we would not expect ...

philomath_mn · 8 months ago

Curious what you are expecting when you say "bottom falls out". Are you expecting significant failures of large-scale systems? Or more a point where people recognize some flaw that you see in LLMs?

JumpCrisscross · 8 months ago

> learned the hard way not to trust the efficacy of these models or their ability to get the sort of reliability that a naive user would expect from them in practical application

But…they work. Linguistics as a science is still solid. But as a practical exercise, it seems to be moot other than for finding niches where LLMs are too pricey.

cainxinth · 8 months ago

Lots of great quotes in this piece but this one stuck out for me:

> TAL LINZEN: It’s sometimes confusing when we pretend that there’s a scientific conversation happening, but some of the people in the conversation have a stake in a company that’s potentially worth $50 billion.