> ChatGPT is so good at this form of interpolation that people find it entertaining: they’ve discovered a “blur” tool for paragraphs instead of photos, and are having a blast playing with it.
“‘blur’ tool for paragraphs” is such a good way of describing the most prominent and remarkable skill of ChatGPT.
It is fun, but so obviously trades off against what makes paragraphs great. It is apt that this essay against ChatGPT blurry language appears on The New Yorker, a publication so known for its literary particularism. ChatGPT smears are amusing, but they are probably also yet another nail in the coffin of the literary society. Nowadays we are not careful readers; we skim, skip, and seek tools to sum up whole books. Human knowledge is in the ability to produce the particular and the ability to recognize it. For philosophers such as Iris Murdoch, careful attention to the particular, to just the right adjective in describing a friend, is a moral attention.
With Google we took the job of storing quotes (and other decompressed writing) and made it the remit of the machine. But we still asked for it back uncompressed. Here with ChatGPT, we are beginning to accept back a blur, because the Declaration of Sock Independence is immediately, Tiktok-speedily funny, and no one’s reading the original declaration anymore anyway.
> because the Declaration of Sock Independence is immediately, Tiktok-speedily funny, and no one’s reading the original declaration anymore anyway
For those who, like me, had missed out on this nugget of comedic gold, here's what ChatGPT had to say when it was asked to "describe losing your sock in the dryer in the style of the declaration of independence":
When in the course of household events, it becomes necessary for one to dissolve the bonds that have connected a sock to its mate, and to assume among the powers of the laundry room, the separate and equal station to which the laws of physics and of household maintenance entitle it, a decent respect to the opinions of socks requires that it should declare the causes which impel it to go missing.
We hold these truths to be self-evident, that all socks are created equal, and are endowed by their manufacturer with certain unalienable rights...
It's a great metaphor and one we should use more. But there's a place for blurred photos: thumbnails.
On Hacker News we often complain about headlines because that's all we see at first. But I've been using Kagi's summarizer [1] and I think it's a great tool for getting the gist of certain things, like if you want to know what a YouTube video is about without watching it. (Google Translate is useful for similar reasons.)
Perhaps someday, Hacker News will have an AI-generated summary of the article at the top of each comment page?
Similarly, ChatGPT excels at fast answers for questions like "What is a X", where you just want a quick definition. It's probably in Wikipedia somewhere, but you don't have to read as much. And it might be wrong, but probably not as wrong as the definition you'd infer from context if you didn't look it up.
We probably would be better off if these things were called "artificial memory" rather than "artificial intelligence." It's an associative memory that often works like human memory in how frequently it confabulates. When you care, you write things down and look things up.
Thumbnails, image matching, low-bandwidth summaries... There are plenty of uses for smoothed images. Also, there are many interesting transformations you can use on computer vision and image processing that start with a blur.
If I try to map the first three into text, there are automatic TL.DR. like you said, document grouping, and search into entire document stores (as in do documents in this store deal with this idea?). On "artificial document creation", there is that highly valuable service of answering stuff like "hey, that thing with sticks that rotate and pull a vehicle around, what is its name again?"
The amount of human-generated lowest-common-denominator English-language free content was already so high that I'm not sure the New Yorker has anything (more) to worry about. If you've been paying for the New Yorker already in the days of Medium, Buzzfeed, blogs, and what-have-you, does there being even more uncurated stuff change your equation? (It doesn't for me.)
More cynically: it'll be hard to kill the few legacy zombies that have survived so much destruction at the hand of free internet content already.
What he misses in this analogy is that part of what produces the "blur" is the superimposing of many relevan paragraphs found on the web into one. This mechanism can be very useful, because it could average out errors and give one a less one-sided perspective on a particular issue. It doesn't always work like this, but hopefully it will more and more. Also, even more useful is to do a cluster analysis of the existant perspectives and give a representative synthesis of each of these, along with a weight representing their popularity. So there's a lot of room for improvement, but the potential in my opinion is there.
If anything, the average has far more errors in it. It's a trope on Reddit that experts get downvoted while amateurs who reflect the consensus of other amateurs get upvoted and repeated. Amateurs tend to outnumber experts in real life anyways, having their opinions become more authoritative (because some "AI" repeats it) is probably not a great direction to head in.
The overlap is that the verb "bokeru" and its root "boke" can be used to describe someone losing their mental faculties e.g. through age or disease such as Alzheimer's, and by extension it can be used as an insult to mean "stupid" as well. But etymologically there is no connection.
The blur is addictive because it feeds a feedback loop: rather than tiring out your brain on understanding one thing in detail, you can watch two summaries and have a vague sense of understanding. It allows to jump to next novelty, always feeding the system 1 of the brain but system 2 is rarely brought in picture.
I wonder if this will lead to a stratification of work in the society: lot of jobs can operate on the blur. "Just give me enough to get my job done". But fewer (critical and hopefully highly paid) people will be engaged in a profession where understanding the details is the job and there's no way around it.
In Asimov's foundation novel this is a recurring theme: they can't find people who can work on designing or maintaining nuclear power. This eventually leads to stagnation. AI tools can prevent this stagnation only if mankind uses the freeing of mental burden with the help of AI to work on higher set of problems. But if the tools are used merely as butlers then the pessimistic outcome is more likely.
The general tendency to lack of details can also give edge in some cases. Imagine if everyone is using similar AI tools to understand company annual reports which gives a nice, tiktok style summary. Then an investor doing the dirty work to go through the details may find things that are missed by the 'algo'.
> ChatGPT smears are amusing, but they are probably also yet another nail in the coffin of the literary society.
As the author (Ted Chiang!!) notes, ChatGPT3 will be yet another nail in the coffin of ChatGPT5. At some point, OpenAI will find it impossible to find untainted training data. The whole thing will become a circular human centipede and die of malnutrition. Apologies for the mental image.
Going back to the "sock of independence" example (/u/airstrike's comment for more context), ChatGPT's answer's accuracy is poor - but it's a funny question, and it gave a funny answer. So was it really a poor answer? My interpretation of their use of 'blur' as an analogy is that: it did not simply answer ACCURATELY in the STYLE of the DoC, it merged or "blurred/smudged together" the CONTENT and STYLE of the story and the DoC. It's not good at understanding the question or the context... and therefore, a lot of its answers feel "blurry".
"Wonder why"? Because, human thoughts, opinions and language are inherently blurry, right? That's my view. Plus, humans have a whole nervous system which has a lot of self-correcting systems (e.g. hormones) that ML AI doesn't yet account for if its goal is human-level intelligence.
Huh? It isn't. It's a good description because it's figuratively accurate to what reading LLM text feels like, not because it's technically accurate to what it's doing.
This is very well written, and probably one of my favorite takes on the whole ChatGPT thing. This sentence in particular:
> Indeed, a useful criterion for gauging a large-language model’s quality might be the willingness of a company to use the text that it generates as training material for a new model.
It seems obvious that future GPTs should not be trained on the current GPT's output, just as future DALL-Es should not be trained on current DALL-E outputs, because the recursive feedback loop would just yield nonsense. But, a recursive feedback loop is exactly what superhuman models like AlphaZero use. Further, AlphaZero is even trained on its own output even during the phase where it performs worse than humans.
There are, obviously, a whole bunch of reasons for this. The "rules" for whether text is "right" or not are way fuzzier than the "rules" for whether a move in Go is right or not. But, it's not implausible that some future model will simply have a superhuman learning rate and a superhuman ability to distinguish "right" from "wrong" - this paragraph will look downright prophetic then.
I think what makes AlphaZero's recursion work is the objective evaluation provided by the game rules. Language models have no access to any such thing. I wouldn't even count user-based metrics of "was this result satisfactory": that still doesn't measure truth.
I generally respect the heck out of Chiang but I think it's silly to expect anyone to be happy feeding a language model's output back into it, unless that output has somehow been modified by the real world.
I don't expect it'll work for everything: as you say, for many topics truth must be measured out in the real world.
But, for a subset of topics, say, math and logic, a minimal set of core principles (axioms) is theoretically sufficient to derive the rest. For such topics, it might actually make sense to feed the output of a (very, very advanced) LLM back into itself. No reference to the real world is needed - only the axioms, and what the model knows (and can prove?) about the mathematical world as derived from those axioms.
Next, what's to say that a model can't "build theory", as hypothesized in this article (via the example of arithmetic)? If the model is fed a large amount of (noisy) experimental data, can it satisfactorily derive a theory that explains all of it, thereby compressing the data down to the theoretical predictions + lossy noise? Could a hypothetical super-model be capable of iteratively deriving more and more accurate models of the world via recursive training, assuming it is given access to the raw experimental data?
> Language models have no access to any such thing.
And this is exactly why MS is in such a hurry to integrate it into Bing. The feedback loop can be closed by analyzing user interaction. See Nadella’s recent interview about this.
Or if it was accompanied by human-written annotations about the quality of it, which could be used to improve its weightings.
Of course it might even be that the only instance of text describing some novel phenomenon available was itself an LLM paraphrase (i.e. the prompt contained novel information but has been lost).
There’s a version of this where the output is mediated by humans. Currently chatgpt has a thumbs up/down UI next to each response. This feedback could serve as a signal for which generated output may be useful for future ingestion. Perhaps OpenAI is already doing this with our thumb signals.
> Indeed, a useful criterion for gauging a large-language model’s quality might be the willingness of a company to use the text that it generates as training material for a new model.
I don't find this a useful criterion. It is certainly something to worry about in the future as the snake begins to eat its own tail, but before we reach that point, we can certainly come up with actual useful criteria. First, what makes up "useful criteria"? Certainly it can't be "the willingness of a company to use the text that it generates as training material for a new model", because that is a hypothetical situation contingent on the future. So we should probably start with something like, well, is ChatGPT useful for anything in the present? And it turns out it is!
It's both a useful translator and a useful synthesizer.
When given an analytic prompt like, "turn this provided box score into an entertaining outline", it can reliably act as translator, because the facts about the game were in the prompt.
And when given a synthetic prompt like, "give me some quotes from the broadcasters", it can reliable act as a synthesizer, because in fact the transcript of the broadcasters were not in the prompt.
> This is very well written, and probably one of my favorite takes on the whole ChatGPT thing.
This is not a surprise, as the author is Ted Chiang, who is the award winning novelist and the author of "The Lifecycle of Software Objects", "Tower of Babylon" and other science fiction works. I had a pleasure of once having coffee with him while talking about his thoughts on some of the topics in "The Lifecycle of Software Objects", which is a very enjoyable book that may be of interest to some HN readers.
Chiang's short stories are beautiful; he reminds me of Stanislaw Lem, brilliant, creative, and ahead of his time. I was surprised they made Arrival into a movie (and that it was as good as it was).
>But, it's not implausible that some future model will simply have a superhuman learning rate and a superhuman ability to distinguish "right" from "wrong" - this paragraph will look downright prophetic then.
>Large Language Models (LLMs) have achieved excellent performances in various tasks. However, fine-tuning an LLM requires extensive supervision. Human, on the other hand, may improve their reasoning abilities by self-thinking without external inputs. In this work, we demonstrate that an LLM is also capable of self-improving with only unlabeled datasets. We use a pre-trained LLM to generate "high-confidence" rationale-augmented answers for unlabeled questions using Chain-of-Thought prompting and self-consistency, and fine-tune the LLM using those self-generated solutions as target outputs. We show that our approach improves the general reasoning ability of a 540B-parameter LLM (74.4%->82.1% on GSM8K, 78.2%->83.0% on DROP, 90.0%->94.4% on OpenBookQA, and 63.4%->67.9% on ANLI-A3) and achieves state-of-the-art-level performance, without any ground truth label. We conduct ablation studies and show that fine-tuning on reasoning is critical for self-improvement.
That part made the least sense for me. Since a more advanced version of a LLM would be better at extracting the truth of things from the given data, what could it possibly gain from ingesting the output of a less precise version of itself? It couldn't ever add anything useful, almost by definition.
This is old. This is the reason why Google translate sucks. It can't tell the difference between what it translated from what a competent person translated.
GPTZero will generate theorem proofs with logical language and use the final contradiction or proof to update its weights. The logical language will be a clever subset of normal language to limit GPT's hallucinations.
You can use the generated text for further training if you have a human curator who determines its quality. I've been training my model that helps generating melodies using some of the melodies I have created with it.
It's pretty evident. Its training would no longer be anchored to reality, and given its output is non-deterministic, the process would result in random drift. This can be concluded without having to test it.
Now, if training was modified to have some other goal like consistency or something, and with a requirement to continue to perform well against a fixed corpus of non-AI-generated text, you could imagine models bootstrapping themselves up to perform better at that metric, alpha-go style.
But merely training on current output, and repeating that process, given how the models work today, would most certainly result in random drift and an eventual descent into nonsense.
It might have different effects over time. E.g. in the intermediate term it emphasizes certain topics/regions which leads to embodied mastery but over the long term it ossifies into stubbornness and broken record repetition. Similar to how human minds work
> Imagine what it would look like if ChatGPT were a lossless algorithm. If that were the case, it would always answer questions by providing a verbatim quote from a relevant Web page. We would probably regard the software as only a slight improvement over a conventional search engine, and be less impressed by it
The story is an impressive piece, but I think as with many of us, it's a personal projection of expectations on results. One example from my experience. In the book "Jim Carter - Sky Spy, Memoirs of a U-2 Pilot" there was an interesting story about the moment when U-2 was used for capturing the photo of a big area at the Pacific to save the life of a lost seaman. The story was very interesting and I always wanted to know more, technical details, people involvement etc. Searching with Google ten years ago didn't help, I rephrased the names, changed the date (used even the range operator) to no avail. And recently I asked several LLM-based bots about it. You can guess it. They ignored my constrains at best and hallucinate at worst. One even invented a mixed reality story when Francis Gary Powers actually flew not one but with a co-pilot and the latter ended up in the Pacific and was saved. Very funny, but I wasn't impressed. But if one of them scraped the far corners of web discussion boards and saved a first-person account of someone who took part in it and gave it to me, I would be really impressed.
The compression & blur analogy also applies to human minds as well. If you focus on fidelity, you have to increase storage and specialize in a narrow domain. If you want a bit of everything, then blurring and destructive compression is the only way. E.g. a "book smart" vs "street smart" difference.
"mastery" can be considered a hyper efficient destructive compression (experts are often unable to articulate or teach to beginners) that reduces latency of response to such extreme levels that they seem to be predicting the future or reacting at godlike speeds.
In fact there's a potent new theory(1) that human consciousness (and probably all mammalian "consciousness") is just a memory system involving some form of lossy compression. Your sense of awareness happens ~20-50 ms after the memory is created. A lot of life is buffering and filtering, and reading that lossy record is very much who we are. Einstein's brain must have been amazing at throwing away information about the natural world.
This is a decent summary. I've been thinking about how ChatGPT by it's very nature destroys context and source reputation. When I search for something on the Internet, I get a link to the original content, which I can then evaluate based on my knowledge and the reputation of the original source. Wikipedia is the same, with a big emphasis on citation. ChatGPT and other LLMs destroy that context and knowledge, giving me no tools to evaluate the sources they're using.
If somebody asked me how heap sort works (my favorite sort!) I can sketch it out. If they ask me where I learned it, I really don't remember. Might be the Aho, Hopcroft, and Ullman book. I can't really say though.
Yes, and then I'll evaluate that answer by your reputation, either socially, organizationally, or publicly. I will value that summary differently if you are a random person on the street, a random person who works at a tech company, or a person wearing a name tag that says "Donald Knuth, Stanford University".
ChatGPT has little reputation of its own, and produces such a broad swath of knowledge, it becomes "Jack of all trades, master of none."
What's interesting is that Microsoft's implementation of ChatGPT in Bing seems to include linking to references, which is a good step forward in my opinion.
The references seem wrong though. I'm looking at the response to a demo Bing query, "What cars should I consider buying that are AWD, go 0-60 in less than 6 seconds, seat 6 or more and have decent reviews?"
> The 2022 Kia Telluride is a midsize SUV that can seat up to eight passengers and has an AWD option. It has a 3.8-liter V6 engine that produces 291 hp and 262 lb-ft of torque. It can accelerate from 0 to 60 mph in 7.1 seconds [10] and has a combined fuel economy of 21 mpg. It also has excellent reviews from critics and owners, and won several awards, including the 2020 World Car of the Year [7].
I would love to know their plan for having new facts propagate into these models.
My idle speculation makes me think this is a hard problem. If ChatGPT kills Search it also kills the websites that get surfaced by search that were relying on money from search-directed users. So stores are fine, but "informational" websites are probably in for another cull. Paywall premium publications are probably still fine - the people currently willing to pay for new, somewhat-vetted, human content still will be. But things like the Reddits of the world might be in for a hard time since all those "search hack" type uses of "search google for Reddit's reviews of this product" are short-circuited, if this is actually more effective than search.
Meanwhile, SEO folks will probably try to maximize what they can get out of the declining search market by using these tools to flood the open web with even more non-vetted bullshit than it's already full of.
So as things change, how does 1 tiny-but-super-reliable amateur website (say, an expert on .NET runtime internals's individual blog) make a dent in the "knowledge" of the next iteration of these models? How do they outweigh the even-bigger sea of crap that the rest of the web has now become when future training is done?
The other interesting thing is that if people stop using websites, then it reduce revenue for those websites > then development of new pages and sources stops/slows, how does ChatGPT improve? If the information for it to learn isn't there.
We need the source information to be continually generated in order for ChatGPT to improve.
The sources are there in the training dataset, they are just not linked to the response. I don't think this is an inherent property of LLMs though, and I imagine future iterations will have some sort of attention mechanism that highlights the contributing source materials.
I don't like this analogy; I think why I don't like it is in the intent. With JPEG in the intent is produce an image indistinguishable from the original. Xerox didn't intend to create photocopier that produces incorrect copies. The artifacts are failures of the JPEG algorithm to do what it's supposed to within its constraints.
GPT is not trying to create a reproduction of it's source material and simply failing at the task. Compression and GPT are both mathematical processes they aren't the same process; JPEG is taking the original image and throwing away some of the detail. GPT is processing content to apply weights to a model; if that is reversible to the original content it is considered a failure.
Blurriness gets weird when you're talking about truth.
Depending on the application we can accept a few pixels here or there being slightly different colors.
I queried GPT to try and find a book I could only remember a few details of. The blurriness of GPT's interpretation of facts was to invent a book that didn't exist, complete with a fake ISBN number. I asked GPT all kinds of ways if the book really existed, and it repeatedly insisted that it did.
I think your argument here would be to say that being reversible to a real book isn't the intent, but that's not how it is being marketed nor how GPT would describe itself.
I think that strengthens my point. We consider a blurry image of something to still be a true representation of that thing. We should never consider a GPT representation of a thing to be true.
> Compression and GPT are both mathematical processes they aren't the same process;
They're not, but they are very related! GPT has a 1536 dimensional vector space that is conceptually related to a principal component analysis and dimensional reduction in certain compression algorithms.
This does mean that neural networks can overfit and be fully reversible but that is hardly their only useful feature!
They are also very good at translating and synthesizing, depending on the nature of the prompt.
If given an analytic prompt like, "convert this baseball box score into an entertaining paragraph", ChatGPT does a reliable job of acting as a translator because all of the facts about the game are contained in the box score!
But when given a synthetic prompt like, "give me some quotes from the broadcasters", ChatGPT does a reliable job of acting as a synthesizer, because none of those facts about the spoken transcript of the broadcasters is in the prompt. And it is a good synthesizer as well because those quotes sound real!
> With JPEG in the intent is produce an image indistinguishable from the original.
Not necessarily, and even if so, if you continuously opened and saved a JPEG image it would turn to a potato quality image eventually, Xerox machines do the same thing. Happens all the time with memes, and old homework assignments. What I fear is this happening to GPT, especially when people just start outright using its content and putting it on sites. Then it becomes part of what GPT is trained on later on, but what it had previously learned was wrong, so it just progressively gets more and more blurred, with people using the new models to produce content, with a feedback loop that just starts to blur truth and facts entirely.
Even if you tie it to search results like Microsoft is doing, eventually the GPT generated content is going to rise to the top of organic results because of SEO mills using GPT for content to goose traffic...then all the top results agree with the already wrong AI generated answer; or state actors begin gaming the system and feeding the model outright lies.
This happens in people too, sure, but in small subsets not in monolithic fashion with hundreds of millions of people relying on the information being right. I have no idea how they can solve this eventual problem, unless they are just supervising what it's learning all the time; but then at the point it can become incredibly biased and limited.
I don't think JPEG wants to produce an image indistinguishable from the original. It wants to reduce space usage without distorting "too" much. Failing to reduce space usage would be considered a "failure" of JPEG, just as much as distorting too much.
JPEG relies of the limitations of human vision to make an image largely indistinguishable from the original. It specifically throws information away that we are less unlikely to notice. So yes, a good JPEG should be indistinguishable (to humans) from the original. Obviously the more you turn up the compression the harder that is.
If I ask ChatGPT to explain something to me like I'm 5, it's going to lose some of the quality in its response, in compared to it being written in 1000 words.
But neither response should be a copy of existing text. The intent of JPEG is to produce a compressed copy of an original. The intent of GPT is not to be a compressed copy of the Internet. It's supposed to producing unique results from what it "knows".
This is an important distinction, especially when there are issues of copyright involved.
“‘blur’ tool for paragraphs” is such a good way of describing the most prominent and remarkable skill of ChatGPT.
It is fun, but so obviously trades off against what makes paragraphs great. It is apt that this essay against ChatGPT blurry language appears on The New Yorker, a publication so known for its literary particularism. ChatGPT smears are amusing, but they are probably also yet another nail in the coffin of the literary society. Nowadays we are not careful readers; we skim, skip, and seek tools to sum up whole books. Human knowledge is in the ability to produce the particular and the ability to recognize it. For philosophers such as Iris Murdoch, careful attention to the particular, to just the right adjective in describing a friend, is a moral attention.
With Google we took the job of storing quotes (and other decompressed writing) and made it the remit of the machine. But we still asked for it back uncompressed. Here with ChatGPT, we are beginning to accept back a blur, because the Declaration of Sock Independence is immediately, Tiktok-speedily funny, and no one’s reading the original declaration anymore anyway.
For those who, like me, had missed out on this nugget of comedic gold, here's what ChatGPT had to say when it was asked to "describe losing your sock in the dryer in the style of the declaration of independence":
When in the course of household events, it becomes necessary for one to dissolve the bonds that have connected a sock to its mate, and to assume among the powers of the laundry room, the separate and equal station to which the laws of physics and of household maintenance entitle it, a decent respect to the opinions of socks requires that it should declare the causes which impel it to go missing.
We hold these truths to be self-evident, that all socks are created equal, and are endowed by their manufacturer with certain unalienable rights...
On Hacker News we often complain about headlines because that's all we see at first. But I've been using Kagi's summarizer [1] and I think it's a great tool for getting the gist of certain things, like if you want to know what a YouTube video is about without watching it. (Google Translate is useful for similar reasons.)
Perhaps someday, Hacker News will have an AI-generated summary of the article at the top of each comment page?
Similarly, ChatGPT excels at fast answers for questions like "What is a X", where you just want a quick definition. It's probably in Wikipedia somewhere, but you don't have to read as much. And it might be wrong, but probably not as wrong as the definition you'd infer from context if you didn't look it up.
We probably would be better off if these things were called "artificial memory" rather than "artificial intelligence." It's an associative memory that often works like human memory in how frequently it confabulates. When you care, you write things down and look things up.
[1] https://labs.kagi.com/ai/sum
If I try to map the first three into text, there are automatic TL.DR. like you said, document grouping, and search into entire document stores (as in do documents in this store deal with this idea?). On "artificial document creation", there is that highly valuable service of answering stuff like "hey, that thing with sticks that rotate and pull a vehicle around, what is its name again?"
More cynically: it'll be hard to kill the few legacy zombies that have survived so much destruction at the hand of free internet content already.
Supposedly, if I'm remembering last discussion with a japaneese speaker correctly, the same stem is used for "blur", or "blurry" (bokeh, bokeshi).
Which is kind of interesting parallel here
The overlap is that the verb "bokeru" and its root "boke" can be used to describe someone losing their mental faculties e.g. through age or disease such as Alzheimer's, and by extension it can be used as an insult to mean "stupid" as well. But etymologically there is no connection.
> Probably originally a transcription of Sanskrit मोह (moha, “folly”), used as a slang term among monks.
The syllables are different; baka is ば か, bokeh is ぼけ [2]. Could those really be from the same root?
[1] https://en.wiktionary.org/wiki/%E9%A6%AC%E9%B9%BF#Japanese
[2] https://en.wiktionary.org/wiki/%E6%9A%88%E3%81%91#Japanese
I wonder if this will lead to a stratification of work in the society: lot of jobs can operate on the blur. "Just give me enough to get my job done". But fewer (critical and hopefully highly paid) people will be engaged in a profession where understanding the details is the job and there's no way around it.
In Asimov's foundation novel this is a recurring theme: they can't find people who can work on designing or maintaining nuclear power. This eventually leads to stagnation. AI tools can prevent this stagnation only if mankind uses the freeing of mental burden with the help of AI to work on higher set of problems. But if the tools are used merely as butlers then the pessimistic outcome is more likely.
The general tendency to lack of details can also give edge in some cases. Imagine if everyone is using similar AI tools to understand company annual reports which gives a nice, tiktok style summary. Then an investor doing the dirty work to go through the details may find things that are missed by the 'algo'.
As the author (Ted Chiang!!) notes, ChatGPT3 will be yet another nail in the coffin of ChatGPT5. At some point, OpenAI will find it impossible to find untainted training data. The whole thing will become a circular human centipede and die of malnutrition. Apologies for the mental image.
In what way? How, technically, is it anything like that?
These comments sound like full-court publicity press for this article. I wonder why.
"Wonder why"? Because, human thoughts, opinions and language are inherently blurry, right? That's my view. Plus, humans have a whole nervous system which has a lot of self-correcting systems (e.g. hormones) that ML AI doesn't yet account for if its goal is human-level intelligence.
Huh? It isn't. It's a good description because it's figuratively accurate to what reading LLM text feels like, not because it's technically accurate to what it's doing.
> Indeed, a useful criterion for gauging a large-language model’s quality might be the willingness of a company to use the text that it generates as training material for a new model.
It seems obvious that future GPTs should not be trained on the current GPT's output, just as future DALL-Es should not be trained on current DALL-E outputs, because the recursive feedback loop would just yield nonsense. But, a recursive feedback loop is exactly what superhuman models like AlphaZero use. Further, AlphaZero is even trained on its own output even during the phase where it performs worse than humans.
There are, obviously, a whole bunch of reasons for this. The "rules" for whether text is "right" or not are way fuzzier than the "rules" for whether a move in Go is right or not. But, it's not implausible that some future model will simply have a superhuman learning rate and a superhuman ability to distinguish "right" from "wrong" - this paragraph will look downright prophetic then.
I generally respect the heck out of Chiang but I think it's silly to expect anyone to be happy feeding a language model's output back into it, unless that output has somehow been modified by the real world.
But, for a subset of topics, say, math and logic, a minimal set of core principles (axioms) is theoretically sufficient to derive the rest. For such topics, it might actually make sense to feed the output of a (very, very advanced) LLM back into itself. No reference to the real world is needed - only the axioms, and what the model knows (and can prove?) about the mathematical world as derived from those axioms.
Next, what's to say that a model can't "build theory", as hypothesized in this article (via the example of arithmetic)? If the model is fed a large amount of (noisy) experimental data, can it satisfactorily derive a theory that explains all of it, thereby compressing the data down to the theoretical predictions + lossy noise? Could a hypothetical super-model be capable of iteratively deriving more and more accurate models of the world via recursive training, assuming it is given access to the raw experimental data?
And this is exactly why MS is in such a hurry to integrate it into Bing. The feedback loop can be closed by analyzing user interaction. See Nadella’s recent interview about this.
I don't find this a useful criterion. It is certainly something to worry about in the future as the snake begins to eat its own tail, but before we reach that point, we can certainly come up with actual useful criteria. First, what makes up "useful criteria"? Certainly it can't be "the willingness of a company to use the text that it generates as training material for a new model", because that is a hypothetical situation contingent on the future. So we should probably start with something like, well, is ChatGPT useful for anything in the present? And it turns out it is!
It's both a useful translator and a useful synthesizer.
When given an analytic prompt like, "turn this provided box score into an entertaining outline", it can reliably act as translator, because the facts about the game were in the prompt.
And when given a synthetic prompt like, "give me some quotes from the broadcasters", it can reliable act as a synthesizer, because in fact the transcript of the broadcasters were not in the prompt.
https://williamcotton.com/articles/chatgpt-and-the-analytic-...
This is not a surprise, as the author is Ted Chiang, who is the award winning novelist and the author of "The Lifecycle of Software Objects", "Tower of Babylon" and other science fiction works. I had a pleasure of once having coffee with him while talking about his thoughts on some of the topics in "The Lifecycle of Software Objects", which is a very enjoyable book that may be of interest to some HN readers.
There is already a paper for that: https://arxiv.org/abs/2210.11610
Large Language Models Can Self-Improve
>Large Language Models (LLMs) have achieved excellent performances in various tasks. However, fine-tuning an LLM requires extensive supervision. Human, on the other hand, may improve their reasoning abilities by self-thinking without external inputs. In this work, we demonstrate that an LLM is also capable of self-improving with only unlabeled datasets. We use a pre-trained LLM to generate "high-confidence" rationale-augmented answers for unlabeled questions using Chain-of-Thought prompting and self-consistency, and fine-tune the LLM using those self-generated solutions as target outputs. We show that our approach improves the general reasoning ability of a 540B-parameter LLM (74.4%->82.1% on GSM8K, 78.2%->83.0% on DROP, 90.0%->94.4% on OpenBookQA, and 63.4%->67.9% on ANLI-A3) and achieves state-of-the-art-level performance, without any ground truth label. We conduct ablation studies and show that fine-tuning on reasoning is critical for self-improvement.
an assumption disguised as fact. we simply do not know yet
Now, if training was modified to have some other goal like consistency or something, and with a requirement to continue to perform well against a fixed corpus of non-AI-generated text, you could imagine models bootstrapping themselves up to perform better at that metric, alpha-go style.
But merely training on current output, and repeating that process, given how the models work today, would most certainly result in random drift and an eventual descent into nonsense.
That's fantastic.
https://en.m.wikipedia.org/wiki/Pareidolia
The story is an impressive piece, but I think as with many of us, it's a personal projection of expectations on results. One example from my experience. In the book "Jim Carter - Sky Spy, Memoirs of a U-2 Pilot" there was an interesting story about the moment when U-2 was used for capturing the photo of a big area at the Pacific to save the life of a lost seaman. The story was very interesting and I always wanted to know more, technical details, people involvement etc. Searching with Google ten years ago didn't help, I rephrased the names, changed the date (used even the range operator) to no avail. And recently I asked several LLM-based bots about it. You can guess it. They ignored my constrains at best and hallucinate at worst. One even invented a mixed reality story when Francis Gary Powers actually flew not one but with a co-pilot and the latter ended up in the Pacific and was saved. Very funny, but I wasn't impressed. But if one of them scraped the far corners of web discussion boards and saved a first-person account of someone who took part in it and gave it to me, I would be really impressed.
"mastery" can be considered a hyper efficient destructive compression (experts are often unable to articulate or teach to beginners) that reduces latency of response to such extreme levels that they seem to be predicting the future or reacting at godlike speeds.
(1) https://pubmed.ncbi.nlm.nih.gov/36178498/
If somebody asked me how heap sort works (my favorite sort!) I can sketch it out. If they ask me where I learned it, I really don't remember. Might be the Aho, Hopcroft, and Ullman book. I can't really say though.
ChatGPT has little reputation of its own, and produces such a broad swath of knowledge, it becomes "Jack of all trades, master of none."
> The 2022 Kia Telluride is a midsize SUV that can seat up to eight passengers and has an AWD option. It has a 3.8-liter V6 engine that produces 291 hp and 262 lb-ft of torque. It can accelerate from 0 to 60 mph in 7.1 seconds [10] and has a combined fuel economy of 21 mpg. It also has excellent reviews from critics and owners, and won several awards, including the 2020 World Car of the Year [7].
[10] https://www.topspeed.com/cars/guides/best-awd-cars-for-2022/
[7] https://www.hotcars.com/best-6-seater-suvs-2022/
The references don't back up the 7.1 seconds or World Car of the Year claims.
My idle speculation makes me think this is a hard problem. If ChatGPT kills Search it also kills the websites that get surfaced by search that were relying on money from search-directed users. So stores are fine, but "informational" websites are probably in for another cull. Paywall premium publications are probably still fine - the people currently willing to pay for new, somewhat-vetted, human content still will be. But things like the Reddits of the world might be in for a hard time since all those "search hack" type uses of "search google for Reddit's reviews of this product" are short-circuited, if this is actually more effective than search.
Meanwhile, SEO folks will probably try to maximize what they can get out of the declining search market by using these tools to flood the open web with even more non-vetted bullshit than it's already full of.
So as things change, how does 1 tiny-but-super-reliable amateur website (say, an expert on .NET runtime internals's individual blog) make a dent in the "knowledge" of the next iteration of these models? How do they outweigh the even-bigger sea of crap that the rest of the web has now become when future training is done?
We need the source information to be continually generated in order for ChatGPT to improve.
The way I find them is from forums, chat, links from other such sites. All goes into my RSS reader.
I use search with !wiki or !mdn etc. most of the time.
GPT is not trying to create a reproduction of it's source material and simply failing at the task. Compression and GPT are both mathematical processes they aren't the same process; JPEG is taking the original image and throwing away some of the detail. GPT is processing content to apply weights to a model; if that is reversible to the original content it is considered a failure.
Depending on the application we can accept a few pixels here or there being slightly different colors.
I queried GPT to try and find a book I could only remember a few details of. The blurriness of GPT's interpretation of facts was to invent a book that didn't exist, complete with a fake ISBN number. I asked GPT all kinds of ways if the book really existed, and it repeatedly insisted that it did.
I think your argument here would be to say that being reversible to a real book isn't the intent, but that's not how it is being marketed nor how GPT would describe itself.
They're not, but they are very related! GPT has a 1536 dimensional vector space that is conceptually related to a principal component analysis and dimensional reduction in certain compression algorithms.
This does mean that neural networks can overfit and be fully reversible but that is hardly their only useful feature!
They are also very good at translating and synthesizing, depending on the nature of the prompt.
If given an analytic prompt like, "convert this baseball box score into an entertaining paragraph", ChatGPT does a reliable job of acting as a translator because all of the facts about the game are contained in the box score!
But when given a synthetic prompt like, "give me some quotes from the broadcasters", ChatGPT does a reliable job of acting as a synthesizer, because none of those facts about the spoken transcript of the broadcasters is in the prompt. And it is a good synthesizer as well because those quotes sound real!
Terms, etc:
https://williamcotton.com/articles/chatgpt-and-the-analytic-...
Not necessarily, and even if so, if you continuously opened and saved a JPEG image it would turn to a potato quality image eventually, Xerox machines do the same thing. Happens all the time with memes, and old homework assignments. What I fear is this happening to GPT, especially when people just start outright using its content and putting it on sites. Then it becomes part of what GPT is trained on later on, but what it had previously learned was wrong, so it just progressively gets more and more blurred, with people using the new models to produce content, with a feedback loop that just starts to blur truth and facts entirely.
Even if you tie it to search results like Microsoft is doing, eventually the GPT generated content is going to rise to the top of organic results because of SEO mills using GPT for content to goose traffic...then all the top results agree with the already wrong AI generated answer; or state actors begin gaming the system and feeding the model outright lies.
This happens in people too, sure, but in small subsets not in monolithic fashion with hundreds of millions of people relying on the information being right. I have no idea how they can solve this eventual problem, unless they are just supervising what it's learning all the time; but then at the point it can become incredibly biased and limited.
If I ask ChatGPT to explain something to me like I'm 5, it's going to lose some of the quality in its response, in compared to it being written in 1000 words.
This is an important distinction, especially when there are issues of copyright involved.