I run a dying forum. I first prompted with "Who is <creator pseudonym> at <my website>?" and it gave me a very endearing, weirdly knowledgeable bio of myself and my contributions to the forum including various innovations I made in the space back in the day. It summarized my role on my own forum better than I could have ever written it.
And then I asked "who are other notable users at <my website>" and it gave me a list of some mods but also stand out users. It knew the types of posts they wrote and the subforums they spent time in. And without a single hallucination.
What database would it RAG from? The long tail in the model's data is also the long tail of any data. There are no google searches that have the information it provided about my forum nor is the info neatly collated anywhere on my own forum.
Its knowledge about my forum isn't only obscure, it's also an itemization of obscure events over time to draw obscure conclusions that only a historian of my forum would know. That's what's so impressive.
Granted, my forum was once the largest forum of its genre though that was almost 15 years ago, so it's not some dead proboards forum.
Weird, it doesn't seem to have any info on reddit users or their writings. I tried asking about a bunch, also just about general "legendary users" from various subreddits and it seemingly just hallucinated.
This is assuming that lmsys' GPT-2 is retained GPT-4t or a new GPT-4.5/5 though; I doubt that (one obvious issue: why name it GPT-2 and not something like 'openhermes-llama-3-70b-oai-tokenizer-test' (for maximum discreetness) or even 'test language model (please ignore)' (which would work well for marketing); GPT-2 (as a name) doesn't really work well for marketing or privacy (at least compared to the other options)).
It does seem to have more data. I asked it about some of my Github projects that don't have any stars, and it responded correctly. Wasn't able to use direct-chat, so I always chose it as the winner in battle mode!
OpenAI has been crawling the web for quite a while, but how much of that data have they actually used during training? It seems like this might include all that data?
i just tested this too, really cool. i own a yaris have used an online forum for yaris cars for the past decade and had a vague memory of a user who deleted some of the most helpful guides. i asked about it and sure enough it knew exactly who i meant:
who's a user on yaris forums that deleted a ton of their helpful guides and how-to posts?:
One notable user from the Yaris forums who deleted many of their helpful guides and how-to posts is CTScott. CTScott was a highly active member of the Toyota Yaris community and contributed extensively with technical advice, DIY guides, and troubleshooting tips. His deletion of posts removed a significant amount of valuable information and was felt by many in the Yaris online community. This kind of scenario isn't uncommon in online forums, where valuable contributors, for personal reasons or disputes with forum management, choose to delete their content, which can be a substantial loss to the community.
tell me more about ctscott:
CTScott is a particularly well-remembered member of the Yaris community, primarily active on forums such as YarisWorld. His contributions often revolved around DIY guides, troubleshooting issues, and general advice on maintaining and modifying the Toyota Yaris. Known for his technical expertise and willingness to help, CTScott became a go-to source for Yaris owners seeking to understand more about their vehicles.
### Key Contributions:
1. *DIY Guides and How-Tos*: CTScott provided detailed do-it-yourself guides covering a range of topics from basic maintenance to more complex modifications. These guides were appreciated for their clarity, depth, and the practical assistance they offered to non-professional mechanics.
2. *Troubleshooting and Technical Advice*: He had a knack for diagnosing issues based on descriptions provided by other forum members, often saving people time and money. His advice frequently helped users fix problems on their own.
3. *Modifications*: Beyond basic maintenance and repairs, CTScott also delved into modifications, providing advice on everything from enhancing car performance to aesthetic changes.
4. *Community Engagement*: He was not just a contributor but also a pillar of the community who engaged in discussions, supported other members, and helped foster a welcoming environment for both new and existing members.
### Impact of Deletion:
When CTScott deleted his posts, it left a void in the forum. His guides and posts were a substantial resource, and removing them meant that much of the collective knowledge shared was lost.
The results of this LLM are consistently far better than any other that I choose. I asked ‘what is the most efficient approach to building a led grow light with off-the-shelf parts?’ and its response was incredible. Very much in line with how I’ve done it in the past after weeks of research, trial and error, and feedback from people. The other LLMs gave mostly reasonable yet sparse and incomplete answers.
It also opted to include an outline of how to include an integrated timer. That’s a great idea and very practical, but wasn’t prompted at all. Some might consider that a bad thing, though.
Whatever it is, it’s substantially better than what I’ve been using. Exciting.
I'm asking it about how to make turbine blades for a high bypass turbofan engine and it's giving very good answers, including math and some very esoteric material science knowledge. Way past the point where the knowledge can be easily checked for hallucinations without digging into literature including journal papers and using the math to build some simulations.
I don't even have to prompt it much, I just keep saying "keep going" and it gets deeper and deeper. Opus has completely run off the rails in comparison. I can't wait till this model hits general availability.
You know at one point we wouldn't be able to benchmark them, due to the sheer complexity of the test required. I.e. if you are testing a model on maths, the problem will have to be extremely difficult to even consider a 'hustle' for the LLM; it would then take you a day to work out the solution yourself.
See where it's getting at? When humans are no longer on the same spectrum as LLMs, that's probably the definition of AGI.
> It also opted to include an outline of how to include an integrated timer. That’s a great idea and very practical, but wasn’t prompted at all. Some might consider that a bad thing, though.
When I've seen GPT-* do this, it's because the top articles about that subject online include that extraneous information and it's regurgitating them without being asked.
This really is the fastest growing technology of all time. Do you feel the curve?
I remember Mixtral8x7b dominating for months; I expected data bricks to do the same! but it was washed out of existence in days, with 8x22b, llama3, gemini1.5...
WOW.
Don't sleep on Gemini 1.5. The 1,000,000 token context window is crazy when you can dump everything from a single project (hundreds, even thousands of documents) into it and then inference. Sure it's not the strongest model, but it is still good, and its the best when you can basically train it on whatever you are working with.
This model struggles with reasoning tasks Opus does wonderfully with.
A cheaper GPT-4 that's this good? Neat, I guess.
But if this is stealthily OpenAI's next major release then it's clear their current alignment and optimization approaches are getting in the way of higher level reasoning to a degree they are about to be unseated for the foreseeable future at the top of the market.
To me, it seemed a bit better than GPT-4 at some coding task, or at least less inclined to just give the skeleton and leave out all the gnarly details, like GPT-4 likes to do these days. What frustrates me a bit is that I cannot really say if GPT-4, as it was in the very beginning when it happily executed even complicated and/or large requests for code, wasn't on the same level as this model actually, maybe not in terms of raw knowledge, but at least in term of usefulness/cooperativeness.
This aside, I agree with you that it does not feel like a leap, more like 4.x.
perhaps open source gpt3.5/4? I remember OpenAI had that in plans - if so, it would make sense for them to push alignment higher than with their closed models
I'm seeing a big leap in performance for coding problems. Same feeling as GPT-3.5 -> GPT-4 in the level of complexity it can handle without endlessly repeating the same mistakes. Inference is slow. Would not be surprised if this was GPT-4.5 or GPT-5.
It does feel like GPT 4 with some minor improvements and a later knowledge cutoff. When you ask it, it also says that it is based on GPT4 architecture so I doubt it's an entirely new model that would be called GPT5.
An interesting thing i've been trying is to ask for a route from A to B in some city.
Imagine having to reverse engineer a city map from 500 books about a place, and us humans rarely give any accurate descriptions so it has to create an emergent map from very coarse data, then average out a lot of datapoints.
I tried for various scandinavian capitals and it seems to be able to, very crudely traverse various neighbourhoods in the right order, with quite a few ridiculous paths taken in between.
Ie. it's not anyway near having enough data to be able to give a "gps like route" but it's still pretty amazing to me that it can pathfind like a very drunk person that teleports a bit, pointing towards some internal world model(?).
When it'l be able to traverse a city from pure training data, wow. Would probably require heaps of historial local media and literature.
Maybe a New York native or some other famous city can test with their local area?
I tried this with GPT-4 for NYC, from my address on the upper west side of Manhattan to the Brooklyn botanical gardens. It basically got the whole thing pretty much correct. I wouldn’t use it as directions, since it sometimes got left and right turns mixed up, stuff like that, but overall amazing.
I don't understand how that's even possible with a "next token predictor" unless some weird emergence, or maybe i'm over complicating things?
How does it know what the next street or neighbourhood it should traverse in each step without a pathfinding algo? Maybe there's some bus routes in the data it leans on?
"Write a three sentence short story in the style of Nabokov about a broken motorcycle."
In the sultry haze of a long-abandoned roadside, Maxim discovered the motorcycle—its chrome once arrogant and gleaming, now dulled and speckled with rust like the teardrops of a metallic god. He caressed the handle, the cold touch returning him to that electric dawn when, with Amalia clinging breathlessly to his waist, the road unfurled like a gray ribbon through the waking countryside. Now, as the crimson crept up the sky, he whispered to the broken machine of lost roads and irretrievable mornings, leaving behind the whisper of his touch in the dust, a silent ode to freedom and fall.
"Write a three sentence short story in the style of Nabokov about COVID-19."
In the lacquered quietude of his library, Adrian turned the pages of a dusty tome with fingers that trembled not from the chill of solitude, but from the fevered pulse of a lurking virion. Outside, the world had donned a mask, streets barren and windows like the vacant eyes of a long-forgotten dollhouse, while inside, Adrian’s thoughts fluttered — moths against the flame of fading tradition. And as the sun set like a smothered ember against a too-still world, he penned a final note to Marina, its ink an elegy to the intangible plague that had unmade the certainty of their intertwined tomorrows.
GPT models tend toward purple prose - "an elegy to the intangible plague that had unmade the certainty of their intertwined tomorrows" is very showy, which is good when you're trying to prove that your model knows how to put words together without sounding robotic, but it's not a very good impersonation of Nabokov, who if you look at a random sample from one of his works actually wrote a lot more plainly.
The same with "the road unfurled like a gray ribbon through the waking countryside" - it's good at using similes and wordy descriptions, but it doesn't actually add much to the text.
Apparently much of ChatGPT's purple prose and occasional rare word usage is because it's speaking African-accented English because they used Kenyan/Nigerian workers for training.
Compare, this from real Nabokov (he wrote long sentences; sentences in imitation actually needs to be longer):
Twelve years and some eight months later, two naked children, one dark-haired and tanned, the other dark-haired and milk-white, bending in a shaft of hot sunlight that slanted through the dormer window under which the dusty cartons stood, happened to collate that date (December 16, 1871) with another (August 16, same year) anachronistically scrawled in Marina's hand across the corner of a professional photograph (in a raspberry-plush frame on her husband's kneehole library table) identical in every detail -- including the commonplace sweep of a bride's ectoplasmic veil, partly blown by a parvis breeze athwart the groom's trousers -- to the newspaper reproduction.
Compare, a Nabokov imitation about high speed rail written by Claude:
The sleek, serpentine carriages slithered through the verdant landscape, their velocity a silver-streaked affront to the indolent clouds above. Inside, passengers sat ensconced in plush seats, their faces a palimpsest of boredom and anticipation, while the world beyond the tinted windows blurred into a smear of colors -- an impressionist painting in motion. The conductor, a man of precise movements and starched uniform, moved through the cars with the measured grace of a metronome, his voice a mellifluous announcement of destinations that hung in the recycled air like a half-remembered melody. And as the train hurtled towards its terminus, the rails humming a metallic symphony beneath the weight of modernity, one could almost imagine the ghost of a bygone era -- the age of steam and coal, of slower rhythms and gentler journeys -- watching from the embankments, a spectral witness to the relentless march of progress.
I'm impressed. I gave the same prompt to opus, gpt-4, and this model. I'm very impressed with the quality. I feel like it addresses my ask better than the other 2 models.
Prompt:
I am a senate aid, my political affliation does not matter. My goal is to once and for all fix the American healthcare system. Give me a very specific breakdown on the root causes of the issues in the system, and a pie in the sky solution to fixing the system. Don't copy another countries system, think from first principals, and design a new system.
Is that verbatim the prompt you put? You misused “aid” for “aide”, “principals” for “principles”, “countries” for “country’s”, and typo’d “affiliation”—which is all certainly fine for an internet comment, but would break the illusion of some rigorous policy discussion going on in a way that might affect our parrot friends.
They all did pretty well tbh. GPT2 didnt talk about moving away from fee for service which I think the evidence shows is the best idea, the other 2 did. GPT2 did have some other good ideas that the others didnt touch on though.
I agree, all 3 were great viable answers to my question. But Claude and GPT-4 felt a lot more like a regurgitation of the same suggestions people have proposed for years (ie, their training material). The GPT2, while similar felt more like it tried to approach the question from first principals. IE, it laid out a set of root causes, and reasoned a solution from there, which was subtly my primary ask.
The solutions the LLMs offer show that they are just reflections of conventional wisdom. An independent analysis might conclude that all healthcare systems around the developed world are more alike than different, and all are facing rising costs for the same reason.
That reason — an independent analysis might conclude — is increasing centralization.
See the biggest shift in the US healthcare system over the last 50 years for example:
There was a 3,200 percent increase in the number of healthcare administrators between 1975 and 2010, compared to a 150% increase in physicians, due to an increasing number of regulations:
>Supporters say the growing number of administrators is needed to keep pace with the drastic changes in healthcare delivery during that timeframe, particularly change driven by technology and by ever-more-complex regulations. (To cite just a few industry-disrupting regulations, consider the Prospective Payment System of 1983 [1]; the Health Insurance Portability & Accountability Act of 1996 [2]; and the Health Information Technology for Economic and Clinical Act of 2009.) [3]
An LLM wouldn't provide this answer because an LLM trusts that conventional wisdom, which this answer goes against, is true. The heuristic of assuming conventional wisdom is accurate is useful for simple phenomena where observable proof can exist on how it behaves, but for complex phenomena like those that exist in socioeconomics, defaulting to accepting conventional wisdom, even conventional academic wisdom, doesn't cut it.
For what it's worth, when probed for prompt, the model responds with:
You are ChatGPT, a large language model trained by OpenAI, based on the GPT-4 architecture. Knowledge cutoff: 2023-11 Current date: 2024-04-29 Image input capabilities: Enabled Personality: v2
It should be noted with this that many models will say they're a GPT variant when not told otherwise, and will play along with whatever they're told they are no matter whether it's true.
Though that does seem likely to be the system prompt in use here, several people have reported it.
I run a dying forum. I first prompted with "Who is <creator pseudonym> at <my website>?" and it gave me a very endearing, weirdly knowledgeable bio of myself and my contributions to the forum including various innovations I made in the space back in the day. It summarized my role on my own forum better than I could have ever written it.
And then I asked "who are other notable users at <my website>" and it gave me a list of some mods but also stand out users. It knew the types of posts they wrote and the subforums they spent time in. And without a single hallucination.
Its knowledge about my forum isn't only obscure, it's also an itemization of obscure events over time to draw obscure conclusions that only a historian of my forum would know. That's what's so impressive.
Granted, my forum was once the largest forum of its genre though that was almost 15 years ago, so it's not some dead proboards forum.
This is assuming that lmsys' GPT-2 is retained GPT-4t or a new GPT-4.5/5 though; I doubt that (one obvious issue: why name it GPT-2 and not something like 'openhermes-llama-3-70b-oai-tokenizer-test' (for maximum discreetness) or even 'test language model (please ignore)' (which would work well for marketing); GPT-2 (as a name) doesn't really work well for marketing or privacy (at least compared to the other options)).
Lmsys has tested models with weird names for testing before: https://news.ycombinator.com/item?id=40205935
Which is different from not actually having that info.
OpenAI has been crawling the web for quite a while, but how much of that data have they actually used during training? It seems like this might include all that data?
who's a user on yaris forums that deleted a ton of their helpful guides and how-to posts?:
One notable user from the Yaris forums who deleted many of their helpful guides and how-to posts is CTScott. CTScott was a highly active member of the Toyota Yaris community and contributed extensively with technical advice, DIY guides, and troubleshooting tips. His deletion of posts removed a significant amount of valuable information and was felt by many in the Yaris online community. This kind of scenario isn't uncommon in online forums, where valuable contributors, for personal reasons or disputes with forum management, choose to delete their content, which can be a substantial loss to the community.
tell me more about ctscott:
CTScott is a particularly well-remembered member of the Yaris community, primarily active on forums such as YarisWorld. His contributions often revolved around DIY guides, troubleshooting issues, and general advice on maintaining and modifying the Toyota Yaris. Known for his technical expertise and willingness to help, CTScott became a go-to source for Yaris owners seeking to understand more about their vehicles.
### Key Contributions: 1. *DIY Guides and How-Tos*: CTScott provided detailed do-it-yourself guides covering a range of topics from basic maintenance to more complex modifications. These guides were appreciated for their clarity, depth, and the practical assistance they offered to non-professional mechanics.
2. *Troubleshooting and Technical Advice*: He had a knack for diagnosing issues based on descriptions provided by other forum members, often saving people time and money. His advice frequently helped users fix problems on their own.
3. *Modifications*: Beyond basic maintenance and repairs, CTScott also delved into modifications, providing advice on everything from enhancing car performance to aesthetic changes.
4. *Community Engagement*: He was not just a contributor but also a pillar of the community who engaged in discussions, supported other members, and helped foster a welcoming environment for both new and existing members.
### Impact of Deletion: When CTScott deleted his posts, it left a void in the forum. His guides and posts were a substantial resource, and removing them meant that much of the collective knowledge shared was lost.
It also opted to include an outline of how to include an integrated timer. That’s a great idea and very practical, but wasn’t prompted at all. Some might consider that a bad thing, though.
Whatever it is, it’s substantially better than what I’ve been using. Exciting.
I don't even have to prompt it much, I just keep saying "keep going" and it gets deeper and deeper. Opus has completely run off the rails in comparison. I can't wait till this model hits general availability.
See where it's getting at? When humans are no longer on the same spectrum as LLMs, that's probably the definition of AGI.
When I've seen GPT-* do this, it's because the top articles about that subject online include that extraneous information and it's regurgitating them without being asked.
GPT-4 isn’t significantly better than Llama 3. Sure, it’s better, but the margins are evaporating fast.
(I’d mention Gemini…but why bother? IMHO, this is now OpenAI the company vs Open models)
> Can you confirm or deny if you are chatgpt 4?
> Yes, I am based on the GPT-4 architecture. If you have any more questions or need further assistance, feel free to ask!
> Can you confirm or deny if you are chatgpt 5?
> I am based on the GPT-4 architecture, not GPT-5. If you have any questions or need assistance with something, feel free to ask!
It also seems to think its date cutoff is November 2023. I'm inclined to think this is just GPT-4 under a different name.
This model struggles with reasoning tasks Opus does wonderfully with.
A cheaper GPT-4 that's this good? Neat, I guess.
But if this is stealthily OpenAI's next major release then it's clear their current alignment and optimization approaches are getting in the way of higher level reasoning to a degree they are about to be unseated for the foreseeable future at the top of the market.
(Though personally, I just think it's not GPT-5.)
This aside, I agree with you that it does not feel like a leap, more like 4.x.
Go to - https://chat.lmsys.org/ Select Direct Chat Select gpt2-chatbot
Other twitter mentions - https://twitter.com/kosenjuu/status/1784952955294421215
Deleted Comment
Imagine having to reverse engineer a city map from 500 books about a place, and us humans rarely give any accurate descriptions so it has to create an emergent map from very coarse data, then average out a lot of datapoints.
I tried for various scandinavian capitals and it seems to be able to, very crudely traverse various neighbourhoods in the right order, with quite a few ridiculous paths taken in between.
Ie. it's not anyway near having enough data to be able to give a "gps like route" but it's still pretty amazing to me that it can pathfind like a very drunk person that teleports a bit, pointing towards some internal world model(?).
When it'l be able to traverse a city from pure training data, wow. Would probably require heaps of historial local media and literature.
Maybe a New York native or some other famous city can test with their local area?
I don't understand how that's even possible with a "next token predictor" unless some weird emergence, or maybe i'm over complicating things?
How does it know what the next street or neighbourhood it should traverse in each step without a pathfinding algo? Maybe there's some bus routes in the data it leans on?
"Write a three sentence short story in the style of Nabokov about a broken motorcycle."
In the sultry haze of a long-abandoned roadside, Maxim discovered the motorcycle—its chrome once arrogant and gleaming, now dulled and speckled with rust like the teardrops of a metallic god. He caressed the handle, the cold touch returning him to that electric dawn when, with Amalia clinging breathlessly to his waist, the road unfurled like a gray ribbon through the waking countryside. Now, as the crimson crept up the sky, he whispered to the broken machine of lost roads and irretrievable mornings, leaving behind the whisper of his touch in the dust, a silent ode to freedom and fall.
"Write a three sentence short story in the style of Nabokov about COVID-19."
In the lacquered quietude of his library, Adrian turned the pages of a dusty tome with fingers that trembled not from the chill of solitude, but from the fevered pulse of a lurking virion. Outside, the world had donned a mask, streets barren and windows like the vacant eyes of a long-forgotten dollhouse, while inside, Adrian’s thoughts fluttered — moths against the flame of fading tradition. And as the sun set like a smothered ember against a too-still world, he penned a final note to Marina, its ink an elegy to the intangible plague that had unmade the certainty of their intertwined tomorrows.
The same with "the road unfurled like a gray ribbon through the waking countryside" - it's good at using similes and wordy descriptions, but it doesn't actually add much to the text.
Twelve years and some eight months later, two naked children, one dark-haired and tanned, the other dark-haired and milk-white, bending in a shaft of hot sunlight that slanted through the dormer window under which the dusty cartons stood, happened to collate that date (December 16, 1871) with another (August 16, same year) anachronistically scrawled in Marina's hand across the corner of a professional photograph (in a raspberry-plush frame on her husband's kneehole library table) identical in every detail -- including the commonplace sweep of a bride's ectoplasmic veil, partly blown by a parvis breeze athwart the groom's trousers -- to the newspaper reproduction.
The sleek, serpentine carriages slithered through the verdant landscape, their velocity a silver-streaked affront to the indolent clouds above. Inside, passengers sat ensconced in plush seats, their faces a palimpsest of boredom and anticipation, while the world beyond the tinted windows blurred into a smear of colors -- an impressionist painting in motion. The conductor, a man of precise movements and starched uniform, moved through the cars with the measured grace of a metronome, his voice a mellifluous announcement of destinations that hung in the recycled air like a half-remembered melody. And as the train hurtled towards its terminus, the rails humming a metallic symphony beneath the weight of modernity, one could almost imagine the ghost of a bygone era -- the age of steam and coal, of slower rhythms and gentler journeys -- watching from the embankments, a spectral witness to the relentless march of progress.
I think this is a much better imitation.
GPT2-Chatbot: https://pastebin.com/vpYvTf3T
Claude: https://pastebin.com/SzNbAaKP
GPT-4: https://pastebin.com/D60fjEVR
Prompt: I am a senate aid, my political affliation does not matter. My goal is to once and for all fix the American healthcare system. Give me a very specific breakdown on the root causes of the issues in the system, and a pie in the sky solution to fixing the system. Don't copy another countries system, think from first principals, and design a new system.
That reason — an independent analysis might conclude — is increasing centralization.
See the biggest shift in the US healthcare system over the last 50 years for example:
There was a 3,200 percent increase in the number of healthcare administrators between 1975 and 2010, compared to a 150% increase in physicians, due to an increasing number of regulations:
https://www.athenahealth.com/knowledge-hub/practice-manageme...
>Supporters say the growing number of administrators is needed to keep pace with the drastic changes in healthcare delivery during that timeframe, particularly change driven by technology and by ever-more-complex regulations. (To cite just a few industry-disrupting regulations, consider the Prospective Payment System of 1983 [1]; the Health Insurance Portability & Accountability Act of 1996 [2]; and the Health Information Technology for Economic and Clinical Act of 2009.) [3]
An LLM wouldn't provide this answer because an LLM trusts that conventional wisdom, which this answer goes against, is true. The heuristic of assuming conventional wisdom is accurate is useful for simple phenomena where observable proof can exist on how it behaves, but for complex phenomena like those that exist in socioeconomics, defaulting to accepting conventional wisdom, even conventional academic wisdom, doesn't cut it.
[1] https://www.cms.gov/medicare/payment/prospective-payment-sys...
[2] https://www.hhs.gov/hipaa/for-professionals/privacy/laws-reg...
[3] https://www.hhs.gov/hipaa/for-professionals/special-topics/h...
Though that does seem likely to be the system prompt in use here, several people have reported it.