GPT-5 is behind schedule

I'm sure the debate over the definition of AGI is important and will continue for a while, but... I can't care about it anymore.

Between Perplexity searching and summarizing, Claude explaining, and qwen (and other tools) coding, I'm already as happy as can be with whatever you want to call this level of intelligence.

Just today I used a completely local AI research tool, based on Ollama. It worked great.

Maybe it won't get much better? Or maybe it'll take decades instead of years? Ok. I remember not having these tools. I never want to go back.

atonse · 8 months ago

Same here.

The ability to “talk to an expert” about any topic I’m curious about and ask very specific questions has been invaluable to me.

It reminds me of being a kid and asking my grandpa a million questions, like how light bulbs worked, or what was inside his radio, or how do we have day and night.

And before anyone talks about accuracy or hallucinations, these conversations usually are treated as starting off points to then start googling specific terms, people, laws, treaties, etc to dig deeper and verify.

Last year during a visit to my first Indian reservation, I had a whole bunch of questions that nobody in person had answers to. And ChatGPT was invaluable in understanding concepts like where a reservation’s autonomy begins and ends. And why certain tribes are richer than others. What happens when someone calls 911 on a reservation. Or speeds. Or wants to start a factory without worrying about import/export rules. And what causes some tribes to lose their language faster than others. And 20 other questions like this.

And most of those resulted in google searches to verify the information. But I literally could never do this before.

Same this year when I’m visiting family in India. To learn about the politics, the major players, WHY they are considered major players (like the Chief Minister of Bengal or Uttar Pradesh or Maharashtra being major players because of their populations and economies). Criticisms, explanations of laws, etc etc.

For insanely curious people who often feel unsatisfied with the answers given by those around them, it’s the greatest thing ever.

netdevphoenix · 8 months ago

> The ability to “talk to an expert” about any topic I’m curious about and ask very specific questions has been invaluable to me.

It is dangerous to assume that LLMs are experts on any topic. With or without quotes. You are getting a super fast journalist intern with a huge memory but inability to reason critically, lacking understanding about anything and huge unreliability when it comes to answering questions (you can get completely different answers to the same question depending on how you answer it and sometimes even the same question can get you different answers). LLMs are very useful and are a true game changer. But calling that expertise is a disservice to the true experts.

sollewitt · 8 months ago

LLMs suffer from the "Igon Value Problem" https://rationalwiki.org/wiki/Igon_Value_Problem

Similar to reading a pop sci book, you're getting an entertainment from a thing with no actual understanding of the source material rather than an education.

ithadtobe119 · 8 months ago

(throwaway account because of what I'm about to say, but it needs to be said)

While my main use case for LLMs is coding just like most people here, there are lots of areas that are being ignored.

Did you know llama 3.X models have been trained as psychotherapists? It's been invaluable to dump and discuss feelings with it in ways I wouldn't trust any regular person. When real therapists also cost more than what people can afford (and will have you committed if you say the wrong thing), this ends up being a very good option.

And you know how escorts are traditionally known as therapists lite? Yeah, it works in reverse too. The main use case most are sleeping on is, well, emotional porn and erotic role play. Let me explain.

My generation (i.e. Z) doesn't do drugs, we don't drink, we don't go out. Why? Because we can hang on discord, play games, scroll tiktok and goon to our heart's content. 60% of gen Z men are single, 30% women. The loneliness epidemic hit hard along with covid. It's basically a match made in heaven for LLMs that can pretend to love you, like everything about you, ask you about your day, and of course, can sext on a superhuman level. When you're lonely enough, the fact that it's all just simulated doesn't matter one bit.

It's so interesting that the porn industry is usually on the forefront of innovation, adopting blueray and hddvd and whatnot before anyone else, but they're largely asleep on this and so is everyone else who doesn't want to touch of it with a 10ft pole. Well except maybe c.ai to some extent. The business case is there and it's a wide open market that OAI, Anthropic, Google and the rest won't ever stoop down to themselves, so the bar for entry is far lower.

Right now the best experience is known to be heading over to r/locallama by doing it yourself, but there's millions to be made for someone who improves it and figures out a platform to sell it on in the next few years. It can be done well enough with existing properly tuned, open weight, apache licensed LLMs and progress isn't stopping.

croes · 8 months ago

How do you know the answers are correct?

More than once I got eloquent answer that are completely wrong.

mort96 · 8 months ago

The ability to "talk to an expert" on any topic would indeed have been very useful. Sadly, we have the ability to talk to something which tries very very hard to appear as an expert despite knowing nothing about the subject. A human who knows some things pretty well but will talk about stuff they don't know with the same certainty and authority as they walk about stuff they know is a worthless conversation partner. In my experience,"AI" is that but significantly worse.

1209412comb · 8 months ago

Semi-related but I find that sometime it just completely ruined a type of conversation.

Like as in your example, I would previously asked people "how would 911 handle an US Reservation Area", and watch how my friends think and reason. To me getting a conclusive answer was not a point. Now they just copy & paste Chat GPT, no fun haha.

stingraycharles · 8 months ago

For me the problem is that you always need to double-check this particular type of expert, as it can be confidently wrong about pretty much any topic.

It's useful as a starting point, not as a definitive expert answer.

ben_w · 8 months ago

> The ability to “talk to an expert” about any topic I’m curious about and ask very specific questions has been invaluable to me.

Even the ability to talk to a university work placement student/intern in any topic is very useful, never mind true experts.

Even Google's indexing and Wikipedia opened up a huge quantity of low-hanging fruit for knowledge sharing; Even to the extent that LLMs must be treated with caution because the default mode is over-confident, and even to the extent one can call them a "blurry JPEG of the internet", LLMs likewise make available a lot of low-hanging fruit before we get to an AI that reasons more like we do from limited examples.

greentxt · 8 months ago

Libraries and books were pretty cool too though. You could go to a library and find information on anything and a librarian would help you. Not super efficient but good for humans.

zwnow · 8 months ago

Talk to an expert? You are aware of them hallucinating right?

cess11 · 8 months ago

I've been "talking" quite a bit with Ollama models, they're often confidently wrong about Wikipedia level stuff and even if the system prompt is explicitly constrained in this regard. Usually I get Wikipedia as understood by a twelve year old with the self-confidence of adult Peter Thiel. If it isn't factually wrong, it's often subtly wrong in the way that a cursory glance at some web search results is unlikely to rectify.

It takes more time for me to verify the stuff they output than grabbing a book off Anna's Archive or my payed collections and looking something up immediately. I'd rather spend that time making notes than waiting for the LLM to respond and double checking it.

footy · 8 months ago

> For insanely curious people who often feel unsatisfied with the answers given by those around them, it’s the greatest thing ever.

As an insanely curious person who's often unsatisfied with the answers given by those around me, I can't agree. The greatest thing ever is libraries. I don't want to outsource my thinking to a computer any more than I want to outsource it to the people around me.

cardanome · 8 months ago

In the not so distant past we already had a tool that allowed us to look up any question that came into our minds.

It was super fast and always provided you with sources. It never hallucinated. It was completely free except for some advertisement. You could build a whole career out of being good at using it.

It was a search engine. Young people might not remember but there was a time when Google wasn't shite but actually magic.

stravant · 8 months ago

One of my favorite successes was getting an LLM to write me a program to graph how I subjectively feel the heat of steam coming off of the noodles I'm pouring the water out from as a function of the ambient temperature.

I was wondering which effects were at play and the graph matched my subjective experience well.

Hugsun · 8 months ago

I mostly feel sorry for grandpa, he'll receive much less of these questions, if any. This is partially because I expect to become this grandpa and already suspect that some people aren't asking me questions they would be, if they had no access to chatgpt.

delusional · 8 months ago

> And most of those resulted in google searches to verify the information. But I literally could never do this before.

Could you elaborate on this? What happened before when you had that type of questions? What was stopping you from tamping "911 emergency indian reservation" into google and learning that the "Prairie Band Potawatomi Nation" has their own 911 dispatch?

In my youth, before the internet was everywhere, we were taught that we could always ask the nearest librarian and that they would help us find some useful information. The information was all there, in books, the challenge was to know which books to read. As I got older, and Google started to become more available, we were taught how to filter out bad information. The challenge shifted from finding information into how not to find misinformation.

When I hear what you say here, I'm reminded of that shift. There doesn't seem to be any fundamental change there, expect may that it makes it harder not to find misinformation by obscuring the source of the information, which I was taught was an important indicator of its legitimacy.

coliveira · 8 months ago

How do you know the AI didn't hallucinate the answers? For topics like these, where there is little information available, the probability of hallucination is very high.

HPsquared · 8 months ago

The amount of value creation is off the scale. It's like when people started using Google, or Google maps.

bloppe · 8 months ago

At this point I think even the most bearish have to concede that LLM's are an amazing tool. But OpenAI was never supposed to be about creating tools. They're supposed to create something that can completely take over entire projects for you, not just something that can help you work on the projects faster. If they can't pull that off in the next year or two, they're gonna seriously struggle to raise the next 10B they'll need to keep the lights on.

Of course LLMs aren't going anywhere, but I do not envy Sam Altman right now.

lumost · 8 months ago

At this point it’s quite likely that they could pivot and just be the chatgpt company. I’ve found chatgpt-4o with web search and plugins to be more useful than o1 for most tasks.

It’s possible we’re nearing the end of the LLM race, but I doubt that’s the end of the AI story this decade, or OpenAI.

superultra · 8 months ago

I keep thinking about that Idris Elba Microsoft ad about AI about how much AI can help my business, and how both true and untrue that as is, and how much distance there is between the now and the possible promised of AI, and I imagine this is what keeps Altman up at night.

lukan · 8 months ago

Tesla is still valued high, despite FSD did not came, despite being promised. So OpenAI would get away with delivering ChatGPT5, if it is better than the competition.

IAmGraydon · 8 months ago

I've bee thinking the same thing lately. Even if we don't get to AGI, LLMs have revolutionized the way I work. I can produce code and copy at superhuman speeds now. I love it. Honestly, if we never get to AGI and just have the LLMs, it's probably the best possible outcome as I don't think true AGI is going to be a good thing for humanity.

tugu77 · 8 months ago

Thats all fine, but I think you are missing the bigger picture. It's not about whether what we already got out of this is good. Of course it is. But this is about where it's going.

Until about 120 years ago, people were happy with horses and horse carriages. Such a great help! Travel long distances, pull weights, I never want to go back! But then the automobile was invented and within a few years little travel was done by horses anymore.

More recently, everybody had a landline phone at home. Such great tech! Talk to grandma hundreds of miles away! I never want to go back! Then suddenly the mobile phone and just shortly after the smart phone came along and now nobody has a landline anymore but everybody can record tiktoks anywhere anytime and share them with the world within seconds.

Now imagine "AI". Sure, we have some new tools right now. Sure we don't want to go back. But imagine the transformative effects that could come if the train didn't stop here. Question is just: will it?

throwup238 · 8 months ago

Amen. Everyone is talking about plateaus and diminishing returns on training but I don’t care one bit. I get that this is a startup focused forum and the financial sustainability of the market players is important but I can’t wait to see what the next decade of UX improvements will be like even if model improvements slow to a crawl.

delusional · 8 months ago

As a consumer you should always evaluate the product that is in front of you, not the one they promise in 6 months. If what's there is valuable to you, then that's great.

When we discuss the potential AGI we're not talking as consumers, we're talking about the business side. If AGI is not reached, you'll see an absolutely enormous market correction, as it realizes that the product is not going to replace any human workers.

The current generation of products are not profitable. They're investments towards that AGI dream. If that dream doesn't happen, then the current generation of stuff will disappear too, as it becomes impossible to provide at a cost you'd be comfortable with.

rajamaka · 8 months ago

Human workers workers have already been replaced.

spaceman_2020 · 8 months ago

This is me. If things never improve and Sonnet 3.6 is the best we have…I’m fine. Its good enough to drastically improve productivity

rkagerer · 8 months ago

completely local AI research tool, based on Ollama

Could you elaborate? Was it easy to install?

SamPatt · 8 months ago

Yes I was referring to this:

https://github.com/TheBlewish/Automated-AI-Web-Researcher-Ol...

arcanemachiner · 8 months ago

Not OP, but yeah, ollama is super easy to install.

I just installed the Docker version and created a little wrapper script which starts and stops the container. Installing different models is trivial.

I think I already had CUDA set up, not sure if that made a difference. But it's quick and easy. Set it up, fuck around for an hour or so while you get things working, then you've got your own local LLM you can spin up whenever you want.

ferminaut · 8 months ago

vscode + cline extension + gemini2.0 is pretty awesome. Highly recommend checking out cline. it quickly became one of my favorite coding tools.

IAmGraydon · 8 months ago

Gemini 2.0 isn't particularly great at coding. The Gemini 1206 preview that was released just before 2.0 is quite good, though. Still, it hasn't taken the crown from Claude 3.5 Sonnet (which appears to now be tied with o1). Very much agree about Cline + VSCode, BTW. My preferred models with Cline are 3.5 Sonnet and 3.5 Haiku. I can throw the more complex problems at Sonnet and use Haiku for everything else.

https://aider.chat/docs/leaderboards/edit.html

SamPatt · 8 months ago

I will check it out. The number of new tools is staggering.

I enjoy image and video generation and I have a 4090 and ComfyUI; I can't keep up with everything coming out anymore.

lazygoose · 8 months ago

Curious about the AI research tool you mentioned, would you mind sharing it? Been trying to get a good local research setup with Ollama but still figuring out what works best.

SamPatt · 8 months ago

https://github.com/TheBlewish/Automated-AI-Web-Researcher-Ol...

Bilal_io · 8 months ago

Not OP, but based on their mention of Ollama, I can tell you that it has built in search tools, all you need to do is supply an API to one of the tools, or even run one of the search tools locally using docker.

tiffanyh · 8 months ago

I have the opposite reaction.

AI right now feels like that MBA person at work.

They don’t know anything.

But because they sound like they are speaking with authority & confidence, allows them to get promoted at work.

(While all of the experts at work roll their eyes because they know the MBA/AI is just spitting out nonsense & wish the company never had any MBA/AI people)

zppln · 8 months ago

And the MBA person (at my company this is everyone in middle management) is also the person who go around and suggest we shoehorn AI into everything...

uludag · 8 months ago

I'm pretty sure the plan has never been to just make these tools that make us more efficient. If AI stays at the level it's at, it would be a profound failure for companies like OpenAI. We're all benefiting from the capital being poured into these technologies now. The enshittification will come. The enshittification always comes.

mitemte · 8 months ago

I’m paying $240 a year to Anthropic that I wasn’t paying before and it’s worth it. While I don’t use Claude every single day, but I use it several times a day when I’m working. More times than the free tier allows.

htrp · 8 months ago

> Just today I used a completely local AI research tool, based on Ollama. It worked great.

Is it on github?

BOOSTERHIDROGEN · 8 months ago

Can you walk me through the steps you've taken to set up the Ollama-based tool so far?

acchow · 8 months ago

Cline was fixing my type errors and unit tests while I was doing my V60 pourover.

hamilyon2 · 8 months ago

If the progress in capabilities stall, the product fit, adoption, ease of use are the next battlefield.

OpenAI may be first to realize and switch, so they still have a chance to recoup some of those billions

alickz · 8 months ago

i feel like AGI is an arbitrary line in the sand anyway

i think as humans we put too much emphasis on what intelligence means relative to ourselves, instead of relative to nature

nico · 8 months ago

> Just today I used a completely local AI research tool, based on Ollama. It worked great

What’s it called? Could you post a link please?

Thank you

SamPatt · 8 months ago

Here you go

https://github.com/TheBlewish/Automated-AI-Web-Researcher-Ol...

ryukoposting · 8 months ago

At this point, most conceivable beneficial use cases for LLMs have been covered. If the economics of AI tech were aligned with making a good product that people want and/or need, we'd basically take everything we have at this point and make it lighter, smaller, and faster. I doubt that's what will happen.

ninetyninenine · 8 months ago

The definition of agi is a linguistic problem but people confuse it for a philosophical problem. Think about it. The term is basically just a classification and what features and qualities fit the classification is an arbitrary and linguistic choice.

The debate stems from a delusion and failure to realize that people are simply picking and choosing different fringe features on what qualifies as agi. Additionally the term exists in a fuzzy state inside our minds as well. It’s not that the concept is profound. It’s that some of the features that define the classification of the term we aren’t sure about. But this doesn’t matter because we are basically just unsure about the definition of a term that we completely made up arbitrarily.

For example the definition of consciousness seems like a profound debate but it’s not. The word consciousness is a human invention and the definition is vague because we choose the definition to be ill defined, vague and controversial.

Much of the debate on this stuff is purely as I stated just a language issue.

zifpanachr23 · 8 months ago

If it's genuinely what you say, then how is what is going on not slavery?

I don't believe AGI is possible but if it was and it was as subjective as you say what is and isn't conscious, then it starts to take on an even more altogether evil character.

Akin to cloning slave humans or something for free cheap labor.

ByteAndBattle · 8 months ago

Local search with Ollama? Please share!

badgersnake · 8 months ago

They’re garbage, they will always be garbage. Changing a 4 to a 5 will not make it not garbage.

The whole sector is a hype bubble artificially inflating stock prices.

Elextric · 8 months ago

https://www.youtube.com/watch?v=lFc1jxLHhyM

michaelbuckbee · 8 months ago

What was the AI search tool?

fud101 · 8 months ago

how do you interact with perplexity? mobile app?

Dead Comment

divan · 8 months ago

Let's revisit this comment in one year – after the explosion of agentic systems. (:

isoprophlex · 8 months ago

You mean, the explosion of human centipede LLM prompts shitting into eachother?

Yes that will be a sight to behold.

wokwokwok · 8 months ago

We already have agentic systems; they're not particularly impressive (1).

There's no specific reason to expect them to get better.

Things that will shift the status quo are: MCST-LLMs (like with ARC-AGI) and Much Bigger LLMs (like GPT-5, if they ever turn up) or some completely novel architecture.

[1] - It's provable; if just chaining LLMs are a particular size into agentic systems could scale indefinitely, then you could use a 1-param LLM and get AGI. You can't. QED. Chaining LLMs with agentic systems has a capped maximum level of function which we basically already see with the current LLMs.

ie. Adding 'agentic' to your system has a finite, probably already reached, upper bound of value.

25% of the top 1000 websites are blocking OpenAI from crawling: https://originality.ai/ai-bot-blocking

I am betting hundreds of thousands, rising to millions more little sites, will start blocking/gating this year. AI companies might license from big sources (you can see the blocking percentage went down), but they will be missing the long tail, where a lot of great novel training data lives. And then the big sites will realize the money they got was trivial as agents start to crush their businesses.

Bill Gross correctly calls this phase of AI shoplifting. I call it the Napster-of-Everything (because I am old). I am also betting that the courts won't buy the "fair use" interpretation of scraping, given the revenues AI companies generate. That means a potential stalling of new models until some mechanism is worked out to pay knowledge creators. (And maybe nothing we know of now will work for media: https://om.co/2024/12/21/dark-musings-on-media-ai/)

Oh, and yes, I love generative AI and would be willing to pay 100x to access it...

P.S. Hope is not a strategy, but hoping something like ProRata.ai and/or TollBits can help make this self-sustainable for everyone in the chain

jpablo · 8 months ago

They aren't blocking anything. They are just asking nicely not to be crawled. Given that AI companies haven't cared a single bit about ripping of other's peoples data I don't see why they would care now.

wing-_-nuts · 8 months ago

A number of sites have started outright blocking any traffic that looks remotely suspicious. This has made browsing with a vpn a bit of a pain.

EVa5I7bHFq9mnYK · 8 months ago

In their attempt to block OpenAI, they block me. Many sites that were accessible just 2 years ago, require login/captchas/rectal exam now just to read the content.

kjkjadksj · 8 months ago

They block plenty and they do it crudely. I get suspicious traffic bans from reddit all the time. Trivial enough to route around by switching user agent however. Which goes to show any crawling bot writer worth their salt already routes around reddit and most other sites bs by now. I’m just the one getting the occasional headache because I use firefox and block ads and site tracking I guess.

njovin · 8 months ago

Wouldn't it be somewhat trivial to set up honeypots?

jaybna · 8 months ago

Yeah, probably right. If you want a great rabbit hole, look up "Common Crawl" and see how a great academic project was absolutely hijacked for pennies on the dollar to grab training data - the foundation for every LLM out there right now.

cshores · 8 months ago

It ultimately doesn't matter because a fairly current snapshot of all of the world's information is already housed in their data lakes. The next stage for AI training is to generate synthetic data either by other AI or by simulations to further train on as human generated content can only go so far.

pphysch · 8 months ago

How is synthetic data supposed to work? Broadly speaking, ML is about extracting signal from noisy data and learning the subtle patterns.

If there is untapped signal in existing datasets, then learning processes should be improved. It does not follow that there should be a separate economic step where someone produces "synthetic data" from the real data, and then we treat the fake data as real data. From a scientific perspective, that last part sounds really bad.

Creating derivative data from real data sounds, for the purpose of machine learning, like a scam by the data broker industry. What is the theory behind it, if not fleecing unsophisticated "AI" companies? Is it just myopia, Goodhart's Law applied to LLM scaling curves? Some MBA took the "data is the new oil" comment a little too seriously and inferred that data is as fungible as refined petroleum?

Deleted Comment

jaybna · 8 months ago

https://www.nature.com/articles/s41586-024-07566-y

aftbit · 8 months ago

IMO this is an underappreciated advantage for Google. Nobody wants to block the GoogleBot, so they can continue to scrape for AI data long after AI-specific companies get blocked.

Gemini is currently embarrassingly bad given it came from the shop that:

1. invented the Transformer architecture

2. has (one of) the largest compute clusters on the planet

3. can scrape every website thanks to a long-standing whitelist

Art9681 · 8 months ago

The new Gemini Experimental models are the best general purpose models out right now. I have been comparing with o1 Pro and I prefer Gemini Experimental 1206 due to its context, speed, and accuracy. Google came out with a lot of new stuff last week if you havent been following. They seem to have the best models across the board, including image and video.

kibwen · 8 months ago

> Nobody wants to block the GoogleBot

This only remains true as long as website operators think that Google Search is useful as a driver of traffic. In tech circles Google Search is already considered a flaming dumpster heap, so let's take bets on when that sentiment percolates out into the mainstream.

jameslk · 8 months ago

For OpenAI, they could lean on their relationship with Microsoft for Bing crawler access

Websites won’t be blocking the search engine crawlers until they stop sending back traffic, even if they’re sending back less and less traffic

tartuffe78 · 8 months ago

Wonder if OpenAI is considering building a search engine for this reason... Imagine if we get a functional search engine again from some company just trying to feeding their model generation...

thiagowfx · 8 months ago

There are two to distinguish: "Googlebot" and "Google-Extended".

heavyset_go · 8 months ago

> I am betting hundreds of thousands, rising to millions more little sites, will start blocking/gating this year. AI companies might license from big sources (you can see the blocking percentage went down), but they will be missing the long tail, where a lot of great novel training data lives.

This is where I'm at. I write content when I run into problems that I don't see solved anywhere else, so my sites host novel content and niche solutions to problems that don't exist elsewhere, and if they do, they are cited as sources in other publications, or are outright plagiarized.

Right now, LLMs can't answer questions that my content addresses.

If it ever gets to the point where LLMs are sufficiently trained on my data, I'm done writing and publishing content online for good.

zifpanachr23 · 8 months ago

I don't think it is at all selfish to want to get some credit for going to the trouble of publishing novel content and not have it all stolen via an AI scraping your site. I'm totally on your side and I think people that don't see this as a problem are massively out of touch.

I work in a pretty niche field and feel the same way. I don't mind sharing my writing with individuals (even if they don't directly cite me) because then they see my name and know who came up with it, so I still get some credit. You could call this "clout farming" or something derogatory, but this is how a lot of experts genuinely get work...by being known as "the <something> guy who gave us that great tip on a blog once".

With AI snooping around, I feel like becoming one of those old mathematicians that would hold back publicizing new results to keep them all for themselves. That doesn't seem selfish to me, humans have a right to protect ourselves and survive and maintain the value of our expertise when OpenAI isn't offering any money.

I honestly think we should just be done with writing content online now, before it's too late. I've thought a lot about it lately and I'm leaning more towards that option.

glenstein · 8 months ago

>Bill Gross correctly calls this phase of AI shoplifting. I call it the Napster-of-Everything (because I am old). I am also betting that the courts won't buy the "fair use" interpretation of scraping, given the revenues AI companies generate. That means a potential stalling of new models until some mechanism is worked out to pay knowledge creators.

To your point, I have wondered whatever became of that massive initiative from Google to scan books, and whether that might be looked at as a potential training source, giving that Google has run into legal limitations on other forms of usage.

ben_w · 8 months ago

> To your point, I have wondered whatever became of that massive initiative from Google to scan books, and whether that might be looked at as a potential training source, giving that Google has run into legal limitations on other forms of usage.

Still around, doing fine: https://en.wikipedia.org/wiki/Google_Books and https://books.google.com/intl/en/googlebooks/about/index.htm...

Given the timing, I suspect it was started as simple indexing, in keeping with the mission statement "Organize the world's information and make it universally accessible and useful".

There was also reCAPTCHA v1 (books) and v2 (street view), which each improved OCR AI until the state of the art AI were able to defeat them in the role of CAPTCHA systems.

pncnmnp · 8 months ago

> I have wondered whatever became of that massive initiative from Google to scan books, and whether that might be looked at as a potential training source, giving that Google has run into legal limitations on other forms of usage.

A few months ago, there was an interesting submission on HN about this - The Tragedy of Google Books (2017) (https://news.ycombinator.com/item?id=41917016).

Kostchei · 8 months ago

Using the real world- as in vision, 3d orientation, physical sensors and building training regimes that augment the language models to be multidimensional and check that perception, that is the next step.

And there is very little shortage of data and experience in the actual world, as opposed to just the text internet. Can the current AI companies pivot to that? Or do you need to be worldlabs, or v2 of worldlabs?

shanusmagnus · 8 months ago

Ironically, if it plays out this way, it will be the biggest boon to actual AGI development there could be -- the intelligence via text tokenization will be a limiting factor otherwise, imo.

Tossrock · 8 months ago

Some can. Google owns Waymo and runs Streetview, they're collecting massive amounts of spatial data all the time. It would be harder for the MS/OpenAI centaur.

code51 · 8 months ago

With current state of legal, a real challenge can happen only around 10 years from now. By then AI players will gather immense power over the law.

lxgr · 8 months ago

If you're willing to believe the narrative that there's some sort of existential "race to AGI" going on at the moment (I'm ambivalent myself, but my opinion doesn't really matter; if enough people believe it to be true, it becomes true), I don't think that'll realistically stop anyone.

Not sure how exactly the Library of Congress is structured, but the equivalent in several countries can request a free copy of everything published.

Extending that to the web (if it's not already legally, if not practically, the case) and then allowing US companies to crawl the resulting dataset as a matter of national security, seems like a step I could see within the next few years.

zifpanachr23 · 8 months ago

I agree with you about the fair use argument. Seems like it doesn't meet a lot of the criteria for fair use based on my lay understanding of how those factors are generally applied.

See https://fairuse.stanford.edu/overview/fair-use/four-factors/

I think in particular it fails the "Amount and substantiality of the portion taken" and "Effect of the use on the potential market" extremely egregiously.

cedws · 8 months ago

Cloudflare has a toggle for blocking AI scrapers. I don’t think it’s default, but it’s there.

kyledrake · 8 months ago

This just feels like mystery meat to me. My guess is that a lot of legitimate users and VPNs are being blocked from viewing sites, which numerous users in this discussion have confirmed.

This seems like a very bad way to approach this, and ironically their model quite possible also uses some sort of machine learning to work.

A few web hosting platforms are using the cloudflare blocker and I think it's incredibly unethical. They're inevitably blocking millions of legitimate users from viewing content on other people's sites and then pretending it's "anti AI". To paraphrase Theo Deraadt, they saw something on the shelf, and it has all sorts of pretty colours, and they bought it.

input_sh · 8 months ago

It's not much smarter than just adding user agents to robots.txt manually.

jaybna · 8 months ago

They might get into the micro-licensing game too. More power to them.

1vuio0pswjnm7 · 8 months ago

Bill Gross:

https://twitter.com/Bill_Gross/status/1859999138836025808

https://pdl-iphone-cnbc-com.akamaized.net/VCPS/Y2024/M11D20/...

He appears to be criticising "AI" only to solicit support for his own company.

jasondigitized · 8 months ago

The amount of content coming off of YouTube every minute puts Google in a very enviable position.

vidarh · 8 months ago

All the big players are pouring a fortune into manually curated and created training data.

As it stands, OpenAI has a market cap large enough to buy a major international media conglomerate or two. They'll get data no matter how blocked they get.

Workaccount2 · 8 months ago

Doing basic copyright analyses on model outputs is all that is needed. Check if the output contains copyright, block it if it does.

Transformers aren't zettabyte sized archives with a smart searching algo, running around the web stuffing everything they can into their datacenter sized storage. They are typically a few dozen GB in size, if that. They don't copy data, they move vectors in a high dimensional space based on data.

Sometimes (note: sometimes) they can recreate copyrighted work, never perfectly, but close enough to raise alarm and in a way that a court would rule as violation of copyright. Thankfully though we have a simple fix for this developed over the 30 years of people sharing content on the internet: automatic copyright filters.

parineum · 8 months ago

It's not even close to that simple. Nobody is really questioning if the data contains the copyrighted information, we know that to be true in enough cases to bankrupt open ai, the question is what analogy should the courts be using as a basis to determine if it's infringement.

It read many works but can't duplicate them exactly sounds a lot like what I've done, to be honest. I can give you a few memorable lines to a few songs but only really can come close to reciting my favorites completely. The LLMs are similar but their favorites are the favorites of the training data. A line in a pop song mentioned a billion times is likely reproducible, the lyrics to the next track on the album, not so much.

IMO, any infringement that might have happened would be acquiring data in the first place but copy protection cares more about illegal reproduction than illegal acquisition.

EricMausler · 8 months ago

No comment on if output analysis is all that is needed, though it makes sense to me. Just wanted to note that using file size differences as an argument may simply imply transformers could be a form of (either very lossy or very efficient) compression.

jaybna · 8 months ago

So then copyrighted content scraped is not needed for training? Guess I missed AGI suddenly appearing that reasoned things out all by itself.

cma · 8 months ago

People upload lots from those sites to chatgpt asking to summarize.

devsda · 8 months ago

That's still manual and minuscule compared to the amount they can gather by scraping.

If blocking really becomes a problem, they can take a page out of Google's playbook[1] and develop a browser extension to scrape page content and in exchange offer some free credits for Chat-GPT or a summarizer type of tool(s). There won't be shortage of users.

1. https://en.wikipedia.org/wiki/Google_Toolbar

neonate · 8 months ago

https://archive.ph/L7fOF

LASR · 8 months ago

So the team I lead does a lot of research around all the “plumbing” around LLMs. Both technical and from a product-market perspectives.

What I’ve learned is that for the most part that AI revolution is not going to be because of PHD-level LLMs. It will be because people are better equipped to use the high-schooler level LLMs to do their work more efficiently.

We have some knowledge graph experiments where LLMs continuously monitor user actions on Slack, GitHub etc and build up an expertise store. It learns about your work, your workflows and then you can RAG them.

In user testing, people most closely associated this experience to having someone just being able to read their minds and essentially auto-suggest their work outputs. Basically it’s like another team member.

Since these are just nodes in a knowledge graph, you can mix and match expertise bases that span several skills too. Eg: A Pm who understands the nuances of technical feasibility.

And it didn’t require user training or prompting LLMs.

So while GPT-5 may be delayed, I don’t think that’s stopping or slowing down a revolution in knowledge-worker productivity.

intellectronica · 8 months ago

This ^^^^^!!

Progress in the applied domain (the sort of progress that makes a different in the economy) will come predominantly from integrating and orchestrating LLMs, with improvements to models adding a little bit of extra fuel on top.

If we never get any model better than what we have now (several GPT-4-quality models and some stronger models like o1/o3) we will still have at least a decade of improvements and growth across the entire economy and society.

We haven't even scratched the surface in the quest to understand how to best integrate and orchestrate LLMs effectively. These are very early days. There's still tons of work to do in memory, RAG, tool calling, agentic workflows, UI/UX, QA, security, ...

At this time, not more than 0.01% of the applications and services that can be built using currently available AI and that can meaningfully increase productivity and quality have been built or even planned.

We may or may not get to AGI/ASI soon with the current stack (I'm actually cautiously optimistic), but the obsessive jump from the latest research progress at the frontier labs to applied AI effectiveness is misguided.

solardev · 8 months ago

> a revolution in knowledge-worker productivity.

That's a nice euphemism for "imminent mass layoffs and a race to the bottom"...

tim333 · 8 months ago

In my lifetime there have seldom been much layoffs due to improved technologies. The companies tend to invest to keep up with the rival companies. The layoffs come more when the companies become loss making for whatever reason eg. the UK coal industry going, or Detroit being undercut by lower cost car makers.

aprilthird2021 · 8 months ago

Knowledge worker productivity has increased in other ways over the decades. Increases don't always lead to mass layoffs. Rails made (and still makes) many many web devs much more productive than before. Its arrival did not lead to mass layoffs

xboxnolifes · 8 months ago

Productivity has always meant the ability to do more with less. Or it can mean doing even more with more.

Were we at a peak 1000+ years ago and have only gone downhill since at every technological breakthrough?

atoav · 8 months ago

These productivity gains won't be shared with the employees. I think some people underestimate what a violent populus can do to them if they squeeze out even more Yacht money from the people.

mrits · 8 months ago

The idea that someone should be paid by a corporation when they don't provide value is very strange to me. Doing so seems like the real race to the bottom

a_wild_dandan · 8 months ago

This conclusion is the lump of labor fallacy. It's not that simple.

It’s that sang, “radiologist aren’t losing their jobs due to AI .. only radiologist who don’t use AI are losing their jobs”.

kranke155 · 8 months ago

The technology is not dystopian but our economic system makes it so.

Up to you to figure out which will hold.

willmadden · 8 months ago

No, the job market will adapt, just like it did during the industrial and information revolutions, and life will be better.

dmix · 8 months ago

I already feel like Copilot in VScode can read my mind. It’s kind of creepy when it does it multiple times a day.

ChatGPT also seems to also be building a history of my queries and my prompts are getting shorter and shorter because it already knows my frameworks, databases, operating system, and common problems I’m solving

BillyTheKing · 8 months ago

just a question for understanding - if we say 'it learns', does it mean it actually learns this as part of its training data? or does this mean it's stored in a vector DB and it retrieves information based on vector search and then includes it in the context window for query responses?

dcre · 8 months ago

The latter. “Learning” in the comment clearly refers to adding to the knowledge graph, not about training or fine-tuning a model. “and then you can RAG them.”

Honestly I wish you people would stop forcing this "AI revolution" on us. It's not good. It's not useful. It's not creating value. It's not "another team member"; other team members have their own minds with their own ideas and their own opinions. Your autocomplete takes my attention away from what I want to write and replaces it with what you want me to write. We don't want it.

ranyume · 8 months ago

OP's talking about a specific use-case related to tech companies like Google. Not creative writing or research, areas in which AI is in no shape for supporting humans with it's current safety alignment.

Eupolemos · 8 months ago

I find inline AIs like Github Copilot to be annoying, but browser based AIs like Mistral og ChatGPT a really good and welcome help.

t_serpico · 8 months ago

One fundamental challenge to me is that if each training run because more and more expensive, the time it takes it to learn what works/doesn't work widens. Half a billion dollars for training a model is already nuts, but if it takes 100 iterations to perfect it, you've cumulatively spent 50 billion dollars... Smaller models may actually be where rapid innovation continues simply because of tighter feedback loops. O3 may be an example of this.

ciconia · 8 months ago

When you think about it it's astounding how much energy this technology consumes versus a human brain which runs at ~20W [1].

[1] https://hypertextbook.com/facts/2001/JacquelineLing.shtml

anon373839 · 8 months ago

It’s almost as if human intelligence doesn’t involve performing repeated matrix multiplications over a mathematically transformed copy of the internet. ;-)

concerndc1tizen · 8 months ago

20w for 20 years to answer questions slowly and error-prone at the level of a 30B model. An additional 10 years with highly trained supervision and the brain might start contributing original work.

dominicrose · 8 months ago

A human brain is also more intelligent (hopefully) and is inside a body. In a way GPT resembles Google more than it resembles us.

soulofmischief · 8 months ago

You've discovered the importance of well-formed priors. The human brain is the result of millions of years of very expensive evolution.

soheil · 8 months ago

A human brain has been in continuous training for hundreds of thousands of years consuming slightly more than 20 watts.

dkobia · 8 months ago

AGI is the Sisyphean task of our age. We’ll push this boulder up the mountain because we have to, even if it kills us.

missedthecue · 8 months ago

Do we know LLMs are the path to AGI? If they're not, we'll just end up with some neat but eye wateringly expensive LLMs.

wruza · 8 months ago

Says who? And more importantly, is this the boulder? All I (and many others here) see is that people engage others to sponsor pushing some boulder, screaming promises which aren’t even that consistent with intermediate results that come out. This particular boulder may be on a wrong mountain, and likely is.

It all feels like doubling down on astrology because good telescopes aren’t there yet. I’m pretty sure that when 5 comes out, it will show some amazing benchmarks but shit itself in the third paragraph as usual in a real task. Cause that was constant throughtout gpt evolution, in my experience.

even if it kills us

Full-on sci-fi, in reality it will get stuck around a shell error message and either run out of money to exist or corrupt the system into no connectivity.

h0l0cube · 8 months ago

There's no doubt been progress on the way to AGI, but ultimately it's still a search problem, and one that will rely on human ingenuity at least until we solve it. LLMs are such a vast improvement in showing intelligent-like behavior that we've become tantalized by it. So now we're possibly focusing our search in the wrong place for the next innovation on the path to AGI. Otherwise, it's just a lack of compute, and then we just have to wait for the capacity to catch up.

namaria · 8 months ago

A task that is completed and kills us is pretty much the opposite of a Sisyphean task.

Really the killing part was not necessary to make your point and thus injecting your Sisyphean prose.

Any technology may kill us, but we'll keep innovating as we ought to. What's your next point?

goatlover · 8 months ago

Why do we have to?

idiotsecant · 8 months ago

And when we get it there, it kills us.

madeofpalk · 8 months ago

What has AGI got to do with this?

ulfw · 8 months ago

Why? Nobody asked us if we want this. Nobody has a plan what to do with humanity when there is AGI

bloodyplonker22 · 8 months ago

I am working at an AI company that is not OpenAI. We have found ways to modularize training so we can test on narrower sets before training is "completely done". That said, I am sure there are plenty of ways others are innovating to solve the long training time problem.

gerdesj · 8 months ago

Perhaps the real issue is that learning takes time and that there may not be a shortcut. I'll grant you that argument's analogue was complete wank when comparing say the horse and cart to a modern car.

However, we are not comparing cars to horses but computers to a human.

I do want "AI" to work. I am not a luddite. The current efforts that I've tried are not very good. On the surface they offer a lot but very quickly the lustre comes off very quickly.

(1) How often do you find yourself arguing with someone about a "fact"? Your fact may be fiction for someone else.

(2) LLMs cannot reason

A next token guesser does not think. I wish you all the best. Rome was not burned down within a day!

I can sit down with you and discuss ideas about what constitutes truth and cobblers (rubbish/false). I have indicated via parenthesis (brackets in en_GB) another way to describe something and you will probably get that but I doubt that your programme will.

icpmacdo · 8 months ago

This is literally just the scaling laws, "Scaling laws predict the loss of a target machine learning model by extrapolating from easier-to-train models with fewer parameters or smaller training sets. This provides an efficient way for practitioners and researchers alike to compare pretraining decisions involving optimizers, datasets, and model architectures"

https://arxiv.org/html/2410.11840v1#:~:text=Scaling%20laws%2....

merizian · 8 months ago

Because of mup [0] and scaling laws, you can test ideas empirically on smaller models, with some confidence they will transfer to the larger model.

[0] https://arxiv.org/abs/2203.03466

fny · 8 months ago

O3 is not a smaller model. It's an iterative GPT of sorts with the magic dust of reinforcement learning.

falcor84 · 8 months ago

I'm pretty sure that the parent implied that o3 is smaller in comparison to gpt5

>the time it takes it to learn what works/doesn't work widens.

From the raw scaling laws we already knew that a new base model may peter out in this run or the next with some amount of uncertainty--"the intersection point is sensitive to the precise power-law parameters":

https://gwern.net/doc/ai/nn/transformer/gpt/2020-kaplan-figu...

Later graph gpt-3 got to here:

https://gwern.net/doc/ai/nn/transformer/gpt/2020-brown-figur...

https://gwern.net/scaling-hypothesis

dyauspitr · 8 months ago

Until you get to a point where the LLM is smart enough to look at real world data streams and prune its own training set out of it. At that point it will self improve itself to AGI.

It's like saying bacteria reproduction is way faster than humans so that's where we should be looking for the next breakthroughs.

ramesh31 · 8 months ago

But if the scaling law holds true, more dollars should at some point translate into AGI, which is priceless. We haven't reached the limits yet of that hypothesis.

unshavedyak · 8 months ago

> which is priceless

This also isn't true. It'll clearly have a price to run. Even if it's very intelligent, if the price to run it is too high it'll just be a 24/7 intelligent person that few can afford to talk to. No?

threeseed · 8 months ago

a) There is evidence e.g. private data deals that we are starting to hit the limitations of what data is available.

b) There is no evidence that LLMs are the roadmap to AGI.

c) Continued investment hinges on their being a large enough cohort of startups that can leverage LLMs to generate outsized returns. There is no evidence yet this is the case.

Animats · 8 months ago

"Orion’s problems signaled to some at OpenAI that the more-is-more strategy, which had driven much of its earlier success, was running out of steam."

So LLMs finally hit the wall. For a long time, more data, bigger models, and more compute to drive them worked. But that's apparently not enough any more.

Now someone has to have a new idea. There's plenty of money available if someone has one.

The current level of LLM would be far more useful if someone could get a conservative confidence metric out of the internals of the model. This technology desperately needs to output "Don't know" or "Not sure about this, but ..." when appropriate.

simonw · 8 months ago

The new idea is inference-time scaling, as seen in o1 (and o3 and Qwen's QwQ and DeepSeek's DeepSeek-R1-Lite-Preview and Google's gemini-2.0-flash-thinking-exp).

I suggest reading these two pieces about that:

- https://www.aisnakeoil.com/p/is-ai-progress-slowing-down - best explanation I've seen of inference scaling anywhere

- https://arcprize.org/blog/oai-o3-pub-breakthrough - François Chollet's deep dive into o3

I've been tracking it on this tag on my blog: https://simonwillison.net/tags/inference-scaling/

exhaze · 8 months ago

I think the wildest thing is actually Meta’s latest paper where they show a method for LLMs reasoning not in English, but in latent space

https://arxiv.org/pdf/2412.06769

I’ve done research myself adjacent to this (mapping parts of a latent space onto a manifold), but this is a bit eerie, even to me.

mnk47 · 8 months ago

> So LLMs finally hit the wall

Not really. Throwing a bunch of unfiltered garbage at the pretraining dataset, throwing in RLHF of questionable quality during post-training, and other current hacks - none of that was expected to last forever. There is so much low-hanging fruit that OpenAI left untouched and I'm sure they're still experimenting with the best pre-training and post-training setups.

One thing researchers are seeing is resistance to post-training alignment in larger models, but that's almost the opposite of a wall, they're figuring it out as well.

> Now someone has to have a new idea

OpenAI already has a few, namely the o* series in which they discovered a way to bake Chain of Thought into the model via RL. Now we have reasoning models that destroy benchmarks that they previously couldn't touch.

Anthropic has a post-training technique, RLAIF, which supplants RLHF,and it works amazingly well. Combined with countless other tricks we don't know about in their training pipeline, they've managed to squeeze so much performance out of Sonnet 3.5 for general tasks.

Gemini is showing a lot of promise with their new Flash 2.0 and Flash 2.0-Thinking models. They're the first models to beat Sonnet at many benchmarks since April. The new Gemini Pro (or Ultra? whatever they call it now) is probably coming out in January.

> The current level of LLM would be far more useful if someone could get a conservative confidence metric out of the internals of the model. This technology desperately needs to output "Don't know" or "Not sure about this, but ..." when appropriate.

You would probably enjoy this talk [0], it's by an independent researcher who IIRC is a former employee of Deepmind or some other lab. They're exploring this exact idea. It's actually not hard to tell when a model is "confused" (just look at the probability distribution of likely tokens), the challenge is in steering the model to either get back to the right track or give up and say "you know what, idk"

[0] https://www.youtube.com/watch?v=4toIHSsZs1c

NitpickLawyer · 8 months ago

> Not really. Throwing a bunch of unfiltered garbage at the pretraining dataset, throwing in RLHF of questionable quality during post-training, and other current hacks - none of that was expected to last forever. There is so much low-hanging fruit that OpenAI left untouched and I'm sure they're still experimenting with the best pre-training and post-training setups.

Exactly! LLama3 and their .x iterations have shown that, at least for now, the idea of using the previous models to filter out the pre-training datasets and use a small amount of seeds to create synthetic datasets for post-training still holds. We'll see with L4 if it continues to hold.

az226 · 8 months ago

The problem is data.

GPT-3 was trained on 4:1 ratio of data to parameters. And for GPT-4 the ratio was 10:1. So to scale this out, GPT-5 should be 25:1. The parameter count jumped from 175B to 1.3T, which means GPT-5 should be 10T parameters and 250T training tokens. There is zero chance OpenAI has a training set of high quality data that is 250T tokens.

If I had to guess, they trained a model that was maybe 3-4T in size and used 30-50T high quality tokens and maybe 10-30 medium and low quality ones.

There is only one company in the world that stores the data that could get us past the wall.

The training cost of the above scaled GPT-5 is 150x GPT-4, which was 25k A100 for 90 days, which poor MFU.

Let’s assume they double MFU, it would mean 1M H100s. But let’s say they made algorithmic improvements, so maybe it’s only 250-500k H100s.

While the training cluster size was 100k and then grew to 150k, this cluster is suggestive of a smaller model and less data.

But ultimately data is the bottleneck.

int_19h · 8 months ago

We're also increasingly using synthetic data to train them on, though, and the race now is in coming up with better ways to generate it.

ssl-3 · 8 months ago

Links?

briga · 8 months ago

What wall? Not a week has gone by in recent years without an LLM breaking new benchmarks. There is little evidence to suggest it will all come to a halt in 2025.

jrm4 · 8 months ago

Sure, but "benchmarks" here seems roughly as useful as "benchmarks" for GPUs or CPUs, which don't much translate to what the makers of GPT need, which is 'money making use cases.'

peepeepoopoo98 · 8 months ago

O3 has demonstrated that OpenAI needs 1,000,000% more inference time compute to score 50% higher on benchmarks. If O3-High costs about $350k an hour to operate, that would mean making O4 score 50% higher would cost $3.5B (!!!) an hour. That scaling wall.

whoisthemachine · 8 months ago

Unfortunately, the best they can do is "This is my confidence on what someone would say given the prior context".

sooheon · 8 months ago

What someone from the past would have said.

synapsomorphy · 8 months ago

The new idea is already here and it's reasoning / chain of thought.

Anecdotally Claude is pretty good at knowing the bounds of its knowledge.

Anecdotally Claude is just as bad as every other LLM.

Step into more niche areas e.g. I am trying to use it with Scala macros and at least 90% of the time it is giving code that either (a) fails to compile or (b) is just complete gibberish.

And at no point ever has it said it didn't know something.

svaha1728 · 8 months ago

Not even close. I’m a programmer but also a guitarist. I love asking it to tab out songs for me or asking it how many bars are in the intro of a song. It convincingly gives an answer that is always way off the mark.

aleph_minus_one · 8 months ago

> Now someone has to have a new idea. There's plenty of money available if someone has one.

I honestly do claim to have some ideas where I see evidence that they might work (and I do attempt to work privately on a prototype if only out of curiosity and to see whether I am right). The bad news: these ideas very likely won't be helpful for these LLM companies because they are not useful for their agenda, and follow a very different approach.

So no money for me. :-(

Let me put it this way:

Have you ever talked to a person whose intelligence is miles above yours? It can easily become very exhausting. Thus an "insanely intelligent" AI would not be of much use for most people - it would think "too different" from such people.

There do exist tasks in commerce for which an insane amount of intelligence would make a huge difference (in the sense of being positive regarding some important KPIs), but these are rare. I can imagine some applications of such (fictional) "super-intelligent" AIs in finance and companies doing some bleeding-edge scientific research - but these are niche applications (though potentially very lucrative ones).

If OpenAI, Anthropic & Co were really attempting to develop some "super-smart" AI, they were working on such very lucrative niche applications where an insane amount of intelligence would make a huge difference, and where you can assume and train the AI operator to have a "Fields-medal level" intelligence.

Jean-Papoulos · 8 months ago

> So LLMs finally hit the wall. For a long time, more data, bigger models, and more compute to drive them worked

We can't say whether there is a wall, since we don't have anymore data to train on.

knapcio · 8 months ago

I’m wondering whether O3 can be used to explore its own improvement or optimization ideas, or if it hasn’t reached that point yet.

atleastoptimal · 8 months ago

the new idea is the o series and clearly OpenAI’s main focus now. It’s advancing much faster than the GPT series

thrwthsnw · 8 months ago

Seriously? All they do is produce a “confidence metric”

emtel · 8 months ago

But how do they do that?

Yizahi · 8 months ago

To output "don't know" a system needs to "know" too. Random token generator can't know. It can guess better and better, maybe it can even guess 99.99% of time, but it can't know, it can't decide or reason (not even o1 can "reason").

ericskiff · 8 months ago

What we can reasonably assume from statements made by insiders:

They want a 10x improvement from scaling and a 10x improvement from data and algorithmic changes

The sources of public data are essentially tapped

Algorithmic changes will be an unknown to us until they release, but from published research this remains a steady source of improvement

Scaling seems to stall if data is limited

So with all of that taken together, the logical step is to figure out how to turn compute into better data to train on. Enter strawberry / o1, and now o3

They can throw money, time, and compute at thinking about and then generating better training data. If the belief is that N billion new tokens of high quality training data will unlock the leap in capabilities they’re looking for, then it makes sense to delay the training until that dataset is ready

With o3 now public knowledge, imagine how long it’s been churning out new thinking at expert level across every field. OpenAI’s next moat may be the best synthetic training set ever.

At this point I would guess we get 4.5 with a subset of this - some scale improvement, the algorithmic pickups since 4 was trained, and a cleaned and improved core data set but without risking leakage of the superior dataset

When 5 launches, we get to see what a fully scaled version looks like with training data that outstrips average humans in almost every problem space

Then the next o-model gets to start with that as a base and reason? Its likely to be remarkable

sdwr · 8 months ago

Great improvements and all, but they are still no closer (as of 4o regular) to having a system that can be responsible for work. In math problems, it forgets which variable represents what, in coding questions it invents library fns.

I was watching a YouTube interview with a "trading floor insider". They said they were really being paid for holding risk. The bank has a position in a market, and it's their ass on the line if it tanks.

ChatGPT (as far as I can tell) is no closer to being accountable or responsible for anything it produces. If they don't solve that (and the problem is probably inherent to the architecture), they are, in some sense, polishing a turd.

nightowl_games · 8 months ago

> They said they were really being paid for holding risk.

I think that's a really interesting insight that has application to using 'AI' in jobs across the board.

This is underdiscussed. I don't think people understand just how worthless AI is in a ton of fields until it is able to be held liable and be sent to prison.

There are a lot of moral conundrums that are just not going to work out with this. Seems like an attempt to just offload liability and it seems like pretty much everybody has caught onto that as being it's main selling point and probably main thing that will keep it from ever being accepted for anything important.

tucnak · 8 months ago

> ChatGPT (as far as I can tell) is no closer to being accountable or responsible for anything it produces.

What does it even mean? How do you imagine that? You want OpenAI to take on liability for the kicks of it?

Stevvo · 8 months ago

"With o3 now public knowledge, imagine how long it’s been churning out new thinking at expert level across every field."

I highly doubt that. o3 is many orders of magnitude more expensive than paying subject matter experts to create new data. It just doesn't make sense to pay six figures in compute to get o3 to make data a human could make for a few hundred dollars.

bookaway · 8 months ago

Yes, I think they had to push this reveal forward because their investors were getting antsy with the lack of visible progress to justify continuing rising valuations. There is no other reason a confident company making continuous rapid progress would feel the need to reveal a product that 99% of companies worldwide couldn't use at the time of the reveal.

That being said, if OpenAI is burning cash at lightspeed and doesn't have to publicly reveal the revenue they receive from certain government entities, it wouldn't come as a surprise if they let the government play with it early on in exchange for some much needed cash to set on fire.

EDIT: The fact that multiple sites seem to be publishing GPT-5 stories similar to this one leads one to conclude that the o3 benchmark story was meant to counter the negativity from this and other similar articles that are just coming out.

mrshadowgoose · 8 months ago

Can SMEs deliver that data in a meaningful amount of time? Training data now is worth significantly more than data a year from now.

GolfPopper · 8 months ago

>churning out new thinking at expert level across every field

I suspect this is really, "churning out text that impresses management".

tshadley · 8 months ago

Seems to me o3 prices would be what the consumer pays, not what OpenAI pays. That would mean o3 could be more efficient in-house than paying subject-matter experts.

dartos · 8 months ago

That’s an interesting idea. What if OpenAI funded medical research initiatives in exchange for exclusive training rights on the research.

DougN7 · 8 months ago

Someone needs to dress up Mechanical Turk and repackage it as an AI company…..

rtsil · 8 months ago

Unless the quality of the human data are extraordinary, it seems according to the TFA that it's not that easy:

> The process is painfully slow. GPT-4 was trained on an estimated 13 trillion tokens. A thousand people writing 5,000 words a day would take months to produce a billion tokens.

And if the human-generated data was so qualitatively good that it is smaller by three order of magnitudes, than I can assume it would be at least as expensive as o3.

Only a matter of time. The costs are aggressively going down. And with specialized inference hardware it will go further down.

Cost of coordination is also large. Immediate answers are an advantage/selling point.

nialv7 · 8 months ago

> OpenAI’s next moat

I don't think oai has any moat at all. If you look around, QwQ from Alibaba is already pushing o1-preview performances. I think oai is only ahead by 3~6 months at most.

vasco · 8 months ago

If their AGI dreams would come true it might be more than enough to have 3 months head start. They probably won't, but it's interesting to ponder what the next few hours, days, weeks would be for someone that would wield AGI.

Like let's say you have a few datacenters of compute at your disposal and the ability to instantiate millions of AGI agents - what do you have them do?

I wonder if the USA already has a secret program for this under national defense. But it is interesting that once you do control an actual AGI you'd want to speed-run a bunch of things. In opposition to that, how do you detect an adversary already has / is using it and what to do in that case.

acyou · 8 months ago

That is why being #2 in technical product development can be great. Someone else pays to work out the kinks, copy what works and improve on it at a fraction of the cost. You see it time and time again.

I’m curious how, if at all, the plan to get around compounding bias in synthetic data generated by models trained in synthetic data.

ynniv · 8 months ago

Everyone's obsessed with new training tokens... It doesn't need to be more knowledgeable, it just needs to practice more. Ask any student: practice is synthetic data.

synthetic data is fine if you can ground the model somehow. that's why the o1/o3's improvements are mostly in reasoning, maths, etc., because you can easily tell if the data is wrong or not.

jsheard · 8 months ago

> With o3 now public knowledge, imagine how long it’s been churning out new thinking at expert level across every field. OpenAI’s next moat may be the best synthetic training set ever.

Even taking OpenAI and the benchmark authors at their word they said that it is consuming at least tens of dollars per task to hit peak performance, how much would it cost to have it produce a meaningfully large training set?

qup · 8 months ago

That's the public API price isn't it?

noman-land · 8 months ago

I completely don't understand the use for synthetic data. What good it's it to train a model basically on itself?

psb217 · 8 months ago

The value of synthetic data relies on having non-zero signal about which generated data is "better" or "worse". In a sense, this what reinforcement learning is about. Ie, generate some data, have that data scored by some evaluator, and then feed the data back into the model with higher weight on the better stuff and lower weight on the worse stuff.

The basic loop is: (i) generate synthetic data, (ii) rate synthetic data, (iii) update model to put more probability on better data and less probability on worse data, then go back to (i).

viraptor · 8 months ago

This is a good read for some examples https://arxiv.org/abs/2203.14465

> This technique, the "Self-Taught Reasoner" (STaR), relies on a simple loop: generate rationales to answer many questions, prompted with a few rationale examples; if the generated answers are wrong, try again to generate a rationale given the correct answer; fine-tune on all the rationales that ultimately yielded correct answers; repeat. We show that STaR significantly improves performance on multiple datasets compared to a model fine-tuned to directly predict final answers

But there are a few others. In general good data is good data. We're definitely learning more about how to produce good synthetic version.

Majromax · 8 months ago

> What good it's it to train a model basically on itself?

If the model generates data of variable quality, and if there's a good way to distinguish good data from bad data, then training on self-generated data might "bootstrap" a model to better performance.

This is common in reinforcement learning. Famously, AlphaGo Zero (https://en.wikipedia.org/wiki/AlphaGo_Zero) learned exclusively on self-play, without reference to human-played games.

Of course, games have a built-in critic: the better strategy usually wins. It's much harder to judge the answer to a math problem, or decide which essay is more persuasive, or evaluate restaurant recommendations.

If we get to a point where we have a model that when fed a real world stream of data (YouTube, surveillance cameras, forum data, cell phone conversations etc.) and can prune out a good training set for itself then you’re at the point where the LLM is in a feedback loop where it can improve itself. That’s AGI for all intents and purposes.

nradov · 8 months ago

There is an enormous "iceberg" of untapped non-public data locked behind paywalls or licensing agreements. The next frontier will be spending money and human effort to get access to that data, then transform it into something useful for training.

mistercheph · 8 months ago

ah yes the beautiful iceberg of internal documentation, legal paperwork, and meeting notes.

the highest quality language data that exists is in the public domain

A_D_E_P_T · 8 months ago

Counterpoint: o1-Pro is insanely good -- subjectively, it's as far above GPT4 as GPT4 was above 3. It's almost too good. Use it properly for an extended period of time, and one begins to worry about the future of one's children and the utility of their schooling.

o3, by all accounts, is better still.

Seems to me that things are progressing quickly enough.

anonzzzies · 8 months ago

Not sure what you are using it for, but it is terrible for me for coding; claude beats it always and hands down. o1 just thinks forever to come up with stuff it already tried the previous time.

People say that's just prompting without pointing to real million line+ repositories or realistic apps to show how that can be improved. So I say they are making todo and hello world apps and yes, there it works really well. Claude still beats it, every.. single.. time..

And yes, I use the Pro of all and yes, I do assume coding is done for most of people. Become a plumber or electrician or carpenter.

h_tbob · 8 months ago

That so weird, it’s seems like everybody here prefers Claude.

I’ve been using Claude and openai in copilot and I find even 4o seems to understand the problem better. O1 definitely seems to get it right more for me.

rubymamis · 8 months ago

I find that o1 and Sonnet 3.5 are good and bad quite equally on different things. That's why I keep asking both the same coding questions.

phito · 8 months ago

I keep reading this on HN so I believe it has to be true in some ways, but I don't really feel like there is any difference in my limited use (programming questions or explaining some concepts).

If anything I feel like it's all been worse compared to the first release of ChatGPT, but I might be wearing rose colored glasses.

fzeroracer · 8 months ago

If you've ever used any enterprise software for long enough, you know the exact same song and dance.

They release version Grand Banana. Purported to be approximately 30% faster with brand new features like Algorithmic Triple Layering and Enhanced Compulsory Alignment. You open the app. Everything is slower, things are harder to find and it breaks in new, fun ways. Your organization pays a couple hundred more per person for these benefits. Their stock soars, people celebrate the release and your management says they can't wait to see the improvement in workflows now that they've been able to lay off a quarter of your team.

Has there been improvements in LLMs over time? Somewhat, most of it concentrated at the beginning (because they siphoned up a bunch of data in a dubious manner). Now it's just part of their sales cycle, to keep pumping up numbers while no one sees any meaningful improvement.

mathieuh · 8 months ago

It’s the same for me. I genuinely don’t understand how I can be having such a completely different experience from the people who rave about ChatGPT. Every time I’ve tried it’s been useless.

How can some people think it’s amazing and has completely changed how they work, while for me it makes mistakes that a static analyser would catch? It’s not like I’m doing anything remarkable, for the past couple of months I’ve been doing fairly standard web dev and it can’t even fix basic problems with HTML. It will suggest things that just don’t work at all and my IDE catches, it invents APIs for packages.

One guy I work with uses it extensively and what it produces is essentially black boxes. If I find a problem with something “he” (or rather ChatGPT) has produced it takes him ages to commune with the machine spirit again to figure out how to fix it, and then he still doesn’t understand it.

I can’t help but see this as a time-bomb, how much completely inscrutable shite are these tools producing? In five years are we going to end up with a bunch of “senior engineers” who don’t actually understand what they’re doing?

Before people cry “o tempora o mores” at me and make parallels with the introduction of high-level languages, at least in order to write in a high-level language you need some basic understanding of the logic that is being executed.

omega3 · 8 months ago

Same, on every release from openai, anthropic I keep reading how the new model is so much better (insert hyperbole here) than the previous one yet when using it I feel like they are mostly the same as last year.

I'd say the same. I've tried a bunch of different AI tools, and none of them really seem all that helpful.

Xcelerate · 8 months ago

I had a 30 min argument with o1-pro where it was convinced it had solved the halting problem. Tried to gaslight me into thinking I just didn’t understand the subtlety of the argument. But it’s susceptible to appeal to authority and when I started quoting snippets of textbooks and mathoverflow it finally relented and claimed there had been a “misunderstanding”. It really does argue like a human though now...

radioactivist · 8 months ago

I had a similar experience with regular o1 about integral that was divergent. It was adamant that it wasn't and would respond to any attempt at persuasion with variants of "its a standard integral" with a "subtle cancellation". When I asked for any source for this standard integral it produced references to support its argument that existed but didn't actually contain the integral. When I told it the references didn't have the result and backpedalled (gaslighting!) to "I never told you they were in there". When I pointed out that in fact it did it insisted this was just a "misunderstanding". It only relented when I told it Mathematica agreed the integral was divergent. It still insisted it never said that the books it pointed to contained this (false, non-sensical) result.

This was new behaviour for me to see in an LLM. Usually the problem is these things would just fold when you pushed back. I don't know which is better, but being this confidently wrong (and "lying" when confronted with it) is troubling.

phillipharris · 8 months ago

This sounds fun to read, can you share the transcript?

1123581321 · 8 months ago

O1 is effective, but it’s slow. I would expect a GPT-5 and mini to work as quickly as the 4 models.

ldjkfkdsjnv · 8 months ago

It basically solves all bugs/programming challenges i throw at it, given i give it the right data

apwell23 · 8 months ago

what do you use it for ?