Hot take on OpenAI’s new GPT-4o

> If OpenAI had GPT-5, they have would shown it.

Not convinced. They had GPT-4 for ages before showing it.

>And the most important thing about the figure is that 4o is not a lot different from Turbo, which is not hugely different from 4.

It's twice as fast and half the cost.

>OpenAI has presumably pivoted to new features precisely because they don’t know how produce the kind of capability advance that the “exponential improvement” would have predicted.

Or they think that users would rather have a faster GPT-4 than a smarter but slower one.

londons_explore · a year ago

> It's twice as fast and half the cost.

There are plenty of usecases that just want the smartest possible AI, and could easily afford 1000x the price before paying a human expert.

How much would you pay a lawyer to write a letter for you? $200 maybe? Well GPT-4o will do it for $0.0025. Would you pay 1000x more ($2.50) for an AI which will shoot you with fewer legal footguns? - of course you would.

Likewise with time, I am totally happy to wait 10 minutes for a better result, rather than having a mediocre result in 10 seconds.

paulcole · a year ago

> Likewise with time, I am totally happy to wait 10 minutes for a better result, rather than having a mediocre result in 10 seconds

theres a massive gulf separating what people say they want and what people want. As well as a massive gulf separating what they will pay and what they say they’ll pay.

while there are plenty of situations where you think you want the smartest possible ai, it’s clear that a pretty smart pretty quick one is good enough for a lot of things.

muyuu · a year ago

expanding to a high revenue but small niche is very challenging, and most importantly it's slow

if some company is going to use a GPT version to do critical work, they'd be insane to jump on it before testing the solution and gaining confidence for a long time, while also evaluating how to exactly incorporate in their workflow without creating total dependency and other such ancillary issues

you would pay 1000x more for the certainty, which you can never have right away - together with the legal liability of the service provider

sebzim4500 · a year ago

It could be that using prompt engineering/search techniques like CoT or ToT gives a bigger improvement than scaling up the model.

reissbaker · a year ago

And not only is it twice as fast at half the cost, while benchmarking better... It also is multimodal over audio, including understanding turn-taking and ad hoc interruptions in human conversation and recognizing multiple speakers. That's a lot of extra capabilities — not every advancement is measured in logical reasoning on text.

Gary Marcus was an advocate of neuro-symbolic approaches to AI, as opposed to deep learning, and he's been incorrectly predicting that "Deep learning is hitting a wall" since the GPT-2 days. His Substack is near-daily low-content negative articles about LLMs, and on Twitter he posts multiple times a day about it and hasn't stopped for years.

birracerveza · a year ago

> Or they think that users would rather have a faster GPT-4 than a smarter but slower one.

And they are absolutely right.

GPT-4 is already much more than enough for 90% of tasks while maintaining a sane dose of human double-checking.

Making it faster enables real-time workflows, and better energy efficiency also gives them much more capability to serve more requests/users while lowering costs. And that knowledge likely carries over to GPT-5 or whatever.

somnic · a year ago

My duckduckgo results are starting to have summaries that do not reflect the content of the associated site and contain plausible falsehoods, courtesy of bing, and the content-farming keyword-spamming AI generated SEO slop goes without saying at this point. It'd be very nice if these models weren't also polluting the resources that people use to try and verify things.

jgalt212 · a year ago

> GPT-4 is already much more than enough for 90% of tasks while maintaining a sane dose of human double-checking.

There's the rub. Does the cost of double checking a 90% solution best the cost of current methods?

gtirloni · a year ago

So as the article is suggesting, Gen AI has probably peaked.

kappuchino · a year ago

> It's twice as fast and half the cost.

is a great trick with language. Because if you just pay for computing time, half the time equals half the cost. But when you phrase it that way, it sounds for some people like 4 times. So ...

sebzim4500 · a year ago

For a lot of things you need to pay more for more speed, so that's why I specified both.

Otherwise they could probably make it way faster but also much more expensive by doing less batching, speculative decoding, etc.

Dead Comment

As I wrote only yesterday, "the usual critics will point out that LLMs like GPT-4o still have a lot of failure modes and suffer from issues that remain unresolved. They will point out that we're reaping diminishing returns from Transformers. They will question the absence of a "GPT-5" model. And so on..."[a]

Gary Marcus is one of those critics. He's always ready to "explain" why a new model is not yet intelligent. In the OP, he repeats the usual criticisms, making him sound, ahem, like a stochastic parrot. He largely ignores all the work that has gone into making GPT4-o behave and sound in ways that feel more natural, more human.

My suggestion is to watch the demos of GPT4-o, play with it, and reach your own conclusions. To me, it feels magical. It makes the AIs of many movies look like they are no longer in the realm of science fiction but in the realm of incremental product development.

---

[a] https://news.ycombinator.com/item?id=40346080

infecto · a year ago

It is interesting to me how quickly people jump on the negative bandwagon. "companies are wasting money", "there is no value in LLM", "peak llm"....the list goes on. There is a lot of value to these models and the timeline they are improving is impressive.

cs702 · a year ago

I think the benefits to society are going to be significant, but I suspect that most companies in the space will not recoup their investment. They're losing money or at best breaking even on each token. It's as if the return on their investments is accruing diffusely to the public instead of to them.

cess11 · a year ago

I saw two clips, which I think was such demonstrations. Both reeked of desperation, one was a dude talking with a synthesised voice imitating a young, overly pleasing woman, the other used the same voice for a pretty weird trivial translation between english and italian. Machine translation is quite old by now, and for practical uses you'd want speech to translated text so it doesn't interrupt conversation like in the demonstration.

Sure, I can imagine people with a lot of spare time and few or no friends might want to substitute with a machine, and that they might feel that this has a magic to it. Magical here meaning roughly the same as it does when applied to Disney-land or similar entertainment simulacra.

flexie · a year ago

As someone who actually uses the API for real products, I don't think the OP understands what the reduced latency and reduced cost means: Everything related to building a more advanced RAG, for example building agentic features into it, sooner or later runs into the same issues of speed and cost. GPT-4 Turbo was simply too slow and too expensive for us to really use it fully. GPT-4 is plenty intelligent for many use cases.

Also, why on Earth would OpenAI launch a dramatically better model as long as their competitors don't force them to? The smart solution for OpenAI would be to almost let their competitors catch up to GPT-4 before launching GPT-5 and no competitor is truly there yet.

antupis · a year ago

> Also, why on Earth would OpenAI launch a dramatically better model as long as their competitors don't force them to? The smart solution for OpenAI would be to almost let their competitors catch up to GPT-4 before launching GPT-5 and no competitor is truly there yet.

Is that how Silicon valley has worked like last 20+ years you just deploy fast get customer feedback and then fix stuff based on that feedback. OpenAI holding progress kinda goes against ethos of SV.

ramoz · a year ago

He said nothing controversial and yes the model has digressed. This "flagship model" is fast and cheaper, but it is worse (regardless of benchmarks & charts, subjectivity is failing for me and many others).

The model seems better at logic, but something is off. It is repetitive of the user prompt, gets hooked on logical meaning vs contextual (per provided conversation), and is too concise as if it needs to provide a direct answer. Perhaps all good for a benchmark, code, or simple logic flows/chains (e.g. ai validation - which gpt-3.5 is good at as well). But it's not desirable for advanced/creative content generation.

My anecdotal receipt: None of the 15 functions we run in prod are switching to the 50% off model as product quality is degraded with gpt-4o.

lioeters · a year ago

> it is worse..regardless of benchmarks & charts

It's interesting how this is a fairly common sentiment expressed, at least I heard it often for previous models.

Do you think such weaknesses can be quantified by adding more test cases, and improving the benchmarks? Maybe it would help if they opened up to community contributions, so people could submit test cases that demonstrate issues that they (OpenAI, et al) are currently not seeing.

pieix · a year ago

> My anecdotal receipt: None of the 15 functions we run in prod are switching to the 50% off model as product quality is degraded with gpt-4o.

Interesting anecdata — how did you validate this?

My personal case is a flow/chain of many function calls for creative output. Flipping the switch resulted in noticeably different and undesired results. We tested a ton.

This tweeter normally has a narrative I turn away from, but the thread was useful as it at least made me feel like I wasn’t losing my mind by finding similar accounts. https://x.com/bindureddy/status/1790127425705120149?s=46&t=y...

edmara · a year ago

Gary Marcus was arguing in 2020 that scaling up GPT-2 wouldn't result in improvements in common sense or reasoning. He was wrong, and he continues to be wrong.

It's called the bitter lesson for a reason. Nobody likes seeing their life's work on some unicorn architecture get demolished by simplicity+scale

FileSorter · a year ago

Why is it so hard for models to say "I don't know" or "That never happened"?

This seems to be a fundamental side effect of next token thinking, training data, etc.

navane · a year ago

The corpus of data wherein people say "I don't know" is small. Maybe we should all post more of that.

_heimdall · a year ago

Just include transcripts from Congressional hearings, there's plenty of "I don't recall" written there.

HarHarVeryFunny · a year ago

How would the LLM know when it knows something or not? They don't deal in facts or memories, just next-word probabilities, and even if all probabilities are low it might just be because it's generated (had sampled) an awkward turn of phrase with few common continuations.

There are solutions, but no quick band-aid.

I have to assume that someone has run a trial on training these models to output answers to factual questions along with numerical probabilities, using a loss function based on a proper scoring rule of the output probabilities, and it didn't work well. That's an obvious starting point, right? All the "safety" stuff uses methods other than next-token prediction.

EarthLaunch · a year ago

I wonder if the training to be compliant to the propter is part of the problem. Both of those statements are similar to saying "I refuse to answer your query".

Or maybe this is inherent to continuation?

The behavior reminds me of the human subconscious, which doesn't say no, just raises up what it can.

zarzavat · a year ago

I will repost the comment I made 12 months ago.

https://news.ycombinator.com/item?id=35402163

> LLMs are still an active area of research. Research is unpredictable. It may take years to gather enough fundamental results to make a GPT-5 core model that is substantially better than GPT-4. Or a key idea could be discovered tomorrow.

> What OpenAI can do while they are waiting is more of the easy stuff, for example more multimodality: integrating DALL-e with GPT-4, adding audio support, etc. They can also optimize the model to make it run faster.

OpenAI is doing science, not just engineering. Science doesn’t happen according to a schedule. Adjust your expectations accordingly.

bradley13 · a year ago

"...evidence that we may have reached a phase of diminishing returns"

With the current crop of LLMs, yes. With such AI models in general, no. We need to extend their capabilities in different directions. Just as an example: one analysis I read, pointed out that the current LLMs live in a one-dimensional world. Everything is just a sequential string of tokens.

Think of a Turing machine, writing on a tape. Sure, theoretically it can perform any computation. Practically? Not so useful.

We need new ways of introducing context and knowledge into AI models. Their conversations may be one-dimensional (as, indeed, ours are), but they need another dimension to provide reasoning and depth.