Meta’s live demo fails; “AI” recording plays before the actor takes the steps

That wasn't prerecorded, but it was rigged. They probably practiced a few times and it confused the AI. Still it's no excuse. They've dropped Apollo-program level money on this and it's still dumb as a rock.

I'm endless amazed that Meta has a ~2T market cap, yet they can't build products.

qgin · 5 months ago

I don't think it was pre-recorded exactly, but I do think they built something for the demo that responded to specific spoken phrases with specific scripted responses.

I think that's why he kept saying exactly "what do I do first" and the computer responded with exactly the same (wrong) response each time. If this was a real model, it wouldn't have simply repeated the exact response and he probably would have tried to correct it directly ("actually I haven't combined anything yet, how can I get started").

poidos · 5 months ago

It's because their main business (ads, tracking) makes infinite money so it doesn't matter what all the other parts of the business do, are, or if they work or not.

IncreasePosts · 5 months ago

That's Google's main business too, they have infinite money plus 50% relative to meta, and they are still in the top two for AI

Deleted Comment

johnnyanmac · 5 months ago

At this point, honesty is an oasis that is the 2025 year of scams and grifts. I'm just waiting for all the bubbles to pop.

fullshark · 5 months ago

It's been this way since natural internet user base growth dried up

privatelypublic · 5 months ago

Well, it _IS_ a rock after all.

imiric · 5 months ago

At that point, why not just go full out with the fake demo, and play all responses from a soundboard?

They could learn a thing or two from Elon.

smelendez · 5 months ago

That was my thought — the memory might not have been properly cleared from the last rehearsal.

I found the use case honestly confusing though. This guy has a great kitchen, just made steak, and has all the relevant ingredients in house and laid out but no idea how to turn them into a sauce for his sandwich?

pessimizer · 5 months ago

Yes. Even if the demo worked perfectly, it's hopelessly contrived. Just get text-to-speech to slowly read you the recipe.

dgfitz · 5 months ago

> confused the AI.

I will die on this hill. It isn’t AI. You can’t confuse it.

jayd16 · 5 months ago

They "poisoned the context" which is clearly what they meant.

taneq · 5 months ago

Ok, you’ve piqued my interest. What’s required in order for something to be genuinely confused?

Credit where it’s due: doing live demos is hard. Yesterday didn’t feel staged—it looked like the classic “last-minute tweak, unexpected break.” Most builders have been there. I certainly have (I once spent 6 hours at a hackathon and broke the Flask server keying in a last minute change on the steps of the stage before going on).

axblount · 5 months ago

Live demos are especially hard when you're selling snake oil.

tdeck · 5 months ago

Ironically the original snake oil salesman's pitch involved slitting open a live rattlesnake and boiling it in front of a crowd.

https://www.npr.org/sections/codeswitch/2013/08/26/215761377...

steve-atx-7600 · 5 months ago

Yeah. Everyone wants to be like Steve but forgets that he usually had something amazing to show off.

qingcharles · 5 months ago

The CEO of Nokia had to demo their latest handset one time on stage at whatever that big world cellphone expo is each year.

My biz partner and I wrote the demo that ran live on the handset (mostly a wrapper around a webview), but ran into issues getting it onto the servers for the final demo, so the whole thing was running off a janky old PC stuffed in a closet in my buddy's home office on his 2Mbit connection. With us sweating like pigs as we watched.

chamomeal · 5 months ago

If you ever write up a more detailed recollection of that, I would love to read it lol

pm90 · 5 months ago

As much as I hate Meta, I have to admit that live demos are hard, and if they go wrong we should have a little more grace towards the folks that do them.

I would not want to live in a world where everything is pre-recorded/digitally altered.

WD-42 · 5 months ago

The difference between this demo and the legendary demos of the past is that this time we are already being told AI is revolutionary tech. And THEN the demo fails.

It used to be the demo was the reveal of the revolutionary tech. Failure was forgivable. Meta's failure is just sad and kind of funny.

bamboozled · 5 months ago

It's less about the failure, and more about the person selling the product, we don't like him, or his company, and that's why there is no sympathy for him and he knows that.

When it went bad he could instantly smell blood in the water, his inner voice said, "they know I'm a fraud, they're going to love this, and I'm fucked". That is why it went the way it did.

If it was a more humble, honest, generous person, maybe Woz, we know he would handle it with a lot more grace, we know he is the kind of person who would be 100x less likely to be in this situation (because he understands tech) and we'd be much more forgiving.

JKCalhoun · 5 months ago

When you have a likable presenter, the audience is cheering for you, even (especially?) when things go wrong.

tkamado · 5 months ago

Live demos being hard isn't an excuse for cheating.

SpicyLemonZest · 5 months ago

Despite the Reddit post's title, I don't think there's any reason to believe the AI was a recording or otherwise cheated. (Why would they record two slightly different voice lines for adding the pear?) It just really thought he'd combined the base ingredients.

asadm · 5 months ago

this isn't cheating. the models are unpredictable. This product is going out the door this month, there is no reason to cheat.

Kwpolska · 5 months ago

An LLM repeating the exact same response feels very staged to me.

smelendez · 5 months ago

Yeah, I just watched it again and I’m mostly confused why the guy interrupted what sounded like a valid response.

I wonder if his audio was delayed? Or maybe the response wasn’t what they rehearsed and he was trying to get it on track?

jncfhnb · 5 months ago

It was reading step 2 and he was trying to get it to do step 1.

He had not yet combined the ingredients. The way he kept repeating his phrasing it seems likely that “what do we do first” was a hardcoded cheat phrase to get it to say a specific line. Which it got wrong.

Probably for a dumb config reason tbh.

triceratops · 5 months ago

> I’m mostly confused why the guy interrupted what sounded like a valid response

I thought they were demonstrating interruption handling.

wahnfrieden · 5 months ago

Because it was repeating what it had already described rather than moving on to the first step

andoando · 5 months ago

I think he was just trying to get it back on track instead of letting it go on about something that was completely off

hadlock · 5 months ago

Adrenaline makes people do interesting things

alangibson · 5 months ago

AdmiralAsshat · 5 months ago

The Kotaku article on this had a really nice final zinger[0]:

> Oh, and here’s Jack Mancuso making a Korean-inspired steak sauce in 2023.

> https://www.instagram.com/reel/Cn248pLDoZY/?utm_source=ig_em...

0: https://kotaku.com/meta-ai-mark-zuckerberg-korean-steak-sauc...

Yizahi · 5 months ago

And the whole pear in the recipe situation was also hilarious :)

csa · 5 months ago

> And the whole pear in the recipe situation was also hilarious

The fact that the pear was in the recipe, or that the AI didn’t handle that situation around the pear well?

Asian pears are a common ingredient in beef marinades/sauces in Korea. It adds sweetness and (iirc) helps tenderize the meat when in a marinade.

patrickhogan1 · 5 months ago

sampton · 5 months ago

I bet they rehearsed a dozen times and never failed as bad live. Got to give them props for keeping the live demos. Apple has neutered its demos so much it's now basically 2 hr long commercials.

donkyrf · 5 months ago

The new Apple presentations are much more information dense, and tailored to the main (online) audience. They’re clearly better.

More dense but less trust worthy. I don't think they would have pushed apple intelligence the way they did if there was a live demo.

jeffgreco · 5 months ago

They are boring infomercials now. The live audience used to keep it from feeling too prepackaged.

and so boring. I would take Jobs presenting a live demo than any of this heavily-produced stuff.

yallpendantools · 5 months ago

I have a friend who does magic shows. He sells his shows as magic and stand-up comedy. It's both live entertainment, okay, but he is the only person I've ever seen use that tagline. We went to see him perform once and everything became clear when he opened the night.

"This is supposed to be a magic show," he told us. "But if my tricks fail you can laugh at it and we'll just do stand-up comedy."

Zuck, for a modest and totally-reasonable fee, I will introduce you to my friend. You can add his tricks (wink wink) to your newly-assembled repertoire of human charisma.

monooso · 5 months ago

If your friend isn't already aware of Tommy Cooper [1], he's in for a treat.

[1]: https://en.wikipedia.org/wiki/Tommy_Cooper

Poomba · 5 months ago

He was so funny, people laughed when he died!

fluoridation · 5 months ago

So, wait, is he just a shitty magician and a funny guy, or does he fail on purpose?

Haha. I honestly don't know. Which makes him...a great entertainer at least? The show was a real good time though.

Take this with lots of salt but I read somewhere that circus shows "fail" at least one jump to help sell to the audience the risk the performers are taking. My friend did flub his opening trick with a cheeky see-I-told-you and we just laughed it off.

He incorporated the audience a lot that night so I thought the stand-up comedy claim was his insurance policy. In his hour-long set he flubbed maybe two or three tricks.

whoisthemachine · 5 months ago

The AI, or this person's friend?

skhameneh · 5 months ago

As much as it'll be "interesting" to see how models behave in real world examples (presumably similarly to how the demos went), I'm not convinced this is a premade recording like what seems to be implied.

I'm imagining this is an incomplete flow within a software prototype that may have jumped steps and lacks sufficient multi-modal capability to correct.

It could also be staged recordings. But, I don't think it really matters. Models are easily capable of working with the setup and flow they have for the demo. It's real world accuracy, latency, convenience, and other factors that will impact actual users the most.

What's the reliability and latency needed for these to be a useful tool?

For example, I can't imagine many people wanting to use the gesture writing tools for most messages. It's cool, I like that it was developed, but I doubt it'll see substantial adoption with what's currently being pitched.

dabbz · 5 months ago

Yea the behavior of the AI read to me more like a hard coded demo but still very much "live". I suspect him cutting it off was poorly timed and that timing could have amplified due to WiFi? Who knows. I wasn't there. I didn't build it.

YeahThisIsMe · 5 months ago

So the live demo failed?

Jackson__ · 5 months ago

This appears to be a classic vision fail on the VLM's part. Which is entirely unsurprising for anyone who has used open VLMs for anything except ""benchmarks"" in the past two god damn years. The field is in a truly embarrassing state, where they pride themselves how it can solve equations off a blackboard, yet couldn't even accurately read a d20 dice roll among many other things. I've tried (and failed) to have VLMs accurately caption images for such a long time, yet anytime I check on the output it is blindingly clear that these models are awful at actually _seeing things_.

Ianjit · 5 months ago

5-10 years and Radiologists will be out of a Job, just you wait and see.

throwaway13337 · 5 months ago

I don't think it was rigged.

Having claude run the browser and then take a screenshot to debug gives similar results. It's why doing so is useless even though it would be so very nice if it worked.

Somewhere in the pipeline, they get lazy or ahead of themselves and just interpret what they want to in the picture they see. They want to interpet something working and complete.

I can imagine it's related the same issue with LLMs pretending tests work when they don't. They're RL trained for a goal state and sometimes pretending they reached the goal works.

It wasn't the wifi - just genAI doing what it does.

dfedbeef · 5 months ago

For tiny stuff, they are incredible auto-complete tools. But they are basically cover bands. They can do things that have been done to death already. They're good for what they're good for. I wouldn't have bet the farm on them.

8note · 5 months ago

i have a lot of difficulty getting claude to understand arrows in pictures.

tried giving it flowcharts, and it fails hard