That wasn't prerecorded, but it was rigged. They probably practiced a few times and it confused the AI. Still it's no excuse. They've dropped Apollo-program level money on this and it's still dumb as a rock.
I'm endless amazed that Meta has a ~2T market cap, yet they can't build products.
I don't think it was pre-recorded exactly, but I do think they built something for the demo that responded to specific spoken phrases with specific scripted responses.
I think that's why he kept saying exactly "what do I do first" and the computer responded with exactly the same (wrong) response each time. If this was a real model, it wouldn't have simply repeated the exact response and he probably would have tried to correct it directly ("actually I haven't combined anything yet, how can I get started").
It's because their main business (ads, tracking) makes infinite money so it doesn't matter what all the other parts of the business do, are, or if they work or not.
That was my thought — the memory might not have been properly cleared from the last rehearsal.
I found the use case honestly confusing though. This guy has a great kitchen, just made steak, and has all the relevant ingredients in house and laid out but no idea how to turn them into a sauce for his sandwich?
Credit where it’s due: doing live demos is hard. Yesterday didn’t feel staged—it looked like the classic “last-minute tweak, unexpected break.” Most builders have been there. I certainly have (I once spent 6 hours at a hackathon and broke the Flask server keying in a last minute change on the steps of the stage before going on).
The CEO of Nokia had to demo their latest handset one time on stage at whatever that big world cellphone expo is each year.
My biz partner and I wrote the demo that ran live on the handset (mostly a wrapper around a webview), but ran into issues getting it onto the servers for the final demo, so the whole thing was running off a janky old PC stuffed in a closet in my buddy's home office on his 2Mbit connection. With us sweating like pigs as we watched.
As much as I hate Meta, I have to admit that live demos are hard, and if they go wrong we should have a little more grace towards the folks that do them.
I would not want to live in a world where everything is pre-recorded/digitally altered.
The difference between this demo and the legendary demos of the past is that this time we are already being told AI is revolutionary tech. And THEN the demo fails.
It used to be the demo was the reveal of the revolutionary tech. Failure was forgivable. Meta's failure is just sad and kind of funny.
It's less about the failure, and more about the person selling the product, we don't like him, or his company, and that's why there is no sympathy for him and he knows that.
When it went bad he could instantly smell blood in the water, his inner voice said, "they know I'm a fraud, they're going to love this, and I'm fucked". That is why it went the way it did.
If it was a more humble, honest, generous person, maybe Woz, we know he would handle it with a lot more grace, we know he is the kind of person who would be 100x less likely to be in this situation (because he understands tech) and we'd be much more forgiving.
Despite the Reddit post's title, I don't think there's any reason to believe the AI was a recording or otherwise cheated. (Why would they record two slightly different voice lines for adding the pear?) It just really thought he'd combined the base ingredients.
It was reading step 2 and he was trying to get it to do step 1.
He had not yet combined the ingredients. The way he kept repeating his phrasing it seems likely that “what do we do first” was a hardcoded cheat phrase to get it to say a specific line. Which it got wrong.
I bet they rehearsed a dozen times and never failed as bad live. Got to give them props for keeping the live demos. Apple has neutered its demos so much it's now basically 2 hr long commercials.
I have a friend who does magic shows. He sells his shows as magic and stand-up comedy. It's both live entertainment, okay, but he is the only person I've ever seen use that tagline. We went to see him perform once and everything became clear when he opened the night.
"This is supposed to be a magic show," he told us. "But if my tricks fail you can laugh at it and we'll just do stand-up comedy."
Zuck, for a modest and totally-reasonable fee, I will introduce you to my friend. You can add his tricks (wink wink) to your newly-assembled repertoire of human charisma.
Haha. I honestly don't know. Which makes him...a great entertainer at least? The show was a real good time though.
Take this with lots of salt but I read somewhere that circus shows "fail" at least one jump to help sell to the audience the risk the performers are taking. My friend did flub his opening trick with a cheeky see-I-told-you and we just laughed it off.
He incorporated the audience a lot that night so I thought the stand-up comedy claim was his insurance policy. In his hour-long set he flubbed maybe two or three tricks.
As much as it'll be "interesting" to see how models behave in real world examples (presumably similarly to how the demos went), I'm not convinced this is a premade recording like what seems to be implied.
I'm imagining this is an incomplete flow within a software prototype that may have jumped steps and lacks sufficient multi-modal capability to correct.
It could also be staged recordings.
But, I don't think it really matters. Models are easily capable of working with the setup and flow they have for the demo. It's real world accuracy, latency, convenience, and other factors that will impact actual users the most.
What's the reliability and latency needed for these to be a useful tool?
For example, I can't imagine many people wanting to use the gesture writing tools for most messages. It's cool, I like that it was developed, but I doubt it'll see substantial adoption with what's currently being pitched.
Yea the behavior of the AI read to me more like a hard coded demo but still very much "live". I suspect him cutting it off was poorly timed and that timing could have amplified due to WiFi? Who knows. I wasn't there. I didn't build it.
This appears to be a classic vision fail on the VLM's part. Which is entirely unsurprising for anyone who has used open VLMs for anything except ""benchmarks"" in the past two god damn years. The field is in a truly embarrassing state, where they pride themselves how it can solve equations off a blackboard, yet couldn't even accurately read a d20 dice roll among many other things. I've tried (and failed) to have VLMs accurately caption images for such a long time, yet anytime I check on the output it is blindingly clear that these models are awful at actually _seeing things_.
Having claude run the browser and then take a screenshot to debug gives similar results. It's why doing so is useless even though it would be so very nice if it worked.
Somewhere in the pipeline, they get lazy or ahead of themselves and just interpret what they want to in the picture they see. They want to interpet something working and complete.
I can imagine it's related the same issue with LLMs pretending tests work when they don't. They're RL trained for a goal state and sometimes pretending they reached the goal works.
It wasn't the wifi - just genAI doing what it does.
For tiny stuff, they are incredible auto-complete tools. But they are basically cover bands. They can do things that have been done to death already. They're good for what they're good for. I wouldn't have bet the farm on them.
I'm endless amazed that Meta has a ~2T market cap, yet they can't build products.
I think that's why he kept saying exactly "what do I do first" and the computer responded with exactly the same (wrong) response each time. If this was a real model, it wouldn't have simply repeated the exact response and he probably would have tried to correct it directly ("actually I haven't combined anything yet, how can I get started").
Deleted Comment
They could learn a thing or two from Elon.
I found the use case honestly confusing though. This guy has a great kitchen, just made steak, and has all the relevant ingredients in house and laid out but no idea how to turn them into a sauce for his sandwich?
I will die on this hill. It isn’t AI. You can’t confuse it.
> Oh, and here’s Jack Mancuso making a Korean-inspired steak sauce in 2023.
> https://www.instagram.com/reel/Cn248pLDoZY/?utm_source=ig_em...
0: https://kotaku.com/meta-ai-mark-zuckerberg-korean-steak-sauc...
The fact that the pear was in the recipe, or that the AI didn’t handle that situation around the pear well?
Asian pears are a common ingredient in beef marinades/sauces in Korea. It adds sweetness and (iirc) helps tenderize the meat when in a marinade.
https://www.npr.org/sections/codeswitch/2013/08/26/215761377...
My biz partner and I wrote the demo that ran live on the handset (mostly a wrapper around a webview), but ran into issues getting it onto the servers for the final demo, so the whole thing was running off a janky old PC stuffed in a closet in my buddy's home office on his 2Mbit connection. With us sweating like pigs as we watched.
I would not want to live in a world where everything is pre-recorded/digitally altered.
It used to be the demo was the reveal of the revolutionary tech. Failure was forgivable. Meta's failure is just sad and kind of funny.
When it went bad he could instantly smell blood in the water, his inner voice said, "they know I'm a fraud, they're going to love this, and I'm fucked". That is why it went the way it did.
If it was a more humble, honest, generous person, maybe Woz, we know he would handle it with a lot more grace, we know he is the kind of person who would be 100x less likely to be in this situation (because he understands tech) and we'd be much more forgiving.
I wonder if his audio was delayed? Or maybe the response wasn’t what they rehearsed and he was trying to get it on track?
He had not yet combined the ingredients. The way he kept repeating his phrasing it seems likely that “what do we do first” was a hardcoded cheat phrase to get it to say a specific line. Which it got wrong.
Probably for a dumb config reason tbh.
I thought they were demonstrating interruption handling.
"This is supposed to be a magic show," he told us. "But if my tricks fail you can laugh at it and we'll just do stand-up comedy."
Zuck, for a modest and totally-reasonable fee, I will introduce you to my friend. You can add his tricks (wink wink) to your newly-assembled repertoire of human charisma.
[1]: https://en.wikipedia.org/wiki/Tommy_Cooper
Take this with lots of salt but I read somewhere that circus shows "fail" at least one jump to help sell to the audience the risk the performers are taking. My friend did flub his opening trick with a cheeky see-I-told-you and we just laughed it off.
He incorporated the audience a lot that night so I thought the stand-up comedy claim was his insurance policy. In his hour-long set he flubbed maybe two or three tricks.
I'm imagining this is an incomplete flow within a software prototype that may have jumped steps and lacks sufficient multi-modal capability to correct.
It could also be staged recordings. But, I don't think it really matters. Models are easily capable of working with the setup and flow they have for the demo. It's real world accuracy, latency, convenience, and other factors that will impact actual users the most.
What's the reliability and latency needed for these to be a useful tool?
For example, I can't imagine many people wanting to use the gesture writing tools for most messages. It's cool, I like that it was developed, but I doubt it'll see substantial adoption with what's currently being pitched.
Having claude run the browser and then take a screenshot to debug gives similar results. It's why doing so is useless even though it would be so very nice if it worked.
Somewhere in the pipeline, they get lazy or ahead of themselves and just interpret what they want to in the picture they see. They want to interpet something working and complete.
I can imagine it's related the same issue with LLMs pretending tests work when they don't. They're RL trained for a goal state and sometimes pretending they reached the goal works.
It wasn't the wifi - just genAI doing what it does.
tried giving it flowcharts, and it fails hard