Readit News logoReadit News
simonw · 9 months ago
I got access to the preview, here's what it gave me for "A pelican riding a bicycle along a coastal path overlooking a harbor" - this video has all four versions shown:

https://static.simonwillison.net/static/2024/pelicans-on-bic...

Of the four two were a pelican riding a bicycle. One was a pelican just running along the road, one was a pelican perched on a stationary bicycle, and one had the pelican wearing a weird sort of pelican bicycle helmet.

All four were better than what I got from Sora: https://simonwillison.net/2024/Dec/9/sora/

yurylifshits · 8 months ago
There's another important contender in the space: Hunyuan model from Tencent

My company (Nim) is hosting Hunyuan model, so here's a quick test (first attempt) at "pelican riding a bycicle" via Hunyuan on Nim: https://nim.video/explore/OGs4EM3MIpW8

I think it's as good, if not better than Sora / Veo

chrismorgan · 8 months ago
> A whimsical pelican, adorned in oversized sunglasses and a vibrant, patterned scarf, gracefully balances on a vintage bicycle, its sleek feathers glistening in the sunlight. As it pedals joyfully down a scenic coastal path, colorful wildflowers sway gently in the breeze, and azure waves crash rhythmically against the shore. The pelican occasionally flaps its wings, adding a playful touch to its enchanting ride. In the distance, a serene sunset bathes the landscape in warm hues, while seagulls glide gracefully overhead, celebrating this delightful and lighthearted adventure of a pelican enjoying a carefree day on two wheels.

What does it produce for “A pelican riding a bicycle along a coastal path overlooking a harbor”?

Or, what do Sora and Veo produce for your verbose prompt?

sashank_1509 · 8 months ago
Hard to say about SORA but the video you shared is most definitely worse than Veo.

The Pelican is doing some weird flying motion, motion blur is hiding a lack of detail, cycle is moving fast so background is blurred etc. I would even say SORA is better because I like the slow-motion and detail but it did do something very non physical.

Veo is clearly the best in this example. It has high detail but also feels the most physically grounded among the examples.

dyauspitr · 8 months ago
Pretty good except the backwards body and the strange wing movement. The feeling of motion is fantastic though.
arjie · 8 months ago
I was curious how it would perform with prompt enhancement turned off. Here's a single attempt (no regenerations etc.): https://www.youtube.com/watch?v=730cb2qozcM

If you'd like to replicate, the sign-up process was very easy and I was easily able to run a single generation attempt. Maybe later when I want to generate video I'll use prompt enhancement. Without it, the video appears to have lost a notion of direction. Most image-generation models I'm aware of do prompt-enhancement. I've seen it on Grok+Flow/Aurora and ChatGPT+DallE.

    Prompt
    A pelican riding a bicycle along a coastal path overlooking a harbor
    Seed
    15185546
    Resolution
    720×480

gcr · 8 months ago
FYI your website shows me a static image on iOS 18.2 Safari. Strangely, the progress bar still appears to “loop,” but the bird isn’t moving at all.

Turning content blockers off does not make a difference.

dr_kiszonka · 8 months ago
Reddit says it is much better than Sora. Are you hosting the full version of Nunyuan? (Your video looks great.)
prometheon1 · 8 months ago
Is it still better if you copy his whole prompt instead of half of it?

Deleted Comment

c0brac0bra · 8 months ago
I mean, the pelican's body is backwards...
tim333 · 8 months ago
Here's one of a penguin paragliding and it's surprisingly realistic https://x.com/Plinz/status/1868885955597549624
0_____0 · 8 months ago
This is the first GenAI video to produce an "oh shit" reflex in me.

oh, shit!

p1necone · 9 months ago
As long as at least one option is exactly what you asked for throwing variations at you that don't conform to 100% of your prompt seems like it could be useful if it gives the model leeway to improve the output in other aspects.
oneshtein · 8 months ago
Here is my version of pelican at bicycle made with hailuoai:

https://hailuoai.video/share/N9dlRd1L1o0p

nkingsy · 9 months ago
His little bike helmet is adorable
mckirk · 9 months ago
The AI safety team was really proud of that one.
AgentME · 8 months ago
It's funny having looked forward to Sora for a while and then seeing it be superseded so shortly after access to it is finally made public.
grumbel · 8 months ago
I am surprised that the top/right one still shows a cut and switch to a difference scene. I would assume that that's something that could be trivially filtered out of the training data, as those discontinuities don't seem to be useful for either these short 6sec video segments or for getting an understanding of the real world.
jerpint · 9 months ago
It looks much better than Sora but still kind of in uncanny valley
spaceman_2020 · 8 months ago
This is the worst it will ever be…
victorbjorklund · 8 months ago
That is surprisingly good. We are at a point where it seems to be good enough for at least b-roll content replacing stock video clips.
rob74 · 8 months ago
Well yeah, if you look closely at the example videos on the site, one of them is not quite right either:

> Prompt: The sun rises slowly behind a perfectly plated breakfast scene. Thick, golden maple syrup pours in slow motion over a stack of fluffy pancakes, each one releasing a soft, warm steam cloud. A close-up of crispy bacon sizzles, sending tiny embers of golden grease into the air. [...]

In the video, the bacon is unceremoniously slapped onto the pancakes, while the prompt sounds like it was intended to be a separate shot, with the bacon still in the pan? Or, alternatively, everything described in the prompt should have been on the table at the same time?

So, yet again: AI produces impressive results, but it rarely does exactly what you wanted it to do...

soco · 8 months ago
Technically speaking I'd say your expectation is definitely not laid out in the prompt, so anything goes. Believe me I've had such requirements from users and me as a mere human programmer am never quite sure what they actually want. So I take guesses just like the AI (because simply asking doesn't bring you very far, you must always show something) and take it from there. In other words, if AI works like me, I can pack my stuff already.
jillyboel · 8 months ago
This tech is cute but the only viable outcomes are going to be porn and mass produced slop that'll be uninteresting before it's even created. Why even bother?
andybak · 8 months ago
There will be both of those things in abundance.

But I'm also seeing some genuinely creative uses of generative video - stuff I could argue has got some genuine creative validity. I am loathe to dismiss an entire technique because it is mostly used to create garbage.

We'll have to figure out how to solve the slop problem - it was already an issues before AI so maybe this is just hastening the inevevitable.

bottled_poe · 8 months ago
Comments like this one are so predictable and incredulous. As if the current state of the art is the final form of this technology. This is just getting started. Big facepalm.
sigmar · 9 months ago
Winning 2:1 in user preference versus sora turbo is impressive. It seems to have very similar limitations to sora. For example- the leg swapping in the ice skating video and the bee keeper picking up the jar is at a very unnatural acceleration (like it pops up). Though by my eye maybe slightly better emulating natural movement and physics in comparison to sora. The blog post has slightly more info:

>at resolutions up to 4K, and extended to minutes in length.

https://blog.google/technology/google-labs/video-image-gener...

torginus · 9 months ago
It looks Sora is actually the worst performer in the benchmarks, with Kling being the best and others not far behind.

Anyways, I strongly suspect that the funny meme content that seems to be the practical uses case of these video generators won't be possible on either Veo or Sora, because of copyright, PC, containing famous people, or other 'safety' related reasons.

jonplackett · 9 months ago
I’ve been using Kling a lot recently and been really impressed, especially by 1.5.

I was so excited to see Sora out - only to see it has most of the same problems. And Kling seems to do better in a lot of benchmarks.

I can’t quite make sense of it - what OpenAI were showing when they first launched Sora was so amazing. Was it cherry picked? Or was it using loads more compute than what they’ve release?

BugsJustFindMe · 9 months ago
> the jar is at a very unnatural acceleration (like it pops up).

It does pop up. Look at where his hand is relative to the jar when he grabs it vs when he stops lifting it. The hand and the jar are moving, but the jar is non-physically unattached to the grab.

lukol · 9 months ago
Last time Google made a big Gemini announcement, OpenAI owned them by dropping the Sora preview shortly after.

This feels like a bit of a comeback as Veo 2 (subjectively) appears to be a step up from what Sora is currently able to achieve.

htrp · 9 months ago
Some PM is literally sitting on this release waiting for their benchmarks to finish

Deleted Comment

esafak · 8 months ago
And it's going to be hard for OpenAI to do that again, now that Google's woken up.
jasonjmcghee · 9 months ago
I appreciate they posted the skateboarding video. Wildly unrealistic whenever he performs a trick - just morphing body parts.

Some of the videos look incredibly believable though.

visnup · 9 months ago
our only hope for verifying truth in the future is that state officials give their speeches while doing kick flips and frontside 360s.
stabbles · 9 months ago
sadly it's likely that video gen models will master this ability faster than state officials
markus_zhang · 9 months ago
Maybe they will do more in person talks, I guess. Back to the old times.
throw4321 · 8 months ago
What officials actually say doesn't make a difference anymore. People do not get bamboozled because of lack of facts. People who get bamboozled are past facts.
kaonwarb · 9 months ago
This was my favorite of all of the videos. There's no uncanny valley; it's openly absurd, and I watched it 4-5 times with increasing enjoyment.
bahmboo · 9 months ago
Cracks in the system are often places where artists find the new and interesting. The leg swapping of the ice skater is mesmerizing in its own way. It would be useful to be able to direct the models in those directions.
johndough · 9 months ago
It is great so see a limitations section. What would be even more honest is a very large list of videos generated without any cherry picking to judge the expected quality for the average user. Anyway, the lack of more videos suggests that there might be something wrong somewhere.
dyauspitr · 9 months ago
The honey, Peruvian women, swimming dog, bee keeper, DJ etc. are stunning. They’re short but I can barely find any artifacts.
__float · 9 months ago
The prompt for the honey video mentions ending with a shot of an orange. The orange just...isn't there, though?
mattigames · 9 months ago
Just pretend it's a movie about a shape shifter alien and it's just trying it's best at ice skating, art is subjective like that doesn't it? I bet Salvador Dali would have found those morphing body parts highly amusing.
cyv3r · 9 months ago
I don't know why they say the model understands physics when it makes mistakes like that still.
0xcb0 · 8 months ago
Imho is stunning, yet what is happening there is super dangerous.

These videos will and may be too realistic.

Our society is not prepared for this kind of reality "bending" media. These hyperrealistic videos will be the reason for hate and murder. Evil actors will use it to influence elections on a global scale. Create cults around virtual characters. Deny the rules of physics and human reason. And yet, there is no way for a person to detect instantly that he is watching a generated video. Maybe now, but in 1 year, it will be indistinguishable from a real recorded video

ks2048 · 8 months ago
Are Apple and other phone/camera makers working on ways to "sign" a video to say it's an unedited video from a camera? Does this exist now? Is it possible?

I'm thinking of simple cryptographic signing of a file, rather than embedding watermarks into the content, but that's another option.

I don't think it will solve the fake video onslaught, but it could help.

jazzyjackson · 8 months ago
Leica M11 signs each photo. "Content Authority Initiative" https://leica-camera.com/en-US/news/partnership-greater-trus...

Cute hack showing that its kinda useless unless the user-facing UX does a better job of actually knowing whether the certificate represents the manufacturer of the sensor (dude just uses a self signed cert with "Leica Camera AG" as the name. Clearly cryptography literacy is lagging behind... https://hackaday.com/2023/11/30/falsified-photos-fooling-ado...

ttul · 8 months ago
I think this will be a thing one day, where photos are digitally watermarked by the camera sensor in a non-repudiable manner.
bravoetch · 8 months ago
Nikon has had digital signature ability in some of their flagship cameras since at least 2007, and maybe before then. The feature is used by law enforcement when documenting evidence. I assume other brands also have this available for the same reasons.
tomp · 8 months ago
We've had realistic sci-fi and alternate history movies for a very long time.
oldmanhorton · 8 months ago
Which take millions of dollars and huge teams to make. These take one bored person, a sentence, and a few minutes to go from idea to posting on social media. That difference is the entire concern.

Dead Comment

krapp · 8 months ago
We already have hate and murder, evil actors influencing elections on a global scale, denial of physics and reason, and cults of personality. We also already have the ability to create realistic videos - not that it matters because for many people the bar of credulity isn't realism but simply confirming their priors. We already live in a world where TikTok memes and Facebook are the primary sources around which the masses base their reality, and that shit doesn't even take effort.

The only thing this changes is not needing to pay human beings for work.

dtquad · 8 months ago
Instead of calling for regulations, the big tech companies should run big campaigns educating the public, especially boomers, that they no longer can trust images, videos, and audio on the Internet. Put paid articles and ads about this in local newspapers around the world so even the least online people gets educated about this.
WickyNilliams · 8 months ago
Do we really want a world where we can't trust anything we see, hear, or read? Where people need to be educated to not trust their senses, the things we use to interpret reality and the world around us.

I feel this kind of hypervigilance will be mentally exhausting, and not being able to trust your primary senses will have untold psychological effects

Retr0id · 8 months ago
What would motivate "big tech" to warn people about their own products, if not regulations?
jprete · 8 months ago
Don't forget text. You can't trust text either.

And no big tech company would run the ads you're suggesting, because they only make money when people use the systems that deliver the untrustworthy content.

onel · 8 months ago
The same things could be said when everyone could print their own newspapers or books. How would people distinguish between fake and real news?

I think we will need the same healthy media diet.

dbbk · 8 months ago
There wasn't even a healthy media diet before generative AI given the amount of 'fake news' in 2016 and 2020.
golergka · 8 months ago
Photoshop has been a thing for over 30 years.
EForEndeavour · 8 months ago
Isn't the whole point of OP that we're currently watching the barrier to generating realistic assets go from "spend months grinding Photoshop tutorials" to "type what you want into this box and wait a few minutes"?
dbbk · 8 months ago
I still don't really know why we're doing this. What is the upside? Democratising Hollywood? At the expense of... enormous catastrophic disinformation and media manipulation.
ddalex · 8 months ago
The society voted with their money. Google refrained from launching their early chatbots and image generation tools due to perceived risks of unsafe and misleading content being generated, and got beaten to the punch in the market. Of course now they'll launch early and often, the market has spoken.
Retr0id · 8 months ago
We have constructed a society where market forces feel inevitable, but it doesn't have to be that way.
veryrealsid · 9 months ago
FWIW it feels like Google should dominate text/image -> video since they have access to Youtube unfettered. Excited to see what the reception is here.
paxys · 9 months ago
Everyone has access to YouTube. It’s safe to assume that Sora was trained on it as well.
Jeff_Brown · 9 months ago
All you can eat? Surely they charge a lot for that, at least. And how would you even find all the videos?
bangaladore · 9 months ago
Does everyone have "legal" access to YouTube.

In theory that should matter to something like Open(Closed)Ai. But who knows.

hirako2000 · 9 months ago
They also had a good chunk of the web text indexed, millions of people's email sent every day, Google scholar papers, the massive Google books that digitized most ever published books and even discovered transformers.
fernly · 8 months ago
Superficially impressive but what is the actual use case of the present state of the art? It makes 10-second demos, fine. But can a producer get a second shot of the same scene and the same characters, with visual continuity? Or a third, etc? In other words, can it be used to create a coherent movie --even a 60-second commercial -- with multiple shots having continuity of faces, backgrounds, and lighting?

This quote suggests not: "maintaining complete consistency throughout complex scenes or those with complex motion, remains a challenge."

okdood64 · 8 months ago
B-roll for YouTube videos.
hersko · 8 months ago
This is still early. It's only going to get better.
becquerel · 8 months ago
Fun. Fun! I find it a lot of fun to have a computer spit out pixels based on silly ideas I have. It is very amusing to me
m3kw9 · 8 months ago
You blend them and extend the videos and then you connect enough for a 2 min short
fernly · 8 months ago
That's what I think the tech at this stage cannot do. You make two clips from the same prompt with a minor change, e.g.

> a thief threatens a man with a gun, demanding his money, then fires the gun (etc add details)

> the thief runs away, while his victim slowly collapses on the sidewalk (etc same details)

Would you get the same characters, wearing the identical clothing, the same lighting and identical background details? You need all these elements to be the same, that's what filmmakers call "continuity". I doubt that Veo or any of the generators would actually produce continuity.

sdenton4 · 8 months ago
Dank memes.

Deleted Comment

exodust · 8 months ago
> "what is the actual use case of the art?"

Not much. Low quality over-saturated advertising? Short films made by untalented lazy filmmakers?

When text prompts are the only source, creativity is absent. No craft, no art. Audiences won't gravitate towards fake crap that oozes out of AI vending machines, unrefined, artistically uncontrolled.

Imagine visiting a restaurant because you heard the chef is good. You enjoy your meal but later discover the chef has a "food generator" where he prompts the food into existence. Would you go back to that restaurant?

There's one exception. Video-to-video and image-to-video, where your own original artwork, photos, drawings and videos are the source of the generated output. Even then, it's like outsourcing production to an unpredictable third party. Good luck getting lighting and details exactly right.

I see the role of this AI gen stuff as background filler, such as populating set details or distant environments via green screen.

eddd-ddde · 8 months ago
> Imagine visiting a restaurant because you heard the chef is good. You enjoy your meal but later discover the chef has a "food generator" where he prompts the food into existence. Would you go back to that restaurant?

That's an obvious yes from me. I liked it, and not only that, but I can reasonably assume it will be consistently good in the future, something lot's of places can't do.

AuthConnectFail · 8 months ago
short video creation tools, its a huge market
gloflo · 8 months ago
Misinformation
xnx · 9 months ago
This looks great, but I'm confused by this part:

> Veo sample duration is 8s, VideoGen’s sample duration is 10s, and other models' durations are 5s. We show the full video duration to raters.

Could the positive result for Veo 2 mean the raters like longer videos? Why not trim Veo 2's output to 5s for a better controlled test?

I'm not surprised this isn't open to the public by Google yet, there's a huge amount of volunteer red-teaming to be done by the public on other services like hailuoai.video yet.

P.S. The skate tricks in the final video are delightfully insane.

echelon · 9 months ago
> I'm not surprised this isn't open to the public by Google yet,

Closed models aren't going to matter in the long run. Hunyuan and LTX both run on consumer hardware and produce videos similar in quality to Sora Turbo, yet you can train them and prompt them on anything. They fit into the open source ecosystem which makes building plugins and controls super easy.

Video is going to play out in a way that resembles images. Stable Diffusion and Flux like players will win. There might be room for one or two Midjourney-type players, but by and large the most activity happens in the open ecosystem.

sorenjan · 9 months ago
> Hunyuan and LTX both run on consumer hardware

Are there other versions than the official?

> An NVIDIA GPU with CUDA support is required. > Recommended: We recommend using a GPU with 80GB of memory for better generation quality.

https://github.com/Tencent/HunyuanVideo

> I am getting CUDA out of memory on an Nvidia L4 with 24 GB of VRAM, even after using the bfloat16 optimization.

https://github.com/Lightricks/LTX-Video/issues/64

WillyWonkaJr · 9 months ago
I wonder if the more decisive aspect is the data, not the model. Will closed data win over open data?

With the YouTube corpus at their disposal, I don't see how anyone can beat Google for AI video generation.

dyauspitr · 9 months ago
Stable Diffusion and Flux did not win though. Midjourney and chatGPT won.