Veo - Readit News

The first thing I will do when I get access to this is ask it to generate a realistic chess board. I have never gotten a decent looking chessboard with any image generator that doesn't have deformed pieces, the correct number of squares, squares properly in a checkerboard pattern, pieces placed in the correct position, board oriented properly (white on the right!) and not an otherwise illegal position. It seems to be an "AI complete" problem.

arcticbull · 2 years ago

Similarly the Veo example of the northern lights is a really interesting one. That's not what the northern lights look like to the naked eye - they're actually pretty grey. The really bright greens and even the reds really only come out when you take a photo of them with a camera. Of course the model couldn't know that because, well, it only gets trained on photos. Gets really existential - simulacra energy - maybe another good AI Turing test, for now.

porphyra · 2 years ago

Human eyes are basically black and white in low light since rod cells can't detect color. But when the northern lights are bright enough you can definitely see the colors.

The fact that some things are too dark to be seen by humans but can be captured accurately with cameras doesn't mean that the camera, or the AI, is "making things up" or whatever.

Finally, nobody wants to see a video or a photo of a dark, gray, and barely visible aurora.

paxys · 2 years ago

That's not true at all. I have seen northern lights with my own eyes that were more neon green and bright purple than any mainstream photo.

hoyd · 2 years ago

I can see what you mean, and that the video is somewhat not what it would be like in real. I have lived in northern Norway most of my life, and watched Auroras a lot. It certainly look green and link for the most time. Fainter, it would perhaps sorry gray I guess? Red, when viewed from a more southern viewpoint..

I work at Andøya Space where perhaps most of the space research on Aurora had been done by sending scientific rockets into space for the last 60 yrs.

pmlarocque · 2 years ago

That not true, they look grey when they aren't bright enough, but they can look green or red to the naked eyes if they are bright. I have seen it myself and yes I was disappointed to see only grey ones last week.

see: https://theconversation.com/what-causes-the-different-colour...

blhack · 2 years ago

Have you ever seen the Northern Lights with your eyes? If so I'm curious where you saw them.

I echo what some other posters here have said: they're certainly not gray.

simonjgreen · 2 years ago

To be fair, the prompt isn’t asking for a realistic interpretation it’s asking for a timelapse. What it’s generated is absolutely what most timelapses look like.

> Prompt: Timelapse of the northern lights dancing across the Arctic sky, stars twinkling, snow-covered landscape

sdenton4 · 2 years ago

That doesn't seem in any way useful, though... To use a very blunt analogy, are color blind people intelligent/sentient/whatever? Obviously, yes: differences in perceptual apparatus aren't useful indicators of intelligence.

22c · 2 years ago

I've only ever seen photos of the northern lights and I also didn't know that.

laserbeam · 2 years ago

For decades, game engines have been working on realistic rendering. Bumping quality here and there.

The golden standard for rendering has always been cameras. It’s always photo-realistic rendering. Maybe this won’t be true for VR, but so far most effort is to be as good as video, not as good as the human eye.

Any sort of video generation AI is likely to have the same goal. Be as good as top notch cameras, not as eyes.

darkstar_16 · 2 years ago

Northern lights are actually pretty colourful, even to the naked eye. I've never seen them pale or b/w

Kiro · 2 years ago

Shouldn't the model reflect how it looks on video rather than our naked eye?

skypanther · 2 years ago

What struck me about the northern lights video was that it showed the Milky Way crossing the sky behind the northern lights. That bright part of the Milky Way is visible in the southern sky but the aurora hugging the horizon like that indicates the viewer is looking north. (Swap directions for the southern hemisphere and the aurora borealis).

garyrob · 2 years ago

Even in NY State, Hudson River Valley, I've seen them with real color. They're different each time.

poulpy123 · 2 years ago

that's a bad example since the only images of aurora borealis are brightly colored. What I expect of an image generator is to output what is expected from it

mikeocool · 2 years ago

Ha, wow, I’d never seen this one before. The failures are pretty great. Even repeatedly trying to correct ChatGPT/Dall-e with the proper number of squares and pieces, it somehow makes it worse.

This is what dall-e came up with after trying to correct many previous iterations: https://imgur.com/Ss4TwNC

Etherlord87 · 2 years ago

As someone who criticizes AI a lot: this actually looks pretty cool! AI is not better at surrealism than a good artist, but at least its work is enjoyable as a surreal art. Justifies the name Dall-e pretty well too.

sdenton4 · 2 years ago

This strikes me as equally "AI complete" as drawing hands, which is now essentially a solved problem... No one test is sufficient, because you can add enough training data to address it.

dongping · 2 years ago

Not sure about better models, but DALL-E3 still seems to be having problems with hands:

https://www.reddit.com/r/dalle2/comments/1afhemf/is_it_possi...

https://www.reddit.com/r/dalle2/comments/1cdks71/a_hand_with...

salamo · 2 years ago

Yeah "AI complete" is a bit tongue-in-cheek but it is a fairly spectacular failure mode of every model I've tried.

sabellito · 2 years ago

Per usual the top comment on anything AI related is snark on "it can't to [random specific thing] well yet".

kmacdough · 2 years ago

Tiring, but so is the relentless over-marketing. Each new demo implies new use cases and flexible performance. But the reality is they're very brittle and blunder most seemingly simple tasks. I would personally love an ongoing breakdown of the key weaknesses. I often wonder "can it X?" The answer is almost always "almost, but not a useful almost".

Dead Comment

perbu · 2 years ago

Most generative AI will struggle when given a task that requires something more less exact. They're probably pretty good at making something "chessish".

creatonez · 2 years ago

> It seems to be an "AI complete" problem.

Conventionally this term means the opposite -- problems that AI unlocks that conventional computing could not do. Conventional computing can render a very wide range of different stylized chess boards, but when an ML technique like diffusion is applied to this mundane problem, it falls apart.

Trixter · 2 years ago

Mine is generation of any actual IBM PC/XT computer. All of the training sets either didn't include actual IBM PCs in them, or they labeled all PC compatibles "IBM PC". Whatever the reason, no generative AI today, whether commercial or open-source, can generate any picture of an IBM PC 5150. Once that situation improves, I'll start taking notice.

Not nearly as impressive as Sora. Sora was impressive because the clips were long and had lots of rapid movement since video models tend to fall apart when the movement isn't easy to predict.

By comparison, the shots here are only a few seconds long and almost all look like slow motion or slow panning shots cherrypicked because they don't have that much movement. Compare that to Sora's videos of people walking in real speed.

The only shot they had that can compare was the cyberpunk video they linked to, and it looks crazy inconsistent. Real shame.

latexr · 2 years ago

> Not nearly as impressive as Sora. Sora was impressive because the clips were long and had lots of rapid movement

The most impressive Sora demo was heavily edited.

https://www.fxguide.com/fxfeatured/actually-using-sora/

jsheard · 2 years ago

To Shy Kids credit they made it clear the Sora footage was heavily edited, but OpenAIs site still presents Air Head without that context.

https://www.youtube.com/watch?v=KFzXwBZgB88 (posted the day after the short debuted)

https://openai.com/index/sora-first-impressions (no mention of editing, nor do they link to the above making-of video)

rvz · 2 years ago

Interesting to see that OpenAI was successful in creating their own reality distortion spells, just like Apple's reality distortion field which has fooled many of these commenters here.

It's quite early to race to the conclusion that one is better than the other when not only they are both unreleased, but especially when the demos can be edited, faked or altered to look great for optics and distortion.

EDIT: It appears there is at least one commenter who replied below that is upset with this fact above.

It is OK to cope, but the truth really doesn't care especially when the competition (Google) came out much stronger than expected with their announcements.

hanspeter · 2 years ago

I believe it was clear that Air Head was an edited video.

The intention wasn't to show "This is what Sora can generate from start to end" but rather "This is what a video production team can do with Sora instead of shooting their own raw footage."

Maybe not so obvious to others, but for me it was clear from how the other demo videos looked.

Jensson · 2 years ago

> Sora was impressive because the clips were long and had lots of rapid movement

Sora videos ran at 1 beat per second, so everything in the image moved at the same beat and often too slow or too fast to keep the pace.

It is very obvious when you inspect the images and notice that there are keyframes at every whole second mark and everything on the screen suddenly goes in their next animation step.

That really limits the kind of videos you can generate.

lupire · 2 years ago

So it needs to learn how far each object can travel in 1sec at its natural speed?

TIPSIO · 2 years ago

Objectively speaking (if people would be honest with themselves), both are just decent at best.

I think comparing them now is probably not that useful outside of this AI hype train. Like comparing two children. A lot can happen.

The bigger message I am getting from this is it's clear OpenAI won't have a super AI monopoly.

TaylorAlexander · 2 years ago

Comparing two children is a good one. My girlfriend has taken to pointing out when I’m engaging in “punditry”. They're an engineer like I am and we talk about tech all the time, but sometimes I talk about which company is beating which company like it’s a football game, and they call me out for it.

Video models are interesting, and to some extent trying to imagine which company is gonna eat the other’s lunch is kind of interesting, but sometimes that’s all people are interested in and I can see my girlfriend's reasoning for being disinterested in such discussion.

Deleted Comment

Aeolun · 2 years ago

I’m fairly certain Google just has a big stack of these in storage but never released, or the moment someone pulls ahead it’s all hands on deck to make the same thing.

motoxpro · 2 years ago

What would make this "Good?"

ein0p · 2 years ago

Also Sora demos had some really impressive generations featuring _people_. Here we hardly see any people which likely means exactly what you’d guess.

data-ottawa · 2 years ago

Has Gemini started generated impacted of people again? My trial has ended and I haven’t been following the issue.

nuz · 2 years ago

Sora is also movement limited to a certain range if you look at the clips closely. Probably something like filtering by some function of optical flow in both cases.

arcastroe · 2 years ago

> The shots here [..] almost all look like slow motion or slow panning shots.

I think this is arguably better than the alternative. With slow-mo generated videos, you can always speed them up in editing. It's much harder to take a fast-paced video and slow it down without terrible loss in quality.

btown · 2 years ago

A commercially available tool that can turn still images into depth-conscious panning shots is still tremendously impactful across all sorts of industries, especially tourism and hospitality. I’m really excited to see what this can do.

pheatherlite · 2 years ago

Not just that, but anything with a subject in it felt uncanny valleyish... like that cowboy clip, the gate of the horse stood out as odd and then I gave it some attention . It seems like a camel's gate. And whole thing seems to be hovering, gliding rather than walking. Sora indeed seems to have an advantage

__float · 2 years ago

I thought a camel's gait is much closer to two legs moving almost at the same time. Granted, I don't see camels often. Out of curiosity can you explain that more?

spiderfarmer · 2 years ago

Also the horse just looks weird, just like the buildings and peppers.

It's impressive as hell though. Even if it would only be used to extrapolate existing video.

Deleted Comment

dyauspitr · 2 years ago

They’re not showing people because that can get hairy quickly.

Deleted Comment

Dead Comment

LZ_Khan · 2 years ago

I imagine thats just a function of how much training data you throw at it.

totaldude87 · 2 years ago

Could also be the doing of google. if Veo screws up , the weight falls on Alphabet stock. While open AI is not public and doesn't have to worry about anything . Like even if open AI faked some of their AI videos(not saying they did), it wouldn't affect them the way it would affect Veo--> Google-->Alphabet

being cautious often puts a dent in innovation

soulofmischief · 2 years ago

You mean like how they faked some Gemini stuff?

https://www.bbc.com/news/technology-67650807

svag · 2 years ago

An interesting thing that Google does is to watermark the AI generated videos using the [SynthID technology](https://deepmind.google/technologies/synthid/).

It seems that the SynthID is not only for AI generated video but for image, text and audio.

bardak · 2 years ago

I would like a bit more convincing that the text watermark will not be noticeable. AI text already has issues with using certain words to frequently. Messing with the weights seems like it might make the issue worse

Tostino · 2 years ago

Not to mention, when does he get applied? If I am asking an llm to transform some data from one format to another, I don't expect any changes other than the format.

padolsey · 2 years ago

It seems really clever, especially the encoding of a signature into LLM token probability selections. I wonder if synthid will trigger some standarization in the industry. I don't think there's much incentive to tho. Open-source gen AI will still exist. What does google expext to occur? I guess they're just trying to present themselves as 'ethically pursuing AI'.

ugh123 · 2 years ago

From a filmmaking standpoint I still don't think this is impactful.

For that it needs a "director" to say: "turn the horse's head 90˚ the other way, trot 20 feet, and dismount the rider" and "give me additional camera angles" of the same scene. Otherwise this is mostly b-roll content.

I'm sure this is coming.

qingcharles · 2 years ago

I can see using these video generators to create video storyboards. Especially if you can drop in a scribbled sketch and a prompt for each tile.

ancientworldnow · 2 years ago

That sounds actively harmful. Often we want story boards to be less specific so as not to have some non artist decision maker ask why it doesn't look like the storyboard.

And when we want it to match exactly in an animatic or whatever, it needs to be far more precise than this, matching real locations etc.

larodi · 2 years ago

Perhaps the only industry which immediately benefits from this is the short ads and perhaps TikTok. But still it is very dubious, as people seem to actually enjoy being themselves the directors of their thing, not somebody else.

Maybe this works for ads for duner place or shisha bar in some developing country. I’ve seen generated images used for menus in such places.

But I doubt a serious filmography can be done this way. And if it can - it’d be again thanks to some smart concept on behalf of humans.

imachine1980_ · 2 years ago

Stock videos are indeed crucial, especially now that we can easily search for precisely what we need. Take, for instance, the scene at the end of 'Look Up' featuring a native American dance in Peru. The dancer's movements were captured from a stock video, and the comet falling was seamlessly edited in. now imagine having near infinite stock videos tailored to the situation.

rzmmm · 2 years ago

Stock photographers are already having issues with piracy due to very powerful AI watermark removal tools. And I suspect the companies are using content of these people to train these models too. .

Shocka1 · 2 years ago

Unlimited possibilities. And more is coming - we're only in the beginning stages of this tech. Truly exciting stuff.

chacham15 · 2 years ago

I dont think "turn the horse's head 90˚" is the right path forward. What I think is more likely and more useful is: here is a start keyframe and here is a stop keyframe (generated by text to image using other things like controlnet to control positioning etc.) and then having the AI generate the frames in between. Dont like the way it generated the in between? Choose a keyframe, adjust it, and rerun with the segment before and segment after.

GenerocUsername · 2 years ago

This appeals to me because it feels auditable and controllable... But the pace these things have been progressing the last 3 years, I could imagine the tech leapfrogs all conventional understanding real soon. Likely outputting gaussian splat style outputs where the scene is separate from the camera and ask peices can be independently tweaked via a VR director chair

8note · 2 years ago

So a declarative keyframe of "the horses head is pointed forward" and a second one of "the horse is looking left"

And let the robot tween?

Vs an imperative for "tween this by turning the horse's head left"

evantbyrne · 2 years ago

They claim it can accept an "input video and editing command" to produce a new video output. Also, "In addition, it supports masked editing, enabling changes to specific areas of the video when you add a mask area to your video and text prompt." Not sure if that specific example would work or not.

sailfast · 2 years ago

For most things I view on the internet B-roll is great content, so I'm sure this will enable a new kind of storytelling via YouTube Shorts / Instagram, etc at minimum.

I wouldn't be so sure it's coming. NNs currently dont have the structures for long term memory and development. These are almost certainly necessary for creating longer works with real purpose and meaning. It's possible we're on the cusp with some of the work to tame RNNs, but it's taken us years to really harness the power of transformers.

Eji1700 · 2 years ago

There's also the whole "oh you have no actual model/rigging/lighting/set to manipulate" for detail work issue.

That said, I personally think the solution will not be coming that soon, but at the same time, we'll be seeing a LOT more content that can be done using current tools, even if that means a dip in quality (severely) due to the cost it might save.

SJC_Hacker · 2 years ago

This lead me to the question of why hasn't there been an effort to do this with 3D content (that I know of).

Because camera angles/lighting/collision detection/etc. at that point would be almost trivial.

I guess with the "2D only" approach that is based on actual, acquired video you get way more impressive shots.

But the obvious application is for games. Content generation in the form of modeling and animation is actually one the biggest cost centers for most studios these days.

gedy · 2 years ago

I think with AI content, we'd need to not treat it like expecting fine grained control. E.g. instead like "dramatic scene of rider coming down path, and dismounting horse, then looking into distance", etc. (Or even less detail eventually once a cohesive story can be generated.)

thehappypm · 2 years ago

If you or I don’t see the potential here, I think that just means someone more creative is going to do amazing things with it

HN has always been notoriously negative, and wrong a lot of the time. One of my personal favorites is Brian Armstrong's post about an exciting new company he was starting around cryptocurrency and needing a co-founder... Always a good one to go back and read when I've been staying up late working on side projects and need a mental boost.

https://news.ycombinator.com/item?id=3754664

teaearlgraycold · 2 years ago

Everything I’ve heard from professionals backs that up. Great for B roll. Great for stock footage. That’s it.

aetherson · 2 years ago

Yeah, I've made a lot of images, and it sure is amazing if all you're interested in is, like, "Any basically good image," but if you start needing something very particular, rather than "anything that is on a general topic and is aesthetically pleasing," it gets a lot harder.

And there are a lot more degrees of freedom to get something wrong in film than in a single still image.

lofaszvanitt · 2 years ago

I can't wait what will the big video camera makers gonna do with tech similar to this. Since Google clearly have zero idea what to do with this, and they lack the creativity, it's up to ARRI, Canon, Panasonic etc. to create their own solutions for this tech. I can't wait to see what Canon has up its sleeves with their new offerings that come in a few months.

loudmax · 2 years ago

The videos in this demo are pretty neat. If this had been announced just four months ago we'd all be very impressed by the capabilities.

The problem is that these video clips are very unimpressive compared to the Sora demonstration which came out three months ago. If this demo was announced by some scrappy startup it would be worth taking note. Coming from Google, the inventor of the Transformer and owner of the largest collection of videos in the world, these sample videos are underwhelming.

Having said that, Sora isn't publicly available yet, and maybe Veo will have more to offer than what we see in those short clips when it gets a full release.

alex_duf · 2 years ago

>these sample videos are underwhelming

wow the speed at which we can be blasé is terrifying. 6 months ago this was not possible, and felt this was years away!

They're not underwhelming to me, they're beyond anything I thought would ever be possible.

are you genuinely unimpressed? or maybe trying to play it cool?

steamer25 · 2 years ago

They didn't really do a very good job of selecting marketing examples. The only good one, that shows off creative possibilities, is the knit elephant. Everything else looks like the results of a (granted fairly advanced) search through a catalog of stock footage.

Even search, in and of itself, is incredibly amazing but fairly commoditized at this point. They should've highlighted more unique footage.

danielbln · 2 years ago

The faster the tech cycle, the faster we become accustomed to it. Look at your phone, an absolute, wondrous marvel of technology that would have been utterl and totally scifi just 25 years ago. Yet we take it for granted, as we do with all technology eventually. The time frames just compress is all, for better or for worse.

On some level, it's healthy to retain a sense of humility at the technological marvels around us. Everything about our daily lives is impressive.

Just a few years ago, I would have been absolutely blown away by these demo videos. Six months ago, I would have been very impressed. Today, Google is rolling a product that seems second best. They're playing catch-up in a game where they should be leading.

I will still be very impressed to see videos of that quality generated on consumer grade hardware. I'll also be extremely impressed if Google manages to roll out public access to this capability without major gaffes or embarrassments.

This is very cool tech, and the developers and engineers that produced it should be proud of what they've achieved. But Google's management needs to be asking itself how they've allowed themselves to be surpassed.

fakedang · 2 years ago

Honestly, if Veo becomes public faster than Sora, they could win the video AI race. But what am I wishfully thinking - it's Google we're talking about!

> But what am I wishfully thinking - it's Google we're talking about!

Google the company known to launch way too many products? What other big company launches more stuff early than them? What people complain about Google is that they launch too much and then shut them down, not that they don't launch things.

spaceman_2020 · 2 years ago

The cost to switch to new models is negligible. People will switch to Sora if its better instantly

I’ve switched to Opus from GPT-4 for coding and it was non-trivially easy

xnx · 2 years ago

60 second example video: https://www.youtube.com/watch?v=diqmZs1aD1g

candiddevmike · 2 years ago

For some reason this video reminds me of dreaming--details just kind of pop in and out and the entire thing seems very surreal and fractal.

jprete · 2 years ago

Same impression here. The scene changes very abruptly from a sky view to following the car. The cars meld with the ground frequently, and I think I saw one car drive through another at one point.

londons_explore · 2 years ago

Looks like in places this has learned video compression artifacts...

exodust · 2 years ago

Funny if true. Perhaps in some generated video it will suddenly interrupt the sequence with pretend unskippable ads for phone cases & VPNs.

nixpulvis · 2 years ago

So… much… bloom. I like it, but still holy shit. I hate that I like it because I don’t want this art form to be reduced by overuse. Sadly, it’s too late.

I’ll just go back to living under a rock.

antifa · 2 years ago

datashaman · 2 years ago

1080p but it has pixelated artifacts...

mccraveiro · 2 years ago

They didn't show any human videos, which could indicate that the technology struggles with generating them.

chubot · 2 years ago

It's also probably that it's easier to spot fake humans than to spot fake cats or camels. We are more attuned to the faces of our own species

That is, AI humans can look "creepy" whereas AI animals may not. The cowboy looks pretty good precisely because it's all shadow.

CGI animators can probably explain this better than I can ... they have to spend way more time on certain areas and certain motions, and all the other times it makes sense to "cheat" ...

It explains why CGI characters look a certain way too -- they have to be economical to animate

himinlomax · 2 years ago

They're probably still wary of their latest PR disaster, the inclusive and diverse WW2 Germans from Gemini.

revscat · 2 years ago

I’m sure part of the reason, beyond those given already, is that they want to avoid the debate around nudity.

karmasimida · 2 years ago

Actually there is one in the last demo, it is not an individual one, but one shot in the demo where a team uses this model to create a scene with human in it, where they created an image of black woman but only up her head in it

I would generally agree though, it is not normal they didn’t show more human

ants_everywhere · 2 years ago

Gemini still won't generate images of humans or even other hominids. They're missing here probably for the same reason. Namely that they're trying to figure out how to balance diverse representation with all the various other factors.

You know why and it’s not that their technology struggles with it.

lewispollard · 2 years ago

Please elaborate, because I certainly don't.

mjfl · 2 years ago

thank goodness.

popcar2 · 2 years ago

inasio · 2 years ago

From a 2014 Wired article [0]: "The average shot length of English language films has declined from about 12 seconds in 1930 to about 2.5 seconds today"

I can see more real-world impact from this (and/or Sora) than most other AI tools

[0] https://www.wired.com/2014/09/cinema-is-evolving/

mattgreenrocks · 2 years ago

This is very noticeable. Watching movies from the 1970s is positively serene for me, vs the shot time on modern films often leaves me wonder, "wait, what just happened there?"

And I'm someone who is fine playing fast action video games. Can't imagine what it's like if you're older or have sensory processing issues.

psbp · 2 years ago

My brain processes too slow for modern action movies.

I can tell what's going on, but I always end up feeling agitated.

ryandrake · 2 years ago

Obligatory: Liam Neeson jumps over a fence in 6 seconds, with 14 cuts[1].

1: https://www.youtube.com/watch?v=gCKhktcbfQM

The first time I watched The Rise of Skywalker it was just too much being thrown at my brain. The second and third watch was much easier to process of course. I'm a big fan of older movies and have noticed the shot length difference anecdotally - Lawrence of Arabia and Ben Hur are two of my favorites. So I suppose it all makes sense to me now that there is actually a comparison measurement that has been completed.

kemitchell · 2 years ago

Enjoy some Tarkovsky.

Even if the shots are very short you still need coherency between shots, and they don't seem to have tackled that problem yet.

lobochrome · 2 years ago

Shot length, yes - but the scene stays the same. Getting continuity with just prompts seems not yet figured out.

Maybe it's easy, and you feed continuity stills into the prompt. Maybe it's not, and this will always remain just a more advanced storyboarding technique.

But then again, storyboards are always less about details and more about mood, dialog, and framing.

chipweinberger · 2 years ago

In 1930 they often literally had a single camera.

Just worth keeping that in mind. You could not just switch between multiple shots like you can today.

joshuahedlund · 2 years ago

How many of those 2.5 second "shots" are back-and-forths between two perspectives (ex. of two characters talking to one another) where each perspective is consistent with itself? This would be extremely relevant for how many seconds of consistent footage are actually needed for an AI-generated "shot" at film-level quality.