Imagen 4 is now generally available

tmvphil · 4 months ago

The way it totally disregards the many explicit instructions given in the "four panel" comic strip.

topato · 4 months ago

Right? Came to the comments specifically for this, but am confused by people's responses. With prompt adherence this bad, is it worth the 2 cents you spent on it? I don't see how it's even useful for deciding if you want to use the ultra version, or for anything else really.... Maybe if you want to redo it in Photoshop? But at that point, breaking out the old Wacom tablet and making a composite image would probably be just as time intensive, but with much higher image quality (and none of the tale tell signs of AIgen)

ben_w · 4 months ago

Even if you only earn $12/hour, 2 cents is worth it to save just 6 seconds.

An image has to be much worse than that to fail to save you 6 seconds.

That said, this is their own chosen example of what it can do, so I'd have to assume it is much worse than that on average.

thanhhaimai · 4 months ago

> Imagen 4 Ultra: When your creative vision demands the highest level of detail and strict adherence to your prompts, Imagen 4 Ultra delivers highly-aligned results.

It seems that you may need the "Ultra" version if you want strict prompt adherence.

It's an interesting strategy. Personally, I notice that most of the times I actually don't need strict prompt adherence for image generation. If it looks nice, I'll accept it. If it doesn't, I'll click generate again. For creativity task, following the prompt too strictly might not be the outcome the users want.

mikepurvis · 4 months ago

I've found this is an interesting balance with Copilot specifically. Like, on the one hand I'm glad it aims for the bare minimum and doesn't try to refactor my whole codebase on every shot... at the same time, there's certain obvious things where I wish it was able to think a bit bigger picture, or even engage me interactively, like "hey, I can do a self-contained implementation here, but it's a bit gross; it looks like adding dependency X to the project keeps this a one liner— which way should it go?"

chatmasta · 4 months ago

I’ve had good experience with iterative prompting when generating images with Gemini (idk which model — it’s whatever we get with our enterprise subscription at work, presumably the latest.) It’s noticeably better than ChatGPT at incorporating its previous image attempt into my instructions to generate the next iteration.

cubefox · 4 months ago

Though that was only Imagen 4 Fast, not Imagen 4 or Imagen 4 Ultra.

ajd555 · 4 months ago

Same for the poster. Asks for the ship to be going towards the right, and it's clearly doing the opposite

smokel · 4 months ago

As seen from the AI's perspective.

math_dandy · 4 months ago

To the left of the "detailed spaceship" I think I see a distortion pattern reminiscent of a cloaked Klingon bird of prey moving to the right. Or I'm just hallucinating patterns in nebular noise.

Jare · 4 months ago

The ship is reminiscent of Galactica's oldschool vipers. Different, but very similar overall structure.

Deleted Comment

weego · 4 months ago

Hopefully it's better than midjourney at least. Ignoring key parts of the prompt seems to be a feature.

vunderba · 4 months ago

Midjourney scores the absolute lowest in terms of prompt adherence against any of the other SOTA models (Kontext, Imagen, gpt-image-1, etc). At this point, its biggest feature is probably as an "exploratory tool" for visualizations by cranking up the chaos and weirdness parameters.

userbinator · 4 months ago

In the little experimentation I did with AI image generation, it seems more a game of trying multiple times until you get something that actually looks right, so I wonder how many attempts they did.

typpilol · 4 months ago

I asked basically copilot the same and got a much better result lol

https://i.imgur.com/kSuqCYg.jpeg

arjie · 4 months ago

Interesting how Imagen doesn't suffer this yellow tint effect.

typpilol · 4 months ago

I assume that's from the retro word in the prompt

cobbzilla · 4 months ago

Makes one wonder if there’s a hidden pre/system prompt for Imagen that’s interfering with optimal results.

ctippett · 4 months ago

Clicking on "Read the documentation" leads to a page that documents nothing about the latest Imagen models and only provides examples using Gemini 2.0 Flash.

typpilol · 4 months ago

Classic Google

HocusLocus · 4 months ago

I have found Imagein to be a good general purpose editor and we use it to clean up bitmaps, and adjust black points and white points and curves on greyscale, so it is good for preparing B&W greyscale photographs for print to compensate for dot gain in halftone screens on laser printers. Its 'color separation' capability is rudimentary/first draft though and is ridiculously close to inverse RGB rather than CMYK. For good color seps we use Photoshop so I can control undercolor removal.

neom · 4 months ago

Are you talking about this google product, or another tool altogether?

anonymousiam · 4 months ago

They're probably talking about the original Imagen printing product line from the 1980's. I thought I might be the only one to remember them in this thread, so I did a search for printer and found the GP comment.

https://tug.org/TUGboat/tb02-2/tb03imagen.pdf

mattxxx · 4 months ago

I guess it's kinda nicely genuine that the "four panel comic strip" has some errors in it (misunderstanding caption + cat high-fiving itself in the bonus fifth panel)

jug · 4 months ago

I was just thinking that. It has many, many errors.

1. Not seen browsing ”ai.dev”.

2. The text ”Imagen 4 is now generally available!” is spoken, not a comic caption.

3. Invalid second panel.

4. Hallucinates ”Meet Imagen 4 fast!”

5. Hallucinates ”It offers low..” etc. (this is the second part of a single sentence said by the cat)

6. Hallucinates ”You can export images in 2K!” (this sentence is not asked for)

7. Doesn’t have the cat and the dog in the fourth panel.

—

Here’s the gpt-image-1 counterpart with the issues I could find:

https://chatgpt.com/share/689f7e4b-01e4-8011-8997-0f37edf8c2...

1. The text ”Imagen 4 is now generally available!” is still spoken, not a caption.

2. ”low latency” -> ”low-laten”

(3. Has that ugly gpt-image-1 trademark yellow filter requiring work in post to avoid.)

I didn’t bring up the ”retro comic look” thing. I certainly think it’s an issue with Imagen 4’s version. It doesn’t look very old school at all. But I can’t judge the OpenAI one either on that, I’m no comic book expert, so I just skipped that one.

typpilol · 4 months ago

I got this result with the basic copilot app

https://i.imgur.com/kSuqCYg.jpeg

razster · 4 months ago

Ran your same prompt, copypasta, got this. https://i.imgur.com/wOocci9.png Cat on panel 3 seems a bit off. I like the first panel.

edaemon · 4 months ago

The cat also has more fingers on one hand than the other. It's a small, inconsequential thing but it always draws my eye in generated images.

pogue · 4 months ago

What do you have to do to remove the watermark? Is Google's SynthID watermark on top of the image as well or is it embedded in EXIF data?

latexr · 4 months ago

> I didn’t bring up the ”retro comic look” thing. (…) I’m no comic book expert, so I just skipped that one.

I’m no Scott McCloud, but the OpenAI version definitely does a better job with the retro style. The yellow filter you criticised actually helps to sell the illusion. The Imagen version utterly fails in the retro area, that style is very much modern.

But there are other important flaws in the OpenAI version. The fourth panel has a different cat (the head shape and stripes are wrong) and it bleeds into the previous panel. Technically that could be a stylistic choice, except that the floor/table is inconsistent, making it clear it was a mistake.

math_dandy · 4 months ago

I was going to nitpick the missing apostrophe in movie posters caption ("STARFALLS REVENGE") but its missing from the prompt, too.

sowbug · 4 months ago

> its

Muphry's Law strikes again.

decimalenough · 4 months ago

> Muphry's

Indeed.

cco · 4 months ago

Just proves my pet opinion that English apostrophe rules are all universally wrong and confusing.

It's and its are backwards. The latter breaks the possessive s rule.

Speaking of, the possessive s should _always_ be added, no reason to sometimes omit it if the name ends in an s.

Ass backwards, all of it.

qoez · 4 months ago

Looks so much better than the yellow tinted chatgpt output in my eyes

tripplyons · 4 months ago

After manually white balancing to remove the tint, I find GPT-Image-1 (the model used in ChatGPT) to be better.

vunderba · 4 months ago

I've updated my GenAI Comparison site to include Imagen4 Ultra, so now we have four Google related generative models (Gemini Flash, Imagen3, Imagen4, and Imagen4 Ultra).

Despite claims that Ultra supports improved strict prompt adherence, we saw no evidence that it scored any better than Imagen 4 and in some cases seemed to ignore the prompt altogether (see the "Not the Bees" comic). In many cases, it also seemed much less steerable than Imagen3 requiring many of the prompts to be rewritten.

https://genai-showdown.specr.net?models=IMAGEN_3,IMAGEN_4,IM...

gizmodo59 · 4 months ago

Looks like OpenAI imagegen is still the SOTA?

BoorishBears · 4 months ago

LMArena (https://lmarena.ai/?chat-modality=image) currently has a model codenamed `nano-banana` that is generally strictly better than gpt-image-1

There's some speculation it's Gemini 3's multi-modal output, and other speculation that it's an OpenAI model. Hard to definitively since these models tend to hallucinate when interrogated.