Now add a walrus: Prompt engineering in DALL-E 3

Very interesting that ChatGPT seems to prompt Dall-E via the client, rather than keeping that interaction entirely server-side. Keeping it server-side would be less likely to leak details as seen here, and makes it less susceptible to tampering.

Also nice to see that Dall-E 3 seeds were finally fixed. That must have happened within the last week or so; they weren't working last I checked (chatgpt always used a fixed seed of 5000).

> Midjourney and Stable Diffusion both have a “seed” concept, but as far as I know they don’t have anything like this capability to maintain consistency between images given the same seed and a slightly altered prompt.

I suspect this is more a function of Midjourney's prompt adherence being fairly poor right now. Even so, the images often aren't dramatically different. Example:

https://analyzer.transfix.ai/?db=josh&q=%28robot+%7C%7C+andr...

danShumway · 2 years ago

> Very interesting that ChatGPT seems to prompt Dall-E via the client, rather than keeping that interaction entirely server-side. Keeping it server-side would be less likely to leak details as seen here, and makes it less susceptible to tampering.

I don't have access to test, but given OpenAI's record on stuff like this, it would be a good idea for someone to check to see whether users can resend/intercept those requests to directly control the prompts that are sent to Dall-E without going through GPT.

Most likely they're only part of conversation history and they're unmodifiable, but I wouldn't necessarily take it as a given, and it would be quick for anyone with access who knows their way around the browser dev tools to check.

teaearlgraycold · 2 years ago

Yup you can control it directly easily enough:

Generating images in one chat: https://i.imgur.com/sIKSfCy.png

Reproducing exactly in another: https://i.imgur.com/C8Tqo48.png

teaearlgraycold · 2 years ago

It's the same as how the Bing search/web browsing works. GPT-4 spits out function calls as JSON and then another systems picks those up and invokes the actual code on the back end.

Zetobal · 2 years ago

The ip2adapter from Tencent does the same for SD.

Deleted Comment

The fact that you get access to DALL-E 3 "for free" if you're already subscribed to ChatGPT Pro is going to give MJ and other competitors a serious run for their money.

Also being able to reuse a seed to emulate a InstructPix2Pix architecture is a game changer.

GaggiX · 2 years ago

You can use Dalle 3 completely free on the Bing AI website.

qingdao99 · 2 years ago

And it's far better in my opinion - ChatGPT actually creates its own prompts from what you give it and feeds those to DALL-E (you can see this by clicking on the images it returns and reading what the actual prompt was), although you can disable this by just firmly telling ChatGPT not to modify your prompt.

Additionally, the image generation itself is quite different. ChatGPT's DALL-E seems to create much more stylized images - much harder to get plain shots that don't heavily embellish your description.

trompetenaccoun · 2 years ago

Maybe I'm not their typical target demographic but OpenAI's products are completely useless to me, the way they've neutered them and are restricting the output. For images I'd rather run Stable Diffusion locally. You own what you produce with it too (with some caveats). GPT-3 was cool before they came out with the chat version but it's been all downhill from there.

willsmith72 · 2 years ago

I don't find the the quality bad at all, and find them extremely useful. So far it's my daily assistant for software architecture, business plans, marketing content, debugging, summarisations, and many more.

Have you used gpt4?

Spivak · 2 years ago

The filter categories for OpenAI's moderation API are hate, harassment, self-harm, sexual, sexual/minors, and violence. Is it really the end of the world that DALL-E is rated PG-13?

There's no limit to the available models to generate adult content, but having one that doesn't makes it embedable in other applications.

garblegarble · 2 years ago

>You own what you produce with it too

As I understand it, this is incorrect - all of these outputs are the creations of a machine, and as such are not eligible for copyright protection

ranit · 2 years ago

Where on ChatGPT console can you access DALL-E 3? I am ChatGPT Pro subscriber, but all I can access, additional to GPT-4, are beta features (plugins and Advanced data analysis).

AuryGlenz · 2 years ago

For me it’s available in the app but not the website, although I can access the conversations that have generated images on the website.

SamBam · 2 years ago

I also don't seem to have access. It seems that it's being slowly rolled out to subscribers, and maybe you and I are on the tail end.

jallmann · 2 years ago

yreg · 2 years ago

> People have been trying to figure out hacks to get Midjourney to create consistent characters for the past year, and DALL-E 3 apparently has that ability as an undocumented feature! [by reusing the seed]

Using a constant seed to produce similar images has been the technique from the very start but it has limitations. You cannot e.g. keep the character consistant between different poses in this way.

Lacerda69 · 2 years ago

It's easy with ControlNet OpenPose

fsloth · 2 years ago

Yeah, controlnet and ipadapter allow really fine grained control. The quality and creativity of Dalle3 beats Stable Diffusion (all models) but this fine grained control is missing.

qingcharles · 2 years ago

DALL-E 2 and now DALL-E 3 have given me more laughs than anything in years.

There used to be a video game magazine which would rate games by "Improvements through improper play." That's exactly how I feel about DALL-E. There are several Subreddits and Facebook Groups I've submitted some seriously cursed AI output to.

GPT-V is a total marvel too. I just used one of the medieval Chaucer images from the recent HN post about their digitization, and told GPT my wife had left me a funny note this morning that I needed to read. It transcribed and translated it perfectly, even though it was practically unreadable.

vunderba · 2 years ago

muskmusk · 2 years ago

Those prompts are wild. It is deeply impressive that it works. Think about what would happen if you gave such instructions to a human. Would they be able comply? How big is the overlap between people who are creative enough to produce the kind of pictures Dall-E produces and disciplined enough to follow complex instructions so rigourously?

It also is just straight up impossible to convert those instructions to "regular" code.

rainmouse · 2 years ago

I can't help but feel that any perceived overlap is coincidental. An illusion similar to seeing a face in an abstract drawing that the artists, or in this case algorithm developers are keen to exploit. Our need to find the fimiliar in something that is ultimately completely alien to our way of thinking.

But with enough existing prompts and training data, it will continue to learn and better trick our senses.

I totally agree that putting those instructions into code would be outrageously complicated, and the biggest strength here is it's ability to the gist of what we are trying to convey.

thfuran · 2 years ago

But it mostly didn't comply. The overlong prompts consistently contain details that are ignored.

qwertox · 2 years ago

A couple of days ago I didn't notice that ChatGPT was still set to Dall-E, because it was just helping me along "as usual" with my programming tasks without giving any hints of being in this Dall-E-Mode.

When I noticed this, I asked it to generate an image of what has been discussed so far, where the first image turned out to be pretty nice [0]

We were dealing a lot with timestamps, NumPy, Pandas stuff.

https://imgur.com/a/SBZ36KT

ChatGTP · 2 years ago

I love the way the cars are often “driving on water”. So perfect yet so wrong.

nicholassmith · 2 years ago

Simulating Red Bulls next set of upgrades no doubt.

iamflimflam1 · 2 years ago

What’s interesting to me from this is that the technique they are using to integrate dalle into ChatGPT is pretty much the same as they are using for plugins.