Very interesting that ChatGPT seems to prompt Dall-E via the client, rather than keeping that interaction entirely server-side. Keeping it server-side would be less likely to leak details as seen here, and makes it less susceptible to tampering.
Also nice to see that Dall-E 3 seeds were finally fixed. That must have happened within the last week or so; they weren't working last I checked (chatgpt always used a fixed seed of 5000).
> Midjourney and Stable Diffusion both have a “seed” concept, but as far as I know they don’t have anything like this capability to maintain consistency between images given the same seed and a slightly altered prompt.
I suspect this is more a function of Midjourney's prompt adherence being fairly poor right now. Even so, the images often aren't dramatically different. Example:
> Very interesting that ChatGPT seems to prompt Dall-E via the client, rather than keeping that interaction entirely server-side. Keeping it server-side would be less likely to leak details as seen here, and makes it less susceptible to tampering.
I don't have access to test, but given OpenAI's record on stuff like this, it would be a good idea for someone to check to see whether users can resend/intercept those requests to directly control the prompts that are sent to Dall-E without going through GPT.
Most likely they're only part of conversation history and they're unmodifiable, but I wouldn't necessarily take it as a given, and it would be quick for anyone with access who knows their way around the browser dev tools to check.
It's the same as how the Bing search/web browsing works. GPT-4 spits out function calls as JSON and then another systems picks those up and invokes the actual code on the back end.
> People have been trying to figure out hacks to get Midjourney to create consistent characters for the past year, and DALL-E 3 apparently has that ability as an undocumented feature! [by reusing the seed]
Using a constant seed to produce similar images has been the technique from the very start but it has limitations. You cannot e.g. keep the character consistant between different poses in this way.
Yeah, controlnet and ipadapter allow really fine grained control. The quality and creativity of Dalle3 beats Stable Diffusion (all models) but this fine grained control is missing.
DALL-E 2 and now DALL-E 3 have given me more laughs than anything in years.
There used to be a video game magazine which would rate games by "Improvements through improper play." That's exactly how I feel about DALL-E. There are several Subreddits and Facebook Groups I've submitted some seriously cursed AI output to.
GPT-V is a total marvel too. I just used one of the medieval Chaucer images from the recent HN post about their digitization, and told GPT my wife had left me a funny note this morning that I needed to read. It transcribed and translated it perfectly, even though it was practically unreadable.
The fact that you get access to DALL-E 3 "for free" if you're already subscribed to ChatGPT Pro is going to give MJ and other competitors a serious run for their money.
Also being able to reuse a seed to emulate a InstructPix2Pix architecture is a game changer.
And it's far better in my opinion - ChatGPT actually creates its own prompts from what you give it and feeds those to DALL-E (you can see this by clicking on the images it returns and reading what the actual prompt was), although you can disable this by just firmly telling ChatGPT not to modify your prompt.
Additionally, the image generation itself is quite different. ChatGPT's DALL-E seems to create much more stylized images - much harder to get plain shots that don't heavily embellish your description.
Maybe I'm not their typical target demographic but OpenAI's products are completely useless to me, the way they've neutered them and are restricting the output. For images I'd rather run Stable Diffusion locally. You own what you produce with it too (with some caveats). GPT-3 was cool before they came out with the chat version but it's been all downhill from there.
I don't find the the quality bad at all, and find them extremely useful. So far it's my daily assistant for software architecture, business plans, marketing content, debugging, summarisations, and many more.
The filter categories for OpenAI's moderation API are hate, harassment, self-harm, sexual, sexual/minors, and violence. Is it really the end of the world that DALL-E is rated PG-13?
There's no limit to the available models to generate adult content, but having one that doesn't makes it embedable in other applications.
Where on ChatGPT console can you access DALL-E 3? I am ChatGPT Pro subscriber, but all I can access, additional to GPT-4, are beta features (plugins and Advanced data analysis).
Those prompts are wild. It is deeply impressive that it works. Think about what would happen if you gave such instructions to a human. Would they be able comply? How big is the overlap between people who are creative enough to produce the kind of pictures Dall-E produces and disciplined enough to follow complex instructions so rigourously?
It also is just straight up impossible to convert those instructions to "regular" code.
I can't help but feel that any perceived overlap is coincidental. An illusion similar to seeing a face in an abstract drawing that the artists, or in this case algorithm developers are keen to exploit. Our need to find the fimiliar in something that is ultimately completely alien to our way of thinking.
But with enough existing prompts and training data, it will continue to learn and better trick our senses.
I totally agree that putting those instructions into code would be outrageously complicated, and the biggest strength here is it's ability to the gist of what we are trying to convey.
A couple of days ago I didn't notice that ChatGPT was still set to Dall-E, because it was just helping me along "as usual" with my programming tasks without giving any hints of being in this Dall-E-Mode.
When I noticed this, I asked it to generate an image of what has been discussed so far, where the first image turned out to be pretty nice [0]
We were dealing a lot with timestamps, NumPy, Pandas stuff.
What’s interesting to me from this is that the technique they are using to integrate dalle into ChatGPT is pretty much the same as they are using for plugins.
Also nice to see that Dall-E 3 seeds were finally fixed. That must have happened within the last week or so; they weren't working last I checked (chatgpt always used a fixed seed of 5000).
> Midjourney and Stable Diffusion both have a “seed” concept, but as far as I know they don’t have anything like this capability to maintain consistency between images given the same seed and a slightly altered prompt.
I suspect this is more a function of Midjourney's prompt adherence being fairly poor right now. Even so, the images often aren't dramatically different. Example:
https://analyzer.transfix.ai/?db=josh&q=%28robot+%7C%7C+andr...
I don't have access to test, but given OpenAI's record on stuff like this, it would be a good idea for someone to check to see whether users can resend/intercept those requests to directly control the prompts that are sent to Dall-E without going through GPT.
Most likely they're only part of conversation history and they're unmodifiable, but I wouldn't necessarily take it as a given, and it would be quick for anyone with access who knows their way around the browser dev tools to check.
Generating images in one chat: https://i.imgur.com/sIKSfCy.png
Reproducing exactly in another: https://i.imgur.com/C8Tqo48.png
Deleted Comment
Using a constant seed to produce similar images has been the technique from the very start but it has limitations. You cannot e.g. keep the character consistant between different poses in this way.
There used to be a video game magazine which would rate games by "Improvements through improper play." That's exactly how I feel about DALL-E. There are several Subreddits and Facebook Groups I've submitted some seriously cursed AI output to.
GPT-V is a total marvel too. I just used one of the medieval Chaucer images from the recent HN post about their digitization, and told GPT my wife had left me a funny note this morning that I needed to read. It transcribed and translated it perfectly, even though it was practically unreadable.
Also being able to reuse a seed to emulate a InstructPix2Pix architecture is a game changer.
Additionally, the image generation itself is quite different. ChatGPT's DALL-E seems to create much more stylized images - much harder to get plain shots that don't heavily embellish your description.
Have you used gpt4?
There's no limit to the available models to generate adult content, but having one that doesn't makes it embedable in other applications.
As I understand it, this is incorrect - all of these outputs are the creations of a machine, and as such are not eligible for copyright protection
It also is just straight up impossible to convert those instructions to "regular" code.
But with enough existing prompts and training data, it will continue to learn and better trick our senses.
I totally agree that putting those instructions into code would be outrageously complicated, and the biggest strength here is it's ability to the gist of what we are trying to convey.
When I noticed this, I asked it to generate an image of what has been discussed so far, where the first image turned out to be pretty nice [0]
We were dealing a lot with timestamps, NumPy, Pandas stuff.
https://imgur.com/a/SBZ36KT