I used to enjoy Translation Party, and over the weekend I realized that we can build the same feedback loop with DALLE-3 and GPT4-Vision. Start with a text prompt, let DALLE-3 generate an image, then GPT-4 Vision turns that image back into a text prompt, DALLE-3 creates another image, and so on.
You need to bring your own OpenAI API key (costs about $0.10/run)
Some prompts are very stable, others go wild. If you bias GPT4's prompting by telling it to "make it weird" you can get crazy results.
Here's a few of my favorites:
- Gnomes: https://dalle.party/?party=k4eeMQ6I
- Start with a sailboat but bias GPT4V to "replace everything with cats": https://dalle.party/?party=0uKfJjQn
- A more stable one (but everyone is always an actor): https://dalle.party/?party=oxpeZKh5
The starting prompt (or at least, the theme) was suggested by one of my kids. Watch in awe as a regular goat rampage accelerates into full cosmic horror universe ending madness. Friggin awesome:
https://dalle.party/?party=vCwYT8Em
[1]: https://x.com/venturetwins/status/1728956493024919604?s=20
An infinite loop, on an unknown influencer's machine, prompted GPT-5 to "make it more."
13 hours later, lights across the planet began to go out."
There's probably a disproportionate amount of Satanic material in the dataset #tinfoilhat #deepstate
> Write a prompt for an AI to make this image. Just return the prompt, don't say anything else, but also, increase the intensity of any adjectives, resulting in progressively more fantastical and wild prompts. Really oversell the intensity factor, and feel free to add extra elements to the existing image to amp it up.
I played with it a bit before I got results I liked - one of the key factors, I think, was giving the model permission to add stuff to the image, which introduced enough variation between images to have a nice sense of progression. Earlier attempts without that instruction were still cool, but what I noticed was that once you ask it to intensify every adjective, you pretty much go to 11 within the first iteration or two - so you wind up having 1 image of a silly cat or goat and then 7 more images of world-shattering kaiju.
The goat one (which again, was an idea from one of my kids) was by far the best in terms of "progression to insanity" that I got out of the model. Really fun stuff!
The longer the Icon of Sin is on Earth, the more powerful it becomes!
...wow that's pretty dramatic.
"Think hard about every single detail of the image, conceptualize it including the style, colors, and lighting.
Final step, condensing this into a single paragraph:
Very carefully, condense your thoughts using the most prominent features and extremely precise language into a single paragraph."
https://dalle.party/?party=1lSMniUP
https://dalle.party/?party=cEUyjzch
https://dalle.party/?party=14fnkTv-
https://dalle.party/?party=wstiY-Iw
Praise the Basilisk, I finally got rate-limited and can go to bed!
I guess it's not that far-fetched as your brain has to do the same to figure out if a scene (or an AI-generated one for that matter) has some weird issue that should pop out. So in a sense your brain does this too.
Interesting that for one and only one iteration, the anthropomorphized cardboard boxes it draws are almost all Danbo: https://duckduckgo.com/?q=danbo+character&ia=images&iax=imag...
It was surprising to see a recognizable character in the middle of a bunch of more fantastical images.
It's very interesting to observe how the relationship between the wolf and Redhood evolved from dark and menacing to serene and friendly.
Simply a cat, evolving into a lounging cucumber, and finally opposite world:
https://dalle.party/?party=pqwKQVka
Vibrant gathering of celestial octopus entities:
https://dalle.party/?party=lHNDUvtp
I'd love to see an alternative viewing mode here which shows the image and the following prompt. Then you need to click a button to reveal the next image. This allows you to picture in your mind what the image might like while reading the prompt.
Thanks for making this fun little app!
Update: I just realized you can get this effect by going into mobile mode (or resizing the window). You can then scroll down to see the image after reading the prompt.
Starting prompt: "A futuristic hybrid of a steam engine train and a DaVinci flying machine"
Results: https://dalle.party/?party=14ESewbz
(Addendum: In case anyone was curious how costs scale by iteration, the full ten iterations in this result billed $0.21 against my credit balance.)
Starting prompt: "A futuristic hybrid of a steam engine train and a DaVinci flying machine"
Results: https://dalle.party/?party=qLHPB2-o
Cost: Eight iterations @ $0.44 -- which suggests to me that the API is getting additional hits beyond the run. I confirmed that the share link isn't passing along the key (via a separate browser and a separate machine) so I'm not clear why this is might be.
These images are incredible but I often notice stuff like this and it kind of ruins it for me.
#3 & #4 are good too, when the tracks are smoking, but not the train.
It consistently shows a robot painting on a canvas. The first 4 are paintings of robots, the next 3 are galaxies, and the final 2 are landscapes.
My best guess to try to explain this would be that “gnome + art style + mushroom” will draw from a lot more concrete examples in the training data, whereas the AI is forced to reach a bit wider to try to concoct some image for the weird scenario given in the cat example.
Got stuck on Van Gogh's "Starry Night" after a while.
https://dalle.party/?party=LOcXREfq
Also, love the simplicity of this idea, would love a "fork" option. And to be able to see the graph of where it originated.
"Write a prompt for an AI to make this image. Just return the prompt, don't say anything else. Replace everything with corgi."
Then it takes that new prompt and feeds it to Dall-E to generate a new image. And then it repeats.
Deleted Comment