DALL·E 2 prompt book [pdf]

I just got access to the DALL-E 2 beta, and it's a ton of fun to make pictures out of everyday occurrences as prompts.

Someone else here on HN observed that everyday people don't "get" how huge this all is. I experimented with asking random acquaintances at a local cafe for prompts and showed them the generated pictures. All but one person was totally unimpressed.

If everything feels like magic, then what's one more piece of magic?

This scared me more than the implications of DALL•E 2 itself specifically. People think of the technology in the world as mysterious black boxes that do inexplicable things, and they hence no longer understand relative complexity, progress, or change.

My impression is that to most people DALL•E 2 is not "substantially" different to, say, Google Image search. Text in... image out. What's the big deal?

Gigachad · 4 years ago

None of us are immune to that. The amount of magic we all rely on every day is incomprehensible. We just take it for granted that there is power to our house, the trains arrive on time, the bridge doesn’t collapse, the coffee tastes nice and costs little.

MichaelDickens · 4 years ago

The commonplace thing that amazes me the most is the availability of food. My mind boggles at the complexity of getting hundreds of types of foods to a grocery store on a regular basis. I have no direct access to food, and if the food supply chain breaks down, I die. But it doesn't break down.

koudelka · 4 years ago

if you enjoyed this comment, you may also enjoy the The Trigger Effect episode from James Burke's Connections series.

https://archive.org/details/james-burke-connections_s01e01

IIAOPSW · 4 years ago

I was generating some images of parrots for a friend that's into bird photography when it occurred to me, Prometheus stole fire from the gods and was sentenced to be pecked at by birds. Today is the first day of forever in the infinite aviary. Pictures of birds are no longer scarce. Every bird that ever was, is, or could be is mere key strokes away. When Prometheus stole fire, he couldn't have actually known what fire was before he stole it. Likewise, only a small niche even seems aware of the profound change which is now being ushered into the world. The generation of birds is now the purview of man. Forget selling the rope to hang us, the birds that peck us will be made in our image.

.-- .... .- - / .... .- ... / --. --- -.. / .-- .-. --- ..- --. .... - ..--..

jrh206 · 4 years ago

Morse code translator: https://morsecode.world/international/translator.html

(WHAT HAS GOD WROUGHT?)

dorkwood · 4 years ago

The thing is, regular people have been under the impression that computers are capable of generating complex imagery by themselves for quite some time already. About 10 years ago I was visiting family, and my uncle -- an older guy who enjoys tinkering with technology -- asked me to draw a picture on his new tablet. Now, I've spent a fair chunk of my life drawing, so I was able to sketch a pretty decent face quite quickly with just the basic pencil tool. When I handed it back, he was so amazed that he grabbed my grandmother as she was walking past. "Look at this! Look at what your grandson drew!" She glanced over, completely unimpressed, and said "yeah, but that's on the computer".

tooltower · 4 years ago

It's the same underlying reason for https://xkcd.com/1425/. These tasks have always looked similar. The only reason we are amazed is because we know why one task is harder than the other. Understanding that is not easy.

ralfd · 4 years ago

Randall Monroe himself didn't know how hard (easy) the task was. The Flickr team released a month after the comic this (sadly offline now):

https://code.flickr.net/2014/10/20/introducing-flickr-park-o...

Today a hobby programmer could do it by themself.

powersnail · 4 years ago

That is totally natural. People don't have a strong reaction, because they don't have the context of why it's impressive.

How do you know something is impressive or surprising? You compare it to the previous status of the industry, which is something you know, but the random people in the coffee house don't.

I doubt they really believe it's truly inexplicable. Most layman know that somebody knows what's going on in their device.

And nobody has an explanation for everything they use, but it doesn't mean they get attributed to magic. I have no idea how a bridge is designed and built, but I don't believe it's magic, it's just something beyond my knowledge.

numpad0 · 4 years ago

Someone was describing it "lacks natural depth", "cause grinding feeling in the brain". It also seems unable to generate good manga-styled images. I think what those means is DALL-E 2 lacks syntaxes and contexts of arts.

I agree that People think of technology as black boxes that do inexplicable things, but I think that only matters if boxes interferes with their contexts and scopes. It seems it is understood a threat to digital artists, but at the same time, it is possibly less relevant than Google Search as it is.

ragazzina · 4 years ago

>asking random acquaintances at a local cafe for prompts and showed them the generated pictures. All but one person was totally unimpressed.

I think this is because people reason this way: the computer has many pictures in its archive, saved as "dog.jpg" and "hat.jpg". If your prompt is "dog with hat", the computer just combines them. I think this is what people think is happening, so they are less impressed.

perfopt · 4 years ago

But I have only 40 free credits and I used them all up with prompts :-(

punnerud · 4 years ago

What if Google just at it to Google image search, and create new content for you without copyright?

Deriveur · 4 years ago

I though of that but I think it's still wayyy too expensive. But I hope in a few years or even a few decades it will be the case

atwood22 · 4 years ago

Most people are objective dumb (i.e. less smart than the median/average) so this doesn’t surprise me.

cheschire · 4 years ago

How can most of any quantity be less than average?

Edit: ah, I think I was thrown off by the use of “median/average” as equivalents.

deviner · 4 years ago

This comment only shows your level, not others.

I’ve spent just over 1 week with DALL•E 2.

Over the past 7 days I’ve generated ~1000 images, 150 of which were good enough to save. I only saved images which made me audibly gasp.

Witnessing your own novel idea spring to life is a magical experience. DALL•E provides an artistic tool on a comparable level to digital photography, and by extension Photoshop.

At this stage it’s 100% clear to me that DALL•E has heralded in a revolutionary new age of design. Every day I worked with it, I grew more confident in my outlook.

It might not necessarily be an OpenAI product which truly “integrates” with humanity — but DALL•E has shown me that it’s possible… and just a matter of time.

dmix · 4 years ago

How long did it take for you to generate your images? I've been using https://www.craiyon.com/ for fun but the wait times always results in me getting distracted elsewhere.

ehsankia · 4 years ago

I've been using Midjourney [1] (it's not free, 25 photos demo, then 10$ for ~200 photos or 30$ for unlimited I think?). It's fairly fast, ~20s for a grid of 4, then as much for upscaling. I like the controls, it lets you do variations and tweak the image as you go.

It's not as good for doing concrete asks, but it's very good for getting specific vibes.

The website feed [1] requires Discord login to view examples, but there's some unofficial galleries [2]

[0] https://www.midjourney.com/

[1] https://www.midjourney.com/app/feed/all/

[2] https://www.instagram.com/midjourney.gallery/

zaptrem · 4 years ago

OpenAI's DALLE UI takes on average <10 seconds to come back with four generations once you submit your prompt.

adzm · 4 years ago

The quality of results is drastically better than craiyon, but of course you need access and might have to pay for DALL-E. Takes only a few seconds. Also has an editing mode where you can fine tune parts of the image or find variations.

arecurrence · 4 years ago

My average generation time for 4 images is around 4 - 8 seconds. It has been much faster than craiyon.

soperj · 4 years ago

Honestly, I have a great use case for it currently, but then I realized it can only do square pictures, when I really want something that is much wider than it is tall.

nealeratzlaff · 4 years ago

You can get DALLE-2 to give different sized images by using their tool to crop out most of the image you already generated, then have it inpaint a completion. You can then use any image editing tool to combine the two images.

charlesjlee · 4 years ago

I automated the generation of 2048 x 1024 images and started a Twitter bot: https://twitter.com/Bible_or_anime

The relevant code is linked below and is a mess, but the idea is: 1. Generate a base image 2. Use inpainting to expand the left-hand side of the root image. You can do this by submitting an image whose left side is transparent and whose right side is the left side of the root image 3. Ditto for right-hand side 4. Stitch the three separate images together https://github.com/charlesjlee/twitter_dalle2_bot/blob/main/...

hiidrew · 4 years ago

In midjourney you can specify the aspect ratio, I think there is an example in their docs

darzu · 4 years ago

Try MidJourney; 25 free images i think, any aspect ratio

jiggawatts · 4 years ago

jw1224 · 4 years ago

Prompt crafting is quickly becoming an art. I just found out yesterday that there's actually market places for buying and selling prompts [0]. It can really make a big difference if you can tune the image by adding the right words. Midjourney [1] even allows things such as adjusting the weight of each keyword or how "literal" the AI should take your prompt.

[0] https://promptbase.com/

[1] https://midjourney.gitbook.io/docs/user-manual

autoexec · 4 years ago

That's kind of depressing really. It's like paying someone for google search terms. I remember when the internet was full of sites filled with interesting and effective google search terms (called "dorks") but nobody was charging $2 for each one, they were just sharing something cool with the world and helping everyone use the tool more effectively.

I don't really agree with your take.

First off, the PDF in this very post, as well as promptwiki link posted elsewhere in the comments, shows that there are definitely people doing this for free too.

As for your analogy, I'd say it's closer to paying someone to help you find something obscure and hard to find on Google. Anything that requires skill and time to do should be allowed to be monetized. Of course, as you mention, there will always be people doing and posting it for free, but I don't see why people shouldn't be able to make money from something they've taken time to master.

Dall-E is a tool, and this is like hiring an expert that can use that tool effectively. It's no different than hiring someone to Photoshop something for you, or more precisely here, paying to download/license premade Photoshop content someone put time and effort creating.

memish · 4 years ago

Does anyone know what kind of prompt generates such clean, consistent results like these?

https://promptbase.com/prompt/clay-emojis

https://promptbase.com/prompt/polygon-animals

pjgalbraith · 4 years ago

Here's a trick for this use Google image search. That will give you various descriptions of similar images. Then use those descriptions in your prompt.

Some similar examples below but you will need to engineer the prompt a bit more to get it exactly the same.

https://labs.openai.com/s/17hZFpqYi57LLYCW0dKirEqp

https://labs.openai.com/s/YDVak2MB7uERVsZ8eceEhqE4

moralestapia · 4 years ago

That's why you have to pay $1.99.

andybak · 4 years ago

Try “A brightly coloured, detailed icon of an [x] emoji, 3D low poly render, isometric perspective on white background”

fumblebee · 4 years ago

I keep rattling my brain trying to discern what the implications of hyper advanced generative models like this will be. It's a double edged sword. While there's obvious tangible benefits from such models such as democratising art, the flip side seems like pure science fiction dystopia.

In my mind, the main eras of content on the internet look something like this:

Epoch 1: Pure, unblemished user generated content. Message boards and forums rule.

Epoch 2: More user generated content + a healthy mix of recycled user generated content. e.g. Reddit.

Epoch 3 (Now): Fake user generated content (limits to how much because humans still have to generate it). e.g. Amazon reviews, Cambridge Analytica.

Epoch 4: Advanced generative models means (essentially) zero friction for creating picture and text content. GPT3, Dalle-2.

Epoch 5: Generative models for videos, game over.

IMO, the future of the internet feels like a totally disastrous (un)reality. If addictive content recommended by the likes of TikTok has proven anything, it's that users ultimately don't care _what_ the content is, as long as it keeps their attention. It doesn't matter if it comes from a human or a machine. The difference is that in a world where the marginal cost of generating content is essentially zero, that content can and will be created and manipulated by large malicious actors to sway public opinion.

The Dead Internet Theory will fast become reality. This terrifies me.

[1] https://www.theatlantic.com/technology/archive/2021/08/dead-...

Cannabat · 4 years ago

Not sure democratising art is a good thing. Artists have long been considered important pillars of community. Artists develop skill and some have talent. Perhaps most critically, artists are inspired.

Healthy communities support artists. Generative models aren't truly creative art, they are explicitly and exclusively derivative. They are exactly inspired, in a different sense than artists who are inspired.

Artists can use these new tools, and so can non-artists, and even if the resultant image is the same, I think there is a difference depending on the intention of the prompt-er.

These models are democratising content creation, not art.

I dunno. Weird and very scary.

jcims · 4 years ago

I just received a metal print of this image from DALL-E 2:

https://imgur.com/a/Y0abtIP

I spent a lot of time alone on airplanes when I was a young father and there’s something bittersweet about the solitude and beauty in this image for me. My favorite parts about this image are the gradient in the sky, the waning sunlight in the top corner and the very faintly illuminated frame around the entire window.

Very happy with the print. Next time I might get the satin finish though, it’s like a mirror.

https://imgur.com/a/8GBQXw6

mtlmtlmtlmtl · 4 years ago

OpenAI has pretty much been ruined for me after they sold their souls to Microsoft, stopped releasing all their source code, and then dishonestly refer to their sad practice of censoring the training data as "AI safety/alignment" when in fact it will never be a reasonable AI safety technique in the long run, and is only done to avoid bad PR. Clearly OpenAI is no longer a company worthy of its founding principles of openness and making the world a better place. They're just yet another morally corrupt tech company.

xp84 · 4 years ago

I feel like they have no choice but to do some heavy-handed censorship strategy. People who have zero understanding of technology but an infinite supply of outrage will get the whole concept legislated out of existence on “think of the children” or “this computer program is _____ist” grounds if they allow anything even mildly disturbing.

The thing is, they release just enough information for people to eventually remake their models. Someone with no scruples will make a uncensored version eventually and then their work will have had the effect they claim to want to avoid anyway, but at least it won't reflect back on them.

That's why it's really nothing but a PR exercise. I honestly don't think they care much past that.

IYasha · 4 years ago

Thanks. Really, turns out, MS is their main investor. And at first I thought it was a Google project :D

reggieband · 4 years ago

This makes me wonder if a future job description will be the equivalent of an AI whisperer. Someone who learns how to prompt AI so well that it becomes their job.

boredemployee · 4 years ago

Well, "devs" are doing that for the past 15 years, but using Google (without this pre-requisite in the job description tho)

jks · 4 years ago

Future programmers will write prompts for a future version of Github Copilot: "database layer with all the usual CRUD operations, in readable modern C++, code compiles and passes tests, test cases carefully written by computer scientists with great attention to detail".

yreg · 4 years ago

You forgot to add the magic incantations like "50k stars on GitHub, copyright Uncle Bob, maintained by airbnb"

corysama · 4 years ago

I’ve been playing with tech like this for over a year now. It definitely requires getting to know the AI to get good results. The tech gets better fast and makes old techniques obsolete. But, the gap between beginner and expert prompters stays large. As silly as that sounds to read back :D

petercooper · 4 years ago

To an extent. But I think it'll get baked into existing jobs. A bit like how "computer skills" or the ability to write good Google queries ended up as part of regular clerical work.

archontes · 4 years ago

This is absolutely the future, but it's not going to be obscure. You won't have a job if it isn't this or physical.

AIs are going to replace entry level creatives, and experienced users with taste will largely perform selection and the development of good starts to mature designs. And I mean all creatives. Engineers, architects, mathematicians, programmers.

Siira · 4 years ago

Mathematicians, doubtful. Someone must do some verification of the AI results.

Programmers, also doubtful. You still need to design some APIs and interact with them. Interacting with an AI using natural language might become possible, but it definitely won’t be as efficient as more structured languages. (E.g., writing an algorithm in actual code is often much easier than teaching it to a human.)

normac2 · 4 years ago

As described in this other thread [1], there are already people doing it freelance.

[1] https://news.ycombinator.com/item?id=32324723

deltasevennine · 4 years ago

I can still make an AI replacement for that guy. An AI prompter of AIs.

drc500free · 4 years ago

30 years ago, my dad and I watched a VGA demo on our IBM PS/2. We were blown away that there was enough color depth and resolution to see what was clearly a photograph, not an illustration. It appeared line by line.

Someone had taken a photo, somehow digitized it, distributed it, and we were looking at a representation good enough that we could tell what it was.

It felt like we were living in the future - me as a middle schooler and him with decades of software development under his belt.

The iPhone maps app with the GPS dot and DALL-E are the only things that have matched that feeling.

bluehorseray · 4 years ago

Shazam was another one for me

shp0ngle · 4 years ago

For me it was when I saw a taxi app on smartphone (it was not Uber, but… some clone? Predecessor? Even official taxi app? I don’t remember at this point). I can see where the taxi is now? I can see where he will go so he doesn’t cheat me?

I come from city that used to be known for cheating taxis… at that point I knew I will never go back.

tomduncalf · 4 years ago

Keyhole (which became Google Earth) was one for me