I just got access to the DALL-E 2 beta, and it's a ton of fun to make pictures out of everyday occurrences as prompts.
Someone else here on HN observed that everyday people don't "get" how huge this all is. I experimented with asking random acquaintances at a local cafe for prompts and showed them the generated pictures. All but one person was totally unimpressed.
If everything feels like magic, then what's one more piece of magic?
This scared me more than the implications of DALL•E 2 itself specifically. People think of the technology in the world as mysterious black boxes that do inexplicable things, and they hence no longer understand relative complexity, progress, or change.
My impression is that to most people DALL•E 2 is not "substantially" different to, say, Google Image search. Text in... image out. What's the big deal?
None of us are immune to that. The amount of magic we all rely on every day is incomprehensible. We just take it for granted that there is power to our house, the trains arrive on time, the bridge doesn’t collapse, the coffee tastes nice and costs little.
The commonplace thing that amazes me the most is the availability of food. My mind boggles at the complexity of getting hundreds of types of foods to a grocery store on a regular basis. I have no direct access to food, and if the food supply chain breaks down, I die. But it doesn't break down.
I was generating some images of parrots for a friend that's into bird photography when it occurred to me, Prometheus stole fire from the gods and was sentenced to be pecked at by birds. Today is the first day of forever in the infinite aviary. Pictures of birds are no longer scarce. Every bird that ever was, is, or could be is mere key strokes away. When Prometheus stole fire, he couldn't have actually known what fire was before he stole it. Likewise, only a small niche even seems aware of the profound change which is now being ushered into the world. The generation of birds is now the purview of man. Forget selling the rope to hang us, the birds that peck us will be made in our image.
The thing is, regular people have been under the impression that computers are capable of generating complex imagery by themselves for quite some time already. About 10 years ago I was visiting family, and my uncle -- an older guy who enjoys tinkering with technology -- asked me to draw a picture on his new tablet. Now, I've spent a fair chunk of my life drawing, so I was able to sketch a pretty decent face quite quickly with just the basic pencil tool. When I handed it back, he was so amazed that he grabbed my grandmother as she was walking past. "Look at this! Look at what your grandson drew!" She glanced over, completely unimpressed, and said "yeah, but that's on the computer".
It's the same underlying reason for https://xkcd.com/1425/. These tasks have always looked similar. The only reason we are amazed is because we know why one task is harder than the other. Understanding that is not easy.
That is totally natural. People don't have a strong reaction, because they don't have the context of why it's impressive.
How do you know something is impressive or surprising? You compare it to the previous status of the industry, which is something you know, but the random people in the coffee house don't.
I doubt they really believe it's truly inexplicable. Most layman know that somebody knows what's going on in their device.
And nobody has an explanation for everything they use, but it doesn't mean they get attributed to magic. I have no idea how a bridge is designed and built, but I don't believe it's magic, it's just something beyond my knowledge.
Someone was describing it "lacks natural depth", "cause grinding feeling in the brain". It also seems unable to generate good manga-styled images. I think what those means is DALL-E 2 lacks syntaxes and contexts of arts.
I agree that People think of technology as black boxes that do inexplicable things, but I think that only matters if boxes interferes with their contexts and scopes. It seems it is understood a threat to digital artists, but at the same time, it is possibly less relevant than Google Search as it is.
>asking random acquaintances at a local cafe for prompts and showed them the generated pictures. All but one person was totally unimpressed.
I think this is because people reason this way: the computer has many pictures in its archive, saved as "dog.jpg" and "hat.jpg". If your prompt is "dog with hat", the computer just combines them. I think this is what people think is happening, so they are less impressed.
Over the past 7 days I’ve generated ~1000 images, 150 of which were good enough to save. I only saved images which made me audibly gasp.
Witnessing your own novel idea spring to life is a magical experience. DALL•E provides an artistic tool on a comparable level to digital photography, and by extension Photoshop.
At this stage it’s 100% clear to me that DALL•E has heralded in a revolutionary new age of design. Every day I worked with it, I grew more confident in my outlook.
It might not necessarily be an OpenAI product which truly “integrates” with humanity — but DALL•E has shown me that it’s possible… and just a matter of time.
How long did it take for you to generate your images? I've been using https://www.craiyon.com/ for fun but the wait times always results in me getting distracted elsewhere.
I've been using Midjourney [1] (it's not free, 25 photos demo, then 10$ for ~200 photos or 30$ for unlimited I think?). It's fairly fast, ~20s for a grid of 4, then as much for upscaling. I like the controls, it lets you do variations and tweak the image as you go.
It's not as good for doing concrete asks, but it's very good for getting specific vibes.
The website feed [1] requires Discord login to view examples, but there's some unofficial galleries [2]
The quality of results is drastically better than craiyon, but of course you need access and might have to pay for DALL-E. Takes only a few seconds. Also has an editing mode where you can fine tune parts of the image or find variations.
Honestly, I have a great use case for it currently, but then I realized it can only do square pictures, when I really want something that is much wider than it is tall.
You can get DALLE-2 to give different sized images by using their tool to crop out most of the image you already generated, then have it inpaint a completion. You can then use any image editing tool to combine the two images.
The relevant code is linked below and is a mess, but the idea is:
1. Generate a base image
2. Use inpainting to expand the left-hand side of the root image. You can do this by submitting an image whose left side is transparent and whose right side is the left side of the root image
3. Ditto for right-hand side
4. Stitch the three separate images together
https://github.com/charlesjlee/twitter_dalle2_bot/blob/main/...
Prompt crafting is quickly becoming an art. I just found out yesterday that there's actually market places for buying and selling prompts [0]. It can really make a big difference if you can tune the image by adding the right words. Midjourney [1] even allows things such as adjusting the weight of each keyword or how "literal" the AI should take your prompt.
That's kind of depressing really. It's like paying someone for google search terms. I remember when the internet was full of sites filled with interesting and effective google search terms (called "dorks") but nobody was charging $2 for each one, they were just sharing something cool with the world and helping everyone use the tool more effectively.
First off, the PDF in this very post, as well as promptwiki link posted elsewhere in the comments, shows that there are definitely people doing this for free too.
As for your analogy, I'd say it's closer to paying someone to help you find something obscure and hard to find on Google. Anything that requires skill and time to do should be allowed to be monetized. Of course, as you mention, there will always be people doing and posting it for free, but I don't see why people shouldn't be able to make money from something they've taken time to master.
Dall-E is a tool, and this is like hiring an expert that can use that tool effectively. It's no different than hiring someone to Photoshop something for you, or more precisely here, paying to download/license premade Photoshop content someone put time and effort creating.
Here's a trick for this use Google image search. That will give you various descriptions of similar images. Then use those descriptions in your prompt.
Some similar examples below but you will need to engineer the prompt a bit more to get it exactly the same.
I keep rattling my brain trying to discern what the implications of hyper advanced generative models like this will be. It's a double edged sword. While there's obvious tangible benefits from such models such as democratising art, the flip side seems like pure science fiction dystopia.
In my mind, the main eras of content on the internet look something like this:
Epoch 1: Pure, unblemished user generated content. Message boards and forums rule.
Epoch 2: More user generated content + a healthy mix of recycled user generated content. e.g. Reddit.
Epoch 3 (Now): Fake user generated content (limits to how much because humans still have to generate it). e.g. Amazon reviews, Cambridge Analytica.
Epoch 4: Advanced generative models means (essentially) zero friction for creating picture and text content. GPT3, Dalle-2.
Epoch 5: Generative models for videos, game over.
IMO, the future of the internet feels like a totally disastrous (un)reality. If addictive content recommended by the likes of TikTok has proven anything, it's that users ultimately don't care _what_ the content is, as long as it keeps their attention. It doesn't matter if it comes from a human or a machine. The difference is that in a world where the marginal cost of generating content is essentially zero, that content can and will be created and manipulated by large malicious actors to sway public opinion.
The Dead Internet Theory will fast become reality. This terrifies me.
Not sure democratising art is a good thing. Artists have long been considered important pillars of community. Artists develop skill and some have talent. Perhaps most critically, artists are inspired.
Healthy communities support artists. Generative models aren't truly creative art, they are explicitly and exclusively derivative. They are exactly inspired, in a different sense than artists who are inspired.
Artists can use these new tools, and so can non-artists, and even if the resultant image is the same, I think there is a difference depending on the intention of the prompt-er.
These models are democratising content creation, not art.
I spent a lot of time alone on airplanes when I was a young father and there’s something bittersweet about the solitude and beauty in this image for me. My favorite parts about this image are the gradient in the sky, the waning sunlight in the top corner and the very faintly illuminated frame around the entire window.
Very happy with the print. Next time I might get the satin finish though, it’s like a mirror.
OpenAI has pretty much been ruined for me after they sold their souls to Microsoft, stopped releasing all their source code, and then dishonestly refer to their sad practice of censoring the training data as "AI safety/alignment" when in fact it will never be a reasonable AI safety technique in the long run, and is only done to avoid bad PR. Clearly OpenAI is no longer a company worthy of its founding principles of openness and making the world a better place. They're just yet another morally corrupt tech company.
I feel like they have no choice but to do some heavy-handed censorship strategy. People who have zero understanding of technology but an infinite supply of outrage will get the whole concept legislated out of existence on “think of the children” or “this computer program is _____ist” grounds if they allow anything even mildly disturbing.
The thing is, they release just enough information for people to eventually remake their models. Someone with no scruples will make a uncensored version eventually and then their work will have had the effect they claim to want to avoid anyway, but at least it won't reflect back on them.
That's why it's really nothing but a PR exercise. I honestly don't think they care much past that.
This makes me wonder if a future job description will be the equivalent of an AI whisperer. Someone who learns how to prompt AI so well that it becomes their job.
Future programmers will write prompts for a future version of Github Copilot: "database layer with all the usual CRUD operations, in readable modern C++, code compiles and passes tests, test cases carefully written by computer scientists with great attention to detail".
I’ve been playing with tech like this for over a year now. It definitely requires getting to know the AI to get good results. The tech gets better fast and makes old techniques obsolete. But, the gap between beginner and expert prompters stays large. As silly as that sounds to read back :D
To an extent. But I think it'll get baked into existing jobs. A bit like how "computer skills" or the ability to write good Google queries ended up as part of regular clerical work.
This is absolutely the future, but it's not going to be obscure. You won't have a job if it isn't this or physical.
AIs are going to replace entry level creatives, and experienced users with taste will largely perform selection and the development of good starts to mature designs. And I mean all creatives. Engineers, architects, mathematicians, programmers.
Mathematicians, doubtful. Someone must do some verification of the AI results.
Programmers, also doubtful. You still need to design some APIs and interact with them. Interacting with an AI using natural language might become possible, but it definitely won’t be as efficient as more structured languages. (E.g., writing an algorithm in actual code is often much easier than teaching it to a human.)
30 years ago, my dad and I watched a VGA demo on our IBM PS/2. We were blown away that there was enough color depth and resolution to see what was clearly a photograph, not an illustration. It appeared line by line.
Someone had taken a photo, somehow digitized it, distributed it, and we were looking at a representation good enough that we could tell what it was.
It felt like we were living in the future - me as a middle schooler and him with decades of software development under his belt.
The iPhone maps app with the GPS dot and DALL-E are the only things that have matched that feeling.
For me it was when I saw a taxi app on smartphone (it was not Uber, but… some clone? Predecessor? Even official taxi app? I don’t remember at this point). I can see where the taxi is now? I can see where he will go so he doesn’t cheat me?
I come from city that used to be known for cheating taxis… at that point I knew I will never go back.
Someone else here on HN observed that everyday people don't "get" how huge this all is. I experimented with asking random acquaintances at a local cafe for prompts and showed them the generated pictures. All but one person was totally unimpressed.
If everything feels like magic, then what's one more piece of magic?
This scared me more than the implications of DALL•E 2 itself specifically. People think of the technology in the world as mysterious black boxes that do inexplicable things, and they hence no longer understand relative complexity, progress, or change.
My impression is that to most people DALL•E 2 is not "substantially" different to, say, Google Image search. Text in... image out. What's the big deal?
https://archive.org/details/james-burke-connections_s01e01
.-- .... .- - / .... .- ... / --. --- -.. / .-- .-. --- ..- --. .... - ..--..
(WHAT HAS GOD WROUGHT?)
https://code.flickr.net/2014/10/20/introducing-flickr-park-o...
Today a hobby programmer could do it by themself.
How do you know something is impressive or surprising? You compare it to the previous status of the industry, which is something you know, but the random people in the coffee house don't.
I doubt they really believe it's truly inexplicable. Most layman know that somebody knows what's going on in their device.
And nobody has an explanation for everything they use, but it doesn't mean they get attributed to magic. I have no idea how a bridge is designed and built, but I don't believe it's magic, it's just something beyond my knowledge.
I agree that People think of technology as black boxes that do inexplicable things, but I think that only matters if boxes interferes with their contexts and scopes. It seems it is understood a threat to digital artists, but at the same time, it is possibly less relevant than Google Search as it is.
I think this is because people reason this way: the computer has many pictures in its archive, saved as "dog.jpg" and "hat.jpg". If your prompt is "dog with hat", the computer just combines them. I think this is what people think is happening, so they are less impressed.
Edit: ah, I think I was thrown off by the use of “median/average” as equivalents.
Over the past 7 days I’ve generated ~1000 images, 150 of which were good enough to save. I only saved images which made me audibly gasp.
Witnessing your own novel idea spring to life is a magical experience. DALL•E provides an artistic tool on a comparable level to digital photography, and by extension Photoshop.
At this stage it’s 100% clear to me that DALL•E has heralded in a revolutionary new age of design. Every day I worked with it, I grew more confident in my outlook.
It might not necessarily be an OpenAI product which truly “integrates” with humanity — but DALL•E has shown me that it’s possible… and just a matter of time.
It's not as good for doing concrete asks, but it's very good for getting specific vibes.
The website feed [1] requires Discord login to view examples, but there's some unofficial galleries [2]
[0] https://www.midjourney.com/
[1] https://www.midjourney.com/app/feed/all/
[2] https://www.instagram.com/midjourney.gallery/
The relevant code is linked below and is a mess, but the idea is: 1. Generate a base image 2. Use inpainting to expand the left-hand side of the root image. You can do this by submitting an image whose left side is transparent and whose right side is the left side of the root image 3. Ditto for right-hand side 4. Stitch the three separate images together https://github.com/charlesjlee/twitter_dalle2_bot/blob/main/...
[0] https://promptbase.com/
[1] https://midjourney.gitbook.io/docs/user-manual
First off, the PDF in this very post, as well as promptwiki link posted elsewhere in the comments, shows that there are definitely people doing this for free too.
As for your analogy, I'd say it's closer to paying someone to help you find something obscure and hard to find on Google. Anything that requires skill and time to do should be allowed to be monetized. Of course, as you mention, there will always be people doing and posting it for free, but I don't see why people shouldn't be able to make money from something they've taken time to master.
Dall-E is a tool, and this is like hiring an expert that can use that tool effectively. It's no different than hiring someone to Photoshop something for you, or more precisely here, paying to download/license premade Photoshop content someone put time and effort creating.
https://promptbase.com/prompt/clay-emojis
https://promptbase.com/prompt/polygon-animals
Some similar examples below but you will need to engineer the prompt a bit more to get it exactly the same.
https://labs.openai.com/s/17hZFpqYi57LLYCW0dKirEqp
https://labs.openai.com/s/YDVak2MB7uERVsZ8eceEhqE4
In my mind, the main eras of content on the internet look something like this:
Epoch 1: Pure, unblemished user generated content. Message boards and forums rule.
Epoch 2: More user generated content + a healthy mix of recycled user generated content. e.g. Reddit.
Epoch 3 (Now): Fake user generated content (limits to how much because humans still have to generate it). e.g. Amazon reviews, Cambridge Analytica.
Epoch 4: Advanced generative models means (essentially) zero friction for creating picture and text content. GPT3, Dalle-2.
Epoch 5: Generative models for videos, game over.
IMO, the future of the internet feels like a totally disastrous (un)reality. If addictive content recommended by the likes of TikTok has proven anything, it's that users ultimately don't care _what_ the content is, as long as it keeps their attention. It doesn't matter if it comes from a human or a machine. The difference is that in a world where the marginal cost of generating content is essentially zero, that content can and will be created and manipulated by large malicious actors to sway public opinion.
The Dead Internet Theory will fast become reality. This terrifies me.
[1] https://www.theatlantic.com/technology/archive/2021/08/dead-...
Healthy communities support artists. Generative models aren't truly creative art, they are explicitly and exclusively derivative. They are exactly inspired, in a different sense than artists who are inspired.
Artists can use these new tools, and so can non-artists, and even if the resultant image is the same, I think there is a difference depending on the intention of the prompt-er.
These models are democratising content creation, not art.
I dunno. Weird and very scary.
https://imgur.com/a/Y0abtIP
I spent a lot of time alone on airplanes when I was a young father and there’s something bittersweet about the solitude and beauty in this image for me. My favorite parts about this image are the gradient in the sky, the waning sunlight in the top corner and the very faintly illuminated frame around the entire window.
Very happy with the print. Next time I might get the satin finish though, it’s like a mirror.
https://imgur.com/a/8GBQXw6
That's why it's really nothing but a PR exercise. I honestly don't think they care much past that.
AIs are going to replace entry level creatives, and experienced users with taste will largely perform selection and the development of good starts to mature designs. And I mean all creatives. Engineers, architects, mathematicians, programmers.
Programmers, also doubtful. You still need to design some APIs and interact with them. Interacting with an AI using natural language might become possible, but it definitely won’t be as efficient as more structured languages. (E.g., writing an algorithm in actual code is often much easier than teaching it to a human.)
[1] https://news.ycombinator.com/item?id=32324723
Someone had taken a photo, somehow digitized it, distributed it, and we were looking at a representation good enough that we could tell what it was.
It felt like we were living in the future - me as a middle schooler and him with decades of software development under his belt.
The iPhone maps app with the GPS dot and DALL-E are the only things that have matched that feeling.
I come from city that used to be known for cheating taxis… at that point I knew I will never go back.