I was supposed to be making a video game, but got a bit sidetracked when DALL·E came out and made this website on the side: http://dailywrong.com/ (yes I should get SSL).
It's like The Onion, but all the articles are made with GPT-3 and DALL·E. I start with an interesting DALL·E image, then describe it to GPT-3 and ask it for an Onion-like article on the topic. The results are surprisingly good.
Somehow these articles are more readable than typical AI-generated search engine fodder... Is it because I'm entering the site with an expectation of nonsense?
Feels like the headlines could be generated similar to the style of "They Fight Crime!"
"He's a hate-fuelled neurotic farmboy searching for his wife's true killer. She's a tortured insomniac snake charmer from a family of eight older brothers. They fight crime!"
>He's an unconventional gay paranormal investigator moving from town to town, helping folk in trouble. She's a violent motormouth wrestler from the wrong side of the tracks. They fight crime!
>He's a Nobel prize-winning sweet-toothed rock star who believes he can never love again. She's a strong-willed communist widow with a knack for trouble. They fight crime!
>He's an obese white trash barbarian with a secret. She's a virginal thirtysomething traffic cop with the power to bend men's minds. They fight crime!
The results with things that are artworks or more general concepts are fascinating, but there is for sure something creepy with "photorealistic" human eyes and faces going on...
If you want to see some really creepy AI generated human "photo" faces, take a look at Bots of New York:
Unfortunately the content of that project is a hostage of Facebook now - similar to ransomware gangsters they force you to do something to get the data, in this case you need to create an account and take part in that global surveillance network. I do not understand why people do that.
We joke about it, but an early and very cheap robotic floor cleaner I had was one of those weasel balls constrained in a flat ring harness with a dusting cloth underneath. It was entertaining and not completely useless.
Put a guinea pig in there and you'd get the same effect.
Actually got a chuckle out of the duck one (http://dailywrong.com/man-finally-comfortable-just-holding-a...). Thanks! I hope your keep generating them. Kind of wish there weren't a newsletter nag, but on the other hand it adds to the realism. Could be worthwhile to generate the text of the nag with gpt too; call it a kind of lampshading.
Haha, I was in a very similar boat when I built https://novelgens.com -- I was also supposed to be making a video game, but got a bit sidetracked with VQGAN+CLIP and other text/image generation models.
Now I'm using that content in the video game. I wonder if you could use these articles as some fake news in your game, too. :)
At first I came up with them myself, but found that it often comes up with better ones, so I ask it for variations.
I think I got it to even fill the title given a picture, something like “Article picture caption: Man holding an apple.
Article title: ...”. Might experiment more with that in the future.
This a fucking fantastic site, it’s absolutely hilarious, and I’ve bookmarked it - I kinda unironically want to set it as my home page - but just a heads up that the CSS is broken for me on my iPhone SE2.
The images don’t scale properly with the rest of the site, they’re massive compared to the content.
I’m curious, if they’re only making DALL-E accessible now, and if GPT-3 was never really accessible (as far as I know). How do you have access to these things to generate text and images?
How do you generate the original image? And what about the subsequent images, do they come automatically from the text? I'd love to know more about the process.
I have been having a blast with DALL-E, spending about an hour a day trying out wild combinations and cracking my friends up. I cannot imagine getting bored of it; it's like getting bored with visual stimulus, or art in general.
In fact, I've been glad to have a 50/day limit, because it helps me contain my hyperfocus instincts.
The information about new pricing is, to me as someone just enjoying making crazy imagines, a huge drag. It means that to do the same 50/day I'd be spending $300/month.
OpenAI: introduce a $20/month non-commercial plan for 50/day, and I'll be at the front of the line.
I think people don't realize how huge these models really are.
When they're free, it's pretty cool. But charge an amount where there's actual profit in the product? Suddenly seems very expensive and not economically viable for a lot of use cases.
We are still in the "you need a supercomputer" phase of these models for now. Something like DALLE mini is much more accessible but the results aren't good enough. Early early days.
> I think people don't realize how huge these models really are.
They really aren't that large by the contemporary scaling race standards.
DALLE-2 has 3.5B parameters, which should fit on an old GPU like Nvidia RTX2080, especially if you optimize your model for inference [1][2] which is commonly done by ML engineers to minimize costs. With optimized model, your memory footprint is ~1 byte per parameter, and some less than 1 ratio (commonly ~0.2) of all parameters to store intermediate activations.
You should be able to run it on Apple M1/M2 with 16GB RAM via CoreML pretty fine, if an order of magnitude slower than on an A100.
Training isn't unreasonably costly as well: you can train a model given O(100k)$ which is less than a yearly salary of a mid-tier developer in silicon valley.
There is no reason these models shouldn't be trained cooperatively and run locally on our own machines. If someone is interested in cooperating with me on such a project, my email is in the profile.
How hard would it be to spin off a variant of this with more focused data models that cater to specific styles or art-types? Like say, a data model only for drawing animals. Or one only for creating new logos?
I’ve been creating generative art since 2016 and I’ve been anxiously waiting for my invite. I wont be able to afford to generate the volume of images it takes to get good ones at this price point.
I can afford $20/mo for something like this but I just can’t swing $200 to $300 it realistically takes to get interesting art out of these CLIP-centric models.
Heck, the initial 50 images isn’t even enough to get the hang of how the model behaves.
MidJourney is a good alternative. Maybe not quite as good as DALL-E, but close enough, without a waitlist and with hobby-friendly prices ($10/month for 200 images/month, or $30 for unlimited)
If you’re technically inclined, I urge you to explore some newer Colabs being shared in this space. They offer vastly more configurable tools, work great for free on Google Colab, are straightforward to run on a local machine.
Meanwhile we should prepare ourselves for a future where the best generative models cost a lot more as these companies slice and dice the (huge) burgeoning market here.
Yeah I've been having fun with it recreating bad Heavy Metal album art (https://twitter.com/P_Galbraith/status/1548597455138463744). It's good, but surprisingly difficult to direct it when you have a composition in mind. A few of these I burned through 20-30 prompts to get and I can't see myself forking up hundreds of dollars to roll the dice.
My brother is a digital artist and while excited at first he found it to be not all that useful. Mainly because it falls apart with complex prompts, especially when you have a few people or objects in a scene, or specific details you need represented, or a specific composition. You can do a lot with in-painting but it requires burning a lot of credits.
I'm sure the novelty wears off. But I'm already coming up with several applications for it.
On the personal side, I've been getting into game development, but the biggest roadblock is creating concept art. I'm an artist but it takes a huge amount of time to get the ideas on paper. Using DALLE will be a massive benefit and will let me expedite that process.
It's important to note that this is not replacing my entire creative process. But it solves the issue I have, where I'm lying in bed imagining a scene in my mind, but don't have the time or energy to sketch it out myself.
I don't know how to say this without sounding like a jerk, even if I bend over backwards to preface that this isn't my intent: this statement says more about your creativity and curiosity than a ceiling on how entertaining DALL-E can be to someone who could keep multiple instances busy, like grandma playing nine bingo cards at once.
Knowing that it will only get better - animation cannot be far behind - makes me feel genuinely excited to be alive.
Same. I generated several thousand images and found it a chore, outside of the daily theme on the discord server, to try and even think of anything to query. It was also discouraging when sometimes you'd hit pure gold for 4-5 of the 6 images, then you'd be lucky to get 1 out of the 6 that was worth saving for several more queries. Now it's down to 4 images and... yeah...
I'm not going to try and profit from the images, I don't need them for any business uses or anything, so to me it was a fun for a while and now just something I'll largely put out of mind.
I was actually forcing myself to go through the whole 50/day because I knew it wouldn't be free forever, and I wanted to get better at it. I'm glad I did, but I wish I did more.
MidJourney gives ~unlimited generation for $30/month, and is nearly as good. Unlike DALL-E it doesn't deliberately nerf face generation. I've been having a blast.
> trying out wild combinations and cracking my friends up
Wait until the next edition comes out where it automatically learns the sorts of things that crack you up and starts generating them without any input from you.
Since many people will start generating their first images soon, be sure to check out this amazing DALL-E prompt engineering book [0]. It will help you get the most out of DALL-E.
I hope that every science teacher that can - provide this to every student. This is the future they live in now. They should know these as well as they know how to install an app on a device.
Wait until we have a DALL-E -- Enabled Custom EMOJI stream - whereby, every text you send out has it corresponding DALL-E resultant image for every txt --
Then we can compare images from different people at different times but the prompt was identical... and see what the resultant library of emoji<-->PROMPT looks like?
What about using Dall-e as a watermark for 'nft' signature 'notary' of an email.
If DALL-E provided a unique PID# for every image - and that PID was a key that only the OP runner of the image has - it can be used to authenticate an image to a text source... ??? (Assuming that no two prompts have the same result ever, but assigning a unique id that CAN be used to replay the image to verify it was generated when an original email/SMS was actually sent - it could be a unique way to timestamp authenticity/provenance of a thing...
Thanks for this! A bit of prompt engineering know-how will help me get the most bang for the buck out of this beta. I also just want to say that dallery.gallery is delightfully clever naming.
Surprised by the lack of comments on the ethics of DALL-E being trained on artists content whereas copilot threads are chock full of devs up in arms over models trained on open source code. Isn’t it the same thing?
I recently talked with a concept artist about DALL-E and first thing they mentioned was "you know that's all stolen art, right?" Immediately made me think of GitHub Copilot.
However the artists being featured in DALL-E's newsletters can't stop gushing about 'the new instrument they are learning how to play' and other such metaphors that are meant to launder what's going on.
My theory is that the professions most at-risk for automation are acting on their anxieties. Must not be a lot of freelance artists on HN, and a whole lot of programmers.
I think the artists have an even clearer case. I don't think GitHub Copilot is ready to steal anyone's job yet. But DALL-E is poised to replace all formerly commissioned filler art for magazines, marketing sites, and blogs. Now the only point to hiring a human is to say you hired a human. Our filler art is farm-to-table.
Having used copilot for over a year now, it isn't there to replace programmers. It isn't called GitHub Pilot, and it doesn't do well with generating original ideas. Sure, if your job is to create sign up forms in HTML then sure, it'll do your job in a second, but if you're creating more complex systems, copilot is just there to help save you time when writing code (which is just implementing ideas).
Think of it like a set of powertools saving you time over manual tools.
I first read the artist's reply as "you know all art is stolen, right" which made more sense to me. If you look at the history of art, you'd also know that it's true.
> My theory is that the professions most at-risk for automation are acting on their anxieties
That's not my problem with Copilot. I think tools and methods that can free human from some amount of work are good in a correctly organized society. They have been existing for a long time, too. They let us free time for other stuff that can't be automated. This extra free time could theoretically let us have more leisure or rest time too. I also trust myself to be able to learn another job if mine can ever be automated.
But I don't want my work to be reused under terms I don't approve of. There are some things I don't want to help with my work and this is reflected in the licenses I choose. I totally sympathize with artists who don't want their work to be reused in ways they don't like. I don't find this hard to understand. I also don't find it hard to understand that if an artist do some work that you should pay for to use is not happy with their work being reused without being paid. They should get paid a tiny bit for each generated art if theirs is in the training set, and only if they approve this use. That's would be only fair, the set would not be possible without those artists.
(Good for me, my personal code is not on GitHub for other, older, reasons)
This entire concept of AI learning using copyrighted works is going to be really tested in courts at some point, perhaps very soon, if not already.
However if the result is adequately different, I don't see how it is different from someone viewing other's work and then being "inspired" and creating something new. If you think about it the vast majority of things are built on top of existing ideas.
Quite true. Best case, we're seeing DJ Spooky style culture jamming/remixing. But more likely it is as you write.
On the other hand, the market for stock photography was already decimated by the internet. Where previously skilled photographers would create libraries of images to exemplify various terms and sell these as stock, in the last decade or so, an art director with the aid of a search engine could rapidly produce similar results.
Of course. Because the majority of the tech bros on this site are self centered and think of arts as a lowly field deserving of no respect. While something slightly resembling some boilerplate lego code they wrote is a criminal act to learn from.
If you really want to learn, visit github.com. There are over 200 million freely available, open source code repositories for you to study and learn from.
Surely being suprised by the lack of comments on the ethics of DALL-E on HN is the same as the lack of comments on the ethics of co-pilot on some artists forum. I highly doubt you're going to find r/artists or whatever up in arms about co-pilot, even if they are about DALL-E.
Well, I can't go ask Caravaggio or Gentileschi to paint my query since they've been dead hundreds of years. But being able to to feed a query containing much more modern concepts in and get a baroque style painting in that specific style is wonderful.
Plus what has already been said about a lot of art being an imitation/derivation of previous works.
It's because the furor over AI replicating human artists already played out over earlier AI iterations. Remember when thisfursonadoesnotexist.com was flamed for stealing furry art? Turns out that many artists shared an extremely generic style that the AI could easily replicate.
It feels like it would be good for this to not be a legal grey area. Whether it's considered a large copyright infringement conspiracy or a form of fair use, it would be good if the law reached a position on that sooner rather than later.
There are a few of those discussions going on in artist's circles these days. I imagine they'll get sued for doing this, but it'll probably take a very famous artist or a hell of a class action suit to make it happen.
> Preventing harmful images: We’ve made our content filters more accurate so that they are more effective at blocking images that violate our content policy — which does not allow users to generate violent, adult, or political content
What is defined as political content? Can I prompt DALL-E to draw ”Fat Putin”?
Something I haven’t seen anyone talking about with these huge models: how do future models get trained when more content online is model generated to start with? Presumably you don’t wanna train a model on autogenerated images or text, but you can’t necessarily know which is which.
This precise thing is causing a funny problem in specialty areas. People are using e.g. Google Lens to identify plants, birds and insects, which sometimes returns wrong answers e.g. say it sees a picture of a Summer Tanager and calls it a Cardinal. If the people then post "Saw this Cardinal" and the model picks up that picture/post and incorporates it into its training set, it's just reinforcing the wrong identification..
That's not really a new problem, though. At one point someone got some bad training data about an old Incan town, the misidentification spread, and nowadays we train new human models to call it Macchu Picchu.
It's a cybernetic feedback system. Dalle is used to create new images, the images that people find most interesting and noteworthy get shared online, and reincorporated into the training data, but now filtered through human desire.
I wonder if human artists can demand that their work not be used for modelling. So as the robots are stuck using older styles for their creations, the humans will keep creating new styles of art.
One interesting comment about this is that some models actually benefit from being fed their own output. Alphafold for instance was fed with its own 'high likelihood' outputs (as demis hassabis described in his lex friedman interview).
Training on auto generated images collected off the Internet is gonna be fine for a while since the images surfacing will be curated (ie. selected as good/interesting/valuable) still mostly by humans.
> Getting humans to refine your data is the best solution right now
Source ?
All those big models are trained with data for which the source is not known or vetted. The amount of data needed is not human-refinable.
For example for language models we train mostly on subsets of CommonCrawl + other things. CommonCrawl data is “cleaned” by filtering out known bad sources and with some heuristics such as ratio of text to other content, length of sentences etc.
The final result is a not too dirty but not clean huge pile of data that comes from millions of sources that no human as vetted and that no one in the team using the data knows about.
The same applies to large images dataset, e.g. Laon 400m that also comes from CommonCrawl and is not curated.
But how would you know? A random string of text or an image with the watermark removed is going to be very hard to distinguish generated from human written.
I think with the terms requiring explicitly telling which images/parts were generated, they could be filtered out and prevent a feedback loop of "generated in/generated out" images. I'm sure there will be some illegal/against terms of use cases there but the majority should represent fair use.
I fully expect stock image sites to be swamped by DALL-E generated images that match popular terms (e.g. "business person shaking hands"). Generate the image for $0.15. Sell it for $1.00.
DALLE images are still only 1024 px wide. Which has its uses, but I don’t think the stock photo industry is in real danger until someone figures out a better AI superresolution system that can produce larger and more detailed images.
You can obtain any size by using the source image with the masking feature. Take the original and shift it then mask out part of the scene and re-run. Sort of like a patchwork quilt, it will build variations of the masked areas with each generation.
Once the API is released, this will be easier to do in a programmatic fashion.
Note: Depending on how many times you do this... I could see there being a continuity problem with the extremes of the image (eg: the far left has no knowledge of the far right). An alternative could be to scale the image down and mask the borders then later scale it back up to the desired resolution.
This scale and mask strategy also works well for images where part of the scene has been clipped that you want to include (EG: Part of a character's body outside the original image dimensions). Scale the image down, then mask the border region, and provide that to the generation step.
Another commenter mentioned Topaz AI upscaling, and Pixelmator has the "ML Super Resolution" feature; both work remarkably well IMO. There are a number of drop-in and system default resolution enhancement processes that work in a pinch, but the quality is lacking compared to the commercial solutions. There are still some areas where DALL-E 2 is lacking in realism, but anyone handy with photo editing tools could amend those shortcomings fairly quickly.
On-demand stock photo generation probably is the next step, particularly when combined with other free media services (Unsplash immediately comes to mind). Simply choose a "look" or base image, add contextual details, and out pops a 1 of 1 stock photo at a fraction of the cost of standard licensing. It'll be very exciting seeing what new products/services will make use of the DALL-E API, how and where they integrate with other APIs, use cases, value adds like upscaling and formatting, etc.
I paid extra to get the higher quality model using the in-app purchase option. It crushes the phone's battery life, but runs in only ~10 seconds on an iPhone 13 Pro for a single 1000x1000 input image.
Makes me imagine stock image sites in the near future. Where your search term ("man looks angrily at a desktop computer") gets a generated image in addition to the usual list of stock photos.
Maybe it would be cheaper. I imagine it would one day. And maybe it would have a more liberal usage license.
At any rate, I look forward to this. And I look forward to the inevitable debates over which is better: AI generation or photographer.
In my experience it doesn’t require that much cherry picking if you use a carefully crafted prompt. For example: “ A professional photography of a software developer talking to a plastic duck on his desk, bright smooth lighting, f2.2, bokeh, Leica, corporate stock picture, highly detailed”
If the price is low enough, you can have humans rank generated images (maybe using Mechanical Turk or a similar service), and from that ranking choose only the highest quality DALL-E generated images.
So what's the loss? It's not like stock photos are the highest art form. Surely, for some people it means they need to change their business model, but all those just needing pictures to illustrate something the process will be much smoother.
There has been trouble with generating life-like eyes but a second pass with a model tuned around making realistic faces has been very successful at fixing that.
One of the commercial use cases this post mentions is authors who want to add illustrations to children's stories.
I wonder if there is a way for DALL-E to generate a character, then persist that character over subsequent runs. Otherwise, it would be pretty difficult to generate illustrations that depict a coherent story.
Example ...
Image 1 prompt: A character named Boop, a green alien with three arms, climbs out of its spaceship.
Image 2 prompt: Boop meets a group of three children and shakes hands with each one.
You can't do that. I can't see this working well for children's book illustrations unless the story was specifically tailored in a way that makes continuity of style and characters irrelevant.
As an aside, Ursula Vernon did pretty well under the constraint you described. She set a comic in a dreamscape and used AI to generate most of the background imagery: https://twitter.com/UrsulaV/status/1467652391059214337
It's not the "specify the character positions in text" proposed, but still a neat take on using this sort of AI for art.
You mean just generate a single large image with all the stuff you want for the whole story, and then use cropping and inpainting to get only the piece you want for each page?
Wait until someone trains a model like this, for porn.
There seems to be a post-DALLE obscenity detector on openAI's tool, as so far I've found it to be entirely robust against deliberate typos designed to avoid simple 'bad word lists'.
Ask it for a "pruple violon" and you get purple violins... you get the deal.
"Metastable" prompts that may or may not generate obscene (content with nudity, guns, violence as I've found) results sometimes shown non-obscene generations, and sometimes trigger a warning.
I’ve thought about this and in fact porn generation sounds like a good thing?? It ensures that it’s victimless. Of course, there is a problem with generation of illegal (underage) porn but other than this, I think it could be helpful for this world.
If all of the child porn industry switched to generated images they'd still be horrible people but many kids would be saved from having these pictures taken. So a commercial model should certainly ban it, but I don't think it's the biggest thing we have to worry about.
If I had to guess, I'd bet they have a supervised classifier trained to recognize bad content (violence, porn, etc) that they use to filter the generated images before passing them to the user, on top of the bad word lists.
This is mentioned, "content filters" are "blocking images that violate our content policy — which does not allow users to generate violent, adult, or political content, among other categories" and they "limited DALL·E’s exposure to these concepts by removing the most explicit content from its training data."
I suspect it's more a business restriction than a moral one. If OpenAI allows people to make porn with these tools, people will make a ton of it. OpenAI will become known as "the company that makes the porn-generating AIs," not "the company that keeps pushing the boundaries of AI." Being known as the porn-ai company is bad for business, so they restrict it.
It's like The Onion, but all the articles are made with GPT-3 and DALL·E. I start with an interesting DALL·E image, then describe it to GPT-3 and ask it for an Onion-like article on the topic. The results are surprisingly good.
This was really funny :)
http://dailywrong.com/man-finally-comfortable-just-holding-a...
"He's a hate-fuelled neurotic farmboy searching for his wife's true killer. She's a tortured insomniac snake charmer from a family of eight older brothers. They fight crime!"
https://theyfightcrime.org/
Here's an implementation in Perl.
http://paulm.com/toys/fight_crime.pl.txt
>He's an unconventional gay paranormal investigator moving from town to town, helping folk in trouble. She's a violent motormouth wrestler from the wrong side of the tracks. They fight crime!
>He's a Nobel prize-winning sweet-toothed rock star who believes he can never love again. She's a strong-willed communist widow with a knack for trouble. They fight crime!
>He's an obese white trash barbarian with a secret. She's a virginal thirtysomething traffic cop with the power to bend men's minds. They fight crime!
If you want to see some really creepy AI generated human "photo" faces, take a look at Bots of New York:
https://www.facebook.com/botsofnewyork
Easiest way for free SSL would be to just throw the domain on CloudFlare :)
lol
Put a guinea pig in there and you'd get the same effect.
Now I'm using that content in the video game. I wonder if you could use these articles as some fake news in your game, too. :)
I think I got it to even fill the title given a picture, something like “Article picture caption: Man holding an apple. Article title: ...”. Might experiment more with that in the future.
Hot dang. Some Reddit subs can be auto-generated now.
https://www.reddit.com/r/SubSimulatorGPT2/
but yea, it will have generated images now.
http://dailywrong.com/wp-content/uploads/2022/07/DALL%C2%B7E...
http://dailywrong.com/wp-content/uploads/2022/07/DALL%C2%B7E...
The images don’t scale properly with the rest of the site, they’re massive compared to the content.
Can DALL-E render Bat Boy?
Thank you! Bookmarked!
Deleted Comment
Why do the images load so slow though?
Dead Comment
In fact, I've been glad to have a 50/day limit, because it helps me contain my hyperfocus instincts.
The information about new pricing is, to me as someone just enjoying making crazy imagines, a huge drag. It means that to do the same 50/day I'd be spending $300/month.
OpenAI: introduce a $20/month non-commercial plan for 50/day, and I'll be at the front of the line.
When they're free, it's pretty cool. But charge an amount where there's actual profit in the product? Suddenly seems very expensive and not economically viable for a lot of use cases.
We are still in the "you need a supercomputer" phase of these models for now. Something like DALLE mini is much more accessible but the results aren't good enough. Early early days.
They really aren't that large by the contemporary scaling race standards. DALLE-2 has 3.5B parameters, which should fit on an old GPU like Nvidia RTX2080, especially if you optimize your model for inference [1][2] which is commonly done by ML engineers to minimize costs. With optimized model, your memory footprint is ~1 byte per parameter, and some less than 1 ratio (commonly ~0.2) of all parameters to store intermediate activations.
You should be able to run it on Apple M1/M2 with 16GB RAM via CoreML pretty fine, if an order of magnitude slower than on an A100.
Training isn't unreasonably costly as well: you can train a model given O(100k)$ which is less than a yearly salary of a mid-tier developer in silicon valley.
There is no reason these models shouldn't be trained cooperatively and run locally on our own machines. If someone is interested in cooperating with me on such a project, my email is in the profile.
1. https://arxiv.org/abs/2206.01861
2. https://pytorch.org/blog/introduction-to-quantization-on-pyt...
What are the resources needed to train this model?
If someone just gave you the model for free, what resources would you need to use it to generate new results?
I’ve been creating generative art since 2016 and I’ve been anxiously waiting for my invite. I wont be able to afford to generate the volume of images it takes to get good ones at this price point.
I can afford $20/mo for something like this but I just can’t swing $200 to $300 it realistically takes to get interesting art out of these CLIP-centric models.
Heck, the initial 50 images isn’t even enough to get the hang of how the model behaves.
Meanwhile we should prepare ourselves for a future where the best generative models cost a lot more as these companies slice and dice the (huge) burgeoning market here.
My brother is a digital artist and while excited at first he found it to be not all that useful. Mainly because it falls apart with complex prompts, especially when you have a few people or objects in a scene, or specific details you need represented, or a specific composition. You can do a lot with in-painting but it requires burning a lot of credits.
On the personal side, I've been getting into game development, but the biggest roadblock is creating concept art. I'm an artist but it takes a huge amount of time to get the ideas on paper. Using DALLE will be a massive benefit and will let me expedite that process.
It's important to note that this is not replacing my entire creative process. But it solves the issue I have, where I'm lying in bed imagining a scene in my mind, but don't have the time or energy to sketch it out myself.
Knowing that it will only get better - animation cannot be far behind - makes me feel genuinely excited to be alive.
I'm not going to try and profit from the images, I don't need them for any business uses or anything, so to me it was a fun for a while and now just something I'll largely put out of mind.
Deleted Comment
Multimodal.art (https://multimodal.art/) is working on a free version of something like DALLE, though it's not that good as of yet.
This belongs on /r/linkedinlunatics
Wait until the next edition comes out where it automatically learns the sorts of things that crack you up and starts generating them without any input from you.
Dead Comment
[0]: https://dallery.gallery/wp-content/uploads/2022/07/The-DALL%... (PDF)
I hope that every science teacher that can - provide this to every student. This is the future they live in now. They should know these as well as they know how to install an app on a device.
Wait until we have a DALL-E -- Enabled Custom EMOJI stream - whereby, every text you send out has it corresponding DALL-E resultant image for every txt --
Then we can compare images from different people at different times but the prompt was identical... and see what the resultant library of emoji<-->PROMPT looks like?
What about using Dall-e as a watermark for 'nft' signature 'notary' of an email.
If DALL-E provided a unique PID# for every image - and that PID was a key that only the OP runner of the image has - it can be used to authenticate an image to a text source... ??? (Assuming that no two prompts have the same result ever, but assigning a unique id that CAN be used to replay the image to verify it was generated when an original email/SMS was actually sent - it could be a unique way to timestamp authenticity/provenance of a thing...
However the artists being featured in DALL-E's newsletters can't stop gushing about 'the new instrument they are learning how to play' and other such metaphors that are meant to launder what's going on.
My theory is that the professions most at-risk for automation are acting on their anxieties. Must not be a lot of freelance artists on HN, and a whole lot of programmers.
I think the artists have an even clearer case. I don't think GitHub Copilot is ready to steal anyone's job yet. But DALL-E is poised to replace all formerly commissioned filler art for magazines, marketing sites, and blogs. Now the only point to hiring a human is to say you hired a human. Our filler art is farm-to-table.
Think of it like a set of powertools saving you time over manual tools.
That's not my problem with Copilot. I think tools and methods that can free human from some amount of work are good in a correctly organized society. They have been existing for a long time, too. They let us free time for other stuff that can't be automated. This extra free time could theoretically let us have more leisure or rest time too. I also trust myself to be able to learn another job if mine can ever be automated.
But I don't want my work to be reused under terms I don't approve of. There are some things I don't want to help with my work and this is reflected in the licenses I choose. I totally sympathize with artists who don't want their work to be reused in ways they don't like. I don't find this hard to understand. I also don't find it hard to understand that if an artist do some work that you should pay for to use is not happy with their work being reused without being paid. They should get paid a tiny bit for each generated art if theirs is in the training set, and only if they approve this use. That's would be only fair, the set would not be possible without those artists.
(Good for me, my personal code is not on GitHub for other, older, reasons)
However if the result is adequately different, I don't see how it is different from someone viewing other's work and then being "inspired" and creating something new. If you think about it the vast majority of things are built on top of existing ideas.
On the other hand, the market for stock photography was already decimated by the internet. Where previously skilled photographers would create libraries of images to exemplify various terms and sell these as stock, in the last decade or so, an art director with the aid of a search engine could rapidly produce similar results.
As long as DALL-E isn't caught painting out a 1-to-1, reverse searchable copy of an image, its not really as bad as copilot, IMO.
The issue isnt just that copilot is trained on my GPL code, its that it might decide to copy paste lines from it, including my comments, etc.
Well, I can't go ask Caravaggio or Gentileschi to paint my query since they've been dead hundreds of years. But being able to to feed a query containing much more modern concepts in and get a baroque style painting in that specific style is wonderful.
Plus what has already been said about a lot of art being an imitation/derivation of previous works.
(I really don't know, and I didn't find anything about it on their site.)
> Preventing harmful images: We’ve made our content filters more accurate so that they are more effective at blocking images that violate our content policy — which does not allow users to generate violent, adult, or political content
What is defined as political content? Can I prompt DALL-E to draw ”Fat Putin”?
Deleted Comment
https://en.m.wikipedia.org/wiki/Ouroboros
Getting humans to refine your data is the best solution right now and many companies and researches go with this approach.
Source ?
All those big models are trained with data for which the source is not known or vetted. The amount of data needed is not human-refinable.
For example for language models we train mostly on subsets of CommonCrawl + other things. CommonCrawl data is “cleaned” by filtering out known bad sources and with some heuristics such as ratio of text to other content, length of sentences etc.
The final result is a not too dirty but not clean huge pile of data that comes from millions of sources that no human as vetted and that no one in the team using the data knows about.
The same applies to large images dataset, e.g. Laon 400m that also comes from CommonCrawl and is not curated.
Clip was trained on 400,000,000 images, GPT is roughly 180B tokens, at 1-2 tokens per word, that's 120,000,000,000 words.
Once the API is released, this will be easier to do in a programmatic fashion.
Note: Depending on how many times you do this... I could see there being a continuity problem with the extremes of the image (eg: the far left has no knowledge of the far right). An alternative could be to scale the image down and mask the borders then later scale it back up to the desired resolution.
This scale and mask strategy also works well for images where part of the scene has been clipped that you want to include (EG: Part of a character's body outside the original image dimensions). Scale the image down, then mask the border region, and provide that to the generation step.
On-demand stock photo generation probably is the next step, particularly when combined with other free media services (Unsplash immediately comes to mind). Simply choose a "look" or base image, add contextual details, and out pops a 1 of 1 stock photo at a fraction of the cost of standard licensing. It'll be very exciting seeing what new products/services will make use of the DALL-E API, how and where they integrate with other APIs, use cases, value adds like upscaling and formatting, etc.
https://apps.apple.com/us/app/waifu2x/id1286485858
I paid extra to get the higher quality model using the in-app purchase option. It crushes the phone's battery life, but runs in only ~10 seconds on an iPhone 13 Pro for a single 1000x1000 input image.
Deleted Comment
Maybe it would be cheaper. I imagine it would one day. And maybe it would have a more liberal usage license.
At any rate, I look forward to this. And I look forward to the inevitable debates over which is better: AI generation or photographer.
And this is the first picture I got: https://labs.openai.com/s/lSWOnxbHBYQAtli9CYlZGqcZ
It got it a bit strong on the depth of field and I don’t like the angle but I could iterate a few times and get a good one.
Heck: If the cost to entry is prohibitively low they might do it at a loss and take over the site
Unless I'm missing something, these seem pretty darn good
I wonder if there is a way for DALL-E to generate a character, then persist that character over subsequent runs. Otherwise, it would be pretty difficult to generate illustrations that depict a coherent story.
Example ...
Image 1 prompt: A character named Boop, a green alien with three arms, climbs out of its spaceship.
Image 2 prompt: Boop meets a group of three children and shakes hands with each one.
It's not the "specify the character positions in text" proposed, but still a neat take on using this sort of AI for art.
Then use inpainting to only preserve that pose and generate new content around it. It’s definitely not perfect.
Then put that at the side of a transparent image, and use as the prompt, "Two identical aliens side by side. One is jumping"
There seems to be a post-DALLE obscenity detector on openAI's tool, as so far I've found it to be entirely robust against deliberate typos designed to avoid simple 'bad word lists'. Ask it for a "pruple violon" and you get purple violins... you get the deal.
"Metastable" prompts that may or may not generate obscene (content with nudity, guns, violence as I've found) results sometimes shown non-obscene generations, and sometimes trigger a warning.
Why do that? Just refusing to run my query is sufficient. Who is harmed if I continue to bang my head against that wall?