It's quite fascinating to play around with this tool. Like user `stared` mentions below, there are words like epic, realistic etc that you can add to shape the pictures quite a lot. Auto-generated text like this is not so great at producing interesting results.
For these prompts, I like the ones generated by Midjouney much more.
While they are less diverse, they are more consistent for a digital art style. Thank you for making this comparison!
The generated images are underwhelming. Other than the snake king one, they all kind of look the same album art look. The generated text also seems like it's trying to have some cohesive theme to it, but doesn't bring it together like one might expect in a short story. And the writing style is filled with cliches.
I can see it being good for inspiration, but it's pretty rough for final consumption.
Midjourney seems to have better hunch for artistic drama. Though I'm not very satisfied with the AI outcomes, probably because they don't watch movies(yet!).
[0]You see a figure in the distance, waving at you. As you get closer, you realize that the figure is your doppelganger. Your doppelganger is wearing your clothes and has your face, but their eyes are black voids. You hear a voice in your head that says,
“We are one. You are me and I am you.”
Wow, OpenAI should be truly taught in business schools around the world as a case study in product marketing. Despite the dozens or hundreds of similar tools, models, colab notebooks, etc. created by a wide flourishing community, these two products are top of mind for everyone. Both terribly disappointing and envy inspiring for us in the field!
GPT-3 and DALL-E (by my subjective opinion) feel much better than other models that their respective tasks.
DALL-E in particular kind of blows the VQGAN+CLIP messing around I've done out of the water. GPT-3 feels markedly better than other text generation or chatbots I've tried.
Definitely these are well marketed, and not the only models aorund, but they also feel ahead of other things I've tried. Can you point out some of these other tools/models?
It's not as simple as "x is better than y". They all have their own flavours. I'd had results from JAX CLIP Guided Diffusion that I can't get from anything else and some of my early experiments with Disco Diffusion have a quality that is unique. I think people will always mix and match models due to their unique qualities.
Having said that I'm on the beta for Stable Diffusion ( https://stability.ai/ ) and it's remarkably capable across a broad range of styles. Dall-E probably still has the edge for more complex semantic prompts and photographic coherance but it's very good and it's got a very open strategy.
I’ll dare make a prediction: Transformers will be declared a dead end in AI in few years and GPT-4 might be the last in line. The method is able to produce highly convincing gimmicks (which is what it comes to once initial excitement splashes) but that is not how art or intelligence works.
> The method is able to produce highly convincing gimmicks (which is what it comes to once initial excitement splashes) but that is not how art or intelligence works.
Uh, art is often about reproducing highly convincing gimmicks.
Language model != Transformers. The scaling power of the transformer architecture allows LLMs to exist, but LMs are just one application of transformers. The architecture itself is extremely generic and is applicable to nearly every problem.
I'm pretty certain whatever comes after GPT-4 will still be using layered self-attention.
Everything is eventually called a dead end in AI, no? Once it has moved from the bleeding edge to something the public uses, the limitations become known and everyone says, "of course a computer can do it, but that's not intelligence."
That's because we aren't building systems capable of self-analysis and awareness. As far as I'm aware, there are only AI/ML systems capable of analyzing some data and giving some output with little to no nuance
no, the OP says it in a different sense, namely that even if you are duly impressed by these advancements, they will stop in a few years and a new line of research will be needed
Prompts matter - here, it is interesting to see the cascade of prompts, i.e. from ones you pass to GPT3 to those generated by it.
I also like the topic choice - getting DALLE2 esoteric, dreamy, surreal. I find generating things that do not exist more interesting - as it stretches the limits of AI creativity and imagination. (Plug: I generated quite a few religious, symbolic, and esoteric images, both pleasant and gloomy, https://pmigdal.medium.com/dall-e-2-and-transcendence-3a3a40...)
Checkout https://text-generator.io for an api compatible switch from gpt-3 for generating text or code but at a reasonable price, there’s some upcoming alternatives got Dall-e too like disco diffusion or stability.ai one day would be a easy switch to a cheaper or better quality ( or both) service from dall-e so keep an eye out for competition. There’s monopoly level pricing in openai right now so be careful y’all
https://hackmd.io/@einarmagnus/midjourney-hallucinations
It's quite fascinating to play around with this tool. Like user `stared` mentions below, there are words like epic, realistic etc that you can add to shape the pictures quite a lot. Auto-generated text like this is not so great at producing interesting results.
The generated images are underwhelming. Other than the snake king one, they all kind of look the same album art look. The generated text also seems like it's trying to have some cohesive theme to it, but doesn't bring it together like one might expect in a short story. And the writing style is filled with cliches.
I can see it being good for inspiration, but it's pretty rough for final consumption.
Here is the output for the first prompt[0]: https://imgur.com/a/18NRrx4
[0]You see a figure in the distance, waving at you. As you get closer, you realize that the figure is your doppelganger. Your doppelganger is wearing your clothes and has your face, but their eyes are black voids. You hear a voice in your head that says, “We are one. You are me and I am you.”
I wonder if there is a trade off between a larger data set so more accurate and how creative it can be.
DALL-E in particular kind of blows the VQGAN+CLIP messing around I've done out of the water. GPT-3 feels markedly better than other text generation or chatbots I've tried.
Definitely these are well marketed, and not the only models aorund, but they also feel ahead of other things I've tried. Can you point out some of these other tools/models?
Having said that I'm on the beta for Stable Diffusion ( https://stability.ai/ ) and it's remarkably capable across a broad range of styles. Dall-E probably still has the edge for more complex semantic prompts and photographic coherance but it's very good and it's got a very open strategy.
tbh I'm surprised they don't.
Uh, art is often about reproducing highly convincing gimmicks.
See https://en.wikipedia.org/wiki/Realism_(arts)
I think you mean that Transformers will become kitsch, which probably.
I'm pretty certain whatever comes after GPT-4 will still be using layered self-attention.
Deleted Comment
I also like the topic choice - getting DALLE2 esoteric, dreamy, surreal. I find generating things that do not exist more interesting - as it stretches the limits of AI creativity and imagination. (Plug: I generated quite a few religious, symbolic, and esoteric images, both pleasant and gloomy, https://pmigdal.medium.com/dall-e-2-and-transcendence-3a3a40...)
Is the fire god going to tell me to write a book report on Call of the Wild? ( https://www.youtube.com/watch?v=KXVdTT1Sis0&t=6m42s )
https://play.aidungeon.io/