Readit News logoReadit News
anibalin commented on FLUX.1 Kontext   bfl.ai/models/flux-kontex... · Posted by u/minimaxir
echelon · 10 months ago
gpt-image-1 (aka "4o") is still the most useful general purpose image model, but damn does this come close.

I'm deep in this space and feel really good about FLUX.1 Kontext. It fills a much-needed gap, and it makes sure that OpenAI / Google aren't the runaway victors of images and video.

Prior to gpt-image-1, the biggest problems in images were:

  - prompt adherence
  - generation quality
  - instructiveness (eg. "put the sign above the second door")
  - consistency of styles, characters, settings, etc. 
  - deliberate and exact intentional posing of characters and set pieces
  - compositing different images or layers together
  - relighting
Fine tunes, LoRAs, and IPAdapters fixed a lot of this, but they were a real pain in the ass. ControlNets solved for pose, but it was still awkward and ugly. ComfyUI was an orchestrator of this layer of hacks that kind of got the job done, but it was hacky and unmaintainable glue. It always felt like a fly-by-night solution.

OpenAI's gpt-image-1 solved all of these things with a single multimodal model. You could throw out ComfyUI and all the other pre-AI garbage and work directly with the model itself. It was magic.

Unfortunately, gpt-image-1 is ridiculously slow, insanely expensive, highly censored (you can't use a lot of copyrighted characters or celebrities, and a lot of totally SFW prompts are blocked). It can't be fine tuned, so you're suck with the "ChatGPT style" and (as is called by the community) the "piss filter" (perpetually yellowish images).

And the biggest problem with gpt-image-1 is because it puts image and text tokens in the same space to manipulate, it can't retain the exact precise pixel-precise structure of reference images. Because of that, it cannot function as an inpainting/outpainting model whatsoever. You can't use it to edit existing images if the original image mattered.

Even with those flaws, gpt-image-1 was a million times better than Flux, ComfyUI, and all the other ball of wax hacks we've built up. Given the expense of training gpt-image-1, I was worried that nobody else would be able to afford to train the competition and that OpenAI would win the space forever. We'd be left with only hyperscalers of AI building these models. And it would suck if Google and OpenAI were the only providers of tools for artists.

Black Forest Labs just proved that wrong in a big way! While this model doesn't do everything as well as gpt-image-1, it's within the same order of magnitude. And it's ridiculously fast (10x faster) and cheap (10x cheaper).

Kontext isn't as instructive as gpt-image-1. You can't give it multiple pictures and ask it to copy characters from one image into the pose of another image. You can't have it follow complex compositing requests. But it's close, and that makes it immediately useful. It fills a much-needed gap in the space.

Black Forest Labs did the right thing by developing this instead of a video model. We need much more innovation in the image model space, and we need more gaps to be filled:

  - Fast
  - Truly multimodal like gpt-image-1
  - Instructive 
  - Posing built into the model. No ControlNet hacks. 
  - References built into the model. No IPAdapter, no required character/style LoRAs, etc. 
  - Ability to address objects, characters, mannequins, etc. for deletion / insertion. 
  - Ability to pull sources from across multiple images with or without "innovation" / change to their pixels.
  - Fine-tunable (so we can get higher quality and precision) 
 
Something like this that works in real time would literally change the game forever.

Please build it, Black Forest Labs.

All of those feature requests stated, Kontext is a great model. I'm going to be learning it over the next weeks.

Keep at it, BFL. Don't let OpenAI win. This model rocks.

Now let's hope Kling or Runway (or, better, someone who does open weights -- BFL!) develops a Veo 3 competitor.

I need my AI actors to "Meisner", and so far only Veo 3 comes close.

anibalin · 10 months ago
Thanks for the detailed post!
anibalin commented on Airfoil   ciechanow.ski/airfoil/... · Posted by u/todsacerdoti
pcurve · 2 years ago
This man to me is a modern day Da Vinci of the Web
anibalin · 2 years ago
It's truly impressive. the amount of time and dedication its uncanny.
anibalin commented on iPhone 15 and iPhone 15 Plus   apple.com/newsroom/2023/0... · Posted by u/mikece
zyang · 3 years ago
It's too polished. COVID is over. Bring back the live demos.
anibalin · 3 years ago
I was thinking the same thing. They should go back to the stage.
anibalin commented on Ask HN: Is it just me or GPT-4's quality has significantly deteriorated lately?    · Posted by u/behnamoh
bbotond · 3 years ago
Yes. Before the update, when its avatar was still black, it solved pretty complex coding problems effortlessly and gave very nuanced, thoughtful answers to non-programming questions. Now it struggles with just changing two lines in a 10-line block of CSS and printing this modified 10-line block again. Some lines are missing, others are completely different for no reason. I'm sure scaling the model is hard, but they lobotomized it in the process.

The original GPT-4 felt like magic to me, I had this sense of awe while interacting with it. Now it is just a dumb stochastic parrot.

anibalin · 3 years ago
Same happened with Dalle-2. It went downhill after a couple of weeks.
anibalin commented on Show HN: WhyBot, making GPT-4 question itself   whybot-khaki.vercel.app/... · Posted by u/johnqian
anibalin · 3 years ago
This is very clever. Thanks for sharing.
anibalin commented on Show HN: I want to change how people buy health supplements   backoflabel.com/... · Posted by u/richarlidad
anibalin · 3 years ago
This is great. I was thinking about this very same issue while driving today. What are the chances.
anibalin commented on Ask HN: Are ChatGPT answers getting worse for anyone else?    · Posted by u/raydiatian
anibalin · 3 years ago
The same effect on Dalle-2.
anibalin commented on Ask HN: How do you keep track of all the content you encounter?    · Posted by u/vvoruganti
fragmede · 3 years ago
Save the shit out of all of it. Only available on Apple Silicon. https://www.rewind.ai/
anibalin · 3 years ago
I was super hyped about this till I saw the price. 50 bucks per month.
anibalin commented on ChatGPT is a ‘code red’ for Google’s search business   nytimes.com/2022/12/21/te... · Posted by u/gnicholas
SubiculumCode · 3 years ago
chatgpt is often very wrong.
anibalin · 3 years ago
Don't forget chatgpt is three weeks old. Google is 20 years old. To me chatgpt already replaced google on some searches. Give it more time, there is only room for improvement here.
anibalin commented on Show HN: TromPhone, a Trombone for Your Phone    · Posted by u/depsypher
anibalin · 3 years ago
haha very cool!

u/anibalin

KarmaCake day19November 8, 2017
About
Books, Cybersecurity, DevOps, Education, Hacking, Open Source, Philosophy, Privacy, Programming, Remote Work, Research, Science, Social Impact, Technology, UI/UX Design, Web Development,
View Original