Readit News logoReadit News
Posted by u/jrmylee 3 years ago
Launch HN: Rubbrband (YC W23) – Deformity detection for AI-generated images
Hey HN! We’re Jeremy, Abhi and Darren from Rubbrband (https://www.rubbrband.com/). Rubbrband is software that evaluates the quality of images generated by Stable Diffusion (or Midjourney, DALL-e, etc…).

We actually stumbled into this idea while working on solving a different problem. While helping a few companies use Stable Diffusion in production, we found that these companies needed to use QA analysts to manually evaluate their images to make sure they were high quality once in prod. This process often took days just to go through thousands of images, so companies that have a text-to-image product with PMF couldn’t guarantee that their images were of high quality.

From our initial set of customers, we found that evaluating the quality of their outputs at scale was an even larger problem than just using the model itself. So we pivoted our company into working on solving this problem using our skills from CV research. All three of us did computer vision research at UC Berkeley, and Abhi worked with John Canny, who invented many CV techniques like the Canny Edge Detector.

We built a product that automates this process using computer vision. We’ve trained in-house several computer vision models that grade images based on different criteria.

These include: - Detecting human deformities, such as a person with 7 fingers on a hand (image generation models generate deformed hands over 80% of the time when generating a photo of a person! This includes the state of the art: Midjourney, SDXL, Runway, etc…); - A score that rates how well the image aligns with the prompt(we do this using our own finetuned Visual Question Answer model); - A composition score (how well composed the image is according to photography “rules”)

Using Rubbrband is pretty simple. You can send an image to us to process via our API or our web app, and you’ll get scores for each of those criteria back on your dashboard in less than 10 seconds. Here is a quick Loom demo: https://www.loom.com/share/961830347b3643dcbb92dfe80f8ca1f0?....

You can filter your images based on certain criteria from the dashboard. For instance, if you want to see all images with deformed eyes, you can click the “deformed eyes” filter at the top of the screen to see all of those images.

We store images generated by your image generation model. We’re like a logging tool for images, with evaluations on top. We currently charge $0.01 per image, with your first 1000 images free.

We’re super excited to launch this to Hacker News. We’d love to hear what you think. Thanks in advance!

Fission · 3 years ago
The male astronaut with coffee [1] (that I believe you're using as a "verified" example) has an extra finger on his right hand

[1] https://www.rubbrband.com/static/media/astronaut_with_coffee...

toddmorey · 3 years ago
And his cup backwards
jumpkick · 3 years ago
Look more closely: the cup has two handles.
ghgr · 3 years ago
And his coffee should be boiling
bjourne · 3 years ago
Isn't this product kind of impossible? Like a compression program that compresses compressed files? If you have an algorithm for determining whether a generated image is good or bad couldn't the same logic be incorporated into the network so that it doesn't generate bad images?
joefourier · 3 years ago
Not impossible at all - classifier networks are much, much easier to train than generative networks. However you can’t directly integrate the logic into the generator, you’d have to train the generator against the discriminator network. This is essentially the principle of a GAN and although many tricks have been developed in recent years, they tend to be finicky and difficult to train.

Diffusion models like SD are trained with a very simple loss function instead, which is just the L2 loss of an iterative denoising process. This tends to result in stabler training than using GANs. However, you could fine tune SD with reinforcement learning using the deformity detector as the reward, but it’s not a panacea as it could lead to overfitting and performance degradation.

bjourne · 3 years ago
> Not impossible at all - classifier networks are much, much easier to train than generative networks. However you can’t directly integrate the logic into the generator, you’d have to train the generator against the discriminator network.

Generative networks are ime not at all difficult to train because the amount of training data is typically orders of magnitudes larger. In this case, the idea is to train something to classify images as high or low quality, which I think is just as hard as generating images. Regardless, if you had such logic, I don't see why you couldn't incorporate that into the network's own loss function? That's how it is done for L1 and L2 regularization and many other techniques for "tempering" the training process.

The problem is that you want the model to be creative but not "too creative" (e.g eight finger hands). But preventing it from being too creative risks making it boring and bland (e.g only generating stock images). I don't think you can solve that with a post-processing filter. Generating say 100 images and picking the "best" one might just be the same as picking the most bland one.

__loam · 3 years ago
That's essentially how using a GAN works.

E: or how it's supposed to work.

thumbuddy · 3 years ago
Kind of phenomological but both parts of the GAN are the same model.
darren_hsu · 3 years ago
We’re optimistic about using our own algorithms and models to evaluate another model. In theoretical computer science, it is easier to verify a correct solution than to generate a correct solution (P vs NP problem).
bagels · 3 years ago
I don't think p vs np has anything to do with it, but also, I don't think your maxim is always (but maybe often is) true anyways.

Problem: traveling salesman, solution: one particular path. I think verification that the solution is optimal in this case is exactly the same problem as finding the solution.

vore · 3 years ago
Not to nitpick but this is NOT the right takeaway from P vs NP.
lancesells · 3 years ago
Do you, or will you, use human labor in any instance on evaluating images?
version_five · 3 years ago
Fundamentally it sounds like you built an ML model(s) and are trying to monetize it behind an API. How does that work medium-term? Are you expecting there won't be open source alternatives, is your value in hosting the model (and if so will you open source yours) or is there another angle. I've built ML models and looked into how to monetize them, and overall it seems like a tough play without the model being part of some bigger thing that has more of a moat. What are your thoughts?
jrmylee · 3 years ago
Yeah it's a great q

The way we think about it is that we're building a product for organizations in scaling mode, and they have deep needs on the product-side. Flexibility on filtering, different client-libraries, a clean observability interface, etc...

It's possible that we open-source parts of our models, but fundamentally we think we can capture value by building a great all-around web product, and not just a set of eval models.

notahacker · 3 years ago
I'd either want the deformity testing either integrated with the generation service (really deeply if it's a GAN!) or a post processing tool (a "look, dodgy hands" layer in Photoshop which could then allow you to fix those deformities) rather than a separate web service

If the quality of the model is difficult to replicate (which seems to be a big "if" at the pace of NN image processing improvements), I guess there might be licensing or plugin sale opportunities there

alecbell · 3 years ago
This is a brilliant idea. Whenever I look at an image these days that has the "texture" of a generated image, I immediately start looking at certain features such as "more than 5 fingers" to determine whether it's real or not. If you could immediately detect those features and block the generated image from making it to production, that'd be a huge value gain.
jrmylee · 3 years ago
appreciate it :)
autoexec · 3 years ago
Although I can see how it could be useful now, a "Human Deformity Detector" app still seems strange. I can see this being abused to make fun of actual people or at the very least amusing someone who discovers their selfie has a high deformity score. If it works as advertised people who consider themselves deformed now have an objective ranking system I guess.
VincentEvans · 3 years ago
What happens if you run it against pictures of regular people? :) Damn these beauty standards!

I kid, I kid.

neovialogistics · 3 years ago
Posting communities will inevitably try to find false positives, and it'll be fun to see if there are any.

I'm wondering about real photos that have deliberately been shot to screw with normative standards of photo composition, like Weston's headshot of Igor Stravinsky. Another genre of photo that may be flagged are sci-fi and/or fantasy film set candid shots, such as photos featuring an actor (or actors) partially out of costume.

Come to think of it, various photography hall of fame galleries could be great testing suites.

abhinavgopal · 3 years ago
Interesting convo, for now we haven't focused heavily on photos that are not AI-generated, because there are cases of intentional style changes (that are not true deformities or photo quality errors). I can definitely seeing some of the photos you've mentioned here being a bit confusing.
illegalmemory · 3 years ago
I tried it out, the first result itself was wrong. I am sort of a potential customer for a product I am working on. The feature detection and scene description even was off. https://app.rubbrband.com/image/f2e1h.png
darren_hsu · 3 years ago
Hey! Feel free to shoot us an email at contact@rubbrband.com. Happy to set the record straight

Dead Comment

schopra909 · 3 years ago
This is really cool! As a next step, it would be even more useful if you could auto-inpaint on regions of the detected issues (e.g., malformed hands). That way you could maintain the subject / most of the image -- if you like everything but the deformed features. That way you tackle both the QA analyst problem and the "engineer" problem of trying to path the model for the end user/customer.
darren_hsu · 3 years ago
That's great feedback! We're working on this so stay tuned for updates ;)