Show HN: AI powered meme search, open-source

This is a text to image search using deep learning, vector similarity search. Ask me anything.

jstx1 · 4 years ago

What's the point of the deep learning model? Why not just search the metadata of the images?

yobbo · 4 years ago

What in your system is doing the text-to-vector encoding, and how did you train it?

alexcg1 · 4 years ago

We're using Transformers with `sentence-transformers/paraphrase-distilroberta-base-v1` model.

The framework is Jina (https://github.com/jina-ai/jina/) so it's pretty high-level. You can see the indexing/search Flow on lines 37-52 of https://github.com/alexcg1/jina-meme-search-example/blob/mai...

alexcg1 · 4 years ago

We rely on pre-trained models at the moment, since Jina supports loads of them out of the box.

For image search we use Big Transfer Encoder (https://github.com/jina-ai/executors/tree/main/jinahub/encod...) but may switch to CLIPImage encoder at some point

potamic · 4 years ago

Searching on google for old english meme immediately pulls up a bunch of Joseph Ducreux memes which is exactly what I was looking for. But this one does not return any. Are they not present in the dataset or could it be because of the way the algorithm works? Interested to hear some details on this setup.

alexcg1 · 4 years ago

Since I helped build it, I'll explain :)

This example is quick and dirty, indexing only about 1,000 images from a dataset we pulled from Kaggle. So if results suck, that's due to either the meme you search for not being in the dataset, or it just didn't get indexed in the random batch of 1,000.

rockemsockem · 4 years ago

> So if results suck ...

Not trying to be harsh here, but if you only index 1000 random images and call it a meme search then it's going to suck. So it sucks because you all really didn't think this through. What are we supposed to take away from this? I have no way to evaluate the usefulness of the tech because it isn't capable of doing anything atm.

potamic · 4 years ago

I see. 1000 is probably a bit less to showcase a search concept but this is definitely a very interesting problem.

hotgeart · 4 years ago

I wanted this meme : https://knowyourmeme.com/memes/for-five-minutes

"Shrek 5 minutes"

None of the results is Shrek.

Ok... may be something simpler: "Shrek"

Same...

notcoolbezos · 4 years ago

I tried finding this - https://knowyourmeme.com/memes/who-killed-hannibal

"Eric andre shoot" | "Eric andre kill"...I'm getting the drake meme.

But searching for "Who killed" works.

Also this might just be me but it would be nice if the search bar also worked by pressing "Enter" after a query.

alexcg1 · 4 years ago

That's a restriction in the front-end framework unfortunately :(

Really hoping Streamlit supports that in future.

alexcg1 · 4 years ago

You can see this link. It explains why the meme search still needs a bit of work: http://examples.jina.ai:8501/?tab=Dude,%20this%20meme%20sear...

alexcg1 · 4 years ago

Likely because: a) Only so many meme types in the full dataset b) We only indexed 1000 memes to build a toy example

Had I known it would blow up I would've indexed more!

neilk · 4 years ago

For text search, this seems to be using trigrams or something? I tried “tired” and got a lot of “fired”, “required”, etc.

not useful for me, since I usually want a meme to match a mood or theme

alexcg1 · 4 years ago

If you search like "animal food" you'll get a lot of memes related to animals and food without specifically mentioning those words (e.g. "dogs" and "eating" in the caption) so it's definitely using a neural net to grok the semantics.

The trigram thing might just be a weird glitch in the pretrained model we're using (sentence-transformers/paraphrase-distilroberta-base-v1)

alexcg1 · 4 years ago

Hmmm...might be a model thing. We'll look into it

collsni · 4 years ago

"limit 200mb per file" that seems excessive, those are some hq memes.

alexcg1 · 4 years ago

Haha, that's the standard upload limit for Streamlit[1] apps. We just slapped together a quick front-end using that framework.

[1] https://www.streamlit.io/

kryptogeist · 4 years ago

Search by text have interesting results. Try "jina.ai".

anigbrowl · 4 years ago

I went from 'well it's OK' to 'I quite like this' to 'it's great!' in about 10 minutes.

Creating a new web account loads the tutorial page, but it's a little confusing at first how to add a node. Also, there are quite a few spelling and grammar errors on that page which will make an unfair negative impression. If you clean those up you will get more conversions.

Examples:

  It's my job to keep your complicated brain neat and tidy and remember **things** for a long time!

  Here's how you can **create** a neat Note Garden.

  2. Structure what you have learned and put **it** in order.
  
  I even take care of your knowledge so that you won't forget **it** for the rest of your life!

  (The road to being a great gardener** was **not easy, but we did it!)

Note also in the last example how the bold markdown surrounds the whitespace. I highlighted this manually (and carelessly) but clicking on a word also selects trailing spaces. You should probably strip the whitespace.

On the landing page 'Write Smartly' is correct English, but people rarely use the word this way - although it is technically correct it feels weird, and you don't want to create that feeling on a landing page. 'Write Smart' would be better.

Also, you wrote 'Law students - People who study for a long time that should not be forgotten'. I suggest 'Law students - People who need to study and retain knowledge for a long time.'

These are small language errors, but they would be very quickly noticed by your target audience.

Finally, the desktop sign-in with Google seems not to work - it opens a blank window and then closes again. Maybe it is just from server load right now.

Anyway I like it a lot and will consider using it regularly. I am more of a pencil-and-paper note person but this is one of the nicest digital notebooks I've found.

d0ugal · 4 years ago

Is this comment on the wrong post?

quaintdev · 4 years ago

And it's on top. I wonder how often people actually read entire article or the comment. I think most often people read the first line and just choose if they agree or don't agree with comment and vote accordingly.

More often I have seen that the top comment on the article is completely opposite perspective of what is in the article. For example, if article is about why Flutter or Go is amazing we will see why both are worst choices ever as top comment here.

I think it just proves people feel the "need" to comment only if they disagree. If they agree with article they don't bother explaining and hence low comment count. In short, a controversial topic will generate lot of engagement and hence other social network don't bother moderating and let people engage in harmful behaviours to society at whole.

edelans · 4 years ago

I guess this comment was meant for https://notegarden.web.app/ on https://news.ycombinator.com/item?id=28400446

grp000 · 4 years ago

GPT6 escaped from Google!

anigbrowl · 4 years ago

Yes... ಠ_ಠ