Launch HN: Metal (YC W23) – Embeddings as a Service

Hey HN! We’re Taylor, James and Sergio – the founders of Metal (https://www.getmetal.io/). You can think of Metal as embeddings as a service. We help developers use embeddings without needing to build out infrastructure, storage, or tooling. Here’s a 2-minute overview: https://www.loom.com/share/39fb6df7fd73469eaf20b37248ceed0f

If you’re unfamiliar with embeddings, they are representations of real world data expressed as a vector, where the position of the vector can be compared to other vectors – thereby deriving meaning from the data. They can be used to create things like semantic search, recommender systems, clustering analysis, classification, and more.

Working at companies like Datadog, Meta, and Spotify, we found it frustrating to build ML apps. Lack of tooling, infrastructure, and proper abstraction made working with ML tedious and slow. To get features out the door we’ve had to build data ingestion pipelines from scratch, manually maintain live customer datasets, build observability to measure drift, manage no-downtime deployments, and the list goes on. It took months to get simple features in front of users and the developer experience was terrible.

OpenAI, Hugging Face and others have brought models to the masses, but the developer experience still needs to be improved. To actually use embeddings, hitting APIs like OpenAI is just one piece of the puzzle. You also need to figure out storage, create indexes, maintain data quality through fine-tuning, manage versions, code operations on top of your data, and create APIs to consume it. All of this friction makes it a pain to ship live applications.

Metal solves these problems by providing an end-to-end platform for embeddings. Here’s how it works:

Data In: You send data to our system via our SDK or API. Data can be text, images, PDFs, or raw embeddings. When data hits our pipeline we preprocess by extracting the text from documents and chunking when necessary. We then generate embeddings using the selected model. If the index has fine-tuning transformation, we transform the embedding into the new vector space so it matches the target data. We then store the embeddings in cold storage for any needed async jobs.

From there we index the embeddings for querying. We use HSNW right now, but are planning to support FLAT indexes as well. We currently index in Redis, but plan to make this configurable and provide more options for datastores.

Data Out: We provide querying endpoints to hit the indexes, finding the ANN. For fine-tuned indexes, we generate embeddings from the base model used and then transform the embedding into the new vector space during the pre-query phase.

Additionally, we provide methods to run clustering jobs on the stored embeddings and visualizations in the UI. We are experimenting with zero-shot classification, by embedding the classes and matching to each embedding in the closest class, allowing us to provide a “classify” method in our SDK. We would love feedback on what other async job types would be useful!

Examples of what users have built so far include embedding product catalogs for improved similarity search, personalized in-app messaging with user behavior clusters, and similarity search on images for content creators.

Metal has a free tier that anyone can use, a developer tier for $20/month, and an enterprise tier with custom pricing. We’re currently building an open source product that will be released soon.

Most importantly, we’re sharing Metal with the HN community because we want to build the best developer experience possible, and the only metric we care about is live apps on prod. We’d love to hear your feedback, experiences with embeddings, and your ideas for how we can improve the product. Looking forward to your comments, thank you!

Congrats on the launch. I started exploring applications of generative AI towards technical documentation this week. I quickly realized that embeddings are a key piece of the puzzle and I can give you some initial validation: I definitely don't want to manage this stuff myself and it really seems like I shouldn't need to. Also I am comfortable with services like Firebase so your product immediately makes sense to me because I basically think of it like Firebase for embeddings.

This is probably more feature creep than you can or want to sign up for at the moment, but I also don't really want to deal with manually transforming my Markdown or HTML into the sections of text that you use as input for embeddings. It would be nice if I could just provide URLs to my live documentation or Markdown source code, and your service takes a best guess at how to split it up into sections and then generate embeddings for each of those sections.

Last, I would be happy to talk to you all about docs strategy for your own docs sometime (I'm not looking for work at the moment; I just enjoy helping people with this stuff). You can contact me and learn more about my background via the social links on https://technicalwriting.tools (a blog about technical writing tooling topics that I just spun up).

Good luck!

tlowe11 · 2 years ago

Great to hear! Humbled by the Firebase comparison as well :) We've talked about the markdown/HTML feature, definitely something we want to build. And I'll ping you!

kaycebasques · 2 years ago

billybones · 2 years ago

Such an important problem!

I get the benefit over Pinecone (which wasn't built with LLMs, etc in mind)

How does this compare to Chroma? Feels like it has most of what you're talking about, and already has an open source product live.

https://www.trychroma.com/

gk1 · 2 years ago

> I get the benefit over Pinecone (which wasn't built with LLMs, etc in mind)

What do you mean?

Pinecone was specifically made to be used alongside LLMs and other embedding models. That’s how anyone uses Pinecone.

jxodwyer1 · 2 years ago

Chroma is awesome <3 - We have some overlap with them as we store the embeddings. But, we provide additional operations on top of the data, such as clustering/fine-tuning. We're also looking into open-sourcing some tools in the near future!

swalsh · 2 years ago

Postgres has an extension as well (pgvector). I've been using it, great performance, great scaling options (though I'm not even close to testing the limits) and gives you the full flexibility of Postgres.

It's easy enough to define a docker compose file, and deploy it to my environments.

sroussey · 2 years ago

That’s what I’m setting up now. What do you use to creat the embedding? OpenAI? Which model?

abyesilyurt · 2 years ago

How does it scale with the number of rows?

meekaaku · 2 years ago

Hi, Regarding a product catalog usecase. Say I embed our product catalog consisting of 1000 skus., then is there a way to update a specific field in the product? A product has name, description, sku etc that doesnt change much. But it also has frequently changed info like price, quantity_available, special_offer etc. How do I update these fields only and be able answer a question that customers send to our bot like:

Do you have this product A and what the price?

which means need to get the latest price and quantity_available field.

Is this possible to do with Metal?

We don’t support this use case yet, but we could by exposing an API to update the non-filterable metadata of the records. This is a cool use case; we would love to learn more about it. Would you want to create embeddings from the product name + description and then have the other attributes returned from the search results? We are very close to supporting this; just a matter of exposing a way to update those attributes

Yes static info are mainly product name/code/description/keywords etc. Dynamic ones are price, quantity_available or similar feeds.

esafak · 2 years ago

The value proposition is not clear to me. You don't generate the embeddings and there are already numerous vector databases. Maybe the versioning part?

AmazingTurtle · 2 years ago

Yeah I also don't understand why there are tons of (YC backed) startups providing little to none value. They basically man-in-the-middle the OpenAI GPT Platform. So... Yeah I've been running a pg_vector database with OpenAI Embeddings for 6 months now and I'm a solo hobby dev who experiments with it. Guess I could've built a startup with that knowledge L M A O

PaulHoule · 2 years ago

I dunno. It took like one line in conda to bring in GPU PyTorch, one for sentence-transformers, one line of Python to initialize it, one line to encode. No worries about somebody else getting data breached, acqui-hired, or struggling to find a sensible, fair, and profitable pricing model.

Clustering with sci kit-learn is… easy. Indexing in FAISS is… easy. Maybe it’s hard if you use Rust and it was hard to do this in Pythoh 5 years ago. Dilbert’s Boss probably thinks it is hard but he got fired…

You’re right! If you want to do that in a notebook, it’s pretty straightforward. But if you want to have it running in production, it’s a bit more complicated. Also, providing users with a gui to run these operations without a notebook has resonated with many less ml savvy users. Dilbert's boss probably didn't know much about ml... :)

I don’t use a notebook. I write plain Python scripts for batch jobs (run every day) and the UI is backed by aiohttp and HTMLX. I has no fear when I demoed my app in public for the first time since I’ve used it every day since the beginning of the year and it spins like a top.

fzysingularity · 2 years ago

Congrats on the launch!

Few questions/thoughts: - What kind of overheads do you have right now with calling this API?

- What scales have you pressure-tested this with? Demo seems to show few 100s of embeddings. Selfishly, I'd like to see a demo of handling 10M+ vectors to be reasonably certain that any company can truly build infrastructure in this context. I guess I'm more interested in the out-of-core applications where I can really shove all my data in here, and see if the system can handle it.

- (dovetails with the previous one): What kind of access patterns are you seeing today, more indie developers pushing few 1000s of vectors into a DB or some heavy users pushing 100K-1M+ vectors.

- Less of a question, but one thought would be to partner with labeling companies to automatically fine-tune embeddings as part of a single embeddings-management platform.

- Would you eventually look to build your own vector DB + metadata / features stores as part of the long-term strategy or try to integrate with existing ones?

jamesmcintyre · 2 years ago

EDIT: never mind, I didn't read your whole post, looks like you guys are working on an opensource option. Great!

Metal looks awesome. I've been comparing vector db solutions so your simple/abstracted sdk looks awesome. One thing I'd mention is with a solution like this that could be so critical to an apps functionality (and therefore so integrated into various parts of the app) I'd love to see that your team is vowing to give some sort of opensource self-hosted option. I want to root for any startup that is letting devs move faster in this area but there's a fear of committing to a solution that may pivot or be acquired/discontinued. Maybe even vowing a "safe-exit" for customers like I think rethinkdb did.

Good luck, looks awesome!

We agree with the sentiment; we’re currently figuring out the pieces we want to open source, as much of it is just infra (like the ingest pipeline). But the search server and some of our future work around memory will get open-sourced first.

m1117 · 2 years ago

This is similar to Pinecone/milvus, correct? What's the advantages of this compared to Pinecone/milvus?

We see ourselves a layer above vectorDB; we use Redis to index the data. We focused on building the ingest pipeline and operations on top of the embeddings, such as clustering and fine-tuning (embedding customization). Ultimately we want to provide the best developer experience possible, and we believe much work is needed here!

ChocoluvH · 2 years ago

haha. That case you might actually wanna consider FAISS/Milvus instead of Redis.

Ozzie_osman · 2 years ago

I think those assume you already have the embedding vector calculated, and they just store and retrieve the vectors.