If you’re unfamiliar with embeddings, they are representations of real world data expressed as a vector, where the position of the vector can be compared to other vectors – thereby deriving meaning from the data. They can be used to create things like semantic search, recommender systems, clustering analysis, classification, and more.
Working at companies like Datadog, Meta, and Spotify, we found it frustrating to build ML apps. Lack of tooling, infrastructure, and proper abstraction made working with ML tedious and slow. To get features out the door we’ve had to build data ingestion pipelines from scratch, manually maintain live customer datasets, build observability to measure drift, manage no-downtime deployments, and the list goes on. It took months to get simple features in front of users and the developer experience was terrible.
OpenAI, Hugging Face and others have brought models to the masses, but the developer experience still needs to be improved. To actually use embeddings, hitting APIs like OpenAI is just one piece of the puzzle. You also need to figure out storage, create indexes, maintain data quality through fine-tuning, manage versions, code operations on top of your data, and create APIs to consume it. All of this friction makes it a pain to ship live applications.
Metal solves these problems by providing an end-to-end platform for embeddings. Here’s how it works:
Data In: You send data to our system via our SDK or API. Data can be text, images, PDFs, or raw embeddings. When data hits our pipeline we preprocess by extracting the text from documents and chunking when necessary. We then generate embeddings using the selected model. If the index has fine-tuning transformation, we transform the embedding into the new vector space so it matches the target data. We then store the embeddings in cold storage for any needed async jobs.
From there we index the embeddings for querying. We use HSNW right now, but are planning to support FLAT indexes as well. We currently index in Redis, but plan to make this configurable and provide more options for datastores.
Data Out: We provide querying endpoints to hit the indexes, finding the ANN. For fine-tuned indexes, we generate embeddings from the base model used and then transform the embedding into the new vector space during the pre-query phase.
Additionally, we provide methods to run clustering jobs on the stored embeddings and visualizations in the UI. We are experimenting with zero-shot classification, by embedding the classes and matching to each embedding in the closest class, allowing us to provide a “classify” method in our SDK. We would love feedback on what other async job types would be useful!
Examples of what users have built so far include embedding product catalogs for improved similarity search, personalized in-app messaging with user behavior clusters, and similarity search on images for content creators.
Metal has a free tier that anyone can use, a developer tier for $20/month, and an enterprise tier with custom pricing. We’re currently building an open source product that will be released soon.
Most importantly, we’re sharing Metal with the HN community because we want to build the best developer experience possible, and the only metric we care about is live apps on prod. We’d love to hear your feedback, experiences with embeddings, and your ideas for how we can improve the product. Looking forward to your comments, thank you!
This is probably more feature creep than you can or want to sign up for at the moment, but I also don't really want to deal with manually transforming my Markdown or HTML into the sections of text that you use as input for embeddings. It would be nice if I could just provide URLs to my live documentation or Markdown source code, and your service takes a best guess at how to split it up into sections and then generate embeddings for each of those sections.
Last, I would be happy to talk to you all about docs strategy for your own docs sometime (I'm not looking for work at the moment; I just enjoy helping people with this stuff). You can contact me and learn more about my background via the social links on https://technicalwriting.tools (a blog about technical writing tooling topics that I just spun up).
Good luck!
I get the benefit over Pinecone (which wasn't built with LLMs, etc in mind)
How does this compare to Chroma? Feels like it has most of what you're talking about, and already has an open source product live.
https://www.trychroma.com/
What do you mean?
Pinecone was specifically made to be used alongside LLMs and other embedding models. That’s how anyone uses Pinecone.
It's easy enough to define a docker compose file, and deploy it to my environments.
Do you have this product A and what the price?
which means need to get the latest price and quantity_available field.
Is this possible to do with Metal?
Clustering with sci kit-learn is… easy. Indexing in FAISS is… easy. Maybe it’s hard if you use Rust and it was hard to do this in Pythoh 5 years ago. Dilbert’s Boss probably thinks it is hard but he got fired…
Few questions/thoughts: - What kind of overheads do you have right now with calling this API?
- What scales have you pressure-tested this with? Demo seems to show few 100s of embeddings. Selfishly, I'd like to see a demo of handling 10M+ vectors to be reasonably certain that any company can truly build infrastructure in this context. I guess I'm more interested in the out-of-core applications where I can really shove all my data in here, and see if the system can handle it.
- (dovetails with the previous one): What kind of access patterns are you seeing today, more indie developers pushing few 1000s of vectors into a DB or some heavy users pushing 100K-1M+ vectors.
- Less of a question, but one thought would be to partner with labeling companies to automatically fine-tune embeddings as part of a single embeddings-management platform.
- Would you eventually look to build your own vector DB + metadata / features stores as part of the long-term strategy or try to integrate with existing ones?
Metal looks awesome. I've been comparing vector db solutions so your simple/abstracted sdk looks awesome. One thing I'd mention is with a solution like this that could be so critical to an apps functionality (and therefore so integrated into various parts of the app) I'd love to see that your team is vowing to give some sort of opensource self-hosted option. I want to root for any startup that is letting devs move faster in this area but there's a fear of committing to a solution that may pivot or be acquired/discontinued. Maybe even vowing a "safe-exit" for customers like I think rethinkdb did.
Good luck, looks awesome!