This is what we’re using. We already sync database content to a typesense DB for regular search so it wasn’t much more work to add in embeddings and now we can do semantic search.
I was using pinecone before installing pgvector in Postgres. Pinecone works and all but having the vectors in Postgres resulted in an explosion of use for us. Full relational queries with where clauses and order by etc AND vector embeddings is wicked.
Why do you use pgvector instead of pgANN? My understanding is pgANN is built with FAISS. When I compared pgvector with FAISS, pgvector was 3-5x slower.
There is certainly a wide variety of problems today for which pgvector is unsuitable due to performance limitations... but fear not! This is an area that is getting significant focus right now.
This hits home, it is a big ask to keep data in sync for yet another store. We already balance MS SQL and Algolia and all the plumbing required to catch updates, deletes, etc. adding another feels like a bridge too far. Hopefully MS will get on this train at some point and catch up to postgres.
Speaking of the repo, they have a number of features they want to add if anyone is interested in contributing, there's lots of room for advancement. Many of these features already have active branches
https://github.com/pgvector/pgvector/issues/27
I honestly hope they use it to improve their documentation. I consider myself and pretty adept developer but without much background in AI and was looking for a solution for building out a recommendation engine and ended up at Pinecone.
Maybe I'm not the target audience but after spending some time poking around I couldn't honestly couldn't even figure out how to use it.
Even a simple Goole Search for "What is a Vector Database" ends up with this page.
Pinecone is a vector database that makes it easy for developers to add vector-search features to their applications
Um okay... what's "vector-search"? For that sake what the eff is a "Vector" to begin with? Finally after getting about a third down the page we start defining what a vector is....
Maybe I'm not their target audience but I ended up poking around for about an hour or two before just throwing up my hands and thinking it wasn't for me.
Ended up just sticking with Algolia, since we had them in place for Search anyway...
Respectfully if you don’t know what a vector is, you probably don’t need a vector DB.
When they say “vector-search” they mean semantic search. I.e. “which document is the most semantically similar to the query text”.
So how do we establish semantic similarity?
In a database like Elasticsearch, you store text and the DB indexes the text so you can search.
In a vector DB you don’t just store the raw text, you store a vectorized version of the text.
A vector can be thought of as an array of numbers. To get a vector representation we need some way to take a string and map it to an array while also capturing the notion of semantics.
This is hard, but machine learning models save the day! The first popular model used for this was called “word2vec” while a more modern model is BERT.
These take an input like “fish” and output a vector like [ 3.12 … 4.092 ] (with over a thousand more elements).
So let’s say we have a sentence that we vectorized and we want to compare some user input to see how similar it is to that sentence. How?
If we call our sentence A and the input vector B, we can compute a number between zero and one that tells us how similar they are.
This is called cosine similarity and is computed by taking the dot product of the two vectors and dividing by both of their magnitudes.
When you load a bunch of vectors in a vector DB, the principal operation you will perform is “give me the top K documents that are similar to the input”. The databases indexing process computes k nearest neighbors algorithm on all vectors in the DB and stores this for use at query time.
Without the indexing process there is no real difference between a vector db and key value store.
Respectfully if you don’t know what a vector is, you probably don’t need a vector DB.
I wasn't looking for one ;-) I was looking for a recommendation engine, similarly most often I'm looking for various ways to use ML and AI to improve various features and workflows.
Which I guess is my point, I don't know who Pinecone's target market is but from following this thread it seems like all the folks who know how to do what they do have alternatives that suit them better. If they are targeting folks like me they're not doing it well.
Pinecone's examples[1] (hat tip to Jurrasic in this thread - I've seen these) all show potential use cases that I might want to leverage, but when you dive into them (for example the Movie Recommender[2] - my use case) I end up with this:
The user_model and movie_model are trained using Tensorflow Keras. The user_model transforms a given user_id into a 32-dimensional embedding in the same vector space as the movies, representing the user’s movie preference. The movie recommendations are then fetched based on proximity to the user’s location in the multi-dimensional space.
It took me another 5 minutes of googling stuff to parse that sentence. And while I could easily get the examples to run I was still running back and forth to Google to figure out what it was doing in the examples - again the documentation is poor here. I'm not a Python dev but I could follow it but I still had to google tqdm to figure out it was a progress bar library?
Also, and this is not unique to Pinecone, I've found generally that while some things are fairly well documented on "Here's how to build a Movie Recommender based on these datasets) frequently in this space there's very little data on how to build a model using your own datasets ie how to take this example and do it with your own data.
Perfect example of AI gold rush nonsense. Pinecone has zero moat and quite a few free alternatives (Faiss, Weviate, pg-vector). Their biggest selling point is that AI hype train people don’t Google “alternatives to pinecone” when cloning the newest trending repo (or I guess, ask ChatGPT).
> Pinecone has zero moat and quite a few free alternatives (Faiss, Weviate, pg-vector)
Faiss is a collection of algorithms for in-memory exact and approximate high-dimensional (e.g., > ~30 dimensional) dense vector k-nearest neighbor, it doesn't add or really consider persistence (beyond full index serialization to an in memory or on disk binary blob), fault tolerance, replication, domain-specific autotuning and the like. The "vector database" companies like Pinecone, Weviate, Zilliz and what not will add these other features to turn them into a complete service, they're not really the same. pgvector seems to be DB-backed IndexFlat and IndexIVFFlat (?) from the Faiss library at present but is of course not a complete service.
However which kind of approximate indexing you want to use very much depends upon the data you're indexing, and where in the tradeoff space between latency, throughput, encoding accuracy, NN recall and memory/disk consumption you want to be (these are the fundamental tradeoffs in the vector search domain), and whether you are performing batched queries or not. To access the full range of tradeoffs you'd need to use all of the options which are available in Faiss or similar low-level libraries which may be difficult to use or require knowledge of underlying algorithms.
Spot on. There is zero moat and the self-hosted alternatives are rapidly improving (if not better) than Pinecone. There are good open-source contributions coming from bigcorp beyond Meta too, e.g., DiskANN (https://github.com/microsoft/DiskANN).
Maybe I am fundamentally missing something, but a "cloud database company" seems like the most boring tech? No one is calling Planetscale or Yugabyte nonsense because there are free alternatives like Postgres.
Is it possible Andreessen are misunderstanding how pinecone/vector dbs are used? It seems like they are pitching it as "memory for large language models" or something. Are people using vector db's in some way I'm not aware of? To me it's a database to help you do a semantic search. A multi-token string is converted into a single embedding. Like maybe 1000 words into one embedding. This is helpful because you can quickly find the relevant parts of a document to answer a question and there are token limits into an LLM, but the idea that it's helping the LLM keep state or something seems off?
Is it possible they are confusing the use of embeddings across whole swaths of text to do a semantic search with the embeddings that happen on a per token basis as data runs through an LLM? Same word, same basic idea, but used so differently that they may as well be different words?
I might be mistaken, but my understanding from having played around with LangChain for a couple months is that because you’ve got to keep all your state in the context window, giving the model access to a vectorstore containing the entire chat history allows it to retrieve relevant messages against your query that can then be stuffed or mapreduced into the context window for the next response.
The alternative - and I believe the way the ChatGPT web app currently works - is just to stuff/mapreduce user input and machine response into the context window on each step, which quickly gets quite lossy.
You aren't mistaken. Keeping state, or storing memories, is where it's at with prompts. The trick is knowing what to remember and what to forget.
I consider vector engines to be "hot" models, given they are storing the vector representations of text already run through the "frozen" model.
Having written something a while back that indexes documents and enters into discussion with them, I'm pretty sure ChatGPT is using some type of embedding lookup/match/distance on the history in the window. That means not all text is submitted at the next entry, but whatever mostly matches what is entered by the user (in vector space) is likely pulled in and sent over in the final prompt.
It also has a more basic version that just keeps a log of past messages.
I don't know whether there's a way (or even a need) to combine these approaches. In a long conversation, it might be useful to trust more recent information more than earlier messages, but Langchain's vector memory doesn't care about sequence.
Same with opensearch and elasticsearch, both of which have added vector search as well (slight differences between their implementations). And since vector search is computationally expensive, there is a lot of value in narrowing down your result set with a regular query before calculating the best matches from the narrowed down result set.
From what I've seen, the big limitation currently is dimensionality. Most of the more advanced models have a high dimensionality and especially Elasticsearch and Lucene limit the dimensionality to 1024. E.g. several of the openai models have a much higher dimensionality. Opensearch works around this by supporting alternate implentations to lucene for vectors.
Of course it's a sane limitation from a cost and computation point of view, having these huge embeddings doesn't scale that well. But it does limit the quality of the results unless you can train your own models and tailor them to your use case.
If you are curious on how to use this stuff, I invested some time a few weeks ago getting my kt-search kotlin library to support this and wrote some documentation for this: https://jillesvangurp.github.io/kt-search/manual/KnnSearch.h.... The quality was underwhelming IMHO but that might be my complete lack of experience with this stuff.
I have no experience with pinecone and I'm sure it's great. But I do share the sentiment that they might not come out on top for this. There are too many players here and it's a fast moving field. OpenAI just majorly moved the whole field forward enormously in terms of what is possible and feasible.
The key thing is that it's in-memory and allows you to combine attribute-based filtering, together with nearest-neighbor search.
We're also working on a way to automatically generate embeddings from within Typesense using any ML models of your choice.
So Algolia + Pinecone + Open Source + Self-Hostable with a cloud hosted option = Typesense
https://github.com/netrasys/pgANN
A marqo.ai dev is currently working on adding HNSW-IVF and HNSW support https://news.ycombinator.com/item?id=35551684 and the maintainer has recently noted that they are actively working on an IVFPQ/ScaNN implementation https://github.com/pgvector/pgvector/issues/93
The pgAnn creator actually asked about performance a month ago here https://github.com/pgvector/pgvector/issues/58
Expect to see performance improve dramatically later this year.
Supabase was also asking for sparse vectors https://github.com/pgvector/pgvector/issues/81
Speaking of the repo, they have a number of features they want to add if anyone is interested in contributing, there's lots of room for advancement. Many of these features already have active branches https://github.com/pgvector/pgvector/issues/27
recent vector database fundraises:
- Chroma - $18M seed https://www.trychroma.com/blog/seed
- Weaviate - $50m A https://www.theinformation.com/articles/index-ventures-leads...
- Pinecone - $100M B
Harder to set up than wrappers like Chroma, but very powerful.
Maybe I'm not the target audience but after spending some time poking around I couldn't honestly couldn't even figure out how to use it.
Even a simple Goole Search for "What is a Vector Database" ends up with this page.
https://www.pinecone.io/learn/vector-database/#what-is-a-vec...
Pinecone is a vector database that makes it easy for developers to add vector-search features to their applications
Um okay... what's "vector-search"? For that sake what the eff is a "Vector" to begin with? Finally after getting about a third down the page we start defining what a vector is....
Maybe I'm not their target audience but I ended up poking around for about an hour or two before just throwing up my hands and thinking it wasn't for me.
Ended up just sticking with Algolia, since we had them in place for Search anyway...
When they say “vector-search” they mean semantic search. I.e. “which document is the most semantically similar to the query text”.
So how do we establish semantic similarity?
In a database like Elasticsearch, you store text and the DB indexes the text so you can search.
In a vector DB you don’t just store the raw text, you store a vectorized version of the text.
A vector can be thought of as an array of numbers. To get a vector representation we need some way to take a string and map it to an array while also capturing the notion of semantics.
This is hard, but machine learning models save the day! The first popular model used for this was called “word2vec” while a more modern model is BERT.
These take an input like “fish” and output a vector like [ 3.12 … 4.092 ] (with over a thousand more elements).
So let’s say we have a sentence that we vectorized and we want to compare some user input to see how similar it is to that sentence. How?
If we call our sentence A and the input vector B, we can compute a number between zero and one that tells us how similar they are.
This is called cosine similarity and is computed by taking the dot product of the two vectors and dividing by both of their magnitudes.
When you load a bunch of vectors in a vector DB, the principal operation you will perform is “give me the top K documents that are similar to the input”. The databases indexing process computes k nearest neighbors algorithm on all vectors in the DB and stores this for use at query time.
Without the indexing process there is no real difference between a vector db and key value store.
I wasn't looking for one ;-) I was looking for a recommendation engine, similarly most often I'm looking for various ways to use ML and AI to improve various features and workflows.
Which I guess is my point, I don't know who Pinecone's target market is but from following this thread it seems like all the folks who know how to do what they do have alternatives that suit them better. If they are targeting folks like me they're not doing it well.
Pinecone's examples[1] (hat tip to Jurrasic in this thread - I've seen these) all show potential use cases that I might want to leverage, but when you dive into them (for example the Movie Recommender[2] - my use case) I end up with this:
The user_model and movie_model are trained using Tensorflow Keras. The user_model transforms a given user_id into a 32-dimensional embedding in the same vector space as the movies, representing the user’s movie preference. The movie recommendations are then fetched based on proximity to the user’s location in the multi-dimensional space.
It took me another 5 minutes of googling stuff to parse that sentence. And while I could easily get the examples to run I was still running back and forth to Google to figure out what it was doing in the examples - again the documentation is poor here. I'm not a Python dev but I could follow it but I still had to google tqdm to figure out it was a progress bar library?
Also, and this is not unique to Pinecone, I've found generally that while some things are fairly well documented on "Here's how to build a Movie Recommender based on these datasets) frequently in this space there's very little data on how to build a model using your own datasets ie how to take this example and do it with your own data.
[1] https://docs.pinecone.io/docs/examples
[2] https://docs.pinecone.io/docs/movie-recommender
Faiss is a collection of algorithms for in-memory exact and approximate high-dimensional (e.g., > ~30 dimensional) dense vector k-nearest neighbor, it doesn't add or really consider persistence (beyond full index serialization to an in memory or on disk binary blob), fault tolerance, replication, domain-specific autotuning and the like. The "vector database" companies like Pinecone, Weviate, Zilliz and what not will add these other features to turn them into a complete service, they're not really the same. pgvector seems to be DB-backed IndexFlat and IndexIVFFlat (?) from the Faiss library at present but is of course not a complete service.
However which kind of approximate indexing you want to use very much depends upon the data you're indexing, and where in the tradeoff space between latency, throughput, encoding accuracy, NN recall and memory/disk consumption you want to be (these are the fundamental tradeoffs in the vector search domain), and whether you are performing batched queries or not. To access the full range of tradeoffs you'd need to use all of the options which are available in Faiss or similar low-level libraries which may be difficult to use or require knowledge of underlying algorithms.
(I'm the author of the GPU half of Faiss)
I'm shocked.
Deleted Comment
Is it possible they are confusing the use of embeddings across whole swaths of text to do a semantic search with the embeddings that happen on a per token basis as data runs through an LLM? Same word, same basic idea, but used so differently that they may as well be different words?
The alternative - and I believe the way the ChatGPT web app currently works - is just to stuff/mapreduce user input and machine response into the context window on each step, which quickly gets quite lossy.
I consider vector engines to be "hot" models, given they are storing the vector representations of text already run through the "frozen" model.
Having written something a while back that indexes documents and enters into discussion with them, I'm pretty sure ChatGPT is using some type of embedding lookup/match/distance on the history in the window. That means not all text is submitted at the next entry, but whatever mostly matches what is entered by the user (in vector space) is likely pulled in and sent over in the final prompt.
https://python.langchain.com/en/latest/modules/memory/types/...
It also has a more basic version that just keeps a log of past messages.
I don't know whether there's a way (or even a need) to combine these approaches. In a long conversation, it might be useful to trust more recent information more than earlier messages, but Langchain's vector memory doesn't care about sequence.
From what I've seen, the big limitation currently is dimensionality. Most of the more advanced models have a high dimensionality and especially Elasticsearch and Lucene limit the dimensionality to 1024. E.g. several of the openai models have a much higher dimensionality. Opensearch works around this by supporting alternate implentations to lucene for vectors.
Of course it's a sane limitation from a cost and computation point of view, having these huge embeddings doesn't scale that well. But it does limit the quality of the results unless you can train your own models and tailor them to your use case.
If you are curious on how to use this stuff, I invested some time a few weeks ago getting my kt-search kotlin library to support this and wrote some documentation for this: https://jillesvangurp.github.io/kt-search/manual/KnnSearch.h.... The quality was underwhelming IMHO but that might be my complete lack of experience with this stuff.
I have no experience with pinecone and I'm sure it's great. But I do share the sentiment that they might not come out on top for this. There are too many players here and it's a fast moving field. OpenAI just majorly moved the whole field forward enormously in terms of what is possible and feasible.
It quite angers me that people (on HN) will consider the following to be benefits worth mentioning as pros to the consumer:
>Sleek/shiny finish
>Marketing/Branding
>Ability to Monetize
We arent shareholders, all 3 of these are bad for the customer.