Readit News logoReadit News
emilfroberg commented on 1400 experiments to find the best RAG configuration   vectorview.ai/blog/optimi... · Posted by u/lukaspetersson
emilfroberg · 2 years ago
Great article Lukas! A key takeaway here is that Llama was the best LLM. I wouldn't have expected that.

(Disclaimer, Lukas and I are co-founders)

emilfroberg commented on Choosing vector database: a side-by-side comparison   benchmark.vectorview.ai/v... · Posted by u/emilfroberg
citruscomputing · 2 years ago
Strongly disagree with PGVector's DX being worse than Chroma. Installing, configuring, and working with Chroma was infuriating -- it's alpha software and has the bugs and rough edges to prove it. The tools to support and interface with postgres are battle-tested and so much nicer by comparison; getting Chroma working took over a week, ripping it out and replacing with PGVector took a couple hours.

Also agree with this[0] article that vector search is only one type of search, and even for RAG isn't necessarily the one you want to start with.

[0]: https://colinharman.substack.com/p/beware-tunnel-vision-in-a...

emilfroberg · 2 years ago
Thanks for your input, I've only tried Chroma a little bit so far and had a pretty good experience. What they also have going for them is a big community on discord that can be helpful.
emilfroberg commented on Choosing vector database: a side-by-side comparison   benchmark.vectorview.ai/v... · Posted by u/emilfroberg
magden · 2 years ago
I don't think we need specialized databases for vectors. Relational databases can easily be expanded by vector data types and operations. They will eventually catch up by supporting what was once a unique feature of the new system: https://medium.com/@magda7817/two-things-to-keep-in-mind-bef...
emilfroberg · 2 years ago
Yeah, maybe they will.. But for now, the best options are the purpose-built vector databases, so why not use them?
emilfroberg commented on Choosing vector database: a side-by-side comparison   benchmark.vectorview.ai/v... · Posted by u/emilfroberg
brigadier132 · 2 years ago
None of these vector dbs seem economical outside of enterprise.
emilfroberg · 2 years ago
Many of them are open source and you can host them yourself. That would make it more cost effective. Also someone mentioned https://turbopuffer.com/. That seems like a good alternative if you're looking for something economical.
emilfroberg commented on Choosing vector database: a side-by-side comparison   benchmark.vectorview.ai/v... · Posted by u/emilfroberg
Pandabob · 2 years ago
I've been wondering about Redis as vector database [0].

[0]: https://twitter.com/sh_reya/status/1661136833848438784

emilfroberg · 2 years ago
I quickly took a look at the redisearch ANN Benchmarks and they seem to stack up against the others (more or less same level as Milvus) in the comparison when it comes to QPS and Latency.
emilfroberg commented on Choosing vector database: a side-by-side comparison   benchmark.vectorview.ai/v... · Posted by u/emilfroberg
Havoc · 2 years ago
16x difference between pg and milvus?

I thought for most use cases this would be quite performance sensitive

emilfroberg · 2 years ago
Yeah, that's the difference we've seen according to the QPS for the ANN Benchmarks. The same story seems to be true for other datasets too. We're looking at a 0.9 recall.
emilfroberg commented on Choosing vector database: a side-by-side comparison   benchmark.vectorview.ai/v... · Posted by u/emilfroberg
dathinab · 2 years ago
Their definition about Hybrid Search is I think wrong.

Through this terms tend to not be consistently defined at all, so "wrong" is maybe the wrong word.

Their definition seem to be about filtering results during (approximate) KNN vector search.

But that is filtering, not hybrid search. Through it might sometimes be implemented as a form of hybrid search, but that's an internal implementation detail and you probably should hope it's not implemented that way.

Hybrid search is when you do both a vector search and a more classical text based search (e.g. bm25) and combine both results in a reasonable way.

emilfroberg · 2 years ago
The way you explain hybrid search aligns with my understanding. Pinecone has a good article about it here https://www.pinecone.io/learn/hybrid-search-intro/. From my understanding, all vector DBs support this.
emilfroberg commented on Choosing vector database: a side-by-side comparison   benchmark.vectorview.ai/v... · Posted by u/emilfroberg
andre-z · 2 years ago
I'm curious where got the numbers on qps? They are pretty different from our experience. Reached out on LinkedIn. ;)
emilfroberg · 2 years ago
Happy to connect. The benchmark numbers are mostly from ANN Benchmarks. For my use case, the nytimes-256 dataset was most relevant so I used that for the QPS benchmark. I also took a look at the benchmarks you've made at https://qdrant.tech/benchmarks/ and there qdrant seems to be outperforming many others. If I've gotten something wrong here, I'm glad to update the article :)
emilfroberg commented on Choosing vector database: a side-by-side comparison   benchmark.vectorview.ai/v... · Posted by u/emilfroberg
donretag · 2 years ago
Curious about the lack of Vespa, especially given the thoroughness of the article and its long-time reputation. OpenSearch is also missing, but perhaps it can be considered being lumped in with Elasticsearch due to them both being based on Lucene. The products are starting to diverge, so would be nice to see, especially since it is open-source.

For the performance-based columns, would be also helpful to see which versions were tested. There is so much attention lately for vector databases, that they all are making great strides forward. The Lucene updates are notable.

emilfroberg · 2 years ago
Someone else also pointed out that Vespa was missing. I'll have to look in to it and add it to the article!
emilfroberg commented on Choosing vector database: a side-by-side comparison   benchmark.vectorview.ai/v... · Posted by u/emilfroberg
NicoJuicy · 2 years ago
I'm actually curious on how the new vector DB from cloudflare compares.
emilfroberg · 2 years ago
Me too! Couldn't find a lot of information on it yet, but I might have to try it myself to get some benchmarks

u/emilfroberg

KarmaCake day95September 11, 2020
About
Co-founder at Vectorview.ai
View Original