This is a good article and seems well balanced despite being written by someone with a product that directly competes with Amazon S3. I particularly appreciated their attempt to reverse-engineer how S3 Vectors work, including this detail:
> Filtering looks to be applied after coarse retrieval. That keeps the index unified and simple, but it struggles with complex conditions. In our tests, when we deleted 50% of data, TopK queries requesting 20 results returned only 15—classic signs of a post-filter pipeline.
Things like this are why I'd much prefer if Amazon provided detailed documentation of how their stuff works, rather than leaving it to the development community to poke around and derive those details independently.
> Things like this are why I'd much prefer if Amazon provided detailed documentation of how their stuff works, rather than leaving it to the development community to poke around and derive those details independently.
Absolutely this. So much engineering time has been wasted on reverse-engineering internal details of things in AWS that could be easily documented. I once spent a couple days empirically determining how exactly cross-AZ least-outstanding-requests load balancing worked with AWS's ALB because the docs didn't tell me. Reverse-engineering can be fun (or at least I kinda enjoy it) but it's not a good use of our time and is one of those shadow costs of using the Cloud.
It's not like there's some secret sauce here in most of these implementation details (there aren't that many ways to design a load balancer). If there was, I'd understand not telling us. This is probably less an Apple-style culture of secrecy and more laziness and a belief that important details have been abstracted away from us users because "The Cloud" when in fact, these details do really matter for performance and other design decisions we have to make.
>It's not like there's some secret sauce here in most of these implementation details. If there was, I'd understand not telling us. This is probably less an Apple-style culture of secrecy and more laziness and a belief that important details have been abstracted away from us users because "The Cloud" when in fact, these details do really matter for performance and other design decisions we have to make.
Having worked inside AWS I can tell you one big reason is the attitude/fear that anything we put in out public docs may end up getting relied on by customers. If customers rely on the implementation to work in a specific way, then changing that detail requires a LOT more work to prevent breaking customer's workloads. If it is even possible at that point.
Did you have an account manager or support contract with AWS? IME, they're more than willing to set up a call with one of their engineers to disclose implementation details like this after your company signs an NDA.
> This is probably less an Apple-style culture of secrecy and more laziness and a belief that important details have been abstracted away from us users
As someone who had worked in providing infra to third parties, I can say that providing more detail than necessary will hurt your chances with some bigger customers. Giving them more information than they need or ask for makes your product look more complicated.
However sophisticated you think a customer of this product will be, go lower.
I have to assume that at this point its either intentional(increases profits?) or because AWS doesn't truly understand their own systems due to the culture of the company.
The alternative is to find solutions that can reasonably support different requirements because business needs change all the time especially in the current state of our industry. From what I’ve seen, OSS Postgres/pgvector can adequately support a wide variety of requirements for millions to low tens of millions of vectors - low latencies, hybrid search, filtered search, ability to serve out of memory and disk, strong-consistency/transactional semantics with operational data. For further scaling/performance (1B+ vectors and even lower latencies), consider SOTA Postgres system like AlloyDB with AlloyDB ScaNN.
Full disclosure: I founded ScaNN in GCP databases and am the lead for AlloyDB Semantic Search. And all these opinions are my own.
And what if they change their internal implementation and your code depends on the old architecture? It's good practice to clearly think about what to expose to users of your service.
If you can truly abstract away an internal detail, then great. But often there are design decisions that you cannot abstract away because they affect e.g. performance in a major way. For example, I don't care whether some AWS service is written in Java or Go or C++. I do care a bit about how its indexing and retrieval works, because I need to know that to plan my query workloads.
I actually think AWS did a reasonably good job of this with DynamoDB. Most of the performance tradeoffs, indexing etc., is pretty clear if you ready enough docs without exposing a ton of unnecessary internals.
Detailed documentation would allow for a fair comparison of competing products. Opaque documentation allows AWS to sell "business value" to upper management while proclaiming anyone asking for more detail isn't focused on what's important.
Yes, I’m the founder and maintainer of the Milvus project, and also a big fan of many AWS projects, including S3, Lambda, and Aurora. Personally, I don’t consider S3Vector to be among the best products in the S3 ecosystem, though I was impressed by its excellent latency control. It’s not particularly fast, nor is it feature-rich, but it seems to embody S3’s design philosophy: being “good enough” for certain scenarios.
In contrast, the products I’ve built usually push for extreme scalability and high performance. Beyond Milvus, I’ve also been deeply involved in the development of HBase and Oracle products. I hope more people will dive into the underlying implementation of S3Vector—this kind of discussion could greatly benefit both the search and storage communities and accelerate their growth.
By the way, if you’re not fully satisfied with S3Vector’s write, query, or recall performance, I’d encourage you to take a look at what we’ve built with Zilliz Cloud. It may not always be the lowest-cost option, but it will definitely meet your expectations when it comes to latency and recall.
While your technical analysis is excellent, making judgements about workload suitability based on a Preview release is premature. Preview services have historically had significantly lower performance quotas than GA releases. Lambda for example was limited to 50 concurrent executions during Preview, raised to 100 at GA, and now the default limit is 1,000.
"That gap isn’t just theoretical—it shows up in real bills."
"That’s not linear growth—it’s a quantum leap"
"The performance and recall were fantastic—but the costs were brutal"
"it’s not a one-size-fits-all solution—it’s the right tool for the right job."
"S3 Vectors is excellent for cold, cheap, low-QPS scenarios—but it’s not the engine you want to power a recommendation system"
"S3 Vectors doesn’t spell the end of vector databases—it confirms something many of us have been seeing for a while"
"that’s proof positive that vector storage is a real necessity—not just “indexes wrapped in a database."
"the vector database market isn’t being disrupted—it’s maturing into a tiered ecosystem where different solutions serve different performance and cost needs"
"The golden age of vector databases isn’t over—it’s just beginning."
"The bigger point is that Milvus is evolving into a system that’s not only efficient and scalable, but AI-native at its core—purpose-built for how modern applications actually work."
"I recently spoke with the CTO of a popular AI note-taking app who told me something surprising: they spend twice as much on vector search as they do on OpenAI API calls. Think about that for a second. Running the retrieval layer costs them more than paying for the LLM itself. That flips the usual assumption on its head." Hmm well start sending full documents as part of context see it flip back :).
Sry maybe should've being more clear it was a sarcastic remark. The whole point of doing vector db search is to feed LLM with very targeted context so you can save $ on API calls to LLM.
Anyone interested in this space should look at https://turbopuffer.com - I think they were first to market with S3 backed vector storage, and a good memory cache in front of it.
Turbopuffer is awesome, really recommend it. Also they have extra features like automatic recall tuning based on you data, option to choose read after write guarantees (trading latency for consistency or vice versa), BM25 search, filtering on the filed and many more.
Really recommend to check them out if you need a vector DB. I tried qdrant and zilli cloud solutions and in terms of operational simplicity turbopuffer just killing it.
At a glance, it looks like a lightweight vector database running on top of low-cost object storage—at a price point that is clearly attractive compared to many dedicated vector database solutions.
This may be because LanceDB is the most attractive with a price point of standard S3 storage ($0.023/GB vs $0.06/GB). I also like that Lancedb works with S3 compatible stores, such as Backblaze B2 which is even cheaper (~70% cheaper).
I love lancedb. It’s the only way I’ve found to performantly and cheaply serve 50m+ records of 768 dimensions. Runs on s3 a bit too slow, but on EFS can still be a few hundred millis.
Postgres has pgvector. Postgres is where all of my data already lives. It’s all open source and runs anywhere. What am I missing with the specialty vector stores?
latency, actual retrieval performance, integrated pipelines that do more than just vector search to produce better results, the list goes on.
Postgres for vector search is fine for toy products or stuff that's outside the hot loop of your business but for high performance applications it's just inadequate.
For the vast majority of applications, the trade off is worth keeping everything in Postgres vs operational overhead of some VC hype data store that won’t be around in 5 years. Most people learned this lesson with Mongo (postgrest jsonb is now good enough for 90% of scenarios).
> Not too long ago, AWS dropped something new: S3 Vectors. It’s their first attempt at a vector storage solution
Nitpick: AWS previously funded pgvector (the slow down in development indicates to me they have stopped). Their hosted database solutions supported the extension. That means RDS and Aurora were their first vector storage solutions.
> Filtering looks to be applied after coarse retrieval. That keeps the index unified and simple, but it struggles with complex conditions. In our tests, when we deleted 50% of data, TopK queries requesting 20 results returned only 15—classic signs of a post-filter pipeline.
Things like this are why I'd much prefer if Amazon provided detailed documentation of how their stuff works, rather than leaving it to the development community to poke around and derive those details independently.
Absolutely this. So much engineering time has been wasted on reverse-engineering internal details of things in AWS that could be easily documented. I once spent a couple days empirically determining how exactly cross-AZ least-outstanding-requests load balancing worked with AWS's ALB because the docs didn't tell me. Reverse-engineering can be fun (or at least I kinda enjoy it) but it's not a good use of our time and is one of those shadow costs of using the Cloud.
It's not like there's some secret sauce here in most of these implementation details (there aren't that many ways to design a load balancer). If there was, I'd understand not telling us. This is probably less an Apple-style culture of secrecy and more laziness and a belief that important details have been abstracted away from us users because "The Cloud" when in fact, these details do really matter for performance and other design decisions we have to make.
Having worked inside AWS I can tell you one big reason is the attitude/fear that anything we put in out public docs may end up getting relied on by customers. If customers rely on the implementation to work in a specific way, then changing that detail requires a LOT more work to prevent breaking customer's workloads. If it is even possible at that point.
As someone who had worked in providing infra to third parties, I can say that providing more detail than necessary will hurt your chances with some bigger customers. Giving them more information than they need or ask for makes your product look more complicated.
However sophisticated you think a customer of this product will be, go lower.
IME the implementation of ANN + metadata filtering is often the "secret sauce" behind many vector database implementations.
It feels that this true for proprietary software in general.
Full disclosure: I founded ScaNN in GCP databases and am the lead for AlloyDB Semantic Search. And all these opinions are my own.
I actually think AWS did a reasonably good job of this with DynamoDB. Most of the performance tradeoffs, indexing etc., is pretty clear if you ready enough docs without exposing a ton of unnecessary internals.
One should only "poke around" an abstraction like this for fun and curiosity and not with intention of putting the finding to real use.
Yes, I’m the founder and maintainer of the Milvus project, and also a big fan of many AWS projects, including S3, Lambda, and Aurora. Personally, I don’t consider S3Vector to be among the best products in the S3 ecosystem, though I was impressed by its excellent latency control. It’s not particularly fast, nor is it feature-rich, but it seems to embody S3’s design philosophy: being “good enough” for certain scenarios.
In contrast, the products I’ve built usually push for extreme scalability and high performance. Beyond Milvus, I’ve also been deeply involved in the development of HBase and Oracle products. I hope more people will dive into the underlying implementation of S3Vector—this kind of discussion could greatly benefit both the search and storage communities and accelerate their growth.
"That’s not linear growth—it’s a quantum leap"
"The performance and recall were fantastic—but the costs were brutal"
"it’s not a one-size-fits-all solution—it’s the right tool for the right job."
"S3 Vectors is excellent for cold, cheap, low-QPS scenarios—but it’s not the engine you want to power a recommendation system"
"S3 Vectors doesn’t spell the end of vector databases—it confirms something many of us have been seeing for a while"
"that’s proof positive that vector storage is a real necessity—not just “indexes wrapped in a database."
"the vector database market isn’t being disrupted—it’s maturing into a tiered ecosystem where different solutions serve different performance and cost needs"
"The golden age of vector databases isn’t over—it’s just beginning."
"The bigger point is that Milvus is evolving into a system that’s not only efficient and scalable, but AI-native at its core—purpose-built for how modern applications actually work."
Deleted Comment
Really recommend to check them out if you need a vector DB. I tried qdrant and zilli cloud solutions and in terms of operational simplicity turbopuffer just killing it.
https://turbopuffer.com/docs/query
Postgres for vector search is fine for toy products or stuff that's outside the hot loop of your business but for high performance applications it's just inadequate.
Nitpick: AWS previously funded pgvector (the slow down in development indicates to me they have stopped). Their hosted database solutions supported the extension. That means RDS and Aurora were their first vector storage solutions.