Major props to the authors of this library. I re-built https://progscrape.com [1] on top of it last year, replacing an ancient Python2 AppEngine codebase that I had neglected for a while. It's a great library and insanely fast, as in indexing the entire library of 1M stories on a Raspberry Pi in seconds.
I'm able to host a service on a Pi at home with full-text search and a regular peak load of a few rps (not much, admittedly), with a CPU that barely spikes above a few percent. I've load tested searches on the Pi up to ~100rps and it held up. I keep thinking I should write up my experiences with it. It was pretty much a drop-in, super-useful library and the team was very responsive with bug reports, of which there were very few.
If you want to see how responsive the search is on such a small device, try clicking the labels on each story -- it's virtually instantaneous to query, and this is hitting up to 10 years * 12 months of search shards! https://progscrape.com/?search=javascript
I'd recommend looking at it over Lucene for modern projects. I am a big fan, as you might be able to tell. Given how well it scales on a tiny little ARM64, I'd wager your experiences on bigger iron will be even more fantastic.
It is a very nice library. I’m using it for a very work in progress incremental email backup CLI tool for email providers using JMAP.
I wanted users to be able to search their backups. As I’m using Rust Tantivy looked like just the right thing for the job. Indexing happens so fast for an email I did not bother to move the work to a separate thread. And search across thousands of emails seems to be no problem.
If anyone wants search for their Rust application they should take a look at Tantivy.
Had a surprisingly good experience with combined power of Quickwit and Clickhouse for multilingual search pet project. Finally something usable for Chinese, Japanese, Korean
I recently deployed Quickwit (based on Tantivy, from the same team) in production to index a few billion objects and have been very pleased with it. Indexing rates are fantastic. Query latency is competitive.
Perhaps most importantly, separation of compute and storage has proven invaluable. Being able to spin up a new search service over a few billion objects in object storage (complete with complex aggregations) without having to pay for long-running beefy servers has enabled some new use cases that otherwise would have been quite expensive. If/when the use case justifies beefy servers, Quickwit also provides an option to improve performance by caching data on each server.
Huge bonus: the team is very responsive and helpful on Discord.
Another resource is a trigram search index (in Go) used by etsy/hound[0] based on an article (and code) from Russ Cox: Regular Expression Matching with a Trigram Index[1].
I was searching for a Meilisearch alternative (which sends out telemetry
by default) and found Tantivy. It's more of a search engine builder,
but the setup looks pretty simple [0].
Hm, I am interested, but I would love to use it as a rust lib and just have rust types instead of some json config...
The java sdk of meilisearch was also nice, same: no need for a cli and manual configuration. I just pointed it to a db entity and indexed whole tables...
OP is entitled to make political choices when selecting software.
Some of us have specific principles of which things like opt-out telemetry might run afoul.
OP will choose their software, I choose mine and you choose yours; none of us need to call each other petty or otherwise cast such negative judgement; a free market is a free market.
It's a minor complaint, but I'm also evaluating it for a minor project.
I just don't like the fact that I can forget to add a flag once and, oh,
now I'm sending telemetry on my personal medical documents.
Tantivy is also used in an interesting Vector Database product called LanceDb - https://lancedb.github.io/lancedb/fts/ to provide full text search capabilities. Last time I looked it was only through the python bindings, though I know they're looking to implement the rust bindings natively to support other platforms.
I started working on a personal project a few years ago, after being insanely frustrated with the resource hog that is elasticsearch. That is coming from someone who's personal computer has more resources than what a number of generous startups allocate for their product. I opted for Tantivy for two reasons: one was my desire to do the whole thing in rust and second was Tantivy itself: performance is 10/10, documentation is second to none and the library is as ergonomic as they get. Sadly the project was a bite that was way too big for a single guy to handle in his spare time, so I abandoned it. Regardless, Tantivy is absolutely awesome.
I'm able to host a service on a Pi at home with full-text search and a regular peak load of a few rps (not much, admittedly), with a CPU that barely spikes above a few percent. I've load tested searches on the Pi up to ~100rps and it held up. I keep thinking I should write up my experiences with it. It was pretty much a drop-in, super-useful library and the team was very responsive with bug reports, of which there were very few.
If you want to see how responsive the search is on such a small device, try clicking the labels on each story -- it's virtually instantaneous to query, and this is hitting up to 10 years * 12 months of search shards! https://progscrape.com/?search=javascript
I'd recommend looking at it over Lucene for modern projects. I am a big fan, as you might be able to tell. Given how well it scales on a tiny little ARM64, I'd wager your experiences on bigger iron will be even more fantastic.
[1] https://github.com/progscrape/progscrape
I wanted users to be able to search their backups. As I’m using Rust Tantivy looked like just the right thing for the job. Indexing happens so fast for an email I did not bother to move the work to a separate thread. And search across thousands of emails seems to be no problem.
If anyone wants search for their Rust application they should take a look at Tantivy.
I basically just need a fulltext search.
https://www.postgresql.org/docs/current/textsearch.html
https://www.crunchydata.com/blog/postgres-full-text-search-a...
https://github.com/paradedb/paradedb/blob/dev/pg_search/Carg...
after listening
Extending Postgres for High Performance Analytics (with Philippe Noël) https://www.youtube.com/watch?v=NbOAEJrsbaM
And inside of the main thing - Quickwit(logs, traces, and soon metrics) https://github.com/quickwit-oss/quickwit
Had a surprisingly good experience with combined power of Quickwit and Clickhouse for multilingual search pet project. Finally something usable for Chinese, Japanese, Korean
https://quickwit.io/docs/guides/add-full-text-search-to-your...
to_tsvector in PG never worked well for my use cases
SELECT * FROM dump WHERE to_tsvector('english'::regconfig, hh_fullname) @@ to_tsquery('english'::regconfig, 'query');
Wish them to succeed. Will automatically upvote any post with Tantivy as keyword
Perhaps most importantly, separation of compute and storage has proven invaluable. Being able to spin up a new search service over a few billion objects in object storage (complete with complex aggregations) without having to pay for long-running beefy servers has enabled some new use cases that otherwise would have been quite expensive. If/when the use case justifies beefy servers, Quickwit also provides an option to improve performance by caching data on each server.
Huge bonus: the team is very responsive and helpful on Discord.
[0] https://github.com/hound-search/hound
[1] http://swtch.com/~rsc/regexp/regexp4.html
Different use-cases for alternatives to Lucene depending on your needs.
The only way to add fields is to reindex all data into a different search index.
[0]: https://github.com/quickwit-oss/tantivy-cli
The java sdk of meilisearch was also nice, same: no need for a cli and manual configuration. I just pointed it to a db entity and indexed whole tables...
Would love that for tantivy
Yes that's how you use tantivy normally, not sure which json config you mean.
tantivy-cli is more like a showcase, https://github.com/quickwit-oss/tantivy is the actual project.
Some of us have specific principles of which things like opt-out telemetry might run afoul.
OP will choose their software, I choose mine and you choose yours; none of us need to call each other petty or otherwise cast such negative judgement; a free market is a free market.