Readit News logoReadit News
mmastrac · 2 years ago
Major props to the authors of this library. I re-built https://progscrape.com [1] on top of it last year, replacing an ancient Python2 AppEngine codebase that I had neglected for a while. It's a great library and insanely fast, as in indexing the entire library of 1M stories on a Raspberry Pi in seconds.

I'm able to host a service on a Pi at home with full-text search and a regular peak load of a few rps (not much, admittedly), with a CPU that barely spikes above a few percent. I've load tested searches on the Pi up to ~100rps and it held up. I keep thinking I should write up my experiences with it. It was pretty much a drop-in, super-useful library and the team was very responsive with bug reports, of which there were very few.

If you want to see how responsive the search is on such a small device, try clicking the labels on each story -- it's virtually instantaneous to query, and this is hitting up to 10 years * 12 months of search shards! https://progscrape.com/?search=javascript

I'd recommend looking at it over Lucene for modern projects. I am a big fan, as you might be able to tell. Given how well it scales on a tiny little ARM64, I'd wager your experiences on bigger iron will be even more fantastic.

[1] https://github.com/progscrape/progscrape

snorremd · 2 years ago
It is a very nice library. I’m using it for a very work in progress incremental email backup CLI tool for email providers using JMAP.

I wanted users to be able to search their backups. As I’m using Rust Tantivy looked like just the right thing for the job. Indexing happens so fast for an email I did not bother to move the work to a separate thread. And search across thousands of emails seems to be no problem.

If anyone wants search for their Rust application they should take a look at Tantivy.

CaptainOfCoit · 2 years ago
Tiny bug report: https://progscrape.com/?search=grep shows "Error: PersistError(UnexpectedError("Storage fetch panicked"))"
mmastrac · 2 years ago
It looks like there was a bug with certain search queries that wedged a mutex because they failed to parse on my end. Deploying a fix now. Thanks!
OtomotO · 2 years ago
Thanks for that! A couple of days ago I used meilisearch for a quick proof of concept, but I'll check out tantivy again via your repo.

I basically just need a fulltext search.

worble · 2 years ago
If you just need full text search, assuming you're already using Postgres you can get quite far just using it's own primitives

https://www.postgresql.org/docs/current/textsearch.html

https://www.crunchydata.com/blog/postgres-full-text-search-a...

adeptima · 2 years ago
Found recently Tantivy inside of ParadeDB (Postgres extension aiming to replace Elastic)

https://github.com/paradedb/paradedb/blob/dev/pg_search/Carg...

after listening

Extending Postgres for High Performance Analytics (with Philippe Noël) https://www.youtube.com/watch?v=NbOAEJrsbaM

And inside of the main thing - Quickwit(logs, traces, and soon metrics) https://github.com/quickwit-oss/quickwit

Had a surprisingly good experience with combined power of Quickwit and Clickhouse for multilingual search pet project. Finally something usable for Chinese, Japanese, Korean

https://quickwit.io/docs/guides/add-full-text-search-to-your...

to_tsvector in PG never worked well for my use cases

SELECT * FROM dump WHERE to_tsvector('english'::regconfig, hh_fullname) @@ to_tsquery('english'::regconfig, 'query');

Wish them to succeed. Will automatically upvote any post with Tantivy as keyword

fulmicoton · 2 years ago
Thank you so much for sharing!!!
tarasglek · 2 years ago
That's a cool design pattern combining url/rest based index and doing the search query entirely within sql. Can do same thing in postgres fdw
tylerkovacs · 2 years ago
I recently deployed Quickwit (based on Tantivy, from the same team) in production to index a few billion objects and have been very pleased with it. Indexing rates are fantastic. Query latency is competitive.

Perhaps most importantly, separation of compute and storage has proven invaluable. Being able to spin up a new search service over a few billion objects in object storage (complete with complex aggregations) without having to pay for long-running beefy servers has enabled some new use cases that otherwise would have been quite expensive. If/when the use case justifies beefy servers, Quickwit also provides an option to improve performance by caching data on each server.

Huge bonus: the team is very responsive and helpful on Discord.

fulmicoton · 2 years ago
Thank you @tyler!!!
karmakaze · 2 years ago
Another resource is a trigram search index (in Go) used by etsy/hound[0] based on an article (and code) from Russ Cox: Regular Expression Matching with a Trigram Index[1].

[0] https://github.com/hound-search/hound

[1] http://swtch.com/~rsc/regexp/regexp4.html

Different use-cases for alternatives to Lucene depending on your needs.

yencabulator · 2 years ago
Beware, you still cannot add/remove fields: https://github.com/quickwit-oss/tantivy/issues/470

The only way to add fields is to reindex all data into a different search index.

francoismassot · 2 years ago
One workaround is to use the JSON field, see doc https://github.com/quickwit-oss/tantivy/blob/main/doc/src/js...
kaathewise · 2 years ago
I was searching for a Meilisearch alternative (which sends out telemetry by default) and found Tantivy. It's more of a search engine builder, but the setup looks pretty simple [0].

[0]: https://github.com/quickwit-oss/tantivy-cli

ukuina · 2 years ago
QuickWit also sends telemetry by default: https://quickwit.io/docs/telemetry
OtomotO · 2 years ago
Hm, I am interested, but I would love to use it as a rust lib and just have rust types instead of some json config...

The java sdk of meilisearch was also nice, same: no need for a cli and manual configuration. I just pointed it to a db entity and indexed whole tables...

Would love that for tantivy

PSeitz · 2 years ago
> Hm, I am interested, but I would love to use it as a rust lib and just have rust types instead of some json config...

Yes that's how you use tantivy normally, not sure which json config you mean.

tantivy-cli is more like a showcase, https://github.com/quickwit-oss/tantivy is the actual project.

banish-m4 · 2 years ago
That's a petty objection to usable interactive search when it's easy to opt-out by adding a single command line argument.
soulofmischief · 2 years ago
OP is entitled to make political choices when selecting software.

Some of us have specific principles of which things like opt-out telemetry might run afoul.

OP will choose their software, I choose mine and you choose yours; none of us need to call each other petty or otherwise cast such negative judgement; a free market is a free market.

kaathewise · 2 years ago
It's a minor complaint, but I'm also evaluating it for a minor project. I just don't like the fact that I can forget to add a flag once and, oh, now I'm sending telemetry on my personal medical documents.
Nathanba · 2 years ago
also meilsearch is more like quickwit, their distributed offering but quickwit is AGPL
kernelsanderz · 2 years ago
Tantivy is also used in an interesting Vector Database product called LanceDb - https://lancedb.github.io/lancedb/fts/ to provide full text search capabilities. Last time I looked it was only through the python bindings, though I know they're looking to implement the rust bindings natively to support other platforms.
axegon_ · 2 years ago
I started working on a personal project a few years ago, after being insanely frustrated with the resource hog that is elasticsearch. That is coming from someone who's personal computer has more resources than what a number of generous startups allocate for their product. I opted for Tantivy for two reasons: one was my desire to do the whole thing in rust and second was Tantivy itself: performance is 10/10, documentation is second to none and the library is as ergonomic as they get. Sadly the project was a bite that was way too big for a single guy to handle in his spare time, so I abandoned it. Regardless, Tantivy is absolutely awesome.