qdequelen (u/qdequelen)

qdequelen commented on Tantivy – full-text search engine library inspired by Apache Lucene github.com/quickwit-oss/t... · Posted by u/kaathewise

PSeitz · a year ago

They serve quite different use cases.

quickwit was built to handle extremely large data volumes, you can ingest and search TB and PB of logs.

meilisearches indexing doesn't scale as it will become slower the more data you have, e.g. I failed to ingest 7GB of data.

qdequelen · a year ago

Hey PSeitz, Meilisearch CEO here. Sorry to hear that you failed to index a low volume of data. When did you last try Meilisearch? We have made significant improvements in the indexing speed. We have a customer with hundreds of gigabytes of raw data on our cloud, and it scales amazingly well. https://x.com/Kerollmops/status/1772575242885484864

qdequelen commented on Nvidia's Chat with RTX is an AI chatbot that runs locally on your PC theverge.com/2024/2/13/24... · Posted by u/nickthegreek

westurner · 2 years ago

From "Artificial intelligence is ineffective and potentially harmful for fact checking" (2023) https://news.ycombinator.com/item?id=37226233 : pdfgpt, knowledge_gpt, elasticsearch :

> Are LLM tools better or worse than e.g. meilisearch or elasticsearch for searching with snippets over a set of document resources?

> How does search compare to generating things with citations?

pdfGPT: https://github.com/bhaskatripathi/pdfGPT :

> PDF GPT allows you to chat with the contents of your PDF file by using GPT capabilities.

GH "pdfgpt" topic: https://github.com/topics/pdfgpt

knowledge_gpt: https://github.com/mmz-001/knowledge_gpt

From https://news.ycombinator.com/item?id=39112014 : paperai

neuml/paperai: https://github.com/neuml/paperai :

> Semantic search and workflows for medical/scientific papers

RAG: https://news.ycombinator.com/item?id=38370452

Google Desktop (2004-2011): https://en.wikipedia.org/wiki/Google_Desktop :

> Google Desktop was a computer program with desktop search capabilities, created by Google for Linux, Apple Mac OS X, and Microsoft Windows systems. It allowed text searches of a user's email messages, computer files, music, photos, chats, Web pages viewed, and the ability to display "Google Gadgets" on the user's desktop in a Sidebar

GNOME/tracker-miners: https://gitlab.gnome.org/GNOME/tracker-miners

src/miners/fs: https://gitlab.gnome.org/GNOME/tracker-miners/-/tree/master/...

SPARQL + SQLite: https://gitlab.gnome.org/GNOME/tracker-miners/-/blob/master/...

https://news.ycombinator.com/item?id=38355385 : LocalAI, braintrust-proxy; promptfoo, chainforge, mixtral

qdequelen · 2 years ago

> Are LLM tools better or worse than e.g. meilisearch or elasticsearch for searching with snippets over a set of document resources?

Absolutely worse, LLM are not made for it at all.

qdequelen commented on Meilisearch v1.6 blog.meilisearch.com/meil... · Posted by u/Culonavirus

qdequelen · 2 years ago

Thanks for the highlight!

qdequelen commented on Fly Kubernetes fly.io/blog/fks/... · Posted by u/ferriswil

qdequelen · 2 years ago

Do you handle high throughput volumes? I would need this for testing to host a database service at scale.

qdequelen commented on Show HN: I scraped 25M Shopify products to build a search engine searchagora.com/... · Posted by u/pencildiver

pencildiver · 2 years ago

Scraper is built in Javascript and a Mongo database. Probably not the most scalable way to do it, but I found that all Shopify stores have a public JSON file available at [Base URL]/products.json. So found a list of stores, built a crawler to go store-by-store, and standardized the data on my end.

Here's an example: https://www.wildfox.com/products.json

qdequelen · 2 years ago

Did you only get the schema.json?

qdequelen commented on Show HN: I scraped 25M Shopify products to build a search engine searchagora.com/... · Posted by u/pencildiver

qdequelen · 2 years ago

Hey, I'm the CEO of Meilisearch. If your issue is performance, I would love to give you a try with Meilisearch. You'll be able to create an "as you type" experience with our engine that responds in less than 50ms!

qdequelen commented on How to deliver the best search results: inside a full text search engine blog.meilisearch.com/how-... · Posted by u/CaroFG

qdequelen · 2 years ago

Thanks @CaroFG, I'm sure that if anyone has questions, the engineering team would love to answer!

qdequelen commented on Algolia New Pricing algolia.com/pricing/... · Posted by u/naiv

naiv · 2 years ago

I think you are correct.

My biggest issues with Algolia were always:

- search requests should be separated from number of documents

-- think of a geoname service where there are 10 mio. documents vs. 500k search requests -- seems to be solved now

- it is crazy to require a new index for each sort direction

-- this is still the case

imho they should introduce cpu cycles + storage. until then self hosted Typesense, Meilisearch, Elasticsearch or hosted Typesense, Elasticsearch are still superior. I am leaving out Meilisearch here as their entry level is also nuts at 1.2k/month for hosted.

qdequelen · 2 years ago

Hello, I'm the Meilisearch CEO. I think you're also correct, Jabo.

I just want to clarify. Meilisearch's pricing doesn't start at 1.2K/month, but at 0/month. We have a usage-based pricing that is basically 0.25/1000 documents and searches. And, funny thing, we are thinking about splitting the searches and documents, too, but we wanted to have more data to be sure to select the right unit price for each. :)

qdequelen commented on Rust Is the Future of JavaScript Infrastructure (2021) leerob.io/blog/rust... · Posted by u/winter_blue

qdequelen · 3 years ago

Make the web faster with Rust.