Readit News logoReadit News
luizfelberti commented on Building a Simple Search Engine That Works   karboosx.net/post/4eZxhBo... · Posted by u/freediver
marginalia_nu · a month ago
If you held a gun to my head and forced me to make a guess I'd say you could push that approach to order of 100K, maybe 1M documents.

If sqlite had a generic "strictly ascending sequence of integers" type[1] and would optimize around that, you could probably push it farther in terms of implementing efficient inverted indexes.

[1] primary key tables aren't really useful here.

luizfelberti · a month ago
> If sqlite had a generic "strictly ascending sequence of integers" type

Is that not what WITHOUT ROWID does? My understanding is that it's precisely meant to physically cluster data in the underlying B-Tree

If that is not what you meant, could you elaborate on the "primary key tables aren't really useful here" footnote?

luizfelberti commented on 650GB of Data (Delta Lake on S3). Polars vs. DuckDB vs. Daft vs. Spark   dataengineeringcentral.su... · Posted by u/tanelpoder
dukodk · a month ago
c5 is such a bad instance type, m6a would be so much better and even cheaper, I would love to see this on an m8a.2xlarge (7th and 8th generations don’t use SMT) and that is even cheaper and has up to 15 Gbps
luizfelberti · a month ago
Actually for this kind of workload 15Gbps is still mediocre. What you actually want is the `n` variant of the instance types, which have higher NIC capacity.

In the c6n and m6n and maybe the upper-end 5th gens you can get 100Gbps NICs, and if you look at the 8th gen instances like the c8gn family, you can even get instances with 600Gbps of bandwidth.

luizfelberti commented on 650GB of Data (Delta Lake on S3). Polars vs. DuckDB vs. Daft vs. Spark   dataengineeringcentral.su... · Posted by u/tanelpoder
luizfelberti · a month ago
Honestly this benchmark feels completely dominated by the instance's NIC capacity.

They used a c5.4xlarge that has peak 10Gbps bandwidth, which at a constant 100% saturation would take in the ballpark of 9 minutes to load those 650GB from S3, making those 9 minutes your best case scenario for pulling the data (without even considering writing it back!)

Minute differences in how these query engines schedule IO would have drastic effects in the benchmark outcomes, and I doubt the query engine itself was constantly fed during this workload, especially when evaluating DuckDB and Polars.

The irony of workloads like this is that it might be cheaper to pay for a gigantic instance to run the query and finish it quicker, than to pay for a cheaper instance taking several times longer.

luizfelberti commented on Apple will phase out Rosetta 2 in macOS 28   developer.apple.com/docum... · Posted by u/summarity
mxey · 2 months ago
The OP says nothing about Rosetta for Linux.
luizfelberti · 2 months ago
It seems to talk about Rosetta 2 as a whole, which is what the containerization framework depends on to support running amd64 binaries inside Linux VMs (even though the kernel still needs to be arm)

Is there a separate part of Rosetta that is implemented for the VM stuff? I was under the impression Rosetta was some kind of XPC service that would translate executable pages for Hypervisor Framework as they were faulted in, did I just misunderstand how the thing works under the hood? Are there two Rosettas?

luizfelberti commented on Apple will phase out Rosetta 2 in macOS 28   developer.apple.com/docum... · Posted by u/summarity
luizfelberti · 2 months ago
They barely just released Containerization Framework[0] and the new container[1] tool, and they are already scheduling a kneecapping of this two years down the line.

Realistically, people are still going to be deploying on x64 platforms for a long time, and given that Apple's whole shtick was to serve "professionals", it's really a shame that they're dropping the ball on developers like this. Their new containerization stuff was the best workflow improvement for me in quite a while.

[0] https://github.com/apple/containerization

[1] https://github.com/apple/container

luizfelberti commented on Evaluating Argon2 adoption and effectiveness in real-world software   arxiv.org/abs/2504.17121... · Posted by u/pregnenolone
swiftcoder · 2 months ago
The documentation on this is... uh... intimidating? I come away from this with the sense that I need to learn a whole lot about cryptography to make a good decision here:

https://argon2-cffi.readthedocs.io/en/stable/parameters.html

luizfelberti · 2 months ago
Do not reference these kinds of docs whenever you need practical, actionable advice. They serve their purpose, but are for a completely different kind of audience.

For anyone perusing this thread, your first resource for this kind of security advice should probably be the OWASP cheatsheets which is a living set of documents that packages current practice into direct recommendations for implementers.

Here's what it says about tuning Argon2:

https://cheatsheetseries.owasp.org/cheatsheets/Password_Stor...

luizfelberti commented on Ask HN: What are you working on? (October 2025)    · Posted by u/david927
azianmike · 2 months ago
A couple of months ago, I saw a tweet from @awilkinson: “I just found out how much we pay for DocuSign and my jaw dropped. What's the best alternative?”

Me being naive, I thought “how hard could would it actually be to build a free e-sign tool?”

Turns out not that hard.

In about a weekend, I built a UETA and ESIGN compliant tool. And it was free. And it cost me less than $50. Unlimited free e-sign. https://useinkless.com/

luizfelberti · 2 months ago
Documenso[0] is a pretty cool alternative that is increasingly compliant with more and more e-signature standards

https://documenso.com/

luizfelberti commented on We will no longer be actively supporting KuzuDB   kuzudb.com... · Posted by u/nrjames
scosman · 2 months ago
Oh too bad. Small fast embedded graph DBs are rare. Any good alternatives?
luizfelberti · 2 months ago
There used to be a similarly names one called CozoDB[0] which was pretty awesome but it looks like its development significantly slowed down.

[0] https://github.com/cozodb/cozo

luizfelberti commented on Memory access is O(N^[1/3])   vitalik.eth.limo/general/... · Posted by u/jxmorris12
luizfelberti · 2 months ago
Ah yes, pretending we can access infinite amounts of memory instantaneously or in a finite/bounded amount of time is the achilles heel of the Von Neumann abstract computer model, and is the point where it completely diverges from physical reality.

Acknowledging that memory access is not instantaneous immediately throws you into the realm of distributed systems though and something much closer to an actor model of computation. It's a pretty meaningful theoretical gap, more so than people realize.

luizfelberti commented on Guy running a Google rival from his laundry room   fastcompany.com/91396271/... · Posted by u/coloneltcb
luizfelberti · 3 months ago
I was trying to do this in 2023! The hardest part about building a search engine is not the actual searching though, it is (like others here have pointed out), building your index and crawling the (extremely adversarial) internet, especially when you're running the thing from a single server in your own home without fancy rotating IPs.

I hope this guy succeeds and becomes another reference in the community like the marginalia dude. This makes me want to give my project another go...

u/luizfelberti

KarmaCake day726June 12, 2019
About
https://berti.me
View Original