joatmon-snoo (u/joatmon-snoo)

joatmon-snoo commented on Flattening Rust’s learning curve corrode.dev/blog/flatteni... · Posted by u/birdculture

baalimago · 4 months ago

>For instance, why do you have to call to_string() on a thing that’s already a string?

It's so hard for me to take Rust seriously when I have to find out answers to unintuitive question like this

joatmon-snoo · 4 months ago

Strings are like time objects: most people and languages only ever deal with simplified versions of them that skip a lot of edge cases around how they work.

Unfortunately going from most languages to Rust forces you to speedrun this transition.

joatmon-snoo commented on An intro to DeepSeek's distributed file system maknee.github.io/blog/202... · Posted by u/sebg

jamesblonde · 5 months ago

Architecturally, it is a scale-out metadata filesystem [ref]. Other related distributed file systems are Collosus, Tectonic (Meta), ADLSv2 (Microsoft), HopsFS (Hopsworks), and I think PolarFS (Alibaba). They all use different distributed row-oriented DBs for storing metadata. S3FS uses FoundationDB, Collosus uses BigTable, Tectonic some KV store, ADLSv2 (not sure), HopsFS uses RonDB.

What's important here with S3FS is that it supports (1) a fuse client - it just makes life so much easiter - and (2) NVMe storage - so that training pipelines aren't Disk I/O bound (you can't always split files small enough and parallel reading/writing enough to a S3 object store).

Disclaimer: i worked on HopsFS. HopsFS adds tiered storage - NVMe for recent data and S3 for archival.

[ref]: https://www.hopsworks.ai/post/scalable-metadata-the-new-bree...

joatmon-snoo · 5 months ago

nit: Colossus* for Google.

joatmon-snoo commented on A year of Rust in ClickHouse clickhouse.com/blog/rust... · Posted by u/valyala

DeathArrow · 5 months ago

C++ can be safe enough if you proceed with care.

What I dislike of C++ is that it grew to become a monster of a language, containing all programming paradigms and ideas, good or bad, known to mankind.

It's so monstrously huge no human can hold its entire complexity in his head.

C++ allows you to do things in 10000 different ways and developer would do just that. Often in the same code base.

That being said, I would use a sane subset of C++ every day over Rust. It's not that I hate Rust or that I don't think is good, technically sound and capable. It just doesn't fit the way I think and I like to work.

I like to keep a simple model in mind. For me, the memory is just a huge array from which we copy data to CPU cache, move some to CPU registers, execute instructions and fetch data from the registers and put it again in some part of that huge array, to be used later. Rust adds a lot of complexity over this simple mental model of mine.

joatmon-snoo · 5 months ago

> if you proceed with care

Yes, but that is _incredibly_ time consuming. You have to set up asan, msan, tsan, and valgrind. If you want linting you need to do shenanigans to wire up clang-tidy.

I also like simple mental models. I like not having to figure out the cmake modifications to pull in a new library. I like having a search engine when I need a new library for x. I like when libraries return Result<Ok, Err> instead of ping ponging between C libraries which indicate errors using retval flags or C++ libraries that throw std::runtime_error(). I like not dealing with void* pointer casting .

joatmon-snoo commented on Better Shell History Search tratt.net/laurie/blog/202... · Posted by u/todsacerdoti

joatmon-snoo · 5 months ago

An unfortunate problem with using awk: there are three different versions of awk, and it is frighteningly easy to use a feature that exists on one but not other.

(source: I have written unit tests against different versions of awk. That was... unpleasant.)

joatmon-snoo commented on Better Shell History Search tratt.net/laurie/blog/202... · Posted by u/todsacerdoti

js2 · 5 months ago

Usually it does de-dupe any submission with "significant attention", but I'm not sure what the threshold for that is.

I'd think the submissions from two days ago (206 points, 93 comments) would have qualified. (It's the exact same URL so that's not why it wasn't de-duped either.)

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

joatmon-snoo · 5 months ago

Pretty sure dedupe is done manually by u/dang.

joatmon-snoo commented on Run structured extraction on documents/images locally with Ollama and Pydantic github.com/vlm-run/vlmrun... · Posted by u/EarlyOom

joatmon-snoo · 6 months ago

Super cool! We at BAML had been thinking about doing something like this for our ecosystem as well - we’d love to add BAML models to this repo!

If you haven’t heard of us, we provide a language and runtime that enable defining your schemas in a simpler syntax, and allow usage with _any_ model, not just those that implement tool calling or json mode, by by relying on schema-aligned parsing. Check it out! https://github.com/BoundaryML/baml

joatmon-snoo commented on Understanding gRPC, OpenAPI and REST and when to use them in API design (2020) cloud.google.com/blog/pro... · Posted by u/hui-zheng

mvdtnz · 7 months ago

Why do you say that? I'm involved in the planning for bidi streaming for a product that supports over 200M monthly active users. I am genuinely curious what landmines we're about to step on.

joatmon-snoo · 7 months ago

bidi streaming screws with a whole bunch of assumptions you rely on in usual fault-tolerant software:

- there are multiple ways to retry - you can retry establishing the connection (e.g. say DNS resolution fails for a 30s window) _or_ you can retry establishing the stream

- your load-balancer needs to persist the stream to the backend; it can't just re-route per single HTTP request/response

- how long are your timeouts? if you don't receive a message for 1s, OK, the client can probably keep the stream open, but what if you don't receive a message for 30s? this percolates through the entire request path, generally in the form of "how do I detect when a service in the request path has failed"

joatmon-snoo commented on Why is it so hard to buy things that work well? (2022) danluu.com/nothing-works/... · Posted by u/janandonly

georgewfraser · 9 months ago

I have some insight into this because this claim is about my company Fivetran:

“…relies on the data source being able to seek backwards on its changelog. But Postgres throws changelogs away once they're consumed, so the Postgres data source can't support this operation”

Dan’s understanding is incorrect, Postgres logical replication allows each consumer to maintain a bookmark in the WAL, and it will retain the WAL until you acknowledge receipt of a portion and advance the bookmark. Evidently, he tried our product briefly, had an issue or thought he had an issue, investigated the issue briefly and came to the conclusion that he understood the technology better than people who have spent years working on it.

Don’t get me wrong, it is absolutely possible for the experts to be wrong and one smart guy to be right. But at least part of what’s going on in this post is an arrogant guy who thinks he knows better than everyone, coming to snap conclusions that other people’s work is broken.

joatmon-snoo · 9 months ago

...how is _this_ the insight that you come away from this post with?

This post is a commentary on product quality issues, the underlying cost models (both goods and services), and the interplay with American culture. There's like 20+ company/product anecdotes in there - a mistake about one detail about one technical detail of one product is wildly uninteresting.