What's important here with S3FS is that it supports (1) a fuse client - it just makes life so much easiter - and (2) NVMe storage - so that training pipelines aren't Disk I/O bound (you can't always split files small enough and parallel reading/writing enough to a S3 object store).
Disclaimer: i worked on HopsFS. HopsFS adds tiered storage - NVMe for recent data and S3 for archival.
[ref]: https://www.hopsworks.ai/post/scalable-metadata-the-new-bree...
What I dislike of C++ is that it grew to become a monster of a language, containing all programming paradigms and ideas, good or bad, known to mankind.
It's so monstrously huge no human can hold its entire complexity in his head.
C++ allows you to do things in 10000 different ways and developer would do just that. Often in the same code base.
That being said, I would use a sane subset of C++ every day over Rust. It's not that I hate Rust or that I don't think is good, technically sound and capable. It just doesn't fit the way I think and I like to work.
I like to keep a simple model in mind. For me, the memory is just a huge array from which we copy data to CPU cache, move some to CPU registers, execute instructions and fetch data from the registers and put it again in some part of that huge array, to be used later. Rust adds a lot of complexity over this simple mental model of mine.
Yes, but that is _incredibly_ time consuming. You have to set up asan, msan, tsan, and valgrind. If you want linting you need to do shenanigans to wire up clang-tidy.
I also like simple mental models. I like not having to figure out the cmake modifications to pull in a new library. I like having a search engine when I need a new library for x. I like when libraries return Result<Ok, Err> instead of ping ponging between C libraries which indicate errors using retval flags or C++ libraries that throw std::runtime_error(). I like not dealing with void* pointer casting .
(source: I have written unit tests against different versions of awk. That was... unpleasant.)
I'd think the submissions from two days ago (206 points, 93 comments) would have qualified. (It's the exact same URL so that's not why it wasn't de-duped either.)
https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
If you haven’t heard of us, we provide a language and runtime that enable defining your schemas in a simpler syntax, and allow usage with _any_ model, not just those that implement tool calling or json mode, by by relying on schema-aligned parsing. Check it out! https://github.com/BoundaryML/baml
- there are multiple ways to retry - you can retry establishing the connection (e.g. say DNS resolution fails for a 30s window) _or_ you can retry establishing the stream
- your load-balancer needs to persist the stream to the backend; it can't just re-route per single HTTP request/response
- how long are your timeouts? if you don't receive a message for 1s, OK, the client can probably keep the stream open, but what if you don't receive a message for 30s? this percolates through the entire request path, generally in the form of "how do I detect when a service in the request path has failed"
“…relies on the data source being able to seek backwards on its changelog. But Postgres throws changelogs away once they're consumed, so the Postgres data source can't support this operation”
Dan’s understanding is incorrect, Postgres logical replication allows each consumer to maintain a bookmark in the WAL, and it will retain the WAL until you acknowledge receipt of a portion and advance the bookmark. Evidently, he tried our product briefly, had an issue or thought he had an issue, investigated the issue briefly and came to the conclusion that he understood the technology better than people who have spent years working on it.
Don’t get me wrong, it is absolutely possible for the experts to be wrong and one smart guy to be right. But at least part of what’s going on in this post is an arrogant guy who thinks he knows better than everyone, coming to snap conclusions that other people’s work is broken.
This post is a commentary on product quality issues, the underlying cost models (both goods and services), and the interplay with American culture. There's like 20+ company/product anecdotes in there - a mistake about one detail about one technical detail of one product is wildly uninteresting.
It's so hard for me to take Rust seriously when I have to find out answers to unintuitive question like this
Unfortunately going from most languages to Rust forces you to speedrun this transition.