a_t48 (u/a_t48) - Readit News

a_t48 commented on Static sites with Python, uv, Caddy, and Docker nkantar.com/blog/2025/08/... · Posted by u/indigodaddy

rr808 · a day ago

At my work we use docker for everything now. Makes no sense. Literally we have a dedicated server for each application. Instead of copying python files to the server in 5 seconds we take 30 minutes to build a docker container, copy to repo, scan it, deploy.

a_t48 · 19 hours ago

Not saying you're wrong here, but if that 30 minutes doesn't include running tests it sounds like your Dockerfile is setup poorly. For a ci/cd setup it should take like...a minute at most, if you don't have heavy ML dependencies, etc. Maybe less, depending on how well you cache things.

a_t48 commented on Writing Speed-of-Light Flash Attention for 5090 in CUDA C++ gau-nernst.github.io/fa-5... · Posted by u/dsr12

a_t48 · 19 hours ago

Definitely going to save this for later and come back to it after I get some more CUDA experience under my belt. It feels so nice right now making nice beautiful to use pipeline code w/ npp and some CUDA kernels here and there, the code is much faster than what it's replacing, but then I look at this guy getting down into the weeds of memory bank contention, prefetching, loop invariance, etc. Makes me feel like I'm playing with LEGO, I'm a little jealous.

The tip that Nsight can run on Mac over SSH is great, too. I've been capturing and viewing data over RDP, basically, will have to give it a shot next week.

a_t48 commented on Sunny days are warm: why LinkedIn rewards mediocrity elliotcsmith.com/linkedin... · Posted by u/smitec

monkeyelite · 7 days ago

Is it bad to use automation?

a_t48 · 7 days ago

TBH automation for finding/filtering candidates, but preferably personal email to my inbox. I can tell when you’ve used some cruddy software to send automated follow-ups four times after I didn’t respond to the first message.

a_t48 commented on The future of large files in Git is Git tylercipriani.com/blog/20... · Posted by u/thcipriani

fmbb · 9 days ago

Doesn’t the official docker build push action support caching with the GitHub Actions cache?

a_t48 · 8 days ago

Yes but one image push for us is >10GB, due to ML dependencies. And even if it is intelligent and supports per layer caching, you can’t share between release branches - https://github.com/docker/build-push-action/issues/862.

And even if that did work, I’ve found it much more reliable to use the actual docker BuildX disk state than to try and get caching for complex multi stage builds working reliably. I have a case right now where there’s no combination of —cache-to/from flags that will give me a 100% cached rebuild starting from a fresh builder, using only remote cache. I should probably report it to the Docker team, but I don’t have a minimal repro right now and there’s a 10% chance it’s actually my fault.

a_t48 commented on The future of large files in Git is Git tylercipriani.com/blog/20... · Posted by u/thcipriani

cyberax · 9 days ago

> let me pay for more cache size

Apparently, this is coming in Q3 according to their public roadmap: https://github.com/github/roadmap/issues/1029

a_t48 · 9 days ago

Awesome. Technically you can go over the limit right now (ours was saying 93/10GB last I checked), but I don’t know the eviction policy. I’d rather pay a bit more and know for sure when data will stick around.

a_t48 commented on The future of large files in Git is Git tylercipriani.com/blog/20... · Posted by u/thcipriani

tagraves · 9 days ago

We use a somewhat similar approach in RWX when pulling LFS files[1]. We run `git lfs ls-files` to get a list of the lfs files, then pass that list into a task which pulls each file from the LFS endpoint using curl. Since in RWX the output of tasks are cached as long as their inputs don't change, the LFS files just stay in the RWX cache and are pulled from there on future clones in CI. In addition to saving on GitHub's LFS bandwidth costs, the RWX cache is also _much_ faster to restore from than `git lfs pull`.

[1] https://github.com/rwx-cloud/packages/blob/main/git/clone/bi...

a_t48 · 9 days ago

Nice! I was considering using some sort of pull through cache like this, but went with the solution that didn’t require setting up more infra than a bucket.

a_t48 commented on The future of large files in Git is Git tylercipriani.com/blog/20... · Posted by u/thcipriani

gmm1990 · 9 days ago

Why not run some open source ci locally or the google equivalent ec2, if you’re already going to the trouble of this much customization with running GitHub ci?

a_t48 · 9 days ago

It was half a day of work to make a drop in action.yml that does this. Saved a bunch of money (both in bandwidth and builder minutes), well worth the investment. It really wasn’t a lot of customization.

All our builds are on GHA definitions, there’s no way it’s worth it to swap us over to another build system, administer it, etc. Our team is small (two at the time, but hopefully doubling soon!), and there’s barely a dozen people in the whole engineering org. The next hit list item is to move from GH hosted builders to GCE workers to get a warmer docker cache (a bunch of our build time is spent pulling images that haven’t changed) - it will also save a chunk of change (GCE workers are 4x cheaper per minute and the caching will make for faster builds), but the opportunity cost for me tackling that is quite high.

a_t48 commented on The future of large files in Git is Git tylercipriani.com/blog/20... · Posted by u/thcipriani

a_t48 · 9 days ago

The real GH LFS cost is not the storage but the bandwidth on pulling objects down for every fresh clone. $$$$$. See my other comment. :)

a_t48 commented on The future of large files in Git is Git tylercipriani.com/blog/20... · Posted by u/thcipriani

bob1029 · 9 days ago

> Large object promisors are special Git remotes that only house large files.

I like this approach. If I could configure my repos to use something like S3, I would switch away from using LFS. S3 seems like a really good synergy for large blobs in a VCS. The intelligent tiering feature can move data into colder tiers of storage as history naturally accumulates and old things are forgotten. I wouldn't mind a historical checkout taking half a day (i.e., restored from a robotic tape library) if I am pulling in stuff from a decade ago.

a_t48 · 9 days ago

At my current job I've started caching all of our LFS objects in a bucket, for cost reasons. Every time a PR is run, I get the list of objects via `git lfs ls-files`, sync them from gcp, run `git lfs checkout` to actually populate the repo from the object store, and then `git lfs pull` to pick up anything not cached. If there were uncached objects, I push them back up via `gcloud storage rsync`. Simple, doesn't require any configuration for developers (who only ever have to pull new objects), keeps the Github UI unconfused about the state of the repo.

I'd initially at spinning up an LFS backends, but this solves the main pain point, for now. Github was charging us an arm and a leg for pulling LFS files for CI, because each checkout is fresh, the caching model is non-ideal (max 10GB cache, impossible to share between branches), so we end up pulling a bunch of data that is unfortunately in LFS, every commit, possibly multiple times. Because of this they happily charge us for all that bandwidth, because they don't provide tools to make it easy to reduce bandwidth (let me pay for more cache size, or warm workers with an entire cache disc, or better cache control, or...).

...and if I want to enable this for developers it's relatively easy, just add a new git hook to do the same set of operations locally.

a_t48 commented on Going faster than memcpy squadrick.dev/journal/goi... · Posted by u/snihalani

waschl · 14 days ago

Thought about zero-copy IPC recently. In order to avoid memcopy for the complete chain, I guess it would be best if the sender allocates its payload directly on the shared memory when it’s created. Is this a standard thing in such optimized IPC and which libraries offer this?

a_t48 · 14 days ago

I've looked into this a bit - the big blocker isn't on the transport/IPC library, but the serializer itself, assuming you _also_ want to support serializing messages to disk or over network. It's a bit of a pickle - at least in C++, tying an allocator to a structure and its children is an ugly mess. And what happens if you do something like resize a string? Does it mean a whole new allocation? I've (partially) solved it before for single process IPC by having a concept of a sharable structure and its serialization type, you could do the same for shared memory. One could also use a serializer that offers promises around allocations, FlatBuffer might fit the bill. There's also https://github.com/Verdant-Robotics/cbuf but I'm not sure how well maintained it is right now, publicly.

As for allocation - it looks like Zenoh might offer the allocation pattern necessary. https://zenoh-cpp.readthedocs.io/en/1.0.0.5/shm.html TBH most of the big wins come from not copying big blocks of memory around from sensor data and the like. A thin header and reference to a block of shared memory containing an image or point cloud coming in over UDS is likely more than performant enough for most use cases. Again, big wins from not having to serialize/deserialize the sensor data.

Another pattern which I haven't really seen anywhere is handling multiple transports - at one point I had the concept of setting up one transport as an allocator (to put into shared memory or the like) - serialize once to shared memory, hand that serialized buffer to your network transport(s) or your disk writer. It's not quite zero copy but in practice most zero copy is actually at least one copy on each end.

(Sorry, this post is a little scatterbrained, hopefully some of my points come across)