The tip that Nsight can run on Mac over SSH is great, too. I've been capturing and viewing data over RDP, basically, will have to give it a shot next week.
And even if that did work, I’ve found it much more reliable to use the actual docker BuildX disk state than to try and get caching for complex multi stage builds working reliably. I have a case right now where there’s no combination of —cache-to/from flags that will give me a 100% cached rebuild starting from a fresh builder, using only remote cache. I should probably report it to the Docker team, but I don’t have a minimal repro right now and there’s a 10% chance it’s actually my fault.
Apparently, this is coming in Q3 according to their public roadmap: https://github.com/github/roadmap/issues/1029
[1] https://github.com/rwx-cloud/packages/blob/main/git/clone/bi...
All our builds are on GHA definitions, there’s no way it’s worth it to swap us over to another build system, administer it, etc. Our team is small (two at the time, but hopefully doubling soon!), and there’s barely a dozen people in the whole engineering org. The next hit list item is to move from GH hosted builders to GCE workers to get a warmer docker cache (a bunch of our build time is spent pulling images that haven’t changed) - it will also save a chunk of change (GCE workers are 4x cheaper per minute and the caching will make for faster builds), but the opportunity cost for me tackling that is quite high.
I like this approach. If I could configure my repos to use something like S3, I would switch away from using LFS. S3 seems like a really good synergy for large blobs in a VCS. The intelligent tiering feature can move data into colder tiers of storage as history naturally accumulates and old things are forgotten. I wouldn't mind a historical checkout taking half a day (i.e., restored from a robotic tape library) if I am pulling in stuff from a decade ago.
I'd initially at spinning up an LFS backends, but this solves the main pain point, for now. Github was charging us an arm and a leg for pulling LFS files for CI, because each checkout is fresh, the caching model is non-ideal (max 10GB cache, impossible to share between branches), so we end up pulling a bunch of data that is unfortunately in LFS, every commit, possibly multiple times. Because of this they happily charge us for all that bandwidth, because they don't provide tools to make it easy to reduce bandwidth (let me pay for more cache size, or warm workers with an entire cache disc, or better cache control, or...).
...and if I want to enable this for developers it's relatively easy, just add a new git hook to do the same set of operations locally.
As for allocation - it looks like Zenoh might offer the allocation pattern necessary. https://zenoh-cpp.readthedocs.io/en/1.0.0.5/shm.html TBH most of the big wins come from not copying big blocks of memory around from sensor data and the like. A thin header and reference to a block of shared memory containing an image or point cloud coming in over UDS is likely more than performant enough for most use cases. Again, big wins from not having to serialize/deserialize the sensor data.
Another pattern which I haven't really seen anywhere is handling multiple transports - at one point I had the concept of setting up one transport as an allocator (to put into shared memory or the like) - serialize once to shared memory, hand that serialized buffer to your network transport(s) or your disk writer. It's not quite zero copy but in practice most zero copy is actually at least one copy on each end.
(Sorry, this post is a little scatterbrained, hopefully some of my points come across)