Also, my heart rate would sometimes drop below 40 bpm. Then it would start pacing, which i didn't want and was extremely uncomfortable.
p.s., the reason the battery ran out was because i found a treatment for my condition that works really well through talking globally to experts (i am a computer scientist). I wrote a case study paper about my condition to help others, co-authored by my doctors. https://www.slideshare.net/slideshow/arvc-and-flecainide-cas... 16 years later, the device is still in place, but I will have it removed early next year.
The blog argues that AI workloads are bottlenecked by latency because of 'millions of small files.' But if you are training on millions of loose 4KB objects directly from network storage, your data pipeline is the problem, not the storage layer.
Data Formats: Standard practice is to use formats like WebDataset, Parquet, or TFRecord to chunk small files into large, sequential blobs. This negates the need for high-IOPS metadata operations and makes standard S3 throughput the only metric that matters (which is already plentiful).
Caching: Most high-performance training jobs hydrate local NVMe scratch space on the GPU nodes. S3 is just the cold source of truth. We don't need sub-millisecond access to the source of truth, we need it at the edge (local disk/RAM), which is handled by the data loader pre-fetching.
It seems like they are building a complex distributed system to solve a problem that is better solved by tar -cvf
We do this with tiered storage over S3 using HopsFS that has a HDFS API with a FUSE client, so training can just read data (from HopsFS datanode's NVMe cache) as if it is local, but it is pulled from NVMe disks over the network. In contrast, writes go straight to S3 vis HopsFS write-through NVMe cache.