Readit News logoReadit News
amrb commented on DisTrO – a family of low latency distributed optimizers   github.com/NousResearch/D... · Posted by u/SchwKatze
simonw · a year ago
Most of the information about this is in this PDF (I hate when people publish interesting information exclusively in PDFs): https://raw.githubusercontent.com/NousResearch/DisTrO/main/A...

I converted it to Markdown (using Gemini 1.5 Pro) and pasted it into a Gist here: https://gist.github.com/simonw/46a33d66e069efe5c10b63625fdab...

From the abstract:

> Training large scale neural networks typically involves sharing gradients between all accelerators, which necessitates specialized, high-speed interconnects. To address this, we introduce DisTrO, a family of architecture-agnostic and network-agnostic distributed optimizers that reduces the inter-GPU communication requirements by four to five orders of magnitude without relying on amortized analysis, enabling low-latency training of large neural networks on slow internet bandwidths with heterogeneous networking hardware.

This could be a HUGE deal.

Currently if you want to train giant LLMs you need a big pile of GPUs in the same location as each other due to the amount of information that needs to shuffle between them during training.

If DisTrO works as intended, it will be possible to train models using GPUs in different places - potentially enabling SETI@home style training where thousands of people with gaming PCs at home could donate their GPU time to a large training effort.

Their tweet about this has more: https://twitter.com/NousResearch/status/1828121648383566270

> Nous Research is proud to release a preliminary report on DisTrO (Distributed Training Over-the-Internet) a family of architecture-agnostic and network-agnostic distributed optimizers that reduces the inter-GPU communication requirements by 1000x to 10,000x without relying on amortized analysis, and matches AdamW+All-Reduce in convergence rates. This enables low-latency training of large neural networks on slow internet bandwidths with heterogeneous networking hardware.

> DisTrO can increase the resilience and robustness of training LLMs by minimizing dependency on a single entity for computation. DisTrO is one step towards a more secure and equitable environment for all participants involved in building LLMs.

> Without relying on a single company to manage and control the training process, researchers and institutions can have more freedom to collaborate and experiment with new techniques, algorithms, and models. This increased competition fosters innovation, drives progress, and ultimately benefits society as a whole.

amrb · a year ago
It's a red flag that the 1.2bil model has to fit in gpu memory, happy to be provided wrong when the code drops
amrb commented on Tokens are a big reason today's generative AI falls short   techcrunch.com/2024/07/06... · Posted by u/anigbrowl
vessenes · a year ago
T-FREE is interesting, at least, I find it interesting in that I don’t really understand it. They take successive character triples of all words, and then hash them, and then use the hash table slots landed in as destinations to feed into an embedding space? Can I possibly be understanding that chart properly?

Can you explain this any better than the first few pages of the paper? I’d like some intuition about why T-FREE works; there are lots of reasons to prefer different tokenization schemes, but I can’t really get this one into my head from the paper, unfortunately.

amrb · a year ago
Can't say I mastered the concept either, I'm waiting for the code [0] to be release so I can run some head-to-head tests.

[0] https://github.com/Aleph-Alpha/trigrams

amrb commented on Tokens are a big reason today's generative AI falls short   techcrunch.com/2024/07/06... · Posted by u/anigbrowl
amrb · a year ago
An alternative approache to BPE tokenization https://arxiv.org/abs/2406.19223
amrb commented on Petals runs Llama 2 (70B) from Colab at 5 tokens/sec   github.com/bigscience-wor... · Posted by u/borzunov
amrb · 2 years ago
Great project and I'm happy to see it expand to more models!
amrb commented on Reddark: Website to watch subreddits going dark   reddark.netlify.app/... · Posted by u/morjom
amrb · 2 years ago
What's the new reddit to try?
amrb commented on Ask HN: Has an API key issuer ever leaked their own customers’ API keys    · Posted by u/mathewpregasen
amrb · 2 years ago
Anything can end up in logs, then it depends on getting access to hosted splunk via employee creds, for a hypothetical breach.
amrb commented on 90% of laid-off H1-B visa holders were able to find new work   fortune.com/2023/05/26/90... · Posted by u/rustoo
glitchc · 2 years ago
All that demonstrates is that it is cheaper to hire the H1-B visa holder over a local developer with the same experience.
amrb · 2 years ago
There is a salary requirement, as not to under cut local works. Of course if you working over 40 hours a week maybe the company gets it's pound of flesh!
amrb commented on Tree of Thoughts   github.com/kyegomez/tree-... · Posted by u/kevinslin
amrb · 2 years ago

u/amrb

KarmaCake day382December 1, 2020View Original