Readit News logoReadit News
charleshn commented on How to Think About GPUs   jax-ml.github.io/scaling-... · Posted by u/alphabetting
aschleck · 6 days ago
It's been a while since I thought about this but isn't the reason providers advertise only 3.2tbps because that's the limit of a single node's connection to the IB network? DGX is spec'ed to pair each H100 with a Connect-X 7 NIC and those cap out at 400gbps. 8 gpus * 400gbps / gpu = 3.2tbps.

Quiz 2 is confusingly worded but is, iiuc, referring to intranode GPU connections rather than internode networking.

charleshn · 6 days ago
Yes, 450GB/s is the per GPU bandwidth in the nvlink domain. 3.2Tbps is the per-host bandwidth in the scale out IB/Ethernet domain.
charleshn commented on The Surprising gRPC Client Bottleneck in Low-Latency Networks   blog.ydb.tech/the-surpris... · Posted by u/eivanov89
lacop · a month ago
Yeah that was my understanding too, hence I filed the bug (actually duplicate of older bug that was closed because poster didn't provide reproduction).

Still not sure if this is linux network configuration issue or grpc issue, but something is for sure broken if I can't send a ~1MB request and get response within roughly network RTT + server processing time.

charleshn · a month ago
Could you check the value of your kernel's net.ipv4.tcp_slow_start_after_idle sysctl, and if it's non zero set it to 0?
charleshn commented on AI capex is so big that it's affecting economic statistics   paulkedrosky.com/honey-ai... · Posted by u/throw0101c
charleshn · a month ago
I'm always surprised by the number of people posting here that are dismissive of AI and the obvious unstoppable progress.

Just looking at what happened with chess, go, strategy games, protein folding etc, it's obvious that pretty much any field/problem that can be formalised and cheaply verified - e.g. mathematics, algorithms etc - will be solved, and that it's only a matter of time before we have domain-specific ASI.

I strongly encourage everyone to read about the bitter lesson [0] and verifier's law [1].

[0] http://www.incompleteideas.net/IncIdeas/BitterLesson.html

[1] https://www.jasonwei.net/blog/asymmetry-of-verification-and-...

charleshn · a month ago
You can now add getting gold at IMO [0] to the above list.

[0] https://x.com/alexwei_/status/1946477742855532918

charleshn commented on AI capex is so big that it's affecting economic statistics   paulkedrosky.com/honey-ai... · Posted by u/throw0101c
kadushka · a month ago
I love it how people are transitioning from “LLMs can’t reason” to “LLMs can’t reliably reason”.
charleshn · a month ago
Frontier models went from not being able to count the number of 'r's in "strawberry" to getting gold at IMO in under 2 years [0], and people keep repeating the same clichés such as "LLMs can't reason" or "they're just next token predictors".

At this point, I think it can only be explained by ignorance, bad faith, or fear of becoming irrelevant.

[0] https://x.com/alexwei_/status/1946477742855532918

charleshn commented on AI capex is so big that it's affecting economic statistics   paulkedrosky.com/honey-ai... · Posted by u/throw0101c
oytis · a month ago
It's very different from chess etc. If we could formalise and "solve" software engineering precisely, it would be really cool, and probably indeed just lift programming to a new level of abstraction.

I don't mind if software jobs move from writing software to verifying software either if it makes the whole process more efficient and the software becomes better as a result. Again, not what is happening here.

What is happening, at least in AI optimist CEO minds is "disruption". Drop the quality while cutting costs dramatically.

charleshn · a month ago
I mentioned algorithms, not software engineering, precisely for that reason.

But the next step is obviously increased formalism via formal methods, deterministic simulators etc, basically so that one could define an environment for a RL agent.

charleshn commented on AI capex is so big that it's affecting economic statistics   paulkedrosky.com/honey-ai... · Posted by u/throw0101c
oytis · a month ago
I just hope when (if) the hype is over, we can repurpose the capacities for something useful (e.g. drug discovery etc.)
charleshn · a month ago
I'm always surprised by the number of people posting here that are dismissive of AI and the obvious unstoppable progress.

Just looking at what happened with chess, go, strategy games, protein folding etc, it's obvious that pretty much any field/problem that can be formalised and cheaply verified - e.g. mathematics, algorithms etc - will be solved, and that it's only a matter of time before we have domain-specific ASI.

I strongly encourage everyone to read about the bitter lesson [0] and verifier's law [1].

[0] http://www.incompleteideas.net/IncIdeas/BitterLesson.html

[1] https://www.jasonwei.net/blog/asymmetry-of-verification-and-...

charleshn commented on AI capex is so big that it's affecting economic statistics   paulkedrosky.com/honey-ai... · Posted by u/throw0101c
mikewarot · a month ago
I'm waiting for the shoe to drop when someone comes out with an FPGA optimized for reconfigurable computing and lowers the cost of llm compute by 90% or better.
charleshn · a month ago
We do already have ASICs, see Google's TPU to get some cost estimates.

HBM is also very expensive.

charleshn commented on Aeron: Efficient reliable UDP unicast, UDP multicast, and IPC message transport   github.com/aeron-io/aeron... · Posted by u/todsacerdoti
lll-o-lll · a month ago
> Relative latency savings cross-DC become less interesting the longer the distance, so there's nothing wrong with TCP there.

Long fat pipe sees dramatic throughput drops with tcp and relatively small packet loss. Possibly we were holding it wrong; would love to know if there is some definitive guide to doing it right. Good success with UDT.

charleshn · a month ago
You might want to look into TCP BBR [0], it might help. Easy to try on Linux, simple sysctl.

[0] https://en.m.wikipedia.org/wiki/TCP_congestion_control#TCP_B...

charleshn commented on Caching is an abstraction, not an optimization   buttondown.com/jaffray/ar... · Posted by u/samuel246
charleshn · 2 months ago
As can be seen from other comments, people tend to focus on the consistency implications, but something not discussed often in the context of distributed systems is that caches tend to introduce bimodality and metastability [0] [1]. See e.g. DynamoDB for an example of design taking it into account [2].

[0] https://brooker.co.za/blog/2021/08/27/caches.html

[1] https://sigops.org/s/conferences/hotos/2021/papers/hotos21-s...

[2] https://brooker.co.za/blog/2022/07/12/dynamodb.html

u/charleshn

KarmaCake day154November 22, 2023View Original