dzr0001 (u/dzr0001) - Readit News

dzr0001 commented on Objects should shut up dustri.org/b/objects-shou... · Posted by u/gm678

growthwtf · a month ago

I think it's an interesting correspondence—some general design principles about creating good auditory user interface somewhere in here. I would be interested if someone smarter than me can tell me what that principle is.

dzr0001 · a month ago

I suspect that there's some marketing component at play here. People who do not own but observe devices making seemingly unnecessary noises might perceive these devices as premium. Think about the various beeps that occur when locking a car and arming the alarm, the startup sound that infotainment systems in some EVs play, the twinkle twinkle little star of a fancy rice cooker.

dzr0001 commented on LLM-D: Kubernetes-Native Distributed Inference llm-d.ai/blog/llm-d-annou... · Posted by u/smarterclayton

dzr0001 · 4 months ago

I did a quick scan of the repo and didn't see any reference to Ray. Would this indicate that llm-d lacks support for pipeline parallelism?

dzr0001 commented on Nvidia Dynamo: A Datacenter Scale Distributed Inference Serving Framework github.com/ai-dynamo/dyna... · Posted by u/ashvardanian

ipsum2 · 6 months ago

Ray doesn't offer anything if you use vLLM on top of Ray Serve though.

dzr0001 · 6 months ago

It does if you need pipeline parallelism across multiple nodes.

dzr0001 commented on IO Devices and Latency planetscale.com/blog/io-d... · Posted by u/milar

jstimpfle · 6 months ago

My current one reads SAMSUNG MZVL21T0HCLR-00BH1 and is built into a quite new work laptop. I can't get below around 250us avg.

On my older system I had a WD_BLACK SN850X but had it connected to an M.1 slot which may be limiting. This is where I measured 1-2ms latency.

Is there any good place to get numbers of what is possible with enterprise hardware today? I've struggled for some time to find a good source.

dzr0001 · 6 months ago

Unfortunately, this data is harder to find than it should be. For instance, just looking at Kioxia, which I've found to be very performant, their datasheets for the CD series drives don't mention write latency at all. Blocks and Files[1] mentions that they claim <255us average, so they must have published that somewhere. This is why we would extensively test multiple units ourselves, following proper preconditioning as defined by SNIA. Averaging 250us for direct writes is pretty good.

[1] https://blocksandfiles.com/2023/08/07/kioxias-rocketship-dat...

dzr0001 commented on IO Devices and Latency planetscale.com/blog/io-d... · Posted by u/milar

jstimpfle · 6 months ago

I still measure 1-2ms of latency with an NVMe disk on my Desktop computer, doing fsync() on a file on a ext4 filesystem.

Update: about 800us on a more modern system.

dzr0001 · 6 months ago

What drive is this and does it need a trim? Not all NVMe devices are created equal, especially in consumer drives. In a previous role I was responsible for qualifying drives. Any datacenter or enterprise class drive that had that sort of latency in direct IO write benchmarks after proper pre-conditioning would have failed our validation.

dzr0001 commented on M3 CPU cores have become more versatile eclecticlight.co/2024/01/... · Posted by u/ingve

lambdaba · 2 years ago

That is remarkably low, isn't it? I mean, saying this as an (ex) ThinkPad user :P. Did you take any special care with it?

dzr0001 · 2 years ago

I have the original M1 air that I got the day it released in 2020. In a typical week, I will let my battery discharge to less than 10% twice and recharge it to 100%. I've logged 458 cycles and lost 11% of my capacity. Not too bad.

dzr0001 commented on An introduction to RabbitMQ erlang-solutions.com/blog... · Posted by u/olikas

adamkf · 5 years ago

I've had truly terrible experiences with RabbitMQ. I believe that it should not be used in any application where message loss is not acceptable. Its two big problems are that it cannot tolerate network partitions (reason enough to never use it in production systems, see https://twitter.com/antifuchs/status/735628465924243456), and it provides no backpressure to producers when it starts running out of memory.

In my last job, we used Rabbit to move about 15k messages per sec across about 2000 queues with 200 producers (which produced to all queues) and 2000 consumers (which each read from their own queues). Any time any of the consumers would slow down of fail, rabbit would run out of memory and crash, causing sitewide failure.

Additionally, Rabbit would invent network partitions out of thin air, which would cause it to lose messages, as when partitions are healed, all messages on an arbitrarily chosen side of the partition are discarded. (See https://aphyr.com/posts/315-jepsen-rabbitmq for more details about Rabbit's issues and some recommendations for running Rabbit, which sound worse than just using something else to me.)

We experimented with "high availability" mode, which caused the cluster to crash more frequently and lose more messages, "durability", which caused the cluster to crash more frequently and lose more messages, and trying to colocate all of our Rabbit nodes on the same rack (which did not fix the constant partitions, and caused us to totally fail when this rack lost power, as you'd expect.)

These are not theoretical problems. At one point, I spent an entire night fighting with this stupid thing alongside 4 other competent infrastructure engineers. The only long term solution that we found was to completely deprecate our use of Rabbit and use Kafka instead.

To anyone considering Rabbit, please reconsider! If you're OK with losing messages, then simply making an asynchronous fire-and-forget RPC directly to the relevant consumers may be a better solution for you, since at least there isn't more infrastructure to maintain.

dzr0001 · 5 years ago

We used to have a pub rate of about 200k msgs/s, from about 400 producers all to a single exchange and had similar issues. However, we were able to mitigate this by using lazy queues.

This worked fine until things got behind and then we couldn't keep up. We were able to work around that by using a hashed exchange that spread messages across 4 queues. It hashed based on timestamp inserted by a timestamp plugin. Since all operations for a queue happen in the same event loop, any sort of backup led to pub and sub operations fighting for CPU time. By spreading this across 4 queues we wound up with 4x the CPU capacity for this particular exchange. With 2000 queues you probably didn't run into that issue very often.