qntty (u/qntty) - Readit News

qntty commented on Claude Advanced Tool Use anthropic.com/engineering... · Posted by u/lebovic

behnamoh · 24 days ago

I cannot believe all these months and years people have been loading all of the tool JSON schemas upfront. This is such a waste of context window and something that was already solved three years ago.

qntty · 24 days ago

Solved how?

qntty commented on Calculus for Mathematicians, Computer Scientists, and Physicists [pdf] mathcs.holycross.edu/~ahw... · Posted by u/o4c

qntty · 25 days ago

Writing a calculus book that's more rigorous than typical books is hard because if you go too hard, people will say that you've written a real analysis book and the point of calculus is to introduce certain concepts without going full analysis. This book seems to have at least avoided the trap of trying to be too rigorous about the concept of convergence and spending more time on introducing vocabulary to talk about functions and talking about intersections with linear algebra.

qntty commented on The half-life of tech skills haraldagterhuis.substack.... · Posted by u/j-a-a-p

qntty · 5 months ago

I tried and failed to find some kind of concrete methodology that they used to get to the number 30 months. I'm still waiting for quadratic algebra to make my knowledge of linear algebra obsolete.

qntty commented on MCP in LM Studio lmstudio.ai/blog/lmstudio... · Posted by u/yags

sixhobbits · 6 months ago

MCP terminology is already super confusing, but this seems to just introduce "MCP Host" randomly in a way that makes no sense to me at all.

> "MCP Host": applications (like LM Studio or Claude Desktop) that can connect to MCP servers, and make their resources available to models.

I think everyone else is calling this an "MCP Client", so I'm not sure why they would want to call themselves a host - makes it sound like they are hosting MCP servers (definitely something that people are doing, even though often the server is run on the same machine as the client), when in fact they are just a client? Or am I confused?

qntty · 6 months ago

It's confusing but you just have to read the official docs

https://modelcontextprotocol.io/specification/2025-03-26/arc...

qntty commented on Find Your People foundersatwork.posthaven.... · Posted by u/jl

qntty · 7 months ago

I like the subway analogy. I'm sure I've heard some version of it before, but maybe because I was younger I didn't really get it. It really is a little strange to tell kids who have never really directed their own lives before to start doing it all of a sudden.

qntty commented on By default, Signal doesn't recall signal.org/blog/signal-do... · Posted by u/feross

ranger_danger · 7 months ago

Yes but as I understand it, "easy to disable" and "difficult to accidentally disable" are opposites.

EDIT: Apparently people have different definitions of easy. Fair enough

qntty · 7 months ago

Child-proof caps are easy to take off but difficult to accidentally take off.

qntty commented on LLM-D: Kubernetes-Native Distributed Inference llm-d.ai/blog/llm-d-annou... · Posted by u/smarterclayton

rdli · 7 months ago

In this analogy, Dynamo is most definitely not like Django. It includes inference aware routing, KV caching, etc. -- all the stuff you would need to run a modern SOTA inference stack.

qntty · 7 months ago

You're right, I was confusing TensorRT with Dynamo. It looks like the relationship between Dynamo and vLLM is actually the opposite of what I was thinking -- Dynamo can use vLLM as a backend rather than vice versa.

qntty commented on LLM-D: Kubernetes-Native Distributed Inference llm-d.ai/blog/llm-d-annou... · Posted by u/smarterclayton

rdli · 7 months ago

This is really interesting. For SOTA inference systems, I've seen two general approaches:

* The "stack-centric" approach such as vLLM production stack, AIBrix, etc. These set up an entire inference stack for you including KV cache, routing, etc.

* The "pipeline-centric" approach such as NVidia Dynamo, Ray, BentoML. These give you more of an SDK so you can define inference pipelines that you can then deploy on your specific hardware.

It seems like LLM-d is the former. Is that right? What prompted you to go down that direction, instead of the direction of Dynamo?

qntty · 7 months ago

It sounds like you might be confusing different parts of the stack. NVIDIA Dynamo for example supports vLLM as the inference engine. I think you should think of something like vLLM as more akin to GUnicorn, and llm-d as an application load balancer. And I guess something like NVIDIA Dynamo would be like Django.

qntty commented on LLM-D: Kubernetes-Native Distributed Inference llm-d.ai/blog/llm-d-annou... · Posted by u/smarterclayton

dzr0001 · 7 months ago

I did a quick scan of the repo and didn't see any reference to Ray. Would this indicate that llm-d lacks support for pipeline parallelism?

qntty · 7 months ago

I believe this is a question you should ask about vLLM, not llm-d. It looks like vLLM does support pipeline parallelism via Ray: https://docs.vllm.ai/en/latest/serving/distributed_serving.h...

This project appears to make use of both vLLM and Inference Gateway (an official Kubernetes extension to the Gateway resource). The contributions of llm-d itself seems to mostly be a scheduling algorithm for load balancing across vLLM instances.

qntty commented on Gandi March 9, 2025 incident postmortem news.gandi.net/en/2025/03... · Posted by u/wilsonfiifi

qntty · 7 months ago

In Hindi, “gandi” means dirty, which I guess is appropriate for marches

u/qntty

KarmaCake day3002December 10, 2014View Original