Readit News logoReadit News
qntty commented on Claude Advanced Tool Use   anthropic.com/engineering... · Posted by u/lebovic
behnamoh · 24 days ago
I cannot believe all these months and years people have been loading all of the tool JSON schemas upfront. This is such a waste of context window and something that was already solved three years ago.
qntty · 24 days ago
Solved how?
qntty commented on Calculus for Mathematicians, Computer Scientists, and Physicists [pdf]   mathcs.holycross.edu/~ahw... · Posted by u/o4c
qntty · 25 days ago
Writing a calculus book that's more rigorous than typical books is hard because if you go too hard, people will say that you've written a real analysis book and the point of calculus is to introduce certain concepts without going full analysis. This book seems to have at least avoided the trap of trying to be too rigorous about the concept of convergence and spending more time on introducing vocabulary to talk about functions and talking about intersections with linear algebra.
qntty commented on The half-life of tech skills   haraldagterhuis.substack.... · Posted by u/j-a-a-p
qntty · 5 months ago
I tried and failed to find some kind of concrete methodology that they used to get to the number 30 months. I'm still waiting for quadratic algebra to make my knowledge of linear algebra obsolete.
qntty commented on MCP in LM Studio   lmstudio.ai/blog/lmstudio... · Posted by u/yags
sixhobbits · 6 months ago
MCP terminology is already super confusing, but this seems to just introduce "MCP Host" randomly in a way that makes no sense to me at all.

> "MCP Host": applications (like LM Studio or Claude Desktop) that can connect to MCP servers, and make their resources available to models.

I think everyone else is calling this an "MCP Client", so I'm not sure why they would want to call themselves a host - makes it sound like they are hosting MCP servers (definitely something that people are doing, even though often the server is run on the same machine as the client), when in fact they are just a client? Or am I confused?

qntty · 6 months ago
It's confusing but you just have to read the official docs

https://modelcontextprotocol.io/specification/2025-03-26/arc...

qntty commented on Find Your People   foundersatwork.posthaven.... · Posted by u/jl
qntty · 7 months ago
I like the subway analogy. I'm sure I've heard some version of it before, but maybe because I was younger I didn't really get it. It really is a little strange to tell kids who have never really directed their own lives before to start doing it all of a sudden.
qntty commented on By default, Signal doesn't recall   signal.org/blog/signal-do... · Posted by u/feross
ranger_danger · 7 months ago
Yes but as I understand it, "easy to disable" and "difficult to accidentally disable" are opposites.

EDIT: Apparently people have different definitions of easy. Fair enough

qntty · 7 months ago
Child-proof caps are easy to take off but difficult to accidentally take off.
qntty commented on LLM-D: Kubernetes-Native Distributed Inference   llm-d.ai/blog/llm-d-annou... · Posted by u/smarterclayton
rdli · 7 months ago
In this analogy, Dynamo is most definitely not like Django. It includes inference aware routing, KV caching, etc. -- all the stuff you would need to run a modern SOTA inference stack.
qntty · 7 months ago
You're right, I was confusing TensorRT with Dynamo. It looks like the relationship between Dynamo and vLLM is actually the opposite of what I was thinking -- Dynamo can use vLLM as a backend rather than vice versa.
qntty commented on LLM-D: Kubernetes-Native Distributed Inference   llm-d.ai/blog/llm-d-annou... · Posted by u/smarterclayton
rdli · 7 months ago
This is really interesting. For SOTA inference systems, I've seen two general approaches:

* The "stack-centric" approach such as vLLM production stack, AIBrix, etc. These set up an entire inference stack for you including KV cache, routing, etc.

* The "pipeline-centric" approach such as NVidia Dynamo, Ray, BentoML. These give you more of an SDK so you can define inference pipelines that you can then deploy on your specific hardware.

It seems like LLM-d is the former. Is that right? What prompted you to go down that direction, instead of the direction of Dynamo?

qntty · 7 months ago
It sounds like you might be confusing different parts of the stack. NVIDIA Dynamo for example supports vLLM as the inference engine. I think you should think of something like vLLM as more akin to GUnicorn, and llm-d as an application load balancer. And I guess something like NVIDIA Dynamo would be like Django.
qntty commented on LLM-D: Kubernetes-Native Distributed Inference   llm-d.ai/blog/llm-d-annou... · Posted by u/smarterclayton
dzr0001 · 7 months ago
I did a quick scan of the repo and didn't see any reference to Ray. Would this indicate that llm-d lacks support for pipeline parallelism?
qntty · 7 months ago
I believe this is a question you should ask about vLLM, not llm-d. It looks like vLLM does support pipeline parallelism via Ray: https://docs.vllm.ai/en/latest/serving/distributed_serving.h...

This project appears to make use of both vLLM and Inference Gateway (an official Kubernetes extension to the Gateway resource). The contributions of llm-d itself seems to mostly be a scheduling algorithm for load balancing across vLLM instances.

qntty commented on Gandi March 9, 2025 incident postmortem   news.gandi.net/en/2025/03... · Posted by u/wilsonfiifi
qntty · 7 months ago
In Hindi, “gandi” means dirty, which I guess is appropriate for marches

u/qntty

KarmaCake day3002December 10, 2014View Original