ritz_labringue (u/ritz_labringue)

ritz_labringue commented on Ask HN: How can ChatGPT serve 700M users when I can't run one GPT-4 locally? · Posted by u/superasn

jsnell · 24 days ago

> These days, LLMs are what we call "Mixture of Experts", meaning they only activate a small subset of their weights at a time. This makes them a lot more efficient to run at high batch size.

I don't really understand why you're trying to connect MoE and batching here. Your stated mechanism is not only incorrect but actually the wrong way around.

The efficiency of batching comes from optimally balancing the compute and memory bandwidth, by loading a tile of parameters from the VRAM to cache, applying those weights to all the batched requests, and only then loading in the next tile.

So batching only helps when multiple queries need to access the same weights for the same token. For dense models, that's just what always happens. But for MoE, it's not the case, exactly due to the reason that not all weights are always activated. And then suddenly your batching becomes a complex scheduling problem, since not all the experts at a given layer will have the same load. Surely a solvable problem, but MoE is not the enabler for batching but making it significantly harder.

ritz_labringue · 24 days ago

You’re right, I conflated two things. MoE improves compute efficiency per token (only a few experts run), but it doesn’t meaningfully reduce memory footprint.

For fast inference you typically keep all experts in memory (or shard them), so VRAM still scales with the total number of experts.

Practically, that’s why home setups are wasteful: you buy a GPU for its VRAM capacity, but MoE only activates a fraction of the compute each token, and some experts/devices sit idle (because you are the only one using the model).

MoE does not make batching more efficient, but it demands larger batches to maximize compute utilization and to amortize routing. Dense models batch trivially (same weights every token). MoE batches well once the batch is large enough so each expert has work. So the point isn’t that MoE makes batching better, but that MoE needs bigger batches to reach its best utilization.

ritz_labringue commented on Ask HN: How can ChatGPT serve 700M users when I can't run one GPT-4 locally? · Posted by u/superasn

ritz_labringue · 25 days ago

The short answer is "batch size". These days, LLMs are what we call "Mixture of Experts", meaning they only activate a small subset of their weights at a time. This makes them a lot more efficient to run at high batch size.

If you try to run GPT4 at home, you'll still need enough VRAM to load the entire model, which means you'll need several H100s (each one costs like $40k). But you will be under-utilizing those cards by a huge amount for personal use.

It's a bit like saying "How come Apple can make iphones for billions of people but I can't even build a single one in my garage"

ritz_labringue commented on My "Are you presuming most people are stupid?" test andymasley.substack.com/p... · Posted by u/jger15

stephenmac98 · 2 months ago

How many people drive their car daily or near daily? How many people are good drivers?

The ratio of those two values shows, in my experience, that a lot of people are not very good at things they spend a lot of time doing, and are generally unaware of their own shortcomings

The average American spends 4.2 hours a week in the car. A typical 40 year old american has driven around 50,000 miles. For someone to continue to be bad at driving after that much experience, it must be a fundamental limitation on their capabilities for learning, thinking, or understanding. Drive to work any given day in Denver and you will see that a large number of people suffer from those fundamental limitations.

This article seems to present a world where most people the author interacts with can think critically about a complex topic, and are interested in learning or improving themselves. I wish I lived where the author lives, because I have had multiple jobs across multiple countries and never encountered an average population like the author describes.

ritz_labringue · 2 months ago

I don't think I agree with the premise. Sure there are lots of car accidents in absolute terms, but given how many people drive and how error-prone driving inherently is, most people are actually pretty decent drivers

ritz_labringue commented on Generative AI coding tools and agents do not work for me blog.miguelgrinberg.com/p... · Posted by u/nomdep

ritz_labringue · 3 months ago

AI is really useful when you already know what code needs to be written. If you can explain it properly, the AI will write it faster than you can and you'll save time because it is quick to check that this is actually the code you wanted to write. So "programming with AI" means programming in your mind and then using the AI to materialize it in the codebase.

ritz_labringue commented on Meta's Llama 3.1 can recall 42 percent of the first Harry Potter book understandingai.org/p/met... · Posted by u/aspenmayer

TGower · 3 months ago

People aren't buying Harry Potter action figures as a subtitute for buying the book either, but copyright protects creators from other people swooping in and using their work in other mediums. There is obviously a huge market demand for high quality data for training LLMs, Meta just spent 15 billion on a data labeling company. Companies training LLMs on copyrighted material without permission are doing that as a substitue for obtaining a license from the creator for doing so in the same way that a pirate downloading a torrent is a substitue for getting an ebook license.

ritz_labringue · 3 months ago

Harry Potter action figures trade almost entirely on J. K. Rowling’s expressive choices. Every unlicensed toy competes head‑to‑head with the licensed one and slices off a share of a finite pot of fandom spending. Copyright law treats that as classic market substitution and rightfully lets the author police it.

Dropping the novels into a machine‑learning corpus is a fundamentally different act. The text is not being resold, and the resulting model is not advertised as “official Harry Potter.” The books are just statistical nutrition. One ingredient among millions. Much like a human writer who reads widely before producing new work. No consumer is choosing between “Rowling’s novel” and “the tokens her novel contributed to an LLM,” so there’s no comparable displacement of demand.

In economic terms, the merch market is rivalrous and zero‑sum; the training market is non‑rivalrous and produces no direct substitute good. That asymmetry is why copyright doctrine (and fair‑use case law) treats toy knock‑offs and corpus building very differently.

ritz_labringue commented on I think I'm done thinking about GenAI for now blog.glyph.im/2025/06/i-t... · Posted by u/kaycebasques

CuriouslyC · 3 months ago

"Despite this plethora of negative experiences, executives are aggressively mandating the use of AI6. It looks like without such mandates, most people will not bother to use such tools, so the executives will need muscular policies to enforce its use.

Being forced to sit and argue with a robot while it struggles and fails to produce a working output, while you have to rewrite the code at the end anyway, is incredibly demoralizing. This is the kind of activity that activates every single major cause of burnout at once.

But, at least in that scenario, the thing ultimately doesn’t work, so there’s a hope that after a very stressful six month pilot program, you can go to management with a pile of meticulously collected evidence, and shut the whole thing down."

The counterpoint to this is that _SOME_ people are able to achieve force multiplication (even at the highest levels of skill, it's not just a juniors-only phenomenon), and _THAT_ is what is driving management adoption mandates. They see that 2-4x increases in productivity are possible under the correct circumstances, and they're basically passing down mandates for the rank and file to get with the program and figure out how to reproduce those results, or find another job.

ritz_labringue · 3 months ago

I very much agree, and I think people who are in denial about the usefulness of these tools are in for a bad time.

I've seen this firsthand multiple times: people who really don't want it to work will (unconsciously or not) sabotage themselves by writing vague prompts or withholding context/tips they'd naturally give a human colleague.

Then when the LLM inevitably fails, they get their "gotcha!" moment.

ritz_labringue commented on Building AI agents to query your databases blog.dust.tt/spreadsheets... · Posted by u/vortex_ape

bob1029 · 5 months ago

> This abstraction shields users from the complexity of the underlying systems and allows us to add new data sources without changing the user experience.

Cursed mission. These sorts of things do work amazingly well for toy problem domains. But, once you get into more complex business involving 4-way+ joins, things go sideways fast.

I think it might be possible to have a human in the loop during the SQL authoring phase, but there's no way you can do it clean without outside interaction in all cases.

95% correct might sound amazing at first, but it might as well be 0% in practice. You need to be perfectly correct when working with data in bulk with SQL operations.

ritz_labringue · 5 months ago

It does require writing good instructions for the LLM to properly use the tables, and it works best if you carefully pick the tables that your agent is allowed to use beforehand. We have many users that use it for every day work with real data (definitely not toy problems).

ritz_labringue commented on Grok3 Launch [video] x.com/xai/status/18916997... · Posted by u/travelhead

yodsanklai · 6 months ago

Naive question from a bystander , but since DeepSeek is open source and is on par with o1-pro (is it?), shouldn't we expect that anybody with the computer power is capable to compete with o1-pro?

ritz_labringue · 6 months ago

It's not on par with o1, let alone o1-pro

ritz_labringue commented on Setting up PostgreSQL for running integration tests gajus.com/blog/setting-up... · Posted by u/mooreds

ritz_labringue · a year ago

Why can't you just create a pool of databases and truncate all tables after each run ? That should be pretty fast

ritz_labringue commented on Sam Altman, Greg Brockman and others to join Microsoft twitter.com/satyanadella/... · Posted by u/JimDabell

dartos · 2 years ago

They’re still different careers, not “levels” or whatever.

A phd scientist may not be a good fit for an engineering job. Their degree doesn’t matter.

An phd-having engineer might not be a good fit for a research job either… because it’s a different job.

ritz_labringue · 2 years ago

researchers are paid 2x what engineers are paid at OAI, even if it's not the same job there's still one that is "higher level" than the other.