nbardy (u/nbardy) - Readit News

nbardy commented on Ask HN: How can ChatGPT serve 700M users when I can't run one GPT-4 locally? · Posted by u/superasn

nbardy · 20 days ago

The serving infrastructure becomes very efficient when serving requests in parallel.

Look at VLLM. It's the top open source version of this.

But the idea is you can service 5000 or so people in parallel.

You get about 1.5-2x slowdown on per token speed per user, but you get 2000x-3000x throughput on the server.

The main insight is that memory bandwidth is the main bottleneck so if you batch requests and use a clever KV cache along with the batching you can drastically increase parallel throughput.

nbardy commented on Problems the AI industry is not addressing adequately thealgorithmicbridge.com/... · Posted by u/baylearn

fragmede · 2 months ago

It depends on how far behind you believe the model-available LLMs are. If I can buy, say, $10k worth of hardware and run a sufficiently equivalent LLM at home for the cost of that plus electricity, and amortize that over say 5 years to get $2k/yr plus electricity, and say you use it 40 hours a week for 50 weeks, for 2000 hours, gets you $1/hr plus electricity. That electrical cost will vary depending on location, but let's just handwave $1/hr (which should be high). So $2/hr vs ChatGPT's $0.11/hr if you pay $20/month and use it 174 hours per month.

Feel free to challenge these numbers, but it's a starting place. What's not accounted for is the cost of training (compute time, but also employee and everything else), which needs to be amortized over the length of time a model is used, so ChatGPT's costs rise significantly, but they do have the advantage that hardware is shared across multiple users.

nbardy · 2 months ago

These estimates are way off. The concurrent requests are near free with the right serving infrastructure. The throughput per token per dollar is 1/100-1/1000 the price for a full saturated node.

nbardy commented on H1-B visas hurt one type of worker and exploit another sanders.senate.gov/op-eds... · Posted by u/1vuio0pswjnm7

cadamsdotcom · 2 months ago

First let me say that I support the underlying intention of this piece of writing. However and this is crucial - it is based on a factually wrong premise which invalidates the entire hit piece. Unfortunately that also makes the surrounding discussion and angst misdirected.

I am urging you to go into the rest of this comment with an open mind.

H-1B visas require a “prevailing wage” which the foreign worker must be paid at or above in order to get a visa. Prevailing wages are made based on the area the worker will be in and their title, and are published on the DOL website: https://www.dol.gov/agencies/eta/foreign-labor/wages

Before being allowed to hire a foreign worker a company must advertise the position in the US. This could be improved as right now you can put the ad in the most obscure newspaper you can find or on TV at 3am. But you do have to do something.

The decision to grant or deny a visa is made by a consular officer - a trained human - who carefully considers all the foreign worker’s case materials (called a “petition”).

All this is to say, they thought of this “bringing in foreign workers depresses wages” problem way back when they created this program and invented prevailing wages to mitigate it. They thought of “bringing in foreign workers denies locals opportunities to apply for those roles” way back when they created the program and invented job ad requirements to mitigate it.

Don’t get me wrong the H-1B is definitely being exploited. Bodyshops are a thing. But it’s not at huge scale. It is a problem to be fixed with better enforcement and targeted reforms, not pitchforks.

Mr. Sanders’ argument needs to be backed up differently. As it stands he is misleading the public. It’s surprisingly out of character.

nbardy · 2 months ago

The local employees would be able to negotiate higher wages given the absent

It provides supply in the market and lowers their bargaining power, so the parity in wages is parity in a lower baseline

Also in practice there are salary ranges and the H1B's usually take the lower of the range AND work longer hours at the fear of visa loss.

nbardy commented on SymbolicAI: A neuro-symbolic perspective on LLMs github.com/ExtensityAI/sy... · Posted by u/futurisold

nbardy · 2 months ago

I love the symbol LLM first approaches.

I built a version of this a few years ago as a LISP

https://github.com/nbardy/SynesthesiaLisp

nbardy commented on Magistral — the first reasoning model by Mistral AI mistral.ai/news/magistral... · Posted by u/meetpateltech

megalomanu · 3 months ago

We just tested magistral-medium as a replacement for o4-mini in a user-facing feature that relies on JSON generation, where speed is critical. Depending on the complexity of the JSON, o4-mini runs ranged from 50 to 70 seconds. In our initial tests, Mistral returned results in 34–37 seconds. The output quality was slightly lower but still remain acceptable for us. We’ll continue testing, but the early results are promising. I'm glad to see Mistral prioritizing speed over raw power, there’s definitely a need for that.

nbardy · 3 months ago

I bet you can close the gap with a finetune.

Should be quiet easy if you have some o4-mini results sitting around.

nbardy commented on Show HN: Semantic Calculator (king-man+woman=?) calc.datova.ai... · Posted by u/nxa

n2d4 · 3 months ago

For fun, I pasted these into ChatGPT o4-mini-high and asked it for an opinion:

   data + plural    = datasets
   data - plural    = datum
   king - crown     = ruler
   king - princess  = man
   king - queen     = prince
   queen - king     = woman
   king + queen     = royalty
   boy + age        = man
   man - age        = boy
   woman - age      = girl
   woman + age      = elderly woman
   girl + age       = woman
   girl + old       = grandmother

The results are surprisingly good, I don't think I could've done better as a human. But keep in mind that this doesn't do embedding math like OP! Although it does show how generic LLMs can solve some tasks better than traditional NLP.

The prompt I used:

> Remember those "semantic calculators" with AI embeddings? Like "king - man + woman = queen"? Pretend you're a semantic calculator, and give me the results for the following:

nbardy · 3 months ago

I hate to be pedantic, but the llm is definitely doing embedding math. In fact that’s all it does.

nbardy commented on I've been using Claude Code for a couple of days twitter.com/Steve_Yegge/s... · Posted by u/tosh

abxyz · 6 months ago

I think it's probably the difference between "code" and "programming". An LLM can produce code and if you're willing to surrender to the LLMs version of whatever it is you ask for, then you can have a great and productive time. If you're opinionated about programming, LLMs fall short. Most people (software engineers, developers, whatever) are not "programmers" they're "coders" which is why they have a positive impression of LLMs: they produce code, LLMs produce code... so LLMs can do a lot of their work for them.

Coders used to be more productive by using libraries (e.g: don't write your own function for finding the intersection of arrays, use intersection from Lodash) whereas now libraries have been replaced by LLMs. Programmers laughed at the absurdity of left-pad[1] ("why use a dependency for 16 lines of code?") whereas coders thought left-pad was great ("why write 16 lines of code myself?").

If you think about code as a means to an end, and focus on the end, you'll get much closer to the magical experience you see spoken about on Twitter, because their acceptance criteria is "good enough" not "right". Of course, if you're a programmer who cares about the artistry of programming, that feels like a betrayal.

[1] https://en.wikipedia.org/wiki/Npm_left-pad_incident

nbardy · 6 months ago

This is untrue.

You can be over specified in your prompts and say exactly what types and algorithms you want if you’re opinionated.

I often write giant page long specs to get exactly the code I want.

It’s only 2x as fast as coding, but thinking in English is way better than coding.

nbardy commented on Anthropic raising funding valuing it at $60B wsj.com/tech/ai/ai-startu... · Posted by u/jmsflknr

awongh · 8 months ago

Crazy that Anthropic has a fraction of the AI model market that OpenAI has, but a valuation that's larger than their proportional share of the market to OpenAI....

The last valuation was at $157 billion- Anthropic is valued at 1/3 of OpenAI but has 1/10th of the market share....

nbardy · 8 months ago

The market is betting the best model wins to some extent and anthropic is not slowing down.

nbardy commented on That's not an abstraction, that's a layer of indirection fhur.me/posts/2024/thats-... · Posted by u/fagnerbrack

noduerme · 8 months ago

I got a piece of advice writing UI code a long time ago: Don't marry your display code to your business logic.

I'd like to say this has served me well. It's the reason I never went for JSX or other frameworks that put logical code into templates or things like that. That is one abstraction I found unhelpful.

However, I've come around to not taking that advice as literally as I used to. Looking back over 25 years of code, I can see a lot of times I tried to abstract away display code in ways that made it exceedingly difficult to detect why it only failed on certain pieces of data. Sometimes this was sheep shaving tightly bound code into generic routines, and sometimes it was planned that way. This is another type of abstraction that adds cognitive load: One where instead of writing wrappers for a specific use case, you try to generalize everything you write to account for all possible use cases in advance.

There's some sort of balance that has to be struck between these two poles. The older I get, though, the more I suspect that whatever balance I strike today I'll find unsatisfactory if I have to revisit the code in ten years.

nbardy · 8 months ago

I think your principle is good, but this is going too far to throw out react.

You really just want to split those things out of the UI component and keep minimal code in the component file. But having it be code free is too much for real client side apps.

Modern apps have very interactive UI that needs code. Things like animation management(stop/start/cancel) etc… have subtle interactions. Canvas zoom and pan, canvas render layouts that relate to UI, etc… lots of timing and control code that is app state dependent and transient and belongs in components.

I apply the simple principle and move any app state management and business logic out of the component UI code. But still lean into logical code in UI.

nbardy commented on Meta Is Probably Training AI on Images Taken by Meta Ray-Bans macrumors.com/2024/10/01/... · Posted by u/reaperducer

simonw · a year ago

> TechCrunch doesn't come out and say it, but if the answer is not a clear and definitive "no," it's likely that Meta does indeed plan to use images captured by the Meta Glasses to train Meta AI. If that wasn't the case, it doesn't seem like there would be a reason for Meta to be ambiguous about answering, especially with all of the public commentary on the methods and data that companies use for training.

I have a slightly different interpretation of this: I think Meta want to keep their options open.

I think that’s true of many of these “will they train on your data?” stories.

People tend to over-estimate the value of their data for training. AI labs are constantly looking for new sources of high quality data - but quality really matters to them. Random junk people feed into the models is right at the bottom of that quality list.

But what happens if Meta say “we will never train on this data”… and then next week a researcher comes up with some new training technique that makes that data 10x more valuable than when they made that decision not to use it?

Safer for them to not make concrete promises that they can’t back out of later if it turns out the data was more valuable than they initially expected.

nbardy · a year ago

Ray bans won’t be random junk. It will be incredible end task data seeing how humans perform tasks in arbitrary homes.