Readit News logoReadit News
wcallahan commented on Nvidia Nemotron 3 Family of Models   research.nvidia.com/labs/... · Posted by u/ewt-nv
selfhoster11 · 6 days ago
You may want to use the new "derestricted" variants of gpt-oss. While the ostensible goal of these variants is to de-censor them, it ends up removing the models' obsession with policy and wasting thinking tokens that could be used towards actually reasoning through a problem.
wcallahan · 4 days ago
Great advice. Have you observed any other differences? I’ve been wondering if there are any specialized variants yet of GPT-OSS models yet that outperform on specific tasks (similar to the countless Llama 3 variants we’ve seen).
wcallahan commented on Nvidia Nemotron 3 Family of Models   research.nvidia.com/labs/... · Posted by u/ewt-nv
woodson · 6 days ago
Not OP, but perhaps they mean not putting too much faith in common benchmarks (thanks to benchmaxxing).
wcallahan · 4 days ago
Yes to both comments. I said that to:

1. disclose my method was not quantifiably measurable as the not model, because that is not important to me, speed of action/development outcomes is more important to me, and because

2. I’ve observed a large gap between benchmark toppers and my own results

But make no mistake, I like have the terminals scrolling live across multiple monitors so I can glance at them periodically and watch their response quality, so I care and notice which give better/worse results.

My biggest goal right now after accuracy is achieving more natural human-like English for technical writing.

wcallahan commented on Nvidia Nemotron 3 Family of Models   research.nvidia.com/labs/... · Posted by u/ewt-nv
btown · 7 days ago
Would you mind sharing what hardware/card(s) you're using? And is https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B... one of the ones you've tested?
wcallahan · 4 days ago
Yes, I run it locally on 3 different AMD Strix Halo machines (Framework Desktop and 2 GMKTec machines, 128gb x 2, 96gb x 1) and a Mac Studio M2 Ultra 128gb of unified memory.

I’ve used several runtimes, including vLLM. Works great! Speedy. Best results with Ubuntu after trying a few different distributions and Vulkan and ROCm drivers.

wcallahan commented on Nvidia Nemotron 3 Family of Models   research.nvidia.com/labs/... · Posted by u/ewt-nv
wcallahan · 8 days ago
I don’t do ‘evals’, but I do process billions of tokens every month, and I’ve found these small Nvidia models to be the best by far for their size currently.

As someone else mentioned, the GPT-OSS models are also quite good (though I haven’t found how to make them great yet, though I think they might age well like the Llama 3 models did and get better with time!).

But for a defined task, I’ve found task compliance, understanding, and tool call success rates to be some of the highest on these Nvidia models.

For example, I have a continuous job that evaluates if the data for a startup company on aVenture.vc could have overlapping/conflated two similar but unrelated companies for news articles, research details, investment rounds, etc… which is a token hungry ETL task! And I recently retested this workflow on the top 15 or so models today with <125b parameters, and the Nvidia models were among the best performing for this type of work, particularly around non-hallucination if given adequate grounding.

Also, re: cost - I run local inference on several machines that run continuously, in addition to routing through OpenRouter and the frontier providers, and was pleasantly surprised to find that if I’m a paying customer of OpenRouter otherwise, the free variant there from Nvidia is quite generous for limits, too.

wcallahan commented on Show HN: Real-time system that tracks how news spreads across 200k websites   yandori.io/news-flow/... · Posted by u/antiochIst
65 · 23 days ago
I think you will need to filter out wire services like AP and Reuters, as I'm seeing stories that are mostly republished wire stories on random websites.
wcallahan · 18 days ago
Instead of filtering them out, I’d imagine you’d want to establish their equivalency instead? Then they can be made available as equal/similar alternatives to the same article (i.e., from your outlet of choice).
wcallahan commented on Show HN: Real-time system that tracks how news spreads across 200k websites   yandori.io/news-flow/... · Posted by u/antiochIst
masterphai · a month ago
Interesting project - it’s rare to see news-flow tracking done in real time at this scale. One thing you may want to stress-test is how stable the clustering remains when stories evolve semantically over a few hours. Embeddings tend to drift as outlets rewrite or localize a piece, and HNSW can sometimes over-merge when the centroid shifts.

A trick that helped in a similar system I built was doing a second-pass “temporal coherence” check: if two articles are close in embedding space but far apart in publish time or share no common entities, keep them in adjacent clusters rather than forcing a merge. It reduced false positives significantly.

Also curious how you handle deduping syndicated content - AP/Reuters can dominate the embedding space unless you weight publisher identity or canonical URLs.

Overall, really nice work. The propagation timeline is especially useful.

wcallahan · 18 days ago
Bad bot.

‘masterphai’ is evidence of how effective a good LLM and better prompt can be now at evading detection of AI authorship… but there’s no way this authors comments are written by a sane human.

From the comment history it appears it has tricked quite a few humans to-date. Interesting!

wcallahan commented on You can see a working Quantum Computer in IBM's London office   ianvisits.co.uk/articles/... · Posted by u/thinkingemote
wcallahan · a month ago
I suspect I’m not alone in pausing around the statement:

> "It’s not likely to be something you’ll ever have at home"

I’m curious… what would need to be true to make this statement wrong?

wcallahan commented on DBCrust – A modern database CLI   github.com/clement-tourri... · Posted by u/kelem
wcallahan · 4 months ago
It would be great to have Convex Database support
wcallahan commented on Running GPT-OSS-120B at 500 tokens per second on Nvidia GPUs   baseten.co/blog/sota-perf... · Posted by u/philipkiely
wcallahan · 5 months ago
I just used GPT-OSS-120B on a cross Atlantic flight on my MacBook Pro (M4, 128GB RAM).

A few things I noticed: - it’s only fast with with small context windows and small total token context; once more than ~10k tokens you’re basically queueing everything for a long time - MCPs/web search/url fetch have already become a very important part of interacting with LLMs; when they’re not available the LLM utility is greatly diminished - a lot of CLI/TUI coding tools (e.g., opencode) were not working reliably offline at this time with the model, despite being setup prior to being offline

That’s in addition to the other quirks others have noted with the OSS models.

wcallahan commented on Gemini 2.5 Flash   developers.googleblog.com... · Posted by u/meetpateltech
vladmdgolam · 8 months ago
There are at least 10 projects currently aiming to recreate Claude Code, but for Gemini. For example, geminicodes.co by NotebookLM’s founding PM Raiza Martin
wcallahan · 8 months ago
Tried Gemini Codes yesterday, as well as anon-kode and anon-codex. Gemini Codes is already broken and appears to be rather brittle (she disclosures as much), and the other two appear to still need some prompt improvements or someone adding vector embedding for them to be useful?

Perhaps someone can merge the best of Aider and codex/claude code now. Looking forward to it.

u/wcallahan

KarmaCake day44April 8, 2022
About
Founder and CEO of aVenture (aventure.vc), a venture capital research platform. San Francisco-based - let's grab a coffee if you're local!

Personal website: williamcallahan.com

View Original