Readit News logoReadit News
sanxiyn commented on Running GPT-OSS-120B at 500 tokens per second on Nvidia GPUs   baseten.co/blog/sota-perf... · Posted by u/philipkiely
isoprophlex · 20 days ago
but... do you get any validation during the forward pass? the small model could just as well have generated "is Berlin." or whatever. do these models somehow give you a likelihood for the next token when you're prefilling, that you can compare against? if so why not just... use that always?

or is this a scenario where computation is expensive but validation is cheap?

EDIT: thanks, people, for educating me! very insightful :)

sanxiyn · 20 days ago
Yes, models give likelihoods you can compare against. No, you can't do that without drafting, because likelihood of token N+2 depends on token N+1. That is, you get P(is, The capital of France) and P(Berlin, The capital of France is), but for the later you need to give "is" as input, you can't do P(Berlin, The Capital of France _).
sanxiyn commented on Launch HN: Gecko Security (YC F24) – AI That Finds Vulnerabilities in Code    · Posted by u/jjjutla
dnsbty · a month ago
This is one area I expect LLMs to really shine. I've tried a few static analysis tools for security, but it feels like the cookie cutter checks aren't that effective for catching anything but the most basic vulnerabilities. Having context on the actual purpose of the code seems like a great way to provide better scans without needing to a researcher for a deeper pentest.

I just started a scan on an open source project I was looking at, but I would love to see you add Elixir to the list of supported languages so that I can use this for my team's codebase!

sanxiyn · a month ago
Terence Tao wrote on "blue team" vs "red team" in cybersecurity and how "unreliable" AI is more suited to red team side. I found it very insightful.

https://news.ycombinator.com/item?id=44711306

sanxiyn commented on Launch HN: Gecko Security (YC F24) – AI That Finds Vulnerabilities in Code    · Posted by u/jjjutla
jjjutla · a month ago
We’ve limited the free tier to one scan per user, so deleting a scan and starting a new one won’t work because of that restriction.

And yes, we don’t support C or C++ yet. Our focus is on detecting business logic vulnerabilities (auth bypasses, privilege escalations, IDORs) that traditional SAST tools often miss. The types of exploitable security issues typically found in C/C++ (mainly memory corruption type issues) are better found through fuzzing and dynamic testing rather than static analysis.

sanxiyn · a month ago
I understand it is not your focus, but fuzzing still falls short and there is a lot AI can help. For example, when there is checksum, fuzzers typically can't progress and it is "solved" by disabling checks when building for fuzzing. AI can just look at the source code doing the checksum and write the code to fill them in, or use its world knowledge to recognize the function is named sha_256 and import Python hashlib, etc.

Hint: we are working on this, and it can easily expand coverage in oss-fuzz even if those targets have been fuzzed for a long time with enormous amount of compute.

sanxiyn commented on Cloudflare 1.1.1.1 Incident on July 14, 2025   blog.cloudflare.com/cloud... · Posted by u/nomaxx117
bagels · a month ago
Why? Some were injecting ads, blocking services, degrading video and other wrongdoings.
sanxiyn · a month ago
Maybe their ISPs don't do that. There are many ISPs on the Earth.
sanxiyn commented on Ubuntu 25.10 Raises RISC-V Profile Requirements   omgubuntu.co.uk/2025/06/u... · Posted by u/bundie
saurik · 2 months ago
The original two generations of iPhone were armv6 with hardware floating point, so that always felt to me like the sane baseline. Android wasn't using hardware floating point on armv6, but I think that was only because the compilers they had sucked (an issue that didn't apply to Apple), and many/most of the devices in fact shipped with the same hardware. I dunno... like, I don't know exactly what went into Debian's decision there, but I could see it having been made for the wrong reasons: looking at what software had been deployed rather than what hardware was common?
sanxiyn · 2 months ago
You can look at Debian's reasoning here: https://wiki.debian.org/ArmHardFloatPort. As I understand, the decision was mostly based on hardware.
sanxiyn commented on Ubuntu 25.10 Raises RISC-V Profile Requirements   omgubuntu.co.uk/2025/06/u... · Posted by u/bundie
snvzz · 2 months ago
>Does this line up with what riscv android will also require?

AIUI both Google and Microsoft selected RVA23 as baseline.

sanxiyn · 2 months ago
Google quote from https://riscv.org/riscv-news/2024/10/risc-v-announces-ratifi...

> "Google is delighted to see the ratification of the RVA23 Profile," said Lars Bergstrom, Director of Engineering, Google. "This profile has been the result of a broad industry collaboration, and is now the baseline requirement for the Android RISC-V Application Binary Interface (ABI)."

sanxiyn commented on Project Vend: Can Claude run a small shop? (And why does that matter?)   anthropic.com/research/pr... · Posted by u/gk1
andy99 · 2 months ago
This sounds like they have an LLM running with a context window that just gets longer and longer and contains all the past interactions of the store.

The normal way you'd build something like this is to have a way to store the state and have an LLM in the loop that makes a decision on what to do next based on the state. (With a fresh call to an LLM each time and no accumulating context)

If I understand correctly this is an experiment to see what happens in the long context approach, which is interesting but not super practical as it's knows that LLMs will have a harder time at this. Point being, I wouldn't extrapolate this to how a commercial system built properly to do something similar would perform.

sanxiyn · 2 months ago
In my experience long context approach flatly doesn't work, so I don't think this is it. The post does mention "tools for keeping notes and preserving important information to be checked later".
sanxiyn commented on JWST reveals its first direct image discovery of an exoplanet   smithsonianmag.com/smart-... · Posted by u/divbzero
cryptoz · 2 months ago
That will be done with a solar gravitational lens - there's a recent-ish NASA paper about it. Basically you send your probe to > 550 AU in the opposite direction of your target exoplanet, point it at the Sun and you will get a warped high-res photo of the planet around the Sun. You can then algorithmically decode it into a regular photo.

I think the transit time is likely decades and the build time is also a long time as well. But in maybe 40-100 years we could have plentiful HD images of 'nearby' exoplanets. If I'm still around when it happens I will be beyond hyped.

sanxiyn · 2 months ago
FYI: Direct Multipixel Imaging and Spectroscopy of an Exoplanet with a Solar Gravity Lens Mission. https://arxiv.org/abs/2002.11871
sanxiyn commented on MCP Specification – version 2025-06-18 changes   modelcontextprotocol.io/s... · Posted by u/owebmaster
fooster · 2 months ago
Have you actually used this stuff at scale? The replies are often not valid.
sanxiyn · 2 months ago
Yes I have.
sanxiyn commented on MCP Specification – version 2025-06-18 changes   modelcontextprotocol.io/s... · Posted by u/owebmaster
surfingdino · 2 months ago
> structured tool output

Yeah, let's pretend it works. So far structured output from an LLM is an exercise in programmers' ability to code defensively against responses that may or may not be valid JSON, may not conform to the schema, or may just be null. There's a new cottage industry of modules that automate dealing with this crap.

sanxiyn · 2 months ago
No? With structured outputs you get valid JSON 100% of the time. This is a non-problem now. (If you understand how it works, it really can't be otherwise.)

https://openai.com/index/introducing-structured-outputs-in-t...

https://platform.openai.com/docs/guides/structured-outputs

u/sanxiyn

KarmaCake day14976July 12, 2009
About
Seo Sanghyeon <sanxiyn@gmail.com>.
View Original