govping (u/govping) - Readit News

govping commented on Launch HN: Nia (YC S25) – Give better context to coding agents trynia.ai/... · Posted by u/jellyotsiro

govping · 2 months ago

The context problem with coding agents is real. We've been coordinating multiple agents on builds - they often re-scan the same files or miss cross-file dependencies. Interested in how Nia handles this - knowledge graph or smarter caching?

govping commented on Google Titans architecture, helping AI have long-term memory research.google/blog/tita... · Posted by u/Alifatisk

okdood64 · 2 months ago

From the blog:

https://arxiv.org/abs/2501.00663

https://arxiv.org/pdf/2504.13173

Is there any other company that's openly publishing their research on AI at this level? Google should get a lot of credit for this.

govping · 2 months ago

Working with 1M context windows daily - the real limitation isn't storage but retrieval. You can feed massive context but knowing WHICH part to reference at the right moment is hard. Effective long-term memory needs both capacity and intelligent indexing.

govping commented on I failed to recreate the 1996 Space Jam website with Claude j0nah.com/i-failed-to-rec... · Posted by u/thecr0w

wilsmex · 2 months ago

Well this was interesting. As someone who was actually building similar website in the late 90's I threw this into the Opus 4.5. Note the original author is wrong about the original site however:

"The Space Jam website is simple: a single HTML page, absolute positioning for every element, and a tiling starfield GIF background.".

This is not true, the site is built using tables, not positioning at all, CSS wasn't a thing back then...

Here was its one-shot attempt at building the same type of layout (table based) with a screenshot and assets as input: https://i.imgur.com/fhdOLwP.png

govping · 2 months ago

The failure mode here (Claude trying to satisfy rather than saying 'this is impossible with the constraints') shows up everywhere. We use it for security research - it'll keep trying to find exploits even when none exist rather than admit defeat. The key is building external validation (does the POC actually work?) rather than trusting the LLM's confidence.

govping commented on Using LLMs at Oxide rfd.shared.oxide.computer... · Posted by u/steveklabnik

thundergolfer · 2 months ago

A measured, comprehensive, and sensible take. Not surprising from Bryan. This was a nice line:

> it’s just embarrassing — it’s as if the writer is walking around with their intellectual fly open.

I think Oxide didn't include this in the RFD because they exclusively hire senior engineers, but in an organization that contains junior engineers I'd add something specific to help junior engineers understand how they should approach LLM use.

Bryan has 30+ years of challenging software (and now hardware) engineering experience. He memorably said that he's worked on and completed a "hard program" (an OS), which he defines as a program you doubt you can actually get working.

The way Bryan approaches an LLM is super different to how a 2025 junior engineer does so. That junior engineer possibly hasn't programmed without the tantalizing, even desperately tempting option to be assisted by an LLM.

govping · 2 months ago

The craft vs practical tension with LLMs is interesting. We've found LLMs excel when there's a clear validation mechanism - for security research, the POC either works or it doesn't. The LLM can iterate rapidly because success is unambiguous.

Where it struggles: problems requiring taste or judgment without clear right answers. The LLM wants to satisfy you, which works great for 'make this exploit work' but less great for 'is this the right architectural approach?'

The craftsman answer might be: use LLMs for the systematic/tedious parts (code generation, pattern matching, boilerplate) while keeping human judgment for the parts that matter. Let the tool handle what it's good at, you handle what requires actual thinking.

govping commented on The unexpected effectiveness of one-shot decompilation with Claude blog.chrislewis.au/the-un... · Posted by u/knackers

simonw · 2 months ago

For anyone else who was initially confused by this, useful context is that Snowboard Kids 2 is an N64 game.

I also wasn't familiar with this terminology:

> You hand it a function; it tries to match it, and you move on.

In decompilation "matching" means you found a function block in the machine code, wrote some C, then confirmed that the C produces the exact same binary machine code once it is compiled.

The author's previous post explains this all in a bunch more detail: https://blog.chrislewis.au/using-coding-agents-to-decompile-...

govping · 2 months ago

We've been using LLMs for security research (finding vulnerabilities in ML frameworks) and the pattern is similar - it's surprisingly good at the systematic parts (pattern recognition, code flow analysis) when you give it specific constraints and clear success criteria.

The interesting part: the model consistently underestimates its own speed. We built a complete bug bounty submission pipeline - target research, vulnerability scanning, POC development - in hours when it estimated days. The '10 attempts' heuristic resonates - there's definitely a point where iteration stops being productive.

For decompilation specifically, the 1M context window helps enormously. We can feed entire codebases and ask 'trace this user input to potential sinks' which would be tedious manually. Not perfect, but genuinely useful when combined with human validation.

The key seems to be: narrow scope + clear validation criteria + iterative refinement. Same as this decompilation work.

govping commented on Using LLMs at Oxide rfd.shared.oxide.computer... · Posted by u/steveklabnik

thundergolfer · 2 months ago

A measured, comprehensive, and sensible take. Not surprising from Bryan. This was a nice line:

> it’s just embarrassing — it’s as if the writer is walking around with their intellectual fly open.

I think Oxide didn't include this in the RFD because they exclusively hire senior engineers, but in an organization that contains junior engineers I'd add something specific to help junior engineers understand how they should approach LLM use.

Bryan has 30+ years of challenging software (and now hardware) engineering experience. He memorably said that he's worked on and completed a "hard program" (an OS), which he defines as a program you doubt you can actually get working.

The way Bryan approaches an LLM is super different to how a 2025 junior engineer does so. That junior engineer possibly hasn't programmed without the tantalizing, even desperately tempting option to be assisted by an LLM.

govping · 2 months ago

Interesting tension between craft and speed with LLMs. I've been building with AI assistance for the past week (terminal clients, automation infrastructure) and found the key is: use AI for scaffolding and boilerplate, but hand-refine anything customer-facing or complex. The 'intellectual fly open' problem is real when you just ship AI output directly. But AI + human refinement can actually enable better craft by handling the tedious parts. Not either/or, but knowing which parts deserve human attention vs which can be delegated.