Readit News logoReadit News
akrauss commented on The port I couldn't ship   ammil.industries/the-port... · Posted by u/cjlm
akrauss · 4 days ago
It is really important that such posts exist. There is the risk that we only hear about the wild successes and never the failures. But from the failures we learn much more.

One difference between this story and the various success stories is that the latter all had comprehensive test suites as part of the source material that agents could use to gain feedback without human intervention. This doesn’t seem to exist in this case, which may simply be the deal breaker.

akrauss commented on AI will make formal verification go mainstream   martin.kleppmann.com/2025... · Posted by u/evakhoury
planckscnst · 12 days ago
LLMs are very good at looking at a change set and finding untested paths. As a standard part of my workflow, I always pass the LLM's work through a "reviewer", which is a fresh LLM session with instructions to review the uncommitted changes. I include instructions for reviewing test coverage.

I've also found that LLMs typically just partially implement a given task/story/spec/whatever. The reviewer stage will also notice a mismatch between the spec and the implementation.

I have an orchestrator bounce the flow back and forth between developing and reviewing until the review comes back clean, and only then do I bother to review its work. It saves so much time and frustration.

akrauss · 12 days ago
What tooling are you using for the orchestration?
akrauss commented on AI will make formal verification go mainstream   martin.kleppmann.com/2025... · Posted by u/evakhoury
thomasfromcdnjs · 12 days ago
shameless plug: I'm working on an open source project https://blocksai.dev/ to attempt to solve this. (and just added a note for me to add formal verification)

Elevator pitch: "Blocks is a semantic linter for human-AI collaboration. Define your domain in YAML, let anyone (humans or AI) write code freely, then validate for drift. Update the code or update the spec, up to human or agent."

(you can add traditional linters to the process if you want but not necessary)

The gist being you define a bunch of validators for a collection of modules you're building (with agentic coding) with a focus on qualifying semantic things;

- domain / business rules/measures

- branding

- data flow invariants — "user data never touches analytics without anonymization"

- accessibility

- anything you can think of

Then you just tell your agentic coder to use the cli tool before committing, so it keeps the code in line with your engineering/business/philosophical values.

(boring) example of it detecting if blog posts have humour in them, running in Claude Code -> https://imgur.com/diKDZ8W

akrauss · 12 days ago
Quick feedback: both the „learn more“ link at the very top and the „Explore all examples“ link lead to 404
akrauss commented on Gemini CLI   blog.google/technology/de... · Posted by u/sync
indigodaddy · 6 months ago
Would Gemini non-interactive mode be a stop gap if they don't have sub-agent equivalent yet?

https://github.com/google-gemini/gemini-cli/blob/main/docs/c...

akrauss · 6 months ago
Possibly. One could think about hooking this in as a tool or simple shell command. But then there is no management when multiple tools modify the codebase simultaneously.

But it is still worth a try and may be possible with some prompting and duct tape.

akrauss commented on Gemini CLI   blog.google/technology/de... · Posted by u/sync
cperry · 6 months ago
Hi - I work on this. Uptake is a steep curve right now, spare a thought for the TPUs today.

Appreciate all the takes so far, the team is reading this thread for feedback. Feel free to pile on with bugs or feature requests we'll all be reading.

akrauss · 6 months ago
There is one feature in Claude Code which is often overlooked and I haven't seen it in any of the other agentic tools: There is a tool called "sub-agent", which creates a fresh context windows in which the model can independently work on a clearly defined sub-task. This effectively turns Claude Code from a single-agent model to a hierarchical multi-agent model (I am not sure if the hierarchy goes to depths >2).

I wonder if it is a concious decision not to include this (I imagine it opens a lot of possibilities of going crazy, but it also seems to be the source of a great amount of Claud Code's power). I would very much like to play with this if it appears in gemini-cli

Next step would be the possibility to define custom prompts, toolsets and contexts for specific re-occuring tasks, and these appearing as tools to the main agent. Example for such a thing: create_new_page. The prompt could describe the steps one needs to create the page. Then the main agent could simply delegate this as a well-defined task, without cluttering its own context with the operational details.

akrauss commented on Gemini CLI   blog.google/technology/de... · Posted by u/sync
cperry · 6 months ago
Hi - I work on this. Uptake is a steep curve right now, spare a thought for the TPUs today.

Appreciate all the takes so far, the team is reading this thread for feedback. Feel free to pile on with bugs or feature requests we'll all be reading.

akrauss · 6 months ago
One thing I'd really like to see in coding agents is this: As an architect, I want to formally define module boundaries in my software, in order to have AI agents adhere to and profit from my modular architecture.

Even with 1M context, for large projects, it makes sense to define boundaries These will typically be present in some form, but they are not available precisely to the coding agent. Imagine there was a simple YAML format where I could specify modules and where they can be found in the source tree, and the APIs of other modules it interacts with. Then it would be trivial to turn this into a context that would very often fit into 1M tokens. When an agent decides something needs to be done in the context of a specific module, it could then create a new context window containing exactly that module, effetively turning a large codebase into a small codebase, for which Gemini is extraordinarily effective.

akrauss commented on It's the end of observability as we know it (and I feel fine)   honeycomb.io/blog/its-the... · Posted by u/gpi
akrauss · 7 months ago
I would be interested in reading what tools are made available to the LLM, and how everything is wired together to form an effective analysis loop. It seems like this is a key ingredient here.
akrauss commented on Mathesar – an intutive spreadsheet-like interface to Postgres data   github.com/mathesar-found... · Posted by u/gjvc
akrauss · a year ago
I can see this being very useful for many admin interfaces where some basic data must be managed by domain experts and UX and is not a priority. Many enterprise applications have such parts.

I wonder what the GPLv3 licensing means for such scenarios: Could people run Mathesar one microservice in an ensemble with proprietary services? Companies who don‘t want to open source their whole product might still be willing to upstream their fixes and improvements to the Mathesar component.

u/akrauss

KarmaCake day34September 26, 2018
About
[ my public key: https://keybase.io/alex_krauss; my proof: https://keybase.io/alex_krauss/sigs/-mMyP5rHa0wW_lF-uofH9KPxXHz1_1X1jYjreihGHAc ]
View Original