skp1995 (u/skp1995) - Readit News

skp1995 commented on VSCode’s SSH agent is bananas fly.io/blog/vscode-ssh-wt... · Posted by u/zdyxry

bravura · a year ago

I am super interested in sidecar. When will it support o3-mini-high?

I also need an SDK to script it, tbh. What I want is to have actually a few different agents that interact with each other. Do you expose a good SDK?

skp1995 · a year ago

I does support o3-mini-high already, we use it for a few flows in the agent.

What kind of SDK support are you looking for?

skp1995 commented on VSCode’s SSH agent is bananas fly.io/blog/vscode-ssh-wt... · Posted by u/zdyxry

bravura · a year ago

"[LLM-generated code] is ultra-useful if you can close the loop between the LLM and the execution environment (with an “Agent” setup). There’s lots to say about this, but for the moment: it’s a semi-effective antidote to hallucination: the LLM generates the code, the agent scaffolding runs the code, the code generates errors, the agent feeds it back to the LLM, the process iterates."

Okay, so who's doing this today and how?

This question came up recently in the Aider discord and not many had a good answer.

Aider is great but the SDK is weak and second-class, so interacting with the repl frustrates agent-dev.

Sidecar (which can run independently of Aide IDE, not to be confused with Aider), https://github.com/codestoryai/sidecar/ is one agent that was mentioned. Many of that projects issues are auto-responded to by a PR-creating agent.

Anything else I'm missing?

In general, I know how I would build an agentic dev-loop, I'm just looking for a good SDK that handles prompting and diff merging etc. i.e. Aider as an SDK or similar.

skp1995 · a year ago

Hey I am the coredev on sidecar. The reason you see autogenerated PRs is cause I am using our agents to write the code for the agent lol

The big difference is the complete loop, each PR gets its own VM with the tool chains installed so the agent can run cargo check or cargo tests etc.

We do find the LLMs of today are not the best elite engineers but very very competent junior engineers. It's been a weird but eye opening workflow to use.

skp1995 commented on SOTA on swebench-verified: relearning the bitter lesson aide.dev/blog/sota-bitter... · Posted by u/mcflem007

alach11 · a year ago

This is a very impressive result. OpenAI was able to achieve 72% with o3, but that's at a very high compute cost at inference-time.

I'd be interested for Aide to release more metrics on token counts, total expenditure, etc. to better understand exactly how much test-time compute is involved here. They allude to it being a lot, but it would be nice to compare with OpenAI's o3.

skp1995 · a year ago

Hey! One of the creators of Aide here.

ngl the total expenditure was around $10k, in terms of test-time compute we ran upto 20X agents on the same problem to first understand if the bitter lesson paradigm of "scale is the answer" really holds true.

The final submission which we did ran 5X agents and the decider was based on mean average score of the rewards, per problem the cost was around $20

We are going to push this scaling paradigm a bit more, my honest gut feeling is that swe-bench as a benchmark is prime for saturation real soon

1. These problem statements are in the training data for the LLMs

2. Brute-forcing the answer the way we are doing works and we just proved it, so someone is going to take a better stab at it real soon

skp1995 commented on Stop making me memorize the borrow checker erikmcclure.com/blog/stop... · Posted by u/signa11

FridgeSeal · a year ago

Can't say I agree, or that this matches my experience of writing Rust.

I don't memorise how it works, I've just learnt what it rejects and why, and this in turn becomes clear as to why it's rejected that. Very rarely do I find myself going "oh bother, now I suddenly need to `Rc` or `Arc` this, I suspect because I've just gotten into the habit of suspecting when I anticipate things will run afoul and structuring things from the get-go to avoid that. Admittedly, I'm not writing absurdly low-level code.

I wonder if the authors grounding C++ is making life harder for them? Often when I've had to teach people Rust, getting them to stop writing {C/C#/Java}-but-in-Rust is the first stop on the trail to "stop fighting and actually enjoy the language". Every language has its idioms, just because you can, doesn't mean you should.

skp1995 · a year ago

I do have to ask, I have worked in codebases which used lifetimes and didn't lean into Rc/Arc and vice-versa.

I used to think Arc/Rc was a shortcut to avoiding the borrow checker shenanigans, but have evolved that thinking over time.

You do mention it in your comment so wondering if you have anything to share about it

skp1995 commented on Stop making me memorize the borrow checker erikmcclure.com/blog/stop... · Posted by u/signa11

estebank · a year ago

> I do hope rustc gets better at lifetime compilation errors cause some of them can be very very gnarly.

When this happens, file tickets! We do our best to improve diagnostics over time, but the best improvements have been reactive, by fixing a case that we never encountered but our users did.

skp1995 · a year ago

will keep that in mind going forward! The most recent ones which I have been hitting are around "higher-ranked lifetime error"

I know my way around this now, which is to literally binary search over the timeline of my edits (commenting out code and then reintroducing it) to see what causes the compiler to trip over (there might be better ways to debug this, and I am all ears)

Most of the times this error is several layers deep in my application so even tho I want to ticket it up, not being able to create a minimal repo for anyone to iterate against feels like a bit of wasted energy on all sides, do let me know if I should change this way of thinking and I can promise myself to start being more proactive.

skp1995 commented on Stop making me memorize the borrow checker erikmcclure.com/blog/stop... · Posted by u/signa11

skp1995 · a year ago

Rust can be hard to get right because of the borrow checker. I had a similar situation happen to me where I went about refactoring the code to make borrow checker happy ... until the last bit when things stopped compiling and I realized my approach was completely wrong (in the rust world, I had a self-reference in the structs)

Having said this, the benefits of borrow checker out weight the shortcomings. I can feel myself writing better code in other languages (I tend to think about the layout and the mutability and lifetimes upfront more now)

My rust code now is very functional, which seems to work best with lifetimes.

I would love to know more about the authors pain, I do hope rustc gets better at lifetime compilation errors cause some of them can be very very gnarly.

skp1995 commented on IMG_0416 ben-mini.github.io/2024/i... · Posted by u/bewal416

skp1995 · a year ago

I am missing the link to the thread, but diffusion models also give a very consistent output when prompted with `IMG_{number}` part of the reason could be the training data distribution

skp1995 commented on Show HN: Aide, an open-source AI native IDE aide.dev/... · Posted by u/skp1995

jadbox · a year ago

I'd much, much prefer Aide to continue as a CLI tool or as a VSCode plugin. Every fork of VSCode ends up with IDE maintenance bugs that never get addressed and slowly the effort implodes as the bug surface becomes too wide.

Do you want to spend 90% of your time on AI or troubleshooting odd Linux VSCode bugs in your fork? I'd highly recommend the team to evaluate a different direction for growth to maximize sustainable future growth.

skp1995 · a year ago

Thats a fair point, a significant part of our 4 person team had to skill up on the VSCode codebase to be able to meaningfully make changes to it.

I would love to know your workflow, you mention CLI tool or VSCode plugin, which one of them work for you? Whats missing from them where Aide can fill in the gap