sudb (u/sudb) - Readit News

simvirdi · 9 months ago

Looks cool - do you have any benchmarks? How do you compare to other products out there?

sudb · 9 months ago

We last submitted a SWE-Bench verified result in November 2024 - at the time I believe we were in the top 5 entrants.

We expect Engine to be as good as the other code-writing agents out there at the moment - we understand almost everyone in the space to be using very similar base models and agent scaffolding.

sudb commented on Show HN: The Best Terminal Inspired Portfolio on the Internet™ kuber.studio/... · Posted by u/kuberwastaken

sudb · 9 months ago

the closest I could get to getting your LLM to identify itself was as LaMDA, which makes me think this is probably a Gemma model - am I close?

sudb commented on Run GitHub Actions locally github.com/nektos/act... · Posted by u/flashblaze

Aurornis · 9 months ago

Same experience here. Edge cases everywhere, though most can be worked around.

You can specify different runners to use. The default images are a compromise to keep size down. There is a very large image that tries to include everything you might want. I would suggest trying that if you don’t mind the very large (15GB IIRC) image.

sudb · 9 months ago

I definitely remember considering the larger images - I think we ended up not using them since my work's usecase for act is running user github workflows on-demand on temporary VMs. The hope was that most usage is covered by the smaller images - and in fairness that has been true so far.

sudb commented on Show HN: Cyberdesk, API for computer agents to control a desktop (open source) github.com/cyberdesk-hq/c... · Posted by u/sgtwompwomp

sudb · 9 months ago

Looks cool! If you're able to say - where/how do you run these virtual desktop instances?

sudb commented on Show HN: A MCP server to evaluate Python code in WASM VM using RustPython github.com/tuananh/hyper-... · Posted by u/tuananh

digdugdirk · 9 months ago

Is there a list of these "code sandboxes" floating around somewhere? It seems like it's going to be more and more important with LLMs playing more of a factor in development moving forward.

sudb · 9 months ago

I know of https://modal.com/, which I believe is used by Codegen and Cognition.

Anecdotally-speaking, I hear that many companies in the LLM agent space roll their own sandbox solutions - I've heard of both Firecracker- and Kubernetes-based implementations.

sudb commented on Run GitHub Actions locally github.com/nektos/act... · Posted by u/flashblaze

sudb · 9 months ago

I use this for work - but there are edge cases all over the place that I keep running into (e.g. Yarn being installed on Github-hosted runners, but not self-hosted ones or act - https://github.com/actions/setup-node/issues/182)

Apart from that it's been quite good!

sudb commented on Show HN: Engine – A multi-LLM alternative to Codex enginelabs.ai/... · Posted by u/sdspurrier

sudb · 9 months ago

I worked on this! Happy to answer any questions anyone has.

sudb commented on Writing "/etc/hosts" breaks the Substack editor scalewithlee.substack.com... · Posted by u/scalewithlee

sudb · 10 months ago

I had a problem recently trying to send LLM-generated text between two web servers under my control, from AWS to Render - I was getting 403s for command injection from Render's Cloudflare protection which is opaque and unconfigurable to users.

The hacky workaround which has been stably working for a while now was to encode the offending request body and decode it on the destination server.

u/sudb

KarmaCake day25February 14, 2023View Original