eborgnia (u/eborgnia)

eborgnia commented on A Year of Fast Apply – Our Path to 10k Tokens per Second relace.ai/blog/relace-app... · Posted by u/eborgnia

bn-l · 2 months ago

You guys have to bring the cost down

eborgnia · 2 months ago

How do you expect it to be priced? We do give discounts for high volume users.

eborgnia commented on A Year of Fast Apply – Our Path to 10k Tokens per Second relace.ai/blog/relace-app... · Posted by u/eborgnia

swyx · 2 months ago

> To streamline the process while maintaining quality, we built our own internal evaluation tool: a Git-style diff viewer with annotation tools for categorizing merge outcomes.

vibecoding internal eval tools is the single best use case of ai accelerating ai i know of! nice to see

(sorry if this gets asked a lot) - any philsophical/methodology differences to MorphLLM that you'd call out since you seem to be a direct alternative?

eborgnia · 2 months ago

Hey, happy to answer! The manual evals we did showed that both morph-v3-fast and morph-v3-large had significantly more smoothing and hallucination behaviors.

It's hard to know for sure because their methods aren't public, but my guess is the dataset they constructed pushes the Fast Apply model to more aggressively fix mistakes introduced by the frontier model in the edit snippet.

This aligns with the fact that their flagship model (morph-v3-large) is 4x slower than ours -- the smoothings/hallucinations are not in the initial code or the edit snippet so they break speculative continuations more frequently. Their 2x faster model (morph-v3-fast) is likely quantized more aggressively (maybe fp4? and run on B200s?) because it exhibits very strange behaviors like hallucinating invalid characters at random points that make the code non-compilable.

From an accuracy POV, auto-smoothing is helpful for fixing obvious mistakes in the edit snippet like missed imports from well known packages. However, it does increase the frequency of code breaking hallucinations like invalid local imports among other functional changes that you might not want a small apply model to perform.

eborgnia commented on Launch HN: Relace (YC W23) – Models for fast and reliable codegen · Posted by u/eborgnia

blef · 7 months ago

We are using Relace in production to apply code for a month and this is crazy how easy it has been to integrate it (less than 30 minutes). The most impressive thing when you come from a general purpose LLM is the speed and the accuracy relace brings.

In the past we were using o4-mini which had an annoying issue at adding newline when not needed and was slow (5s+), relace fixed all these issues.

eborgnia · 7 months ago

Glad it's working out -- thanks for the support :)

eborgnia commented on Launch HN: Relace (YC W23) – Models for fast and reliable codegen · Posted by u/eborgnia

darkteflon · 7 months ago

This is very interesting. There was also an article and discussion a couple of days ago on using diffusion models for edit/apply tasks at ~2k tps[1].

If I understand correctly, the ‘apply’ model takes the original code, an edit snippet, and produces a patch. If the original code has a lot of surrounding context (e.g., let’s say you pass it the entire file rather than trying to assess which bits are relevant in advance), are speed and/or performance materially affected (assuming the input code contains no duplication of the code to be amended)?

Does / how well does any of this generalise to non-code editing? Could I use Relace Apply to create patches for, e.g., plain English markdown documents? If Apply is not a good fit, is anyone aware of something suitable in the plain English space?

[1] https://news.ycombinator.com/item?id=44057820

eborgnia · 7 months ago

The diffusion approach is really interesting -- it's something we haven't checked out for applying edits just yet. It could work quite well though!

You can definitely use it for markdown, but we haven't seen anyone test it for plaintext yet. I'm sure it would work though, let us know if you end up trying it!

eborgnia commented on Launch HN: Relace (YC W23) – Models for fast and reliable codegen · Posted by u/eborgnia

threeseed · 7 months ago

> For both vibe-coded and enterprise codebases

What in god’s name does this even mean ?

eborgnia · 7 months ago

Haha, we think of "vibe-coded codebases" as codebases produced by nontechnical users that are using an AI tool

eborgnia commented on Launch HN: Relace (YC W23) – Models for fast and reliable codegen · Posted by u/eborgnia

bcyn · 7 months ago

Very interested to see what the next steps are to evolve the "retrieval" model - I strongly believe that this is where we'll see the next stepwise improvement in coding models.

Just thinking about how a human engineer approaches a problem. You don't just ingest entire relevant source files into your head's "context" -- well, maybe if your code is broken into very granular files, but often files contain a lot of irrelevant context.

Between architecture diagrams, class relationship diagrams, ASTs, and tracing codepaths through a codebase, there should intuitively be some model of "all relevant context needed to make a code change" - exciting that you all are searching for it.

eborgnia · 7 months ago

Adding extra structural information about the codebase is an avenue we're actively exploring. Agentic exploration is a structure-aware system where you're using a frontier model (Claude 4 Sonnet or equivalent) that gives you an implicit binary relevance score based on whatever you're putting into context -- filenames, graph structures, etc.

If a file is "relevant" the agent looks at it and decides if it should keep it in context or not. This process repeats until there's satisfactory context to make changes to the codebase.

The question is whether we actually need a 200b+ parameter model to do this or if we can distill the functionality onto a much smaller, more economical model. A lot of people are already choosing to do it with Gemeni (due to the 1m context window), and they write the code with Claude 4 Sonnet.

Ideally, we want to be able to run this process cheaply in parallel to get really fast generations. That's the ultimate goal we're aiming towards

eborgnia commented on Launch HN: Relace (YC W23) – Models for fast and reliable codegen · Posted by u/eborgnia

harrisreynolds · 7 months ago

Nice! I am currently writing a new version of my no-code platform, WeBase [1], to use AI to generate and edit applications.

Currently just using foundation models from OpenAI and Gemini but will be very interested to try this out.

My current approach is to just completely overwrite files with new updated version but I am guessing using something like Relace will make the whole process more efficient... is that correct?

I'll watch your video later but I would love to learn more about common use cases. It could even be fun to write a blog post for your blog comparing my "brut force" approach to something more intelligent using Relace.

[1] https://www.webase.com (still points to the old "manual" version)

eborgnia · 7 months ago

Happy to collaborate, shoot us an email at info@relace.ai :)

eborgnia commented on Launch HN: Relace (YC W23) – Models for fast and reliable codegen · Posted by u/eborgnia

ramoz · 7 months ago

I can import my entire codebase to Gemini and get more than a nuanced similarity score in terms of agent guidance.

What’s the differentiator or plan for arbitrary query matching?

Latency? If you think about it - not really a huge issue. Spend 20s-1M mapping an entire plan with Gemini for a feature.

Pass that to Claude Code.

At this point you want non-disruptive context moving forward and presumably any new findings would only be redundant with what is in long context already.

Agentic discovery is fairly powerful even without any augmentations. I think Claude Code devs abandoned early embedding architectures.

eborgnia · 7 months ago

Hey, these are really interesting points. The question of agentic discovery vs. one-shot retrieval is really dependent on the type of product.

For Cline or Claude Code where there's a dev in the loop, it makes sense to spend more money on Gemeni ranking or more latency on agentic discovery. Prompt-to-app companies (like Lovable) have a flood of impatient non-technical users coming in, so latency and cost become a big consideration.

That's when using a more traditional retrieval approach can be relevant. Our retrieval models are meant to work really well with non-technical queries on these vibe-coded codebases. They are more of a supplement to the agentic discovery approaches, and we're still figuring out how to integrate them in a sensible way.