Readit News logoReadit News
Posted by u/bhaktatejas922 2 months ago
Launch HN: Morph (YC S23) – Apply AI code edits at 4,500 tokens/sec
Hey HN, I’m Tejas at Morph. We’ve built a blazing-fast model for applying AI-generated code edits directly into your files at 4,500+ tokens/sec. No more slow full-file rewrites or brittle search-and-replace hacks.

Here's a demo video: https://www.youtube.com/watch?v=LdT8epGHJPk.

Why? AI spits out code that can’t reliably be inserted into existing code. Full file rewrites, brittle search-and-replace hacks are too slow, expensive, or error-prone.

Morph's approach:

- Your agent outputs edits “lazily”, referencing unmodified lines in the existing file (ex: // ...existing code...)

- Morph instantly applies these edits to a file using our Fast Apply model + speculative decoding against the original file, making AI patches fast, reliable, and production-ready.

This approach was pioneered by Cursor last year, but their models aren’t available as APIs—so we built Morph for developers everywhere (with a large free tier!)

Live demo (no signup): https://morphllm.com/dashboard and docs: https://docs.morphllm.com/quickstart

We have 2 Fast Apply models: morph-v3-fast - 4500+ tok/sec, and morph-v3-large - 2500+ tok/sec. These models power Fast Apply at create.xyz, databutton, continue.dev, and more!

We also provide retrieval models for embedding + reranking. Next Up: Inline Edit Model (Cmd-K): Extremely fast inline edits - keep dev flow state; and Morph Tab API: Our Next Edit Prediction model guesses your next code edit + action with sub-500ms latency. It's currently in private beta, but you can request early access here: https://morphllm.com/tab

Hot takes:

1) Raw inference speed matters more than incremental accuracy gains for dev UX—agree or disagree?

2) Full-file rewrites by frontier models are legacy—Fast Apply edits win on speed, cost, reliability.

3) As benchmarks on narrow tasks saturate to 99%+, complexity is shifting from single frontier models to specialized inference-optimized models. As frontier models move upmarket, they'll leave simple tasks behind, and they'll be used to do tasks only frontier models can do

We’d love to hear your ideas and experiences with coding agents!

deepdarkforest · 2 months ago
> 1) Raw inference speed matters more than incremental accuracy gains for dev UX—agree or disagree?

I know you are trying to generate some controversy/visibility, but i think if we are being transparent here, you know this is wrong. People prefer using larger (or reasoning) models, with much bigger diff in tok/sec just for quality in coding, it comes first. Even if i have a big edit to apply, like 5k tokens, 200-300ms of difference in edit time are nothing. Edit speed is definitely not a bottleneck for dev UX, quality is. A dev who wants to save 200ms every code change over quality is someone who well, i cannot relate. If im using 1-2 agents in parallel, most of the time the edits are already applied while im reviewing code from the other agents. But again maybe that's just me.

Speaking of quality, how do you measure it? Do you have any benchmarks? How big is the difference in error rate between the fast and large model?

ashwindharne · 2 months ago
I do find that having inference happen ~50% faster is much more valuable to my workflow than a single digit accuracy increase. If I'm going to have to check that the changes are correct anyways, getting more iterations in faster feels much better than incremental accuracy.

There's definitely a tipping point though. If the accuracy gains are so high that I can check its work less carefully or less often, the benefits of inference speed are effectively nil.

godot · 2 months ago
I've been using Cursor pretty extensively in the past few months and I use it to code pretty hard problems sometimes, and a while ago when the options were between claude 3.5 sonnet vs gemini 2.5 pro, there was such a significant difference in quality that claude 3.5 often straight up failed -- the code it wrote woudln't work, even after retrying over and over again, and gemini 2.5 pro often was able to solve it correctly. In a particular project I even had to almost exclusively use gemini 2.5 pro to continue to make any progress despite having to wait out the thinking process every time (gemini was generally slower to begin with, and then the thinking process often took 30-90 seconds).
bhaktatejas922 · 2 months ago
exactly. The point is that none of the users even realize a model is doing the apply - it should be so accurate and fast that it feels like its not there
walthamstow · 2 months ago
Agreed. Sonnet 4 is supposedly better than Sonnet 3.5, but in Cursor 3.5 is much faster so that's what I use
Cort3z · 2 months ago
As far as i understand, this is not +-300ms. It is 300ms vs. 10 sec or something. That is a huge difference. I personally find the time to wait for these larger models a limiting factor. It’s also probably a resource waste for fairly simple task like this. (Compared to the general function approximation of the llms)

But I honestly feel like the task of smartly applying edits falls somewhat within traditional coding tasks. What about it is so difficult it could not be done with a smart diffing algorithm?

bhaktatejas922 · 2 months ago
it's a bit unclear why a model works best here. in short - smart diffing is edge case hell and you'll never capture all of them
deepdarkforest · 2 months ago
you misunderstood. its 300ms just for the apply model, the model that takes your coding models output (eg sonnet) and figures out where the code should be changed in the file. Cursor has its own, and claude uses a different technique with strings as well. So its 10sec vs 10sec +300ms using your analogy
asam-0 · 2 months ago
Fully agree.

The very 1st thing you do after you get a proposed set of changes from an AI model is to review them carefully before applying them. Most of the time it duplicates code because it skipped specific tokens or context that was out of it's window and the user didn't include it in their prompt.

Batch applying any changes is just a way to create even harded code to debug and accumulating such bulk code injections will definietly break your code much earlier than you think.

B, Sam

bhaktatejas922 · 2 months ago
if you've used cursor, you've probably felt how seamless fast apply can feel - fast apply is accurate and fast to the point where most don't even realize its a model
bhaktatejas922 · 2 months ago
I think it depends - the actual thing to measure it to keep a developer in flow state. Many errors as well as latency break this. To be brief yes, accuracy comes first.

Quality is measured 2 main ways:

1) End-to-end: User query -> to task resolution. These are aider style benchmarks answering the question of actual task completion

2) Apply Quality: Syntax correctness, character diff, etc..

The error rate for large vs fast is around 2%. If you're doing code edits that are extremely complex or on obscure languages - large is the better option. There's also an auto option to route to the model we think is best for a task

candiddevmike · 2 months ago
I don't believe anyone can be in some kind of "flow state" while waiting on LLM responses. I think it's funny that we complained for years about C and others being slow to compile and now folks are fine waiting seconds++ everytime they want to change something.
Aurornis · 2 months ago
> the actual thing to measure it to keep a developer in flow state.

Personally, I find flow state hard to achieve when I constantly have to switch modes to debugging LLM output or an edit error that I missed.

When the majority of time is spent waiting for the main LLM to think, I will always wait a few extra seconds for a better edit than risk having to spend multiple cycles playing find-the-bug because something didn't get applied correctly somewhere.

deepdarkforest · 2 months ago
Glad to hear quality comes first! Then I assume you have some public benchmarks like the ones you mention that are reproducible? I could only find this graph https://docs.morphllm.com/guides/apply but there is no mention of what it refers to, what data it used etc.
k__ · 2 months ago
I have to admit, that using slow models is unbearable when I used fast one before.

I don't know if the quality and speed are linearly related, though.

AirMax98 · 2 months ago
Seriously agree — try using something like Sonnet 3.7 and then switching to Gemini 2.5 Pro. The code that both output is fine enough — especially given that I mostly use LLMs as a fancy autocomplete. Generally a better prompt is going to get me closer to what I want than a more robust model. The speed hit with Gemini 2.5 Pro is just too substantial for me to use it as a daily driver.

I imagine the speed difference might not matter so much if you are performing seismic updates across a codebase though.

Darmani · 2 months ago
Sounds like review time is the bottleneck for you.

I'm currently working on something that that makes people much faster at reviewing the output of coding agents. If you have some time, I'm very interested in interviewing you about your workflows. Just reply here, or find my contact information in my profile.

-- Jimmy Koppel, Ph. D.

smrtinsert · 2 months ago
Slow is smooth and smooth is fast.
bhaktatejas922 · 2 months ago
and speculative edits is faster
paulddraper · 2 months ago
I do not use Opus for coding, I much prefer Sonnet.

Many tasks work better with iteration/supervision and Sonnet makes that feasible.

bhaktatejas922 · 2 months ago
yeah same. I feel like Opus tends to be slightly more sycophancy leaning on technical topics
bigyabai · 2 months ago
The marketing language seems to suggest they're insecure over quality and want to promote quantity. But I'm in the same boat as you - I would happily take 10 tok/sec of a correct answer instead of wasting an hour curating 4500 tok/sec throwaway answers. Benchmark performance matters 100x more than your latency.

If these "hot takes" extend into Morph's own development philosophy, then I can be glad to not be a user.

bhaktatejas922 · 2 months ago
There's no amount of error rate that's acceptable to us - edits should always be correct. We've just found anecdotally the saving users time is just provably also very important for churn, retention and keeping developer flow state, right after accuracy.
IanCal · 2 months ago
This is a code editing model. 10 tokens per second editing may as well not exist for any interactive use case.
johnfn · 2 months ago
Anyone can get 10 tok/sec - just tell the model to output the entire file with changes, rather than just the delta.

Whatever LLM you're using will have a baseline error rate a lot higher than 2%, so you're going to be reviewing all the code it outputs regardless.

laborcontract · 2 months ago
Really like this. I've been trying microsoft's copilot and it's so clunky, particularly when applying edits. One would assume they have the resources to train the model..

Request: please provide a system prompt in the docs to help the llm generate the diff format that performs best w/ your models. LLMs frequently change the way they present diffs on upgrades and I don't want to be guessing which format is best.

EDIT: Please clarify your privacy policy. If my interpretation is correct, paying users will have their data retained and trained on? Is there any way to pay to use the service (w/o picking up the phone) and not have my data trained on?

  4.1 Use of Service Data

  Depending on your subscription tier:

  Free Tier: We may use your submitted code data to train our models, improve our Services, and develop new features.
  Engineer Tier: We may use your submitted code data to train our models, improve our Services, and develop new features, subject to the confidentiality provisions in your service agreement.
  Enterprise Tier: We do not use your submitted code data for any purpose other than processing your immediate request. Your code data is never used for model training or service improvement.

[0] https://morphllm.com/privacy

bhaktatejas922 · 2 months ago
done! Yeah we have ZDR options as well, just email us to enable it info@morphllm.com

Morph via OpenRouter is always zero data retention

laborcontract · 2 months ago
Good to know. Thanks a lot!
fastball · 2 months ago
This whole "don't train on my data" thing is so silly. Do you know how these models were created? By training them on code.

Very selfish / tragedy of the commons for you to want to use tools that were trained on the code of others but not your own. That is how these models get better.

laborcontract · 2 months ago
I care much less about the training, much more about the data retention. I don't think it's wrong to not want my data retained, especially if the counterparty is receiving remuneration for the service. For free services, I agree with you. I've used the free Gemini liberally.

I do appreciate the transparency on their privacy page and their providing the ability to opt about. Seems like they've given it some thought.

weird-eye-issue · 2 months ago
Seems completely broken.

I used the provided HTML example on https://morphllm.com/dashboard/playground/apply. Without editing anything at all, I pressed apply.

Your model added a bunch of CSS even though that wasn't in the update instructions at all. It also added a contact section, which again, wasn't in the update instructions that your demo provided.

bhaktatejas922 · 2 months ago
nice catch. the html example was using a hardcoded snippet we forgot to uncomment. fixed
weird-eye-issue · 2 months ago
Thanks I will give it another try because our use case is HTML/Markdown documents, not code, and this could be interesting. I'm just hesitant to trust an LLM to do replacements and your broken example really didn't help with my confidence. Even a 1% error rate wouldn't be worth it because if find/replace doesn't work you know it doesn't work and can feed that error back into the agent to fix it (like how Claude Code recovers from its editing errors)

edit: The example is still broken. I've inspected the network request and it's definitely your backend that is broken not something being hardcoded... The CSS is not present in the request at all, but in the response it's being inserted.

Workaccount2 · 2 months ago
Just for clarification here because I am a bit confused,

Morph is a tool for integrating the output of other LLMs and not an LLM itself? It doesn't generate 4500 tok/sec, it can edit 4500 tok/sec?

bhaktatejas922 · 2 months ago
Correct, but morph is a LLM as well. In practice its basically Big LLM using small LLM as a tool call
Workaccount2 · 2 months ago
I see. How is this not going to get run over immediately by big players? Google's diffusion model is already in the wings, and it's both wicked fast and ~flash-lite intelligent.
furyofantares · 2 months ago
Big LLM and small LLM, very Starbucks sizing vibes here.
Kamshak · 2 months ago
It's more expensive than Gemini flash which can actually write pretty decent code (not just apply a diff). Fast AI edit application is definitely great but that's pretty expensive

Morph v3 fast: Input: $1.20 / M tokens, Output $2.70 / M tokens

Gemini 2.5 Flash: $0.30 / M tokens, Output $2.50 / M tokens

(Source: OpenRouter)

bhaktatejas922 · 2 months ago
Thats for 0 data retention - on the Morph website its: 0.80 /1M token input, $1.20 /1M token output. We have discounts for large volumes/reserved instances as well
seanw265 · 2 months ago
Last time I looked into Morph, I noticed you weren’t yet on OpenRouter. I see that’s changed, but it looks like only an older model is listed. Any plans to be more active there?

Also, are there any benchmarks comparing your fast apply models to others like Relace or even Llama via Cerebras? I’m particularly interested in output accuracy.

bhaktatejas922 · 2 months ago
the power of hacker news! New models are listed there now
bhaktatejas922 · 2 months ago
the v2 model listed currently points to morph-v3-large. We're working with them to get v3-large and v3-fast listed
bijection · 2 months ago
How does this compare to relace, which I believe is also a YC company? They seem to have very similar functionality [0]

[0] https://www.relace.ai/

Kamshak · 2 months ago
Good question, they also list the same customers (create.xyz, continue.dev)
fazkan · 2 months ago
I think both maybe using customers very loosely :)
nico · 2 months ago
Would be awesome to have a browser extension that could create a bridge between ChatGPT and VSCode, applying Morph in between (or Claude instead of ChatGPT). Essentially use the web interface, instead of the APIs for agentic coding
bhaktatejas922 · 2 months ago
I think an MCP would do the job. We're shipping one out as we speak
sidgarimella · 2 months ago
+1 hyped for an mcp that I might be able to plug zed into