Here's a demo video: https://www.youtube.com/watch?v=LdT8epGHJPk.
Why? AI spits out code that can’t reliably be inserted into existing code. Full file rewrites, brittle search-and-replace hacks are too slow, expensive, or error-prone.
Morph's approach:
- Your agent outputs edits “lazily”, referencing unmodified lines in the existing file (ex: // ...existing code...)
- Morph instantly applies these edits to a file using our Fast Apply model + speculative decoding against the original file, making AI patches fast, reliable, and production-ready.
This approach was pioneered by Cursor last year, but their models aren’t available as APIs—so we built Morph for developers everywhere (with a large free tier!)
Live demo (no signup): https://morphllm.com/dashboard and docs: https://docs.morphllm.com/quickstart
We have 2 Fast Apply models: morph-v3-fast - 4500+ tok/sec, and morph-v3-large - 2500+ tok/sec. These models power Fast Apply at create.xyz, databutton, continue.dev, and more!
We also provide retrieval models for embedding + reranking. Next Up: Inline Edit Model (Cmd-K): Extremely fast inline edits - keep dev flow state; and Morph Tab API: Our Next Edit Prediction model guesses your next code edit + action with sub-500ms latency. It's currently in private beta, but you can request early access here: https://morphllm.com/tab
Hot takes:
1) Raw inference speed matters more than incremental accuracy gains for dev UX—agree or disagree?
2) Full-file rewrites by frontier models are legacy—Fast Apply edits win on speed, cost, reliability.
3) As benchmarks on narrow tasks saturate to 99%+, complexity is shifting from single frontier models to specialized inference-optimized models. As frontier models move upmarket, they'll leave simple tasks behind, and they'll be used to do tasks only frontier models can do
We’d love to hear your ideas and experiences with coding agents!
I know you are trying to generate some controversy/visibility, but i think if we are being transparent here, you know this is wrong. People prefer using larger (or reasoning) models, with much bigger diff in tok/sec just for quality in coding, it comes first. Even if i have a big edit to apply, like 5k tokens, 200-300ms of difference in edit time are nothing. Edit speed is definitely not a bottleneck for dev UX, quality is. A dev who wants to save 200ms every code change over quality is someone who well, i cannot relate. If im using 1-2 agents in parallel, most of the time the edits are already applied while im reviewing code from the other agents. But again maybe that's just me.
Speaking of quality, how do you measure it? Do you have any benchmarks? How big is the difference in error rate between the fast and large model?
There's definitely a tipping point though. If the accuracy gains are so high that I can check its work less carefully or less often, the benefits of inference speed are effectively nil.
But I honestly feel like the task of smartly applying edits falls somewhat within traditional coding tasks. What about it is so difficult it could not be done with a smart diffing algorithm?
The very 1st thing you do after you get a proposed set of changes from an AI model is to review them carefully before applying them. Most of the time it duplicates code because it skipped specific tokens or context that was out of it's window and the user didn't include it in their prompt.
Batch applying any changes is just a way to create even harded code to debug and accumulating such bulk code injections will definietly break your code much earlier than you think.
B, Sam
Quality is measured 2 main ways:
1) End-to-end: User query -> to task resolution. These are aider style benchmarks answering the question of actual task completion
2) Apply Quality: Syntax correctness, character diff, etc..
The error rate for large vs fast is around 2%. If you're doing code edits that are extremely complex or on obscure languages - large is the better option. There's also an auto option to route to the model we think is best for a task
Personally, I find flow state hard to achieve when I constantly have to switch modes to debugging LLM output or an edit error that I missed.
When the majority of time is spent waiting for the main LLM to think, I will always wait a few extra seconds for a better edit than risk having to spend multiple cycles playing find-the-bug because something didn't get applied correctly somewhere.
I don't know if the quality and speed are linearly related, though.
I imagine the speed difference might not matter so much if you are performing seismic updates across a codebase though.
I'm currently working on something that that makes people much faster at reviewing the output of coding agents. If you have some time, I'm very interested in interviewing you about your workflows. Just reply here, or find my contact information in my profile.
-- Jimmy Koppel, Ph. D.
Many tasks work better with iteration/supervision and Sonnet makes that feasible.
If these "hot takes" extend into Morph's own development philosophy, then I can be glad to not be a user.
Whatever LLM you're using will have a baseline error rate a lot higher than 2%, so you're going to be reviewing all the code it outputs regardless.
Request: please provide a system prompt in the docs to help the llm generate the diff format that performs best w/ your models. LLMs frequently change the way they present diffs on upgrades and I don't want to be guessing which format is best.
EDIT: Please clarify your privacy policy. If my interpretation is correct, paying users will have their data retained and trained on? Is there any way to pay to use the service (w/o picking up the phone) and not have my data trained on?
[0] https://morphllm.com/privacyMorph via OpenRouter is always zero data retention
Very selfish / tragedy of the commons for you to want to use tools that were trained on the code of others but not your own. That is how these models get better.
I do appreciate the transparency on their privacy page and their providing the ability to opt about. Seems like they've given it some thought.
I used the provided HTML example on https://morphllm.com/dashboard/playground/apply. Without editing anything at all, I pressed apply.
Your model added a bunch of CSS even though that wasn't in the update instructions at all. It also added a contact section, which again, wasn't in the update instructions that your demo provided.
edit: The example is still broken. I've inspected the network request and it's definitely your backend that is broken not something being hardcoded... The CSS is not present in the request at all, but in the response it's being inserted.
Morph is a tool for integrating the output of other LLMs and not an LLM itself? It doesn't generate 4500 tok/sec, it can edit 4500 tok/sec?
Morph v3 fast: Input: $1.20 / M tokens, Output $2.70 / M tokens
Gemini 2.5 Flash: $0.30 / M tokens, Output $2.50 / M tokens
(Source: OpenRouter)
Also, are there any benchmarks comparing your fast apply models to others like Relace or even Llama via Cerebras? I’m particularly interested in output accuracy.
[0] https://www.relace.ai/