Cloudlflare builds OAuth with Claude and publishes all the prompts

The commits are revealing.

Look at this one:

> Ask Claude to remove the "backup" encryption key. Clearly it is still important to security-review Claude's code!

> prompt: I noticed you are storing a "backup" of the encryption key as `encryptionKeyJwk`. Doesn't this backup defeat the end-to-end encryption, because the key is available in the grant record without needing any token to unwrap it?

I don’t think a non-expert would even know what this means, let alone spot the issue and direct the model to fix it.

victorbjorklund · 3 months ago

That is how LLM:s should be used today. An expert prompts it and checks the code. Still saves a lot of time vs typing everything from scratch. Just the other day I was working on a prototype and let claude write code for a auth flow. Everything was good until the last step where it was just sending the user id as a string with the valid token. So if you got a valid token you could just pass in any user id and become that user. Still saved me a lot of time vs doing it from scratch.

Vinnl · 3 months ago

At least for me, I'm fairly sure that I'm better at not adding security flaws to my code (which I'm already not perfect at!) than I am at spotting them in code that I didn't write, unfortunately.

otabdeveloper4 · 3 months ago

> Still saves a lot of time vs typing everything from scratch

No it doesn't. Typing speed is never the bottleneck for an expert.

As an offline database of Google-tier knowledge, LLM's are useful. Though current LLM tech is half-baked, we need:

a) Cheap commodity hardware for running your own models locally. (And by "locally" I mean separate dedicated devices, not something that fights over your desktop's or laptop's resources.)

b) Standard bulletproof ways to fine-tune models on your own data. (Inference is already there mostly with things like llama.cpp, finetuning isn't.)

XCSme · 3 months ago

> Still saves a lot of time vs typing everything from scratch.

In my experience, it takes longer to debug/instruct the LLM than to write it from scratch.

zx8080 · 3 months ago

> An expert prompts it and checks the code. Still saves a lot of time vs typing everything from scratch.

It's a lie. The marketing one, to be specific, which makes it even worse.

noone_youknow · 3 months ago

For me, it’s not the typing - it’s the understanding. If I’m typing code, I have a mental model already or am building one as I type, whereas if I have an LLM generate the code then it’s “somebody else’s code” and I have to take the time to understand it anyway in order to usefully review it. Given that’s the case, I find it’s often quicker for me to just key the code myself, and come away with a better intuition for how it works at the end.

827a · 3 months ago

I tend to disagree, but I don't know what my disagreement means for the future of being able to use AI when writing software. This workers-oauth-provider project is 1200 lines of code. An expert should be able to write that on the scale of an hour.

The main value I've gotten out of AI writing software comes from the two extremes; not from the middle-ground you present. Vibe coding can be great and seriously productive; but if I have to check it or manually maintain it in nearly any capacity more complicated than changing one string, productivity plummets. Conversely; delegating highly complex, isolated function writing to an AI can also be super productive, because it can (sometimes) showcase intelligence beyond mine and arrive at solutions which would take me 10x longer; but definitionally I am not the right person to check its code output; outside of maybe writing some unit tests for it (a third thing AI tends to be quite good at)

0points · 3 months ago

I really don't agree with the idea that expert time would just be spent typing, and I'd be really surprised if that's the common sentiment around here.

An expert reasons, plans ahead, thinks and reasons a little bit more before even thinking about writing code.

If you are measuring productivity by lines of code per hour then you don't understand what being a dev is.

dismalaf · 3 months ago

> Still saves a lot of time vs typing everything from scratch

Probably very language specific. I use a lot of Ruby, typing things takes no time it's so terse. Instead I get to spend 95% of my time pondering my problems (or prompting the LLM)...

signa11 · 3 months ago

> ... Still saves a lot of time vs typing everything from scratch ...

how ? the prompts have still to be typed right ? and then the output examined in earnest.

blinded · 3 months ago

Sure! But over half the fun of coding is writing and learning.

i5heu · 3 months ago

Revealing against what?

If you look at the README it is completely revealed... so i would argue there is nothing to "reveal" in the first place.

> I started this project on a lark, fully expecting the AI to produce terrible code for me to laugh at. And then, uh... the code actually looked pretty good. Not perfect, but I just told the AI to fix things, and it did. I was shocked.

> To emphasize, this is not "vibe coded". Every line was thoroughly reviewed and cross-referenced with relevant RFCs, by security experts with previous experience with those RFCs.

JW_00000 · 3 months ago

I think OP meant "revealing" as in "enlightening", not as "uncovering something that was hidden intentionally".

rienbdj · 3 months ago

> Revealing against what?

Revealing of what it is like working with an LLM in this way.

kortilla · 3 months ago

Revealing the types of critical mistakes LLMs make. In particular someone that didn’t already understand OAuth likely would not have caught this and ended up with a vulnerable system.

risyachka · 3 months ago

If the guy knew how to properly implement oauth - did he save any time though by prompting or just tried to prove a point that if you actually already know all details of impl you can guide llm to do it?

Thats the biggest issue I see. In most cases I don't use llm because DIYing it takes less time than prompting/waiting/checking every line.

throwaway2037 · 3 months ago

While I think this is a cool (public) experiment by Claude, asking an LLM to write security-sensitive code seems crazy at this point. Ad absurdum: Can you imagine asking Claude to implement new functionality in OpenSSL libs!?

PeterStuer · 3 months ago

Which is exactly why AI coding assistants work with your expertise rather than replace it. Most people I see fail at AI assisted development are either non-technical people expecting the AI will solve it all, or technical people playing gotcha with the machine rather than collaborating with it.

bootsmann · 3 months ago

There is also one quite early in the repo where the dev has to tell Claude to store only the hashes of secrets

kentonv · 3 months ago

Yeah I was disappointed in that one.

I hate to say, though, but I have reviewed a lot of human code in my time, and I've definitely caught many humans making similar-magnitude mistakes. :/

hn_throwaway_99 · 3 months ago

I just wanted to say thanks so much publishing this, and especially your comments here - I found them really helpful and insightful. I think it's interesting (though not unexpected) that many of the other commenters' comments here show what a Rorschach test this is. I think that's kind of unfortunate, because your experience clearly showed some of the benefits and limitations/pitfalls of coding like this in an objective manner.

I am curious, did you find the work of reviewing Claude's output more mentally tiring/draining than writing it yourself? Like some other folks mentioned, I generally find reviewing code more mentally tiring than writing it, but I get a lot of personal satisfaction by mentoring junior developers and collaborating with my (human) colleagues (most of them anyway...) Since I don't get that feeling when reviewing AI code, I find it more draining. I'm curious how you felt reviewing this code.

jjcm · 3 months ago

Most interesting aspect of this is it likely learned this pattern from human-written code!

Deleted Comment

jofzar · 3 months ago

I know I'm preaching to the masses here, but isn't this why PR are so important?

bananapub · 3 months ago

this seems like a true but pointless observation? if you're producing security-sensitive code then experts need to be involved, whether that's me unwisely getting a junior to do something, or receiving a PR from my cat, or using an LLM.

removing expert humans from the loop is the deeply stupid thing the Tech Elite Who Want To Crush Their Own Workforces / former-NFT fanboys keep pushing, just letting an LLM generate code for a human to review then send out for more review is really pretty boring and already very effective for simple to medium-hard things.

toofy · 3 months ago

> …removing expert humans from the loop is the deeply stupid thing the Tech Elite Who Want To Crush Their Own Workforce…

this is completely expected behavior by them. departments with well paid experts will be one of the first they’ll want to cut. in every field. experts cost money.

we’re a long, long, long way off from a bot that can go into random houses and fix under the sink plumbing, or diagnose and then fix an electrical socket. however, those who do most of their work on a computer, they’re pretty close to a point where they can cut these departments.

in every industry in every field, those will be jobs cut first. move fast and break things.

hn_throwaway_99 · 3 months ago

I think it's a critically important observation.

I thought this experience was so helpful as it gave an objective, evidence-based sample on both the pros and cons of AI-assisted coding, where so many of the loudest voices on this topic are so one-sided ("AI is useless" or "developers will be obsolete in a year"). You say "removing expert humans from the loop is the deeply stupid thing the Tech Elite Who Want To Crush Their Own Workforces / former-NFT fanboys keep pushing", but the fact is many people with the power to push AI onto their workers are going to be more receptive to actual data and evidence than developers just complaining that AI is stupid.

ActionHank · 3 months ago

But AIbros will be running around telling everyone that Claude invented OAuth for Cloudflare all on its own and then opensourced it.

october8140 · 3 months ago

It's a Jr Developer that you have to check all their code over. To some people that is useful. But you're still going to have to train Jr Developers so they can turn into Sr Developers.

PeterStuer · 3 months ago

I don't like the jr dev analogy. It neither has the same weaknesses nor the same strenghts.

It's more like the genious coworker that has an overassertive ego and sometimes shows up drunk, but if you know how to work with them and see past their flaws, can be a real asset.

Cthulhu_ · 3 months ago

I don't really agree; a junior developer, if they're curious enough, wouldn't just write insecure code, they would do self-study and find out best practices etc before writing code, including not storing plaintext passwords and the like.

This is exactly the direction I expect AI-assisted coding to go in. Not software engineers being kicked out and some business person pressing a few buttons to have a fully functional app (as is playing out in a lot of fantasies on LinkedIn & X), but rather experienced engineers using AI to generate bits of code and then meticulously reviewing and testing them.

The million dollar (perhaps literally) question is – could @kentonv have written this library quicker by himself without any AI help?

kentonv · 3 months ago

It took me a few days to build the library with AI.

I estimate it would have taken a few weeks, maybe months to write by hand.

That said, this is a pretty ideal use case: implementing a well-known standard on a well-known platform with a clear API spec.

In my attempts to make changes to the Workers Runtime itself using AI, I've generally not felt like it saved much time. Though, people who don't know the codebase as well as I do have reported it helped them a lot.

I have found AI incredibly useful when I jump into other people's complex codebases, that I'm not familiar with. I now feel like I'm comfortable doing that, since AI can help me find my way around very quickly, whereas previously I generally shied away from jumping in and would instead try to get someone on the team to make whatever change I needed.

srhtftw · 3 months ago

> It took me a few days to build the library with AI. ... > I estimate it would have taken a few weeks, maybe months to write by hand.

I don't think this is a fair assessment give the summary of the commit history https://pastebin.com/bG0j2ube shows your work started on 2025-02-27 and started trailing off at 2025-03-20 as others joined in. Minor changes continue to present.

> That said, this is a pretty ideal use case: implementing a well-known standard on a well-known platform with a clear API spec.

Still, this allowed you to complete in a month what may have taken two. That's a remarkable feat considering the time and value of someone of your caliber.

michelsedgh · 3 months ago

The fascinating part is that each person is finding their own way of using these tools from kids to elders and everyone in between no matter what your background or language or whatever is

9dev · 3 months ago

Funny thing. I have built something similar recently, that is a 2.1-compliant authorisation server in TypeScript[0]. I did it by hand, with some LLM help on the documentation. I think it took me about two weeks full time, give or take, and there’s still work to do, especially on the testing side of things, so I would agree with your estimate.

I’m going to take a very close look at your code base :)

[0] https://github.com/colibri-hq/colibri/blob/next/packages/oau...

nipah · 2 months ago

Your estimation maybe right, but maybe also there is a point on why it is right: https://neilmadden.blog/2025/06/06/a-look-at-cloudflares-ai-...

Maybe because (and I'm quoting that article) it is still lacking in what it should have that you managed to accomplish this task in "few days" instead of "a few weeks, maybe months".

Maybe the bottleneck was not your typing speed, but the [specific knowledge] to build that system. Because if you know something well enough, you can build it way faster, like rebuilding something from scratch, you will be faster as you already know the paths. In which case, my question would be: would not be writing this as fast, or maybe at least more secure and reasonable, if you had the complete knowledge of the system first.

Because contrary to LLMs, humans can actually improve and learn when they do things, and they don't whey they don't do things. Not knowing the code to the full extent is worth the time "gained" by using the LLM to write it?

I think it's very hard to estimate those other aspects of the thing.

upstairs-war · 3 months ago

Thanks kentonv. I picked up where you left off, supported with oauth2.1 rfc, and integrated ms oauth to our internal mcp server. Cool to have Claude be business aware

jdbohrman · 3 months ago

YES!!!! I've actually been thinking about starting a studio specifically geared to turning complex RFPs and protocols into usable tools with AI-assisted coding. I built these using Cursor just to test how for it could go. I think the potential of doing that as a service is huge:

https://github.com/jdbohrman-tech/hermetic-mls https://github.com/jdbohrman-tech/roselite

I think it's funny that Roselite caused a huge meltdown to the Veilid team simply because they have a weird adamancy to no AI assistance. They even called it "plagiarism"

graeme · 3 months ago

>I have found AI incredibly useful when I jump into other people's complex codebases, that I'm not familiar with. I now feel like I'm comfortable doing that

This makes sense. Are there codebases where you find this doesn't work as well, either from the codebase's min required context size or the code patterns not being in the training data?

aprilthird2021 · 3 months ago

Matches my experiences well. Making changes to large, complex codebases I know well? Teaching the AI to get up to speed with me takes too much time.

Code I know nothing about? AI is very helpful there

philipwhiuk · 3 months ago

> Though, people who don't know the codebase as well as I do have reported it helped them a lot.

My problem I guess is that maybe this is just Dunning-Kruger esq. When you don't know what you don't know you get the impression it's smart. When you do, you think it's rubbish.

Like when you see a media report on a subject you know about and you see it's inaccurate but then somehow still trust the media on a subject you're a non-expert on.

gokhan · 3 months ago

> Not software engineers being kicked out ... but rather experienced engineers using AI to generate bits of code and then meticulously reviewing and testing them.

But what if you only need 2 kentonv's instead of 20 at the end? Do you assume we'll find enough new tasks that will occupy the other 18? I think that's the question.

And the author is implementing a fairly technical project in this case. How about routine LoB app development?

thewebguyd · 3 months ago

> But what if you only need 2 kentonv's instead of 20 at the end? Do you assume we'll find enough new tasks that will occupy the other 18? I think that's the question.

This is likely where all this will end up. I have doubts that AI will replace all engineers, but I have no doubt in my mind that we'll certainly need a lot less engineers.

A not so dissimilar thing happened in the sysadmin world (my career) when everything transitioned from ClickOps to the cloud & Infrastructure as Code. Infrastructure that needed 10 sysadmins to manage now only needed 1 or 2 infrastructure folks.

The role still exists, but the quantity needed is drastically reduced. The work that I do now by myself would have needed an entire team before AWS/Ansible/Terraform, etc.

paxys · 3 months ago

Increased productivity means increased opportuntity. There isn't going to be a time (at least not anytime soon) when we can all sit back and say "yup, we have accomplished everything there is to do with software and don't need more engineers".

simonw · 3 months ago

I guess I have trouble emphasizing with "But what if you only need 2 kentonv's instead of 20 at the end?" because I'm an open source oriented developer.

What's open source for if not allowing 2 developers to achieve projects that previously would have taken 20?

bigstrat2003 · 3 months ago

> but rather experienced engineers using AI to generate bits of code and then meticulously testing and reviewing them.

My problem is that (in my experience anyways) this is slower than me just writing the code myself. That's why AI is not a useful tool right now. They only get it right sometimes so it winds up being easier to just do it yourself in the first place. As the saying goes: bad help is worse than no help at all, and AI is bad help right now.

motorest · 3 months ago

> My problem is that (in my experience anyways) this is slower than me just writing the code myself.

In my experience, the only times LLMs slow down your task is when you don't use them effectively. For example, if you provide barely any context or feedback and you prompt a LLM to write you the world, of course it will output unusable results, primarily because it will be forced to interpolate and extrapolate through the missing context.

If you take the time to learn how to gently prompt a LLM into doing what you need, you'll find out it makes you far more productive.

JimDabell · 3 months ago

> My problem is that (in my experience anyways) this is slower than me just writing the code myself.

How much experience do you have writing code vs how much experience do you have prompting using AI though? You have to factor in that these tools are new and everybody is still figuring out how to use them effectively.

uludag · 3 months ago

I feel this is on point. So not only is there the time lost correcting and testing AI generated code, but there's also the mental model you build of the code when you write it yourself.

Assuming you want a strong mental model of what the code does and how it works (which you'd use in conversations with stakeholders and architecture discussions for example), writing the code manually, with perhaps minor completion-like AI assistance, may be the optimal approach.

dkdcio · 3 months ago

> The million dollar (perhaps literally) question is – could @kentonv have written this library quicker by himself without any AI help?

I *think* the answer to this is clearly no: or at least, given what we can accomplish today with the tools we have now, and that we are still collectively learning how to effectively use this, there's no way it won't be faster (with effective use) in another 3-6 months to fully-code new solutions with AI. I think it requires a lot of work: well-documented, well-structured codebases with fast built-in feedback loops (good linting/unit tests etc.), but we're heading there no

motorest · 3 months ago

> I think the answer to this is clearly no: or at least, given what we can accomplish today with the tools we have now, and that we are still collectively learning how to effectively use this, there's no way it won't be faster (with effective use) in another 3-6 months to fully-code new solutions with AI.

I think these discussions need to start from another point. The techniques changed radically, and so did the way problems are tackled. It's not that a software engineer is/was unable to deliver a project with/without LLMs. That's a red herring. The key aspects are things like the overall quality of the work being delivered vs how much time it took to reach that level of quality.

For example, one of the primary ways a LLM is used is not to write code at all: it's to explain to you what you are looking at. Whether it's used as a Google substitute or a rubber duck, developers are able to reason with existing projects and even explore approaches and strategies to tackle problem like they were never able to do so. You no longer need to book meetings with a principal engineer to as questions: you just drop a line in Copilot Chat and ask away.

Another critical aspect is that LLMs help you explore options faster, and iterate over them. This allows you to figure out what approach works best for your scenario and adapt to emerging requirements without having to even chat with anyone. This means that, within the timeframe you would deliver the first iteration of a MVP, you can very easily deliver a much more stable project.

necovek · 3 months ago

In a "well-documented, well-structured codebase with fast built-in feedback loops", a human programmer is really empowered to make changes fast. This is exactly what's needed for fast iteration, including in unfamiliar codebases.

When you are not introducing a new pattern in the code structure, it's mostly copy-paste and then edit.

But it's also extremely rare, so a pretty high bar to be able to benefit from tools like AI.

0xbadcafebee · 3 months ago

That's not the million dollar question; anyone who's done any kind of AI coding will tell you it's ridiculously faster. I haven't touched JavaScript, CSS & HTML in like a decade. But I got a whole website created with complex UI interactions in 20 minutes - and no frameworks - by just asking ChatGPT to write stuff for me. And that's the crappy, inefficient way of doing this work. Would have taken me a week to figure out all that. If I'd known how to do it already, and I was very good, perhaps it would have taken the same amount of time? But clearly there is a force-multiplier at work here.

The million dollar question is, what are the unintended, unpredicted consequences of developing this way?

If AI allows me to write code 10x faster, I might end up with 10x more code. Has our ability to review it gotten equally fast? Will the number of bugs multiply? Will there be new classes of bugs? Will we now hire 1 person where we hired 5 before? If that happens, will the 1 person leaving the company become a disaster? How will hiring work (cuz we have such a stellar track record at that...)? Will the changing economics of creating software now make SaaS no longer viable? Or will it make traditional commercial software companies no longer viable? Will the entire global economy change, the way it did with the rise of the first tech industry? Are we seeing a rebirth?

We won't know for sure what the consequences are for a while. But there will be consequences.

stackskipton · 3 months ago

>experienced engineers using AI to generate bits of code and then meticulously reviewing and testing them

And where are supposed to get experienced engineers if replaced all Jr Devs with AI? There is a ton of benefit from drudgery of writing classes even if seems like grunt work at the time.

motorest · 3 months ago

> This is exactly the direction I expect AI-assisted coding to go in. Not software engineers being kicked out and some business person pressing a few buttons to have a fully functional app (as is playing out in a lot of fantasies on LinkedIn & X), but rather experienced engineers using AI to generate bits of code and then meticulously reviewing and testing them.

There is a middle ground: software engineers being kicked out because now some business person can hand over the task of building the entire OAuth infrastructure to a single inexperienced developer with a Claude account.

petersellers · 3 months ago

I'm not so sure that would work well in practice. How would the inexperienced developer know that the code created by the AI was correct? What if subtle bugs are introduced that the inexperienced developer didn't catch until it went out into production? What if the developer didn't even know how to debug those problems correctly? Would they know that the code they are writing is maintainable and extensible, or are they just going to generate a new layer of code on top of the old one any time they need a new feature?

belter · 3 months ago

The million-dollar question is not whether you can review at the speed the model is coding. It is whether you can trust review alone to catch everything.

If a robot assembles cars at lightning speed... but occasionally misaligns a bolt, and your only safeguard is a visual inspection afterward, some defects will roll off the assembly line. Human coders prevent many bugs by thinking during assembly.

pton_xd · 3 months ago

> Human coders prevent many bugs by thinking during assembly.

I'm far from an AI true believer but come on -- human coders write bugs, tons and tons of bugs. According to Peopleware, software has "an average defect density of one to three defects per hundred lines of code"!

chrisweekly · 3 months ago

THIS.

IMHO more rigorous test automation (including fuzzing and related techniques) is needed. Actually that holds whether AI is involved or not, but probably more so if it is.

Shorn · 3 months ago

And yet, doors still fall off airplanes without any AI in sight.

jstummbillig · 3 months ago

This is not where AI-assisted coding is going. Where it is going is: The AI will quickly become better at avoiding these types of mistakes than humans ever were (and are ever going to be), because they can and thus will be RL'ed away. What will be left standing longest is providing the vision wrt what the actual problem is, you want to solve.

kypro · 3 months ago

> Not software engineers being kicked out and some business person pressing a few buttons to have a fully functional app (as is playing out in a lot of fantasies on LinkedIn & X), but rather experienced engineers using AI to generate bits of code and then meticulously reviewing and testing them.

Why would a human review the code in a few years when AI is far better than the average senior developer? Wouldn't that be as stupid as a human reviewing stockfish's moves in Chess?

danans · 3 months ago

> Not software engineers being kicked out and some business person pressing a few buttons to have a fully functional app (as is playing out in a lot of fantasies on LinkedIn & X)

The theory of enshittification says that "business person pressing a few buttons" approach will be pursued, even if it lowers quality, to save costs, at least until that approach undermines quality so much that it undermines the business model. However, nobody knows how much quality tradeoff tolerance is there to mine.

hooverd · 3 months ago

AI is great for undifferentiated heavy lifting and surfacing knowledge, but by the time I've made all the decisions, I can just write the code that matters myself there.

tkiolp4 · 3 months ago

Why is speed important in this context? If the code is published one week/month later, would that affect what exactly? It’s open source.

kentonv · 3 months ago

As it happens, if this were released a month later, it would have been a huge loss for us.

This OAuth library is a core component of the Workers Remote MCP framework, which we managed to ship the day before the Remote MCP standard dropped.

And because we were there and ready for customers right at the beginning, a whole lot of people ended up building their MCP servers on us, including some big names:

https://blog.cloudflare.com/mcp-demo-day/

(Also if I had spent a month on this instead of a few days, that would be a month I wasn't spending on other things, and I have kind of a lot to do...)

Deleted Comment