Background: I’m a startup founder turned investor. I taught myself (bad) PHP in 2000, and picked up Ruby on Rails in 2011. I’d guess 2015 was the last time I wrote a line of Ruby professionally. Last month, I decided to use Windsurf to build a Rails 8 API backend and React front-end app, using OpenAI's realtime API for voice-to-voice responses. Over the last few days, I also used Claude Code and Gemini 2.5 Pro for some of the trickier features. 35,000 LoC later, this is what I built!
The site uses function-calling to navigate the site in realtime as you chat with the voice assistant, which I think is pretty neat.
For the long version, see https://tomblomfield.com/post/778601470234918912/vibecoding-...
I'd love any feedback you have!
Demo video of the voice assistant: https://www.youtube.com/watch?v=kRhVc9D5kcg
Generate and edit new recipes: https://www.youtube.com/watch?v=VwwZF6dHcHg
35 kLOC is quite a bit. I wonder how straightforward and maintainable this app ended up to be. This would require taking a look at the sources. While good Rails code tends to be very terse, frontend may be quite voluminous.
> I believe within a couple of months, when things like log tailing and automated testing and native version control get implemented
This sounds a bit too optimistic, especially around automated testing, but yes, eventually this all will be there.
> an extremely powerful tool for even non-technical people to write production-quality apps
But why would non-technical people would even think in terms of log tailing and version control, any more than they think about the gauge of wiring in their walls, or the kind of modulation their Wi-Fi device use? For really non-technical audience to make a good use of such tools, it won't just take the AI to be a competent coder. The AI should become a competent architect and a competent senior SWE to translate from the product management language to the software development language, without even surfacing it when not explicitly asked. It's going to be quite a bit of a challenge to make it work, and work about as reliably as with a human team.
Dead Comment
I have entire codebases of embedded software in C without the shortcuts of modern programming languages in way fewer than 35k lines
I think people will have to recalibrate on this. The LOCs do things that you otherwise would not do. Features and details that simply would not happen — because they are too code/time intensive for most projects. It just won't matter anymore.
> But why would non-technical people would even think in terms of log tailing and version control
They won't! They won't have to. The obvious good stuff that everyone thinks the AI tool should be able do, will just work, because the people building the tools, will mostly obviously focus on making them work.
I can't really imagine producing that much code in that short amount of time and holding any amount of it in my head. I'd bet money there's code in there that does the same thing but different, leading to all kinds of little inconsistencies that make this code worthless in any serious context.
Probably the main value engineers have for a maintenance project is context. I wonder what happens when we fully cede context to the machines...
Today, I got a request at work for a feature ("let's offer coupons!") that I thought would take a week. That was until I found out that another engineer wrote most of the code last year, and it'd take him a day to dust off.
I'm totally onboard with, and grateful for, larger-scale experiments like this...thanks for putting the effort in. I wonder how well Cursor (or similar) would handle a situation in which large amounts of code are _almost_ being used. What if 3k LOC accidentally get duplicated? Can our automated systems understand that and fix it? Because if they can't, a human is going to spend a _long_ time trying to figure out what happened.
Over the next 18 months, I expect we'll hear a few stories of the LLM accidentally reimplementing an entire feature in a separate code path. It's a whole new class of bugs! :D
I think in the end AI will be more advanced tool, but a tool nonetheless. Like methodologies and principles, good practises etc. - they only work if you use it with care and added thought and adaptation to your case. DRY it a great principle. But sometimes it's better if you repeat yourself. For one reason or another. And these are the the tradeoffs that human in the loop should be making imho.
I agree. When I read these articles on vibe coding I can't help to think that these guys are basking in the glory of the impressive maze they built around themselves. Of course running these things in production and having them reach the state of legacy code is an entirely different thing. Building a maze is one thing, having to run around it is an entirely different challenge.
It's like one of those world expos: everything looks fantastic, but the moment the event ends everything just crumbles.
The app literally exposes his OpenAI key.
I believe I'm going to need a new oven...
https://www.recipeninja.ai/recipe/r_ttOB5xyqpOLXCL/gluten-fr...
LLMs are super useful but currently, the primary use case is teaching, not doing. For this reason, I think ChatGPT is really just as good as an AI enabled editor (or both if you don't mind paying for two subscriptions).
Also vibe code has a parallel feature, while the code is generating, you are also doing live review and correcting it towards right direction, so depending on your experience, the end product can be a bad mess or wonderful piece of creation and maintenance dream.
The issue with seasoned SWE is that, the moment a mistake(or bad pattern) is made, the baby is thrown with bath water.
For a tiered app like the one presented, 35k LOC is not really that impressive if you think about it. A generic react based front end will easily need a large number of LOC due to modular principle of components, various amounts of hooks and tests(nearly makes us 25-40% of LOC). A business layer will also have many layers of abstractions and numerous impl. to move data between layers.
The vibe code shines, when you let it build one block at a time, limit the scope well and focus. Also, 2-3 weeks is a lot of time to write 35k LOC. at start of any new project, LOC generation rate is very high. But in maintenance phase it significantly falls as smaller changes are more common.
I'm just being honest. For my use case, I would be much better off if LLMs could just do everything.
Lots of apps are quite repetitive: for building APIs for example you generate one controller and the ask the app to generate more using the first ones as a pattern. For frontend you do the same for forms or lists.
Tests are often quite good, but I think they were already great even back in the first ChatGPT release.
With this strategy and the fact that some patterns are quite verbose (albeit understandable for an AI or a reader), it is quite easy to get to a big LoC while still maintaining consistency.
For code? Autocomplete on steroids is the killer-app.
The other things the LLMs give me are prone to be over-engineered/overly verbose code or similar.
I went through a lot of "Why are you also doing $FOO then $BAR? Doesn't seem necessary if we skip them and do $BAZ which will make one or both of those redundant" and it responding "You're right! Lets use $BAZ instead".
And giving them code to make a small change to was pointless - they would often, but not always, make an incidental change far from the point where you asked for the change.
But autocomplete? That works just great and because I've already got context of the code I am writing I can check it in (at most) two seconds and move on.
Depending on the situation this can be invaluable. If you're experienced in the domain you probably know generally what you need to do but you might get a better result by reasoning through the best solution with the constraints and requirements you have. Or maybe you missed something obvious when you write out the full context—which is a required step for getting a good output from the chatbot, and generally isn't a required step if you aren't explaining your approach to someone else and you don't want to be rigorous.
After seeing how people like Andrej Karparthy used vibe coding to generate applications https://x.com/karpathy/status/1903671737780498883?s=61 I realize that
you need to be clear on what you want the LLM to do break down the tasks and give byte sized tasks to llm to do specific thing and sometimes I had to tell it not go and change random files because it found the need to refactor them.
> I struggle to find much utility in terms of actually writing code.
I personally feel you need to give up some control and just let the LLM do its thing if you want to use it to help you build. It honestly does a lot of things in a more verbose way and I've come to the conclusion that it is an LLM writing code for another LLM. As long as I can debug it, I'm okay with the code, as I can develop at a pace that is truly unreal.
I finished my "Recent" contexts feature in a half a day, today. Without the LLM, this would have taken me a week I think. I would say 98% of my code in the past few months has been AI generated. You can see a real life work flow here:
https://app.gitsense.com/?chat=eece40e2-6064-46d2-9bf1-d868c...
I truly believe if you provide a LLM with the right context, it can meet your functional specs 90% of the time. Note the emphasis on functional and not necessary style. And if *YOU* architecture your code properly, it should be 100% maintainable.
I do want to make it clear that what I am doing right now is not novel, but I believe most problems are not. If the problem is not well understood, it can be a challenge like my my chat bridge feature. This feature allows you import Git repos for chatting but I will probably need to rewrite 50% of the LLM code since the solution it built is not scalable.
Do you come across issues like this too or am I not prompting it correctly?
Does it mean it use this expensive open ai audio model in the app? Don't you worry this will make it bankrupt if app goes viral and not monetised?
Can you share what's your strategy here, like topup $2000 open ai account as kind of marketing expenses for users to try for free? Genuine questions since planning to use openai audio API in other case and this kind of expensive price worry me a lot even if switching to new mini-transcribe and mini-tts
On a more serious note: I've found that for debugging difficult issues, o1 Pro is in a league of it's own.
Claude Code's eagerness to do work will often fix things given enough time, especially for self-contained pieces of software, but I still find myself going to o1 Pro more often than I'd expect.
A coworker and I did a comparison the other day, where we fired up o1 Pro and Claude Code with the same refactor. o1 Pro one-shotted it, while Claude Code took a few iterations.
Interestingly enough, the _thinking_ time of o1 Pro led us to just commit the Claude Code changes, as they were both finished in around the same time (1 min 37s vs. 2+ minutes), however we did end up using some feedback from o1 to fix an issue Claude hadn't caught. YMMV
Or, was this mostly just an exercise in engineering/testing AI?
A second, minor problem of your website is that the images illustrating recipes are AI generated with a bad quality
You can't solve those issues by throwing more AI.. well maybe the second problem you can (AI images with later models are generally ok)