No, a lack of monitoring cost you $10K. Your app was throwing a database exception and nobody was alerted that this was not only happening, but happening continuously and in large volumes. Such an alert would have made this a 5-minute investigation rather than 5 days.
If you haven't fixed that alerting deficiency, then you haven't really fixed anything.
This gets more and more common, companies and founders does not think about the infrastructure since they believe that their cloud provider of choice is going to do it for them with it's magic.
As soon as you expect paying customers in your system you need to have someone with the knowledge and experience to deal with infrastructure. That means logging, monitoring, alerting, security etc.
Sure not having tests, is bad. Doing thing with AI without triple checking is dangerous.
But not having error logging/alerts on your db ? That's the crazy part.
This is a new product, is not legacy code from 20 years ago when they thought it was a neat idea to just throw stuff at the db raw, and check for db errors to do data validation, so alerts are hard because there's so many expected errrors.
I think deploying and then going to sleep is the red flag here. They should have deployed the change at 9am or something and had the workday to monitor issues.
While I agree, it wouldn't have helped. The first n sign-ups per commit worked. So the problem didn't manifest during the day, only when they stopped committing.
Now granted, if they'd carried on doing stuff (but not redeployed production) it may have shown up in say mid-aftenoon, or not depending on volume.
And of course the general guideline of not deploying anything to production early in the day, or on Friday, is still valid.
You can deploy and go to sleep if you have monitoring and alerting and someone getting paged. It shouldn't be a human monitoring for issues anyway, so the only reason to choose 9am over bedtime should be that you don't want to risk a late night page, not that someone will actually be checking up actively on the deployment.
Yeah if one instance created one ID, then any integration tests creating more than one user would have failed. There were no testing or logging on a system with live users while doing a refactor between two dynamic languages
TBH, if the backend were written in Go, this probably wouldn’t have happened to the extent it did. Somewhere in a log a descriptive error would have shown up.
One of the reasons I use Go whenever possible is that it removes a lot of the classic Python footguns. If you are going to rewrite your backend from
Javascript, why would you rewrite it in another untyped, error-prone language?
> I want to preface this by saying yes the practices here are very bad and embarrassing (and we've since added robust unit/integration tests and alerting/logging), could/should have been avoided, were human errors beyond anything, and very obvious in hindsight.
>
> This was from a different time under large time constraints at the very earliest stages (first few weeks) of a company. I'm mostly just sharing this as a funny story with unique circumstances surrounding bug reproducibility in prod (due again to our own stupidity) Please read with that in mind
If they would have made that mistake by writing code and just misunderstood something or oversaw the problem, fine. But making this mistake by copy-pasting from ChatGPT without proper review is just terrible.
If you code for a hobby/fun, yeah, sure, it's a silly mistake.
If you're earning past six figures, are part of a team of programmers, call yourself an professional / engineer, and have technical management above you like a VP of Engineering, yadda yadda....then it's closer to systematic failure of the company's engineering practices than "mistake."
There is a reason we call it software engineering, not software fuckarounding (or, cough, "DevOps Engineeer".)
Software engineering practices assume people are going to make mistakes, and implements procedures to reduce the chances of that making it into production, and reduce the impact of those mistakes if they do make it into production.
I spotted the error instantly. With all due respect to your team - this has nothing to do with ChatGPT and everything to do with using a programming model that your team does not have sufficient expertise in. Even if this error managed to slip by code review, it would have been caught with virtually any monitoring solution, many of which take less than 5 minutes to set up.
To be fair, if I wasn't looking for this bug I never would have spotted it. That being said, you're entirely right that any monitoring or even the most basic manual testing should have instantly caught this.
This. And it seems the team wasn't able to do basic troubleshooting from either database or application log. This was a simple error - what will happen when transient errors (such as implicit locks on tables, etc) occurs. These guys shouldn't be writing code - at all.
It's not some innocent mistake. The title is purposefully clickbait / keyword-y, implying that it was chatgpt that made the 'mistake' for SEO and to generate panicked clicks.
"We made a programming error in our use of an LLM, didn't do any QA, and it cost us $10k" doesn't generate the C-suite "oh shit what if ChatGPT fucks up, what's our exposure!?" reaction. There's a million middle and upper management posting this article on LinkedIn, guaranteed.
It's like the Mr. Beast open-mouth-surprised expression thumbnail nonsense; you feel incredibly compelled to click it.
While we're on the subject: LLMs can't make "mistakes." They are not deterministic.
They cannot reason, think, or do logic.
They are very fancy word salad generators that use a lot of statistical probabilities. By definition they're not capable of "mistakes" because nothing they generate is remotely guaranteed to be correct or accurate.
Edit: The mods boosted the post; it got downvoted into oblivion, for obvious reasons, and then skyrocketed instantly in rank, which means they boosted it: https://hnrankings.info/40627558/
Hilarious that a post which is insanely clickbait (which the rules say should result in a title rewrite) got boosted by the mods.
>By definition they're not capable of "mistakes" because nothing they generate is remotely guaranteed to be correct or accurate.
This makes no sense. Only things that are guaranteed to be correct or accurate can make mistakes? Everyone knows what "mistake" means in this context. Nobody cares what your preferred definition of mistake is.
> By definition they're not capable of "mistakes" because nothing they generate is remotely guaranteed to be correct or accurate.
By that logic, nothing is capable of making mistakes :D.
> Hilarious that a post which is insanely clickbait (which the rules say should result in a title rewrite) got boosted by the mods.
You have a distorted view of what clickbait is and the rules of this site. I suggest you go calm down and try to stop hating on a technology which is just that: a technology! Like any other, it can be misused, but think about why exactly you feel so passionate about this particular technology.
Interestingly, you know who else spotted the error? ChatGPT-4o. Annoyingly you can't share a chat with an image in it, but pasting in the image of the bad code, and prompting "whats wrong with the code" got ChatGPT to tell me that:
* UUID Generation in Primary Key: The default parameter should use the callable uuid.uuid4 directly instead of str(uuid.uuid4()). SQLAlchemy will call the function to generate the value.
* Date Default Value: server_default=text("(now())") might not work as expected. Use func.now() for server-side defaults in SQLAlchemy.
* Import Statements: Ensure uuid and text from sqlalchemy are imported.
* Column Definitions: Consider using DateTime(timezone=True) for datetime columns to handle time zones.
It then provided me with corrected code that does
id = Column(String, primary_key=True, default=lambda: str(uuid.uuid4()), unique=True, nullable=False)
ChatGPT-4o might spot it when asking about the code directly, but this was a conversion from js to python, errors where chatgpt/copilot or any other AI will allucinate or make mistakes to be as close as the original code are very common in my experience.
The other common issue is if the original code has thinsg chatgpt doesn't like (misspell, slightly wrong formatting) it will fix it automatically, or if he really think you should have added a particular field you didn't add.
Having no real experience with python I would assume uuid.uuid4() was some schema definition (like in prisma), so honestly the fact that this bug exists is not surprising at all and I would have done the same mistake myself, but yah one kubectl logs would have been able to catch it immediately.
...also from next.js and prisma to python? ...what?
I understand how the mistake was made, it seems relatively easy to slip by even when writing code without ChatGPT.
But what I don't understand is how this wasn't caught after the first failure? Does this company not have any logging? Shouldn't the fact the backend is attempting to reuse UUIDs be immediately obvious from observing the error?
They didn’t even know there was an error until the customers came ringing. You always want to know what errors happened before your customers do, logging, alerting, any monitoring at all would have helped them here.
I guarantee you that they _will_ have another production bug like this sometime in the future (every fast paced project will). You'd hope this next one wont take 5 days to identify.
it wasn't 5 days of only working on this one problem though, it was over 5 calendar days. even if something is gonna take me one day to implement, it's gonna take three days to get enough focus time between all the other meetings and fires to put out in order to actually get a day's worth of coding done
Yeah, I agree with this here. I think it's totally reasonable that something as specific as multiple Stripe subscriptions wouldn't be exercised by normal unit testing; as mentioned in the post, this wouldn't have been an easy error to reproduce via an acceptance test; and I think the focus on ChatGPT is overblown (by both the OP and everyone else) and mistakenly passing a String instead of a Callable to a function that accepts either happens all the time. My gut instinct is that not using an ORM would have prevented this particular issue, but that may just be my bias against ORMs speaking; one could easily imagine a similar bug occurring in a non-database context. My real conclusion is that all the folks crowing that they would have definitely caught this bug are either much better engineers than I am, or (more likely) are just a bit deluded about their own abilities.
I am also very confused about the apparent lack of logging or recourse to logging. It's been a while, but if I recall correctly ECS should automatically propagate the resulting Duplicate Key exceptions which were presumably occurring to CloudWatch without a bunch of additional configuration - was that not happening? If it was happening, did no one think to go check what types of Exceptions were happening overnight?
I have seen the same mistake made in code created by humans. Many times, especially in react / typescript/ JavaScript, someone will forget to use a lambda.
I felt the blog post failed to articulate the root cause of the issue and went straight to blaming ChatGPT.
When you rush and make large or non peer code reviewed commits to main it is going to happen.
The real issue was when you rush, take shortcuts and don’t adequately test and peer code review then errors will occur.
I would have imagined that a test that tried a few different signup options would have found the issue immediately.
My mental model for ChatGPT is that it’s an entry-level engineer that will never be promoted to a terminal level and will eventually be let go.
However, this engineer can type infinitely fast, which means it might be useful if used very carefully.
Anyway, letting such a person near financially important code would lead to similar issues, and in both cases, I’d question the judgment of the person that decided to deploy the code at all, let alone without much testing.
This is kind of how it works with Real Engineering™ and other licensed professions - it's the non-licensed people doing most of the grunt work, the PE/architect/licensed professional reviews and signs off on it. But then, by virtue of their signature, they're still on the hook for any problems.
This sort of issue seems common in a few places. E.g. Vue's component props can have defaults, and woe betide you if you use a literal object or array as a default, instead of a function that returns an object or array.
I'm surprised there was no lint rule for this case.
Dear god, I thought I was taking crazy pills. After saying the same thing (a rewrite this early is insane) I was scanning the comments and no one else was pointing this out. I have no clue what would drive someone to rewrite this early (with or without customers) for what is effectively a lateral move (node to python). If you had hundreds of customers and wanted to rewrite in Go or similar then maybe (I still question even that).
> This is the eye opener for me, how is a startup justifying a re-write when they don't even have customers?
In my case (with a real project I'm working on now), it'd be due to realizing that C# is a great language and has a good runtime and web frameworks, but at the same time drags down development velocity and has some pain points which just keep mounting, such as needing to create bunches of different DTO objects yet AutoMapper refusing to work with my particular versions of everything and project configuration, as well as both Entity Framework and the JSON serializer/deserializer giving me more trouble than it's worth.
Could the pain points be addressed through gradual work, which oftentimes involves various hacks and deep dives in the docs, as well as upgrading a bunch of packages and rewriting configuration along the way? Sure. But I'm human and the human desire is to grab a metaphorical can of gasoline, burn everything down and make the second system better (of course, it might not actually be better, just have different pain points, while not even doing everything the first system did, nor do it correctly).
Then again, even in my professional career, I get the same feeling whenever I look at any "legacy" or just cumbersome system and it does take an active, persistent effort on my part to not give in to the part of my brain that is screaming for a rewrite. Sometimes rewrites actually go great (or architectural changes, such as introducing containers), more often than not everything goes down in a ball of flames and/or endless amounts of work.
I'm glad that I don't give in, outside of the cases where I know with a high degree of confidence that it would improve things for people, either how the system runs, or the developer experience for others.
You don't need to make DTOs when you don't have to, using AutoMapper is considered a bad practice and is heavily discouraged (if you do have to use a tool like that, there are alternatives like Mapperly which are zero-cost to use and will give you built-time information on what doesn't map without having to run the application).
Hell, most simple applications could do with just a single layer - schema registration in EF Core is mapping, or at most two, one for DB and one for response contracts.
Just do it the simplest way you can. I understand that culture in some companies might be a problem, and it's been historically an issue plaguing .NET, spilling over, originally, from Java enterprise world. But I promise you there are teams which do not do this kind of nonsense.
Things really have improved since .NET Framework days, EF Core productivity wise, while similar in its strong areas, is pretty much an entirely new solution everywhere else.
No, ChatGPT made you the money that your app generated since you had no ability to implement it otherwise/without ChatGPT. Your inability to code, debug, log, monitor cost you the $10k. ChatGPT is net positive in this story.
Looking at this team's project at github.com/reworkd, it clearly tells the maturity of the product as well as the team. Emoji driven development. Emoji's for all commit messages. Monkeys, bananas, rockets, fireworks, you name it, they have it in their commit message.
Just wondering what's the end game of these AI startups. YC would reject many other reasonably sound ideas I hear but this space is highly speculative and I don't see any such down-the-stream-chatgpt-wrapper startup reaching a billion dollar IPO.
It was already implemented, seems like they had ability there:
> Our project was originally full stack NextJS but we wanted to first migrate everything to Python/FastAPI.
> What happened was that as part of our backend migration, we were translating database models from Prisma/Typescript into Python/SQLAlchemy. This was really tedious. We found that ChatGPT did a pretty exceptional job doing this translation and so we used it for almost the entire migration.
ChatGPT wasn't a net positive if they wouldn't have tried to do this migration up-front without it.
Possibly they had better error logging in the other stack, possibly they didn't, possibly they needed it less because they were actually writing the code for it themselves and knew how it worked.
("Write all the code a second time before turning on monetization" is itself an interesting decision, of course.)
"Note: I want to preface this by saying yes the practices here are bad and could have been avoided. This was from a different time under large time constraints. Please read with that in mind"
These "constraints" are why I'm terrified of subscribing to software
I have immense respect for the OP for writing up the story, and even more so for giving this preface. It's really useful to know what mistakes other people make, but can be quite embarrassing to tell others about mistakes you've made. Thanks, OP.
A local cinema's shitty website charged me and, immediately after, my wifi disconnected for a few seconds. This somehow crashed their server entirely for several minutes, failing to send me the tickets even when it got back up.
It took me several threatening emails to make them understand they had already taken my money and I wasn't going to try again until I got a refund. Now I'm paranoid any time I purchase online at mediocre shops.
I tried subway online ordering once almost a year ago. The last final screen crashed and deleted the entire order from my cart. Anyways, you're probably wondering if the order actually went through but subway's UI was shit and nearly tricked me into replacing the same order a second time: yes.
out of curiosity, what's the technical solution? My guess: add an idempotency key to the request and a message queue? Then when you try to consume it, you check whether that request was made previously.
If you haven't fixed that alerting deficiency, then you haven't really fixed anything.
Programming when everything works is easy, it's handling the problems that makes it hard.
"Under construction "
Looks like the OP removed the post?
EDIT: Found archived copy of post: http://web.archive.org/web/20240609213809/https://asim.bearb...
As soon as you expect paying customers in your system you need to have someone with the knowledge and experience to deal with infrastructure. That means logging, monitoring, alerting, security etc.
DevOps.. amateurs.
But not having error logging/alerts on your db ? That's the crazy part.
This is a new product, is not legacy code from 20 years ago when they thought it was a neat idea to just throw stuff at the db raw, and check for db errors to do data validation, so alerts are hard because there's so many expected errrors.
Now granted, if they'd carried on doing stuff (but not redeployed production) it may have shown up in say mid-aftenoon, or not depending on volume.
And of course the general guideline of not deploying anything to production early in the day, or on Friday, is still valid.
One of the reasons I use Go whenever possible is that it removes a lot of the classic Python footguns. If you are going to rewrite your backend from Javascript, why would you rewrite it in another untyped, error-prone language?
In go, I've definitely seen:
That would have masked this error so it didn't get logged by the application.In python, if you ignore an exception entirely, like I did that error above, you instead get an exception logged by default.
Python's exceptions also include line numbers, where as Go errors by default wouldn't show you _which_ object has a conflict, even if you logged it.
In general, python's logs are way better than Go's, and exceptions make it way harder to ignore errors entirely than Go's strategy.
https://web.archive.org/web/20240610032818/https://asim.bear...
The author has added an important edit:
> I want to preface this by saying yes the practices here are very bad and embarrassing (and we've since added robust unit/integration tests and alerting/logging), could/should have been avoided, were human errors beyond anything, and very obvious in hindsight.
>
> This was from a different time under large time constraints at the very earliest stages (first few weeks) of a company. I'm mostly just sharing this as a funny story with unique circumstances surrounding bug reproducibility in prod (due again to our own stupidity) Please read with that in mind
They did make a silly mistake, but we are humans, and humans, be it individually or collectively, do make silly mistakes.
If you're earning past six figures, are part of a team of programmers, call yourself an professional / engineer, and have technical management above you like a VP of Engineering, yadda yadda....then it's closer to systematic failure of the company's engineering practices than "mistake."
There is a reason we call it software engineering, not software fuckarounding (or, cough, "DevOps Engineeer".)
Software engineering practices assume people are going to make mistakes, and implements procedures to reduce the chances of that making it into production, and reduce the impact of those mistakes if they do make it into production.
https://0912i390129ionkjan.bearblog.dev/how-a-single-chatgpt...
but google cache still serves a copy...
https://webcache.googleusercontent.com/search?q=cache%3Ahttp...
Dead Comment
I did step into that particular trap more than once (passing the result, rather than the function)
"We made a programming error in our use of an LLM, didn't do any QA, and it cost us $10k" doesn't generate the C-suite "oh shit what if ChatGPT fucks up, what's our exposure!?" reaction. There's a million middle and upper management posting this article on LinkedIn, guaranteed.
It's like the Mr. Beast open-mouth-surprised expression thumbnail nonsense; you feel incredibly compelled to click it.
While we're on the subject: LLMs can't make "mistakes." They are not deterministic.
They cannot reason, think, or do logic.
They are very fancy word salad generators that use a lot of statistical probabilities. By definition they're not capable of "mistakes" because nothing they generate is remotely guaranteed to be correct or accurate.
Edit: The mods boosted the post; it got downvoted into oblivion, for obvious reasons, and then skyrocketed instantly in rank, which means they boosted it: https://hnrankings.info/40627558/
Hilarious that a post which is insanely clickbait (which the rules say should result in a title rewrite) got boosted by the mods.
I'm sure it's a complete coincidence that the story was apparently authored by someone at a Ycombinator company: https://news.ycombinator.com/item?id=40629998
> It's like the Mr. Beast open-mouth-surprised expression thumbnail nonsense; you feel incredibly compelled to click it.
I feel incredibly compelled to ignore it.
Sponsorblock is great for combating that. (Altough, I conciously avoid channels that mostly do clickbait anyways)
This makes no sense. Only things that are guaranteed to be correct or accurate can make mistakes? Everyone knows what "mistake" means in this context. Nobody cares what your preferred definition of mistake is.
By that logic, nothing is capable of making mistakes :D.
> Hilarious that a post which is insanely clickbait (which the rules say should result in a title rewrite) got boosted by the mods.
You have a distorted view of what clickbait is and the rules of this site. I suggest you go calm down and try to stop hating on a technology which is just that: a technology! Like any other, it can be misused, but think about why exactly you feel so passionate about this particular technology.
* UUID Generation in Primary Key: The default parameter should use the callable uuid.uuid4 directly instead of str(uuid.uuid4()). SQLAlchemy will call the function to generate the value.
* Date Default Value: server_default=text("(now())") might not work as expected. Use func.now() for server-side defaults in SQLAlchemy.
* Import Statements: Ensure uuid and text from sqlalchemy are imported.
* Column Definitions: Consider using DateTime(timezone=True) for datetime columns to handle time zones.
It then provided me with corrected code that does
where the addition of lambda: fixes the problem.The other common issue is if the original code has thinsg chatgpt doesn't like (misspell, slightly wrong formatting) it will fix it automatically, or if he really think you should have added a particular field you didn't add.
...also from next.js and prisma to python? ...what?
Deleted Comment
But what I don't understand is how this wasn't caught after the first failure? Does this company not have any logging? Shouldn't the fact the backend is attempting to reuse UUIDs be immediately obvious from observing the error?
I guarantee you that they _will_ have another production bug like this sometime in the future (every fast paced project will). You'd hope this next one wont take 5 days to identify.
I am also very confused about the apparent lack of logging or recourse to logging. It's been a while, but if I recall correctly ECS should automatically propagate the resulting Duplicate Key exceptions which were presumably occurring to CloudWatch without a bunch of additional configuration - was that not happening? If it was happening, did no one think to go check what types of Exceptions were happening overnight?
Specifically asking why did it take so long to detect and why did it take so long to diagnose is useful in these situations.
Type 1 tries to find the error message and figure out what it really, really means by breaking down the error message and system.
Type 2 does trial and error on random related things until the problem goes away.
I hate to say that I've seen way more type 2s engineers than type 1, but maybe I’m working at the wrong companies.
Here we are talking 1.65 MILLION CAD $ backed YC company
I felt the blog post failed to articulate the root cause of the issue and went straight to blaming ChatGPT.
When you rush and make large or non peer code reviewed commits to main it is going to happen.
The real issue was when you rush, take shortcuts and don’t adequately test and peer code review then errors will occur.
I would have imagined that a test that tried a few different signup options would have found the issue immediately.
However, this engineer can type infinitely fast, which means it might be useful if used very carefully.
Anyway, letting such a person near financially important code would lead to similar issues, and in both cases, I’d question the judgment of the person that decided to deploy the code at all, let alone without much testing.
I'm surprised there was no lint rule for this case.
/s
(I hope)
This is the eye opener for me, how is a startup justifying a re-write when they don't even have customers?
Perhaps also the tooling because any remotely decent IDE should show an error there, let alone the potential warnings of some code analysis software.
In my case (with a real project I'm working on now), it'd be due to realizing that C# is a great language and has a good runtime and web frameworks, but at the same time drags down development velocity and has some pain points which just keep mounting, such as needing to create bunches of different DTO objects yet AutoMapper refusing to work with my particular versions of everything and project configuration, as well as both Entity Framework and the JSON serializer/deserializer giving me more trouble than it's worth.
Could the pain points be addressed through gradual work, which oftentimes involves various hacks and deep dives in the docs, as well as upgrading a bunch of packages and rewriting configuration along the way? Sure. But I'm human and the human desire is to grab a metaphorical can of gasoline, burn everything down and make the second system better (of course, it might not actually be better, just have different pain points, while not even doing everything the first system did, nor do it correctly).
Then again, even in my professional career, I get the same feeling whenever I look at any "legacy" or just cumbersome system and it does take an active, persistent effort on my part to not give in to the part of my brain that is screaming for a rewrite. Sometimes rewrites actually go great (or architectural changes, such as introducing containers), more often than not everything goes down in a ball of flames and/or endless amounts of work.
I'm glad that I don't give in, outside of the cases where I know with a high degree of confidence that it would improve things for people, either how the system runs, or the developer experience for others.
Hell, most simple applications could do with just a single layer - schema registration in EF Core is mapping, or at most two, one for DB and one for response contracts.
Just do it the simplest way you can. I understand that culture in some companies might be a problem, and it's been historically an issue plaguing .NET, spilling over, originally, from Java enterprise world. But I promise you there are teams which do not do this kind of nonsense.
Things really have improved since .NET Framework days, EF Core productivity wise, while similar in its strong areas, is pretty much an entirely new solution everywhere else.
instead of `bug: fix blah`, it's `:bug:: fix blah`, which, honestly actually seems clearer and easier to parse at a glance
edit: hacker news doesn't support unicode emojis
https://grook.ai/share?id=e269e88a7b1a71eff4f176c864b30161&x...
> Our project was originally full stack NextJS but we wanted to first migrate everything to Python/FastAPI.
> What happened was that as part of our backend migration, we were translating database models from Prisma/Typescript into Python/SQLAlchemy. This was really tedious. We found that ChatGPT did a pretty exceptional job doing this translation and so we used it for almost the entire migration.
ChatGPT wasn't a net positive if they wouldn't have tried to do this migration up-front without it.
Possibly they had better error logging in the other stack, possibly they didn't, possibly they needed it less because they were actually writing the code for it themselves and knew how it worked.
("Write all the code a second time before turning on monetization" is itself an interesting decision, of course.)
These "constraints" are why I'm terrified of subscribing to software
Deleted Comment
The previous world where you buying per user seat licenses for hundreds and hundreds of dollars wasn't great.
We had race conditions where we would charge users twice
This has made me paranoid that any time I see timeout or error related to money I assume it went through and come back later.
It took me several threatening emails to make them understand they had already taken my money and I wasn't going to try again until I got a refund. Now I'm paranoid any time I purchase online at mediocre shops.