Are Americans not embarrassed by the way these tech bros operate? As a European it’s obvious that the US gone from an allied to an enemy. I would feel like a traitor if I picked US tech these days.
I will report OpenAI to the data protection agency in my country and I encourage others to do the same. They can not blame Mixpanel when they sprinkle others personal information around like this. NOT OK.
But the case for Cloudflare here is complicated. Every engineer is very free to make a better system though.
Cloudflare builds a global scale system, not an iphone app. Please act like it.
Is that an overreaction?
Name me global, redundant systems that have not (yet) failed.
And if you used cloudflare to protect against botnet and now go off cloudflare... you are vulnerable and may experience more downtime if you cannot swallow the traffic.
I mean no service have 100% uptime - just that some have more nines than others.
But at the same time, what value do they add if they:
* Took down the the customers sites due to their bug.
* Never protected against an attack that our infra could not have handled by itself.
* Don't think that they will be able to handle the "next big ddos" attack.
It's just an extra layer of complexity for us. I'm sure there are attacks that could help our customers with, that's why we're using them in the first place. But until the customers are hit with multiple ddos attacks that we can not handle ourself then it's just not worth it.
1. Their bot management system is designed to push a configuration out to their entire network rapidly. This is necessary so they can rapidly respond to attacks, but it creates risk as compared to systems that roll out changes gradually.
2. Despite the elevated risk of system wide rapid config propagation, it took them 2 hours to identify the config as the proximate cause, and another hour to roll it back.
SOP for stuff breaking is you roll back to a known good state. If you roll out gradually and your canaries break, you have a clear signal to roll back. Here was a special case where they needed their system to rapidly propagate changes everywhere, which is a huge risk, but didn’t quite have the visibility and rapid rollback capability in place to match that risk.
While it’s certainly useful to examine the root cause in the code, you’re never going to have defect free code. Reliability isn’t just about avoiding bugs. It’s about understanding how to give yourself clear visibility into the relationship between changes and behavior and the rollback capability to quickly revert to a known good state.
Cloudflare has done an amazing job with availability for many years and their Rust code now powers 20% of internet traffic. Truly a great team.
How can you write the proxy without handling the config containing more than the maximum features limit you set yourself?
How can the database export query not have a limit set if there is a hard limit on number of features?
Why do they do non-critical changes in production before testing in a stage environment?
Why did they think this was a cyberattack and only after two hours realize it was the config file?
Why are they that afraid of a botnet? Does not leave me confident that they will handle the next Aisuru attack.
I'm migrating my customers off Cloudflare. I don't think they can swallow the next botnet attacks and everyone on Cloudflare go down with the ship, so it will be safer to not be behind Cloudflare when it hits.
Dead Comment
Best on-call I’ve had.
It's really easy to make people whole for this, so whether that happens or not is the difference between the apologies being real or just them just backpedaling because employees got upset.
Edit: Looks like they're doing the right thing here:
> Altman’s initial statement was criticized for doing too little to make things right for former employees, but in an emailed statement, OpenAI told me that “we are identifying and reaching out to former employees who signed a standard exit agreement to make it clear that OpenAI has not and will not cancel their vested equity and releases them from nondisparagement obligations” — which goes much further toward fixing their mistake.
Looks like they’re doing that.
By the time the report was originally sent the feature was just released, and while we never deployed a code change to directly address it, it wouldn't be the first time that we receive something that I believe it was genuinely a security issue and stopped being reproducible due to an seemingly unrelated change around the same time.