Not a yearly cadence because back then they only released a new OS version when it was done and had features worth releasing, but even every two years that wasn't a cheap update.
> "The order of RRs in a set is not significant, and need not be preserved by name servers, resolvers, or other parts of the DNS." [from RFC]
> However, RFC 1034 doesn’t clearly specify how message sections relate to RRsets.
The developer(s) was assuming order didn't matter in general, cause the RFC said it didn't for one aspect, and intentionally made a change to order for performance reasons. But it turned out that change did matter.
Mistakes of this kind seem unavoidable, this one doesn't necessary say to me the developers made a mistake i never could or something.
I think the real conclusion is they probably need tests using actual live network stacks with common components, and why didn't they have those? Not just unit tests or with mocks, but tests that would have actually used real getaddrinfo function in glibc and shown it failing?
But also.. the programmers working on the software running one of the most important (end-user) DNS servers in the world:
1. Changes logic in how CNAME responses are formed
2. I assume some tests at least broke that meant they needed to be "fixed up" (y'know - "when a CNAME is queried, I expect this response")
3. No one saw these changes in test behavoir and thought "I wonder if this order is important". Or "We should research more into this", Or "Are other DNS servers changing order", Or "This should be flagged for a very gradual release".
4. Ends up in test environment for, what, a month.. nothing using getaddrinfo from glibc is being used to test this environment or anyone noticed that it was broken
Cloudflare seem to be getting into thr swing of breaking things and then being transparent. But this really reads as a fun "did you know", not a "we broke things again - please still use us".
There's no real RCA except to blame an RFC - but honestly, for a large-scale operation like there's this seems very big to slip through the cracks.
I would make a joke about South Park's oil "I'm sorry".. but they don't even seem to be
OP said:
"However, we did not have any tests asserting the behavior remains consistent due to the ambiguous language in the RFC."
One could guess it's something like -- back when we wrote the tests, years ago, whoever did it missed that this was required, not helped by the fact that the spec proceeded RFC 2119 standardizing the all-caps "MUST" "SHOULD" etc language, which would have helped us translsate specs to tests more completely.
Personally, I prefer the same db unless I were at a traffic scale where splitting them is necessary for load.
One advantage of same db is you can use db transaction control over enqueing jobs and app logic too, when they are dependent. But that's not the main advantage to me, I don't actually need that. I just prefer the simplicity, and as someone else said above, prefer not having to reconcile app db state with queue state if they are separate and only ONE goes down. Fewer moving parts are better in the apps I work on which are relatively small-scale, often "enterprise", etc.
That being said, I regret that we have switched from good_job (https://github.com/bensheldon/good_job). The thing is - Basecamp is a MySQL shop and their policy is not to accept RDMS engine specific queries. You can see in their issues in Github that they try to stick "universal" SQL and are personally mostly concerned how it performs in MySQL(https://github.com/rails/solid_queue/issues/567#issuecomment... , https://github.com/rails/solid_queue/issues/508#issuecomment...). They also still have no support for batch jobs: https://github.com/rails/solid_queue/pull/142 .
I am (and have been for a while, not in a hurry) considering them each as a move off resque.
The main blocker for me with GoodJob is that it uses certain pg-specific features in a way that makes it incompatible with transaction-mode in pgbounder -- that is, it requires persistent sessions. Which is annoying, and is done to get some upper-end performance improvements that I don't think matter for my or most scales. Otherwise, I much prefer GoodJob's development model, trust the maintainer's judgement more, find the code more readable, etc. -- but that's a big But for me.