Deleted Comment
1. AI is making unit tests nearly free. It's a no-brainer to ask Copilot/Cursor/insert-your-tool-here to include tests with your code. The bonus is that it forces better habits like dependency injection just to make the AI's job possible. This craters the "cost" side of the equation for basic coverage.
2. At the same time, software is increasingly complex: a system of a frontend, backend, 3rd-party APIs, mobile clients, etc. A million passing unit tests and 100% test coverage mean nothing in a world when a tiny contract change breaks the whole app. In our experience the thing that gives us the most confidence is black-box, end-to-end testing that tests things exactly as a real user would see them.
See the book "How Google tests software" (by James A. Whittaker, 2012) and the Pragmatic Engineer blog has a good post on how big tech does QA: https://newsletter.pragmaticengineer.com/p/qa-across-tech
The funny thing is that the parsing library was correct and it was the test property that was wrong—but I still learned about an edge case I had never considered!
This has been a common pattern for "simpler" property-based tests I've written: I write a test, it fails right away, and it turns out that my code is fine, but my property was wrong. And this is almost always useful; if I write an incorrect property, it means that some assumption I had about my code is incorrect, so the exercise directly improves my conceptual model of whatever I'm doing.
1) lightweight, because most of our test suites run on production infrastructure and can’t afford to run them constantly
2) "creative", to find bugs we hadn’t considered before
Probabilistic test scenarios allow us to increase the surface we're testing without needing to exhaustively test every scenario.
...not to mention that automated tests are by definition bot traffic, and websites do/should have protections against spam. Cloudflare or AWS WAF tends to filter out some of our AWS DeviceFarm tests, and running automated tests directly from EC2 instances is pretty much guaranteed to be caught by Captcha. Which is not a complaint: this is literally what they were designed to do.
A way to mitigate this issue is to implement "test-only" user agents or tokens to make sure that synthetic requests are distinguishable from real ones, but that means that our code does something in testing that it doesn't do in "real life". (The full Volkswagen effect.)
Dead Comment
Using the Lindy Effect for guidance, I've built a stack/framework that works across 20 years of different versions of these languages, which increases the chances of it continuing to work without breaking changes for another 20 years.
From my side I think it’s more useful to focus on surfacing issues early. We want to know about bugs, slowdowns, regressions before they hit users, so everything we write is written using TDD. But because unit tests are couple with the environment they "rot" together. So we usually set up monitoring, integration and black-box tests super early on and keep them running as long as the project is online.