This is a social problem as much as a technical one: even if you have LaunchDarkly, DataDog etc making very clear that a flag isn't used, getting a team to prioritise cleanup is difficult. Especially if their PM leaned on engineers to make the experiment "quick n dirty" and therefore hard to clean up.
At The Guardian we had a pretty direct way to fix this: experiments were associated with expiry dates, and if your team's experiments expired the build system simply wouldn't process your jobs without outside intervention. Seems harsh, but I've found with many orgs the only way to fix negative externalities in a shared codebase is a tool that says "you broke your promises, now we break your builds".
e.g. "Test was successful so it's rolling out to all users, minus a 0.5% holdback population for the next 2 years"
This then forces the team to maintain the two paths for the long-term, ensuring the team might get re-orged / re-prioritize their projects sometime a year later making the cleanup really hard to eventually enforce.
* maintained a stable version of python within google, and made sure that everything in the monorepo worked with it. in my time on the team we moved from 2.7 to 3.6, then incrementally to 3.11, each update taking months to over a year because the rule at google is if you check any code in, you are responsible for every single breakage it causes
* maintained tools to keep thousands of third party packages constantly updated from their open source versions, with patch queues for the ones that needed google-specific changes
* had highly customised versions of tools like pylint and black, targeted to google's style guide and overall codebase
* contributed to pybind11, and maintained tools for c++ integration
* developed and maintained build system rules for python, including a large effort to move python rules to pure starlark code rather than having them entangled in the blaze/bazel core engine
* developed and maintained a typechecker (pytype) that would do inference on code without type annotations, and work over very large projects with a one-file-at-a-time architecture (this was my primary job at google, ama)
* performed automated refactorings across hundreds of millions of lines of code
and that was just the dev portion of our jobs. we also acted as a help desk of sorts for python users at google, helping troubleshoot tricky issues, and point newcomers in the right direction. plus we worked with a lot of other teams, including the machine learning and AI teams, the colaboratory and IDE teams, teams like protobuf that integrated with and generated python bindings, teams like google cloud who wanted to offer python runtimes to their customers, teams like youtube who had an unusually large system built in python and needed to do extraordinary things to keep it performant and maintainable.
and we did all this for years with fewer than 10 people, most of whom loved the work and the team so much that we just stayed on it for years. also, despite the understaffing, we had managers who were extremely good about maintaining work/life balance and the "marathon, not sprint" approach to work. as i said in another comment, it's the best job i've ever had, and i'll miss it deeply.