I'm currently running a shared RDS cluster, for instance, that's used in a platform capacity. Workloads are isolated from each other; there's a clear pipeline for my team and others to update it; and the eventual goal is to build out monitoring to make it evident which workloads are putting stress on the system, so that the owners can respond. We chose this architecture to save on infrastructure costs, at the expense of marginally higher operational costs, as we explicitly wanted to avoid a proliferation of tiny RDS clusters with each new service. The expectation is that the higher operational cost will ultimately be lower than what infrastructure costs would be.
Our operational costs are more distributed, they are more but they indicate the cost of each product and they separate our data very well.
Teams can do their own migrations, and we can prevent some rogue service from violating the data.
We are currently in progress for a very large migration from what you have. But I'm sure we just did it wrong.
2. Try a mutex
3. If that doesn't work, try adding a condition variable.
4. If that still doesn't work, try an atomic in default sequentially consistent mode or equivalent (ex: Java volatile, InterlockedAdd, and the like). Warning: atomics are very subtle. Definitely have a review with an expert if you are here.
5. If that still doesn't work, consider lock free paradigms. That is, combinations of atomics and memory barriers.
6. If that still doesn't work, publish a paper on your problem lol.
---------
#1 is my most important piece of advice. There was a Blender render I was doing, like 2.6 or something old a few years ago. Blenders parallelism wasn't too good and only utilized 25% of my computer.
So I ran 4 instances of headless Blender. Bam, 100% utilization. Done.
Don't overthink parallelism. It's stupid easy sometimes, as easy as a & on the end of your shell command.