One thing I’d add is that usefulness also seems to depend on whether the task can be broken down. When you can split work into small pieces that are easy to check, generative models tend to work really well. But once those pieces start depending heavily on each other and the design constraints pile up, the pattern you describe shows up pretty clearly.