I would love to get answers to questions like "which users have access to resource X, including implicitly through one or more assume-role jumps, across these N accounts, including stuff like iam:PassRole, even including tag-based policies?". Add a time dimension too, like "who had access to X between Jun and Aug 2020?", and you'd have a winner. Would such queries be possible here?
I'm also missing a discussion about designing for interruption, either by not keeping state, or by being able to shed state quickly, to be picked up by other instances.
Also, if you set up EC2 spot with a launch template or ASG with very differently-sized instance types (to reduce risk of running out), is there a way to even out the load coming through an ALB? The least-connections scheduling can help in some cases, but a connection might not map 1:1 to one unit of load. The ALB can use weighted balancing, but on the target group level. Dunno how easy it would be to allocate different instance sizes to different target groups and weigh them accordingly.