Dead Comment
If you need to call out of the the AZ for other data or API sources, either figure out which AZs that service is using and configure to stay in them, or, make sure to go all the way back out to a well-balanced (for resilience) endpoint.
You can survive datacenter (AZ) outage IF you have separate stacks per AZ and don't mix traffic. If you have Kafka cluster spread out in 3 AZ don't get surprised if you just LOWERED your availability because any issue in one AZ makes your stack unstable. And issues in single AZ are quite common.
PromQL support (with extensions) and clustered / HA mode. Great storage efficiency. Plays well for monitoring multiple k8s clusters, works great with Grafana, pretty easily deployed on k8s.
No affiliation, just a happy user.
If you're looking at scaling your Prometheus setup - check out also Victoria Metrics.
Operational simplicity and scalability/robustness are what drive me to it.
I used to to send metrics from multiple Kubernetes clusters with Prometheus - each cluster having Prom with remote_write directive to send metrics to central VictoriaMetrics service.
That way my "edge" prometheus installations are practically "stateless", easily set up using prometheus-operator. You don't even need to add persistent storage to them.
I work on space craft flight software and exactly none of it is formally verified. As far as I know, it's not practical to do so. If we could do it (without ballooning cost and schedule 10x), we would.