We got a lot of critical infra running on them and then slowly there was tech-debt that would start accumulating. Clusters have to get updated , older DNS versions in k8s are slow, networking (Older Weave versions was bursting through the seams when the traffic exploded with many applications onboarded). SRE teams get overwhelmed, constant requests for adding PVC (Kafka & C* was on k8s) took a toll. Sanity prevailed in the end, there was decision to move to hosted PaaS infra, though I no longer work there, I just reminisced what we were going through.
Though a "cloud-independent" solution will save pennies, it will definitely drown dollars in personnel costs and the uptime/SLA
History repeats itself, because we don't learn from our mistakes (us or others)
In v1.4 release Uptrace got built-in alerting capabilities that allow to monitor metrics, manage alerts, and receive notifications without using AlertManager. But AlertManager is still supported.
While OpenTelemetry remains to be the main source of telemetry data, Uptrace also provides integrations for Prometheus metrics, Vector logs, FluentBit, Sentry, and CloudWatch.
You can quickly try Uptrace by:
- Running a Docker example: https://github.com/uptrace/uptrace/tree/master/example/docke...
- Checking Uptrace Cloud demo: https://app.uptrace.dev/play
If you have any feedback or questions, let me know in the comments!