For the latter, I have a very hard time believing we’ve squeezed most of the juice out of compression already. Surely there’s an absolutely massive amount of low-rank structure in all that redundant data. Yeah, I know these companies already use inverted indices and various sorts of trees, but I would have thought there are more research-y approaches (e.g. low rank tensor decomposition) that if we could figure out how to perform them efficiently would blow the existing methods out of the water. But IDK, I’m not in that industry so maybe I’m overlooking something.
100PB is the total volume of the raw, uncompressed data for the full retention period (180 days). compression is what makes it cost-efficient. on this dataset, we see ~15x compression, so we only store around 6.5PB at rest.
This is a tricky one that's come up recently. How you you quantify the value of $$$ observability platform? Anecdotally I know robust tracing data can help me find problems in 5-15 minutes that would have taken hours or days with manual probing and scouring logs.
Even then you have the additional challenge of quantifying the impact of the original issue.
exactly. high-cardinality, wide structured events are the way.