OpenData Timeseries: Prometheus-compatible metrics on object storage
Posted by apurvamehta 1 day ago
Comments
Comment by hagen1778 1 day ago
But if you do compare, VictoriaMetrics cloud for 3Mil active series and twice higher ingestion rate (100K samples/s or 30s scrape interval) will cost you ~$1k/month + storage costs.
Comment by apurvamehta 1 day ago
It's a meaningful change in calculation of running yourself vs paying someone to do it for you IMO.
Comment by agavra 1 day ago
Comment by mdwaud 1 day ago
> None of these numbers are exact, but the structural gap is clear: a handful of nodes costing roughly $560/month versus $10,000-20,000/month for a managed service at the same scale. As we explained earlier, it’s practical to operate OpenData Timeseries yourself and fully realize these massive cost savings since it isn’t a traditional distributed database that manages partitioned and replicated state.
It doesn't look 100% turn-key, but those are compelling numbers.
Comment by apurvamehta 1 day ago
Comment by agavra 1 day ago
It's definitely not quite turn key just yet but we've been dogfooding it in production against a moderate metrics use case (~30k samples/s) and have it hooked up to grafana (you just configure a prometheus source and point to your deployed URL). We run it on a single node with no replicas ;)
Comment by davistreybig 1 day ago
Comment by valyala 1 day ago
Comment by apurvamehta 1 day ago
1. the reason it's slow as you select more series over longer periods of time is that the series has to be pulled for each time bucket in the range, and then the samples have to be pulled for each bucket. By compacting older buckets and merging samples together, historical queries should be pretty comparable to 'more recent' cold queries. 2. We don't pre-cache all the metadata today. If we did that, then we could parallelize sample loads much more efficiently, lowering latency. 3. There is a lot of room to do better batching and tune the parallelism of cold reads.
We've only been at this for a couple of months. THe techniques to improve latency on object storage are well known, we just have to implement them.
Another benefit is this: all the data is on S3, so spinning up more optimized readers to transform older data to do more detailed analysis is also an option with this architecture.
Comment by valyala 1 day ago
Comment by agavra 1 day ago
Comment by apurvamehta 1 day ago
Comment by agavra 1 day ago
Comment by hagen1778 1 day ago
Especially nowadays, when metrics from k8s ramping up churn rate to hundreds of thousands and millions series.
Comment by agavra 1 day ago
I have some prototypes of vectorized compute that takes that same query from 2s -> ~800ms, and it's just early days. If you want to contribute to help make it better, the query engine part of it is begging for help!