Not much I agree with in this article. Seems to be based on little operational experience with the product, particular indicated by a couple of major mistakes and assumptions in the article (compacting does happen, didn't read the manual about deployment configurations clearly).
Loki has its idiosyncrasies but they are there for a good reason. Anyone who has sat there waiting hours for a Kibana or Splunk query to run to get some information out will know what I'd referring to. You don't dragnet your entire log stream unless your logs are terrible, which needs to be fixed, or you don't know when something happened, which needs fixing. I watch many people run queries that scan terabytes of data with gay abandon on a regular basis on older platforms and still never get what they need out.
The structured metadata distinction is important because when you do a query against that you are not using an index, just parsed out data. That means explicitly you're not filtering, you're scanning and that is expensive.
If you have a problem with finding things, then it's not the logging engine, it's the logs!
Did someone use both Grafana Loki and Kibana? Does it have any advantages over Kibana? I am mostly interested in resource usage and versatility of filtering.
In Kibana, if something is there I will find it with ease and it doesn't take a lot of time to investigate issues in a microservice based application. It is also quite fast.
Compared over Kibana, we experience:
- 3x reduced costs
- no more index corruption because a key changed type
- slower performance for queries over 1 day, especially when non optimized without any filtering
- non intuitive ui/ux
So good but not perfect! When we have the time we'll look for alternatives
> 2. It has a convenient and simple query language
IMHO, Loki query language is the most inconvenient language for logs I've seen:
- It doesn't support calculating multiple stats in a single query. For example, it cannot calculate the number of logs and the average request duration in a single query.
- Its' syntax for aggregate functions is very unintuitive and is hard to use, especially if you aren't familiar with PromQL.
- It requires putting an annoying "|=" separator between words and phrases you are searching in logs.
- You need to use a hack with JSON parsing when filtering or stats calculations on log fields is needed.
Kibana + ElasticSearch was a mess for us. Was glad to get rid of it. Cost a fortune to run and was time consuming. Loki conversely doesn’t even show up on our costs report (other than the S3 bucket) and requires very little if any maintenance!
Also out of box configuration sinks 1TB/hr quite happily in microservices mode.
ELK could never deal with my logs which are sometimes-JSON. Loki can ingest and query it just fine. Also the query/extraction language makes a lot more sense to me.
Elasticsearch can store arbitrary text in log fields, including JSON-encoded string. Elasticsearch can also tokenize JSON-encoded string and provide fast full-text search over such string in the same way like it does for a regular plaintext string.
why do you need storing JSON-encoded string inside log field? It is much better parsing the JSON into separate fields at log shipper and storing the parsed log fields into Elasticsearch. This gives better query performance and may also reduce disk space usage, since values for every parsed field are stored separately (this usually improves compression ratio and reduces disk read IO during queries if column-oriented storage is used for per-field data).
If your source emits logs in OpenTelemetry format, using an OTel Collector inbetween you could do sometimes-JSON parsing of log content before the backend.
Yes, we switched metrics and logs from an Elastic stack to Prometheus/Thanos/Loki/Grafana about two years ago. On the logs side specifically, resource usage is WAY lower (300eps is like 1.5 cores and 4gb of memory), not to mention going from persistent volumes (disks) to blob storage / S3 is far cheaper and doesn't require any maintenance. Queries are slower, however, because Elastic pre-indexes while Loki searches on-demand, so it really comes down to query volume and your need for query performance (does it matter if your search takes 300ms vs 3s?). I've also found running Elastic yourself requires constant maintenance, while Loki has been very hands-off. Strongly recommend.
From the Enterprise Perspektive at least for my use cases(fine grained permissions using extra id) , elasticsearch with kibana always had a solution available.
For grafana cloud and Loki you can close to a good usability with LBAC (label based access control) but you still need have many data sources to map onto each “team view” to make it user friendly.
What is missing for me is like in elastic a single datasource for all logs which every team member across all teams can see and you scope out the visibility level with LBAC
@valyala , as others have noted, you are CEO of VictoriaMetrics and have written (most of?) VictoriaLogs. How is VictoriaLogs coming along? This is an older blog post.
I switched our team over to VictoriaLogs from ELK when VL1.0 was released a few months back and we've been very happy with it. Nowhere near as much finicky performance tuning, no more logs failing to ingest because a string looked a bit too numeric, and the query language has fewer weird gotchas.
At the end of the day ELK was throwing us a bunch of roadblocks in order to solve problems we didn't need solved. Maybe if we were trying to build some big analysis layer on top of our logs that would've been nice. VL has worked great for our use case of needing to centralize and view logs.
VictoriaLogs is free from issues mentioned in the referred article. It supports log fields with big number of unique values (such as user_id, trace_id, ip, etc.) from the beginning, and it doesn't need any configuration for working with such fields. It automatically indexes all the log fields and provides fast full-text search over all the ingested log fields.
It's also not ideal to have a different query language for different Grafana datastores (LogQL, PromQL, TraceQL). Are there any plans on making a unified Grafana query language?
There is an effort in OpenTelemetry to create a standard query language for observability. There were a lot of discussions with a lot of opinions; there were even several talks during KubeConEU about that:
Why not just use SQL? With LLMs evolving to do sophisticated text-to-SQL, the case for a custom language for the sake of simplicity is diminishing.
I think that expressiveness, performance and level of fluency by base language models (i.e. the amount of examples in training set) are the key differentiators for query languages in the future. SQL ticks all those boxes.
I think I’m probably not interested in this. PromQL is already relatively dense to learn, but reasonably well fit to the domain model and internally consistent, unlike most other metric querying tools I’ve tried over the years.
Maybe that would work as well with traces and logs but IMO the problem space is quite different and not sure how much value we’d get from a unified language where some subsets only apply to parts, ie traces and logs and metric, as opposed to spiritually similar but distinct languages.
No, the OP of the HN thread is from VictoriaMetrics (open source), he’s not Chris Siebenmann, unix systems administrator at the University of Toronto’s Computer Science Labs.
danluu.com mentioned this approach (or just, ‘big data’ systems) for traces and metrics iirc. Not sure if for logs too.
Aren’t all of those relatively tabular? What would you be looking for from those tools to help with logs?
Loki has its idiosyncrasies but they are there for a good reason. Anyone who has sat there waiting hours for a Kibana or Splunk query to run to get some information out will know what I'd referring to. You don't dragnet your entire log stream unless your logs are terrible, which needs to be fixed, or you don't know when something happened, which needs fixing. I watch many people run queries that scan terabytes of data with gay abandon on a regular basis on older platforms and still never get what they need out.
The structured metadata distinction is important because when you do a query against that you are not using an index, just parsed out data. That means explicitly you're not filtering, you're scanning and that is expensive.
If you have a problem with finding things, then it's not the logging engine, it's the logs!
In Kibana, if something is there I will find it with ease and it doesn't take a lot of time to investigate issues in a microservice based application. It is also quite fast.
So good but not perfect! When we have the time we'll look for alternatives
2. It has a convenient and simple query language.
3. It works very well with traces and metrics.
the pain part:
1. It struggles to query logs over a wide time range.
2. Its indexing (or labeling) capabilities are very limited, similar to Prometheus.
3. Due to 1 and 2, it is difficult to configure and use correctly to avoid errors related to usage limits (e.g., maximum series limits).
IMHO, Loki query language is the most inconvenient language for logs I've seen:
- It doesn't support calculating multiple stats in a single query. For example, it cannot calculate the number of logs and the average request duration in a single query.
- Its' syntax for aggregate functions is very unintuitive and is hard to use, especially if you aren't familiar with PromQL.
- It requires putting an annoying "|=" separator between words and phrases you are searching in logs.
- You need to use a hack with JSON parsing when filtering or stats calculations on log fields is needed.
Deleted Comment
Modern columnar SQL such as ClickHouse are 10+ times more efficient in real-world use cases.
I'm a CEO and founder of Quesma, which, let's use Kibana with ClickHouse: https://quesma.com/
Forever free, source-available license.
Also out of box configuration sinks 1TB/hr quite happily in microservices mode.
Could you share Loki config, which can deal with 1TB/hr volume of logs?
why do you need storing JSON-encoded string inside log field? It is much better parsing the JSON into separate fields at log shipper and storing the parsed log fields into Elasticsearch. This gives better query performance and may also reduce disk space usage, since values for every parsed field are stored separately (this usually improves compression ratio and reduces disk read IO during queries if column-oriented storage is used for per-field data).
I tried explaining this at https://itnext.io/why-victorialogs-is-a-better-alternative-t...
Elastic was kind of a resource hog and much more expensive for the same amount of data.
That might be dependent on your use case though.
For grafana cloud and Loki you can close to a good usability with LBAC (label based access control) but you still need have many data sources to map onto each “team view” to make it user friendly.
What is missing for me is like in elastic a single datasource for all logs which every team member across all teams can see and you scope out the visibility level with LBAC
At the end of the day ELK was throwing us a bunch of roadblocks in order to solve problems we didn't need solved. Maybe if we were trying to build some big analysis layer on top of our logs that would've been nice. VL has worked great for our use case of needing to centralize and view logs.
This is explained in more details at https://itnext.io/why-victorialogs-is-a-better-alternative-t...
important because the title includes _new_
https://sched.co/1tcyx
https://sched.co/1txI1
We are still waiting for a compelling implementation that will show the way.
I think that expressiveness, performance and level of fluency by base language models (i.e. the amount of examples in training set) are the key differentiators for query languages in the future. SQL ticks all those boxes.
Maybe that would work as well with traces and logs but IMO the problem space is quite different and not sure how much value we’d get from a unified language where some subsets only apply to parts, ie traces and logs and metric, as opposed to spiritually similar but distinct languages.