whost49 (u/whost49) - Readit News

whost49 commented on · Posted by u/whost49

whost49 · 2 years ago

I hope that I would be able to finish this quickly because I want to focus on my other project (https://github.com/yousseb/atfal-ai & https://github.com/yousseb/atfal-site) which is an effort to track missing children/people and use AI to help find them. I love meld, but as a person who has lost his brother in a way that he couldn't come back - and I keep losing family in Gaza -, I want to help reunite families who still have the chance to be reunited. Everyone is welcome to help.

whost49 commented on GitLab 8.10 Released with Wildcard Branch Protection and Manual Actions for CI about.gitlab.com/2016/07/... · Posted by u/ehPReth

drewcrawford · 9 years ago

Hi sytse, while I've got you.

I've been annoyed these last few releases by high memory usage. Every day, my backups fail due to ENOMEM.

I have adjusted unicorn-worker-killer to undo the increased memory usage you did a few releases ago:

    unicorn['worker_memory_limit_min'] = "300*(1024**2)"
    unicorn['worker_memory_limit_max'] = "330*(1024**2)"

(8.9.0's default is a totally absurd 400-650MB.)

and while that stops my server from falling over in the first 10 minutes, it does consistently fall over once a day.

Here is a screenshot of my htop, I'm not sure where to look next. 25% free sounds okay, but it's not enough to run a backup, or even run `gitlab-rake`.

http://weblinksdca.s3.amazonaws.com/Screen%20Shot%202016-07-...

whost49 · 9 years ago

Drew, thanks for sharing your settings. It looks like you are using a 2GB RAM system.

How many unicorn workers are you using? For 2GB systems, I'd recommend at most 2 rather than the default 3. Some discussion about this is here: https://gitlab.com/gitlab-org/omnibus-gitlab/issues/1279

We'll be doing more work in these coming months to profile and reduce the memory usage needed by GitLab so that all your tools can run comfortably in the 2GB range.

whost49 commented on LogZoom: A fast, lightweight substitute for Logstash/Fluentd in Go packetzoom.com/blog/logzo... · Posted by u/whost49

shlant · 9 years ago

So as someone who is just about to implement Fluentd, what is the status of using LogZoom with docker?

Currently, with Fluentd all I have to do is set the log-driver and tags in DOCKER_OPTS, point fluentd to ES, and I have all my container logs.

Does LogZoom work this seemlessly with docker? I know that at the very least I will need https://github.com/docker/docker/issues/20363 in order to implement any LogZoom plugin, so is this really much of a benefit if I don't have hundreds of containers running on a host? The only concern I had after reading this was if Fluentd will use as much resources as they mention. For my use case, I think not.

whost49 · 9 years ago

For your use case, I think Fluentd may work fine. LogZoom currently deals with structured JSON log data received from hundreds of hosts around the world. It could be modified to handle arbitrary logs (and wrap a structure around it) and integrate with Docker, but that was not the goal here.

whost49 commented on LogZoom: A fast, lightweight substitute for Logstash/Fluentd in Go packetzoom.com/blog/logzo... · Posted by u/whost49

andrewvc · 9 years ago

Thanks for the thoughtfully considered response :).

Regarding security with redis. Did you read the docs here? https://www.elastic.co/guide/en/logstash/current/plugins-out... Logstash does support Redis Password auth (as does Filebeat). Regarding the encryption with redis point, seeing as Redis doesn't support SSL itself, are you using spiped as the official Redis docs recommend?

Regarding the two queues, I would like to clarify that you can do this with the:

Filebeat -> Logstash -> Redis -> Logstash -> (outputs) technique.

If you declare two Logstash Redis outputs in the first 'shipper' Logstash you can write to two separate queues. And have the second 'indexer' read from both.

It is true that if one output is down we will pause processing, but you can use multiple processes for that. It is possible that in the near future we will support multiple pipelines in a single process (which we already do internally in our master branch for metrics, just not in a publicly exposed way yet).

Regarding JVM overhead. That's a fair point about memory. The JVM does have a cost. That said, memory / VMs are cheap these days, and that cost is fixed. One thing to be careful of is that we often times see people surprised to find that they get a stray 100MB event going through their pipeline due to an application bug. Having that extra memory is a good idea regardless. We have many users increasing their heap size far beyond what the JVM requires simply to handle weird bursts of jumbo logs.

whost49 · 9 years ago

Thanks for that information. There's no doubt Logstash can do a lot, and it sounds like with the multiple pipeline feature Logstash will make it easier to do what we wanted to do in a single process.

In the past, we've also been burned by many Big Data solutions running out of heap space that adding more processes that relied on tuning JVM parameters again did not appeal to us.

whost49 commented on LogZoom: A fast, lightweight substitute for Logstash/Fluentd in Go packetzoom.com/blog/logzo... · Posted by u/whost49

andrewvc · 9 years ago

Hi all, Logstash developer here. It's always exciting to see new stuff in this space, however, this post has me confused. Maybe the OP can clue me in.

I'm a bit confused as the assertion "This worked for a while, but when we wanted to make our pipeline more fault-tolerant, Logstash required us to run multiple processes.", is no more true for Logstash than it is for any other piece of software. Single processes can fail, so it can be nice to run multiples. It would be great if the author of the piece had clarified that further. If you're around I'd love to hear specifically what you mean by this. Internally Logstash is very thread friendly, we only recommend multiple processes when you want either greater isolation or greater fault tolerance.

I don't personally see what the difference is between:

Filebeat -> LogZoom -> Redis -> Logstash -> (Backends)

and

Filebeat -> LogStash -> Redis -> Logstash -> (Backends)

or even better

Filebeat -> Redis -> Logstash -> (Backends)

You can read more about the filebeat Redis output here: https://www.elastic.co/guide/en/beats/filebeat/current/redis...

whost49 · 9 years ago

> If you're around I'd love to hear specifically what you mean by this. Internally Logstash is very thread friendly, we only recommend multiple processes when you want either greater isolation or greater fault tolerance.

Right, we considered using multiple Logstash processes, but we really didn't want to run three instances of Logstash requiring three relatively heavyweight Java VMs. The total memory consumption of a single VM running Logstash is higher than running three different instances of LogZoom.

We looked at the Filebeat Redis output as well. First, it didn't seem to support encryption or client authentication out of the box. But what we really wanted was a way to make Logstash duplicate the data into two independent queues so that Elasticsearch and S3 outputs could work independently.

whost49 commented on LogZoom: A fast, lightweight substitute for Logstash/Fluentd in Go packetzoom.com/blog/logzo... · Posted by u/whost49

azylman · 9 years ago

I wonder if the considered Heka (https://hekad.readthedocs.org/en/v0.10.0/), made by Mozilla? It's written in Go and, as far as I can tell, solves many of the same problems and more.

whost49 · 9 years ago

Heka looks good and does a lot more. It doesn't appear to support Redis and S3 out of the box, so we would have probably had to evaluate, learn, and change the third-party plugins had we known about Heka beforehand.

whost49 commented on LogZoom: A fast, lightweight substitute for Logstash/Fluentd in Go packetzoom.com/blog/logzo... · Posted by u/whost49

andygrunwald · 9 years ago

We at trivago had a similar problem. For this we created Gollum: - http://tech.trivago.com/2015/06/22/gollum/ - https://github.com/trivago/gollum

We use it heavily to stream any Kind of Data into Kafka: Access and errorlogs, Application Logs, etc. Did you consider it as well?

whost49 · 9 years ago

No, we did not consider Gollum--it definitely looks like a possible solution and one we might have considered. I think the name of the project makes it hard to find, unfortunately.