GoAccess – Visual Web Log Analyzer

loloquwowndueo · 4 years ago

Despite the emphasis on goaccess’s visual mode, keep in mind that “ While the terminal output is the default output, it has the capability to generate a complete, self-contained real-time HTML report (great for analytics, monitoring and data visualization)”. It was a great replacement for my aging webalizer setup which in turn had been replaced by google analytics.

This is for a personal site and at some point I realized I cared less about having those very detailed google metrics (which I never checked anyway) than about making my site more responsive and less invasive. I nixed google analytics and haven’t looked back; still have basic metrics thanks to goaccess.

tombrossman · 4 years ago

The HTML reports are my preferred way of looking at stats, but to make them more useful it's worth taking some additional steps to filter all the garbage traffic.

What works for me is:

- Use ipset to drop all traffic from certain countries (you pick which works best for you)

- Configure fail2ban to 'automagically' drop all IPs requesting .php and wp-admin URLs for a few days

- Integrate Piwik/Matomo's 'referrer spam' blocklist into your list of ignored referrers.

- Use per-site logging and only log .html hits with a static site to see page views.

This approach won't work for everyone and it takes extra sysadmin & Bash scripting skills to achieve, but it works really well with my Jekyll site.

I don't receive much traffic on my personal website but my stats page is public and updates hourly with a cronjob. https://www.tombrossman.com/stats/

smnscu · 4 years ago

Nice photos, thank you for piquing my interest in Jersey https://www.tom.je/

Also, pretty good advice in this post, bookmarked it.

noxvilleza · 4 years ago

What's quite nice with the HTML output view is that it stores stats even if the underlying log files are rotated/deleted - however if the goaccess process ends (like if your server needs to restart) you lose all the historic context.

elboulangero · 4 years ago

I run goaccess once a day to analyze log files. There's an option to store the result in goaccess database. So nothing is never lost and I accumulate stats for as long as I want.

I detailed that in a (lengthy) blog post if you're interested: https://arnaudr.io/2020/08/10/goaccess-14-a-detailed-tutoria...

legrande · 4 years ago

Sadly I imagine most of the traffic to a server would be bots or bad actors scanning for common vulns. I used AWStats (Another log file analyzer) for many years and had to slice roughly 70% percent of my traffic away because most of it was automated.

Most bots were courteous to state they were bots typically using a useragent with `-bot` found in the string. Some used generic browser useragents but were scanning for things like `wp-admin` etc

Most of the genuine human traffic were people on a mobile phone and that was the only heuristic I looked at to determine how many people visited my site. Very few desktop users were present.

npilk · 4 years ago

The --ignore-crawlers flag in GoAccess filters out a decent amount of bot traffic. It's not perfect, but it's good enough for a rough estimate of 'real' traffic.

jgalt212 · 4 years ago

Here's a tricky one. Common Crawl runs its bots from AWS. AWS has like a jillion IP addresses. How do you tell which traffic are legit Common Crawl bots and which are imposters?

ComputerGuru · 4 years ago

I mean, they’re all bots one way or the other. The only exception would be a personal VPN running off of AWS, but that’s a bad idea given how many sites block that range.

jonatron · 4 years ago

You could add $ssl_cipher to your log_format configuration (if nginx), and use that as a TLS fingerprint to find more bots.

loloquwowndueo · 4 years ago

Indeed - about 39% of hits to my site come from crawlers.

How do I know? Thanks to GoAccess!! ;)

Indy9000 · 4 years ago

If anyone is wondering, this is not a Go project. It's built with C.

Awesome work nonetheless.

fasteo · 4 years ago

Worth mentioning that it comes with an embedded WebSocket server, available as a standalone classical unix server - write programs that do one thing and do it well -

"Very simple, just redirect the output from your application (stdout) to a file (named pipe) and let gwsocket transfer the data to the browser — That's it."

[1] https://gwsocket.io/

sdevonoes · 4 years ago

Great work. I guess I'm spoiled already by tools like DataDog because I was clicking on the dashboards and expected to be able to go to the logs who actually generated them. For example, if I see a huge spike in the Requests dashboard at 3am, I would like to be able to go to the logs that generated that spike by just clicking on the spike. Does GoAccess provide access to the logs themselves?

Neil44 · 4 years ago

I like GoAccess because the reports work well with lots or vhosts in the same logs, which many similar tools don’t. Allowing you to see which sites are busy, taking resource, have unusual patterns etc.

denton-scratch · 4 years ago

So it won't do my syslog.

I like that it'll run on a terminal. But it would be more useful to me if it had plugggable backends for arbitrary log formats.

victor106 · 4 years ago

This looks great,

Anyone here know of a way to tail a log file that’s exposed over http?