Using goaccess to parse & display traffic for multiple sites

Posted on Apr 21, 2019

//deprecated

Long story short, I host a lot of websites at my lab. Everything runs through a central nginx reverse proxy. For a while I’ve thought about using stuff like loggly to parse these logs and throw up some neat data on where the traffic is going, when its busy etc. I stumbled upon goaccess last week and it revived my interest in it.

The good news is it’s super easy to setup and you can 100% automate it. I set mine up on the reverse proxy VM so I didn’t need to faff about with rsyncing master logs about – it takes 15-17 seconds to parse all my logs and run it through the maxmind geo-ip database and output the html so it’s pretty efficient. (that’s around 750mb of logfiles). I’m using ubuntu 16.04 also.

getting started

Installing goaccess;

echo "deb http://deb.goaccess.io/ $(lsb_release -cs) main" | sudo tee -a /etc/apt/sources.list.d/goaccess.list
wget -O - https://deb.goaccess.io/gnugpg.key | sudo apt-key add -
sudo apt-get update
sudo apt-get install goaccess

There’s a tonne of changes you can make to the /etc/goaccess.conf config files, here’s what I did - just added these three lines to the top of the config;

time-format %T
date-format %d/%b/%Y
log_format %h - %^ [%d:%t %^] "%r" %s %b "%R" "%u"

And edited these lines;

html-report-title honk.ie stats - updated every 15 minutes
exclude-ip 10.0.0.0/24

With that done, lets create our folder to setup the scripts and automate this. I used /opt/goaccess/ and will use that for this example - feel free to choose whatever you want.

mkdir /opt/goaccess && cd /opt/goaccess && wget https://geolite.maxmind.com/download/geoip/database/GeoLite2-City.tar.gz
tar -xzf GeoLite2-City.tar.gz && mv GeoLite2-City_*/GeoLite2-City.mmdb . && rm -rf GeoLite2-City_* && rm -rf GeoLite2-City.tar.gz

This makes the folder, downloads the maxmind geoip database - tars it and moves the database into the working directory and cleans up the leftovers.

Let’s grab a master version of the current logs too;

zcat -f /var/log/nginx/*access.log* > /opt/goaccess/master.log && zcat /var/log/nginx/*access.log.*.gz >> /opt/goaccess/master.log

scripts

nano build.sh

#! /bin/bash

SERVER_LOG='/var/log/nginx/*.access.log'
MASTER_LOG='/opt/goaccess/master.log'
HTML_OUT='/var/www/stats.honk.ie/index.html'
BLACKLIST='/opt/goaccess/spammers.txt'

cat $SERVER_LOG >> $MASTER_LOG
awk -i inplace '!seen[$0]++' $MASTER_LOG

goaccess -f $MASTER_LOG $(printf -- "--ignore-referer=%s " $(<$BLACKLIST)) --geoip-database /opt/goaccess/GeoLite2-City.mmdb --agent-list --no-progress -o $HTML_OUT

NOTE: If you’re using a different logs location, or using apache instead update these variables. Additionally make sure we’re writing to where you want the index.html to be placed for your site. Before running this, make this folder and chown it so nginx can read it;

mkdir /var/www/yoursite.tld/ && touch /var/www/yoursite.tld/index.html  && chown -R www-data. /var/www/yoursite.tld

nano spam.sh

#! /bin/bash

cd /opt/goaccess/

wget --timestamping --output-file=wget-cron.log https://raw.githubusercontent.com/piwik/referrer-spam-blacklist/master/spammers.txt

crontab

Here’s all I’m using for crontab entries;

*/15 * * * * /opt/goaccess/build.sh > /dev/null
0 3 * * 1 /opt/goaccess/spam.sh > /dev/null

So it updates the html every 15 minutes and every Monday morning at 3am updates the spammers.txt. All that’s left to do is setup how you want these stats displayed, and direct nginx towards it and you’re all set.