Long story short, I host a lot of websites at my lab. Everything runs through a central nginx reverse proxy. For a while I’ve thought about using stuff like loggly to parse these logs and throw up some neat data on where the traffic is going, when its busy etc. I stumbled upon goaccess last week and it revived my interest in it.
The good news is it’s super easy to setup and you can 100% automate it. I set mine up on the reverse proxy VM so I didn’t need to faff about with rsyncing master logs about – it takes 15-17 seconds to parse all my logs and run it through the maxmind geo-ip database and output the html so it’s pretty efficient. (that’s around 750mb of logfiles). I’m using ubuntu 16.04 also.
echo "deb http://deb.goaccess.io/ $(lsb_release -cs) main" | sudo tee -a /etc/apt/sources.list.d/goaccess.list wget -O - https://deb.goaccess.io/gnugpg.key | sudo apt-key add - sudo apt-get update sudo apt-get install goaccess
There's a tonne of changes you can make to the
/etc/goaccess.conf config files, here's what I did - just added these three lines to the top of the config;
time-format %T date-format %d/%b/%Y log_format %h - %^ [%d:%t %^] "%r" %s %b "%R" "%u"
And edited these lines;
html-report-title honk.ie stats - updated every 15 minutes exclude-ip 10.0.0.0/24
With that done, lets create our folder to setup the scripts and automate this. I used
/opt/goaccess/ and will use that for this example - feel free to choose whatever you want.
mkdir /opt/goaccess && cd /opt/goaccess && wget https://geolite.maxmind.com/download/geoip/database/GeoLite2-City.tar.gz tar -xzf GeoLite2-City.tar.gz && mv GeoLite2-City_*/GeoLite2-City.mmdb . && rm -rf GeoLite2-City_* && rm -rf GeoLite2-City.tar.gz
This makes the folder, downloads the maxmind geoip database - tars it and moves the database into the working directory and cleans up the leftovers.
Let's grab a master version of the current logs too;
zcat -f /var/log/nginx/*access.log* > /opt/goaccess/master.log && zcat /var/log/nginx/*access.log.*.gz >> /opt/goaccess/master.log
#! /bin/bash SERVER_LOG='/var/log/nginx/*.access.log' MASTER_LOG='/opt/goaccess/master.log' HTML_OUT='/var/www/stats.honk.ie/index.html' BLACKLIST='/opt/goaccess/spammers.txt' cat $SERVER_LOG >> $MASTER_LOG awk -i inplace '!seen[$0]++' $MASTER_LOG goaccess -f $MASTER_LOG $(printf -- "--ignore-referer=%s " $(<$BLACKLIST)) --geoip-database /opt/goaccess/GeoLite2-City.mmdb --agent-list --no-progress -o $HTML_OUT
NOTE: If you're using a different logs location, or using apache instead update these variables. Additionally make sure we're writing to where you want the index.html to be placed for your site. Before running this, make this folder and chown it so nginx can read it;
mkdir /var/www/yoursite.tld/ && touch /var/www/yoursite.tld/index.html && chown -R www-data. /var/www/yoursite.tld
#! /bin/bash cd /opt/goaccess/ wget --timestamping --output-file=wget-cron.log https://raw.githubusercontent.com/piwik/referrer-spam-blacklist/master/spammers.txt
Here's all I'm using for crontab entries;
*/15 * * * * /opt/goaccess/build.sh > /dev/null 0 3 * * 1 /opt/goaccess/spam.sh > /dev/null
So it updates the html every 15 minutes and every Monday morning at 3am updates the spammers.txt. All that's left to do is setup how you want these stats displayed, and direct nginx towards it and you're all set.