Keep in mind that many factors can influence the parsing time, including processor, ram, log, etc. however, generally we could derive the following table:
|Benchmark full features & metrics enabled (>=v0.9.5) - Default Hash Tables||87,816 lines per second|
|Benchmark full features & metrics enabled (>=v0.9.5) - On-Disk B+ Tree||23,000 lines per second|
|Benchmark full features & metrics enabled (>=v0.9.5) - In-memory hash table||46,000 lines per second|
Note: A dataset of about 52M hits (12GB size) is parsed in 20 mins (in-memory), 60 mins (on-disk storage).
If you are using the standard log format that comes with Apache or Nginx, configuring GoAccess should be pretty straight forward.
There are two ways to configure the log format. If you are outputting to a
terminal (ncurses), the easiest is to run GoAccess with
prompt a configuration window. However this won't make it permanent, for that
you will need to specify the format in the configuration file.
The configuration file is located under
%sysconfdir% is either
In the configuration file you need to uncomment
The following should work for the standard Apache or Nginx formats.
time-format %T date-format %d/%b/%Y log-format %h %^[%d:%t %^] "%r" %s %b "%R" "%u"
For more information, please check GoAccess' man page
It's fairly easy to run GoAccesss, once it has been installed (no configuration is needed), just run it against your web log file: (-a is optional)
# goaccess -f /var/log/apache2/access.log -a
Filtering can be done through the use of pipes. For instance, using grep to filter specific data and then pipe the output into GoAccess. This adds a great amount of flexibility to what GoAccess can display. For example:
# zcat -f access.log* | goaccess
For more examples, please check GoAccess' man page
To generate an HTML report, just run it against your web log file: (-a is optional)
# goaccess -f /var/log/apache2/access.log -a > report.html OR # zcat -f /var/log/apache2/access.log* | goaccess -a > report.htmlNote You can run GoAccess via
cat /var/log/apache2/access.log | goaccess -a > report.html
GoAccess should not leak any memory, (tested with
so mostly it will depend on the log size and features enabled.
For 247,834 parsed lines is
~41.1 MiB (full features enabled).
Note: Removing the query string with
-q can greatly decrease memory consumption, especially on timestamped requests.
GoAccess has a generic predefined log format option in the config file &
However, this script can automatically extract the proper format from IIS log files.
Here's an extensible Amazon S3 and Cloudfront
log parser in Python that uses GoAccess.
(Thanks to Viktor Nagy)
This section describes how to install Tokyo Cabinet with the source package.
$ wget http://fallabs.com/tokyocabinet/tokyocabinet-1.4.48.tar.gz $ tar -zxvf tokyocabinet-1.4.48.tar.gz $ cd tokyocabinet-1.4.48 $ ./configure --prefix=/usr --enable-off64 --enable-fastest $ make # make install
If you have a large dataset that won't fit in physical memory or you want data persistence, then you want to use the B+ Tree on-disk database.
$ ./configure --enable-utf8 --enable-geoip --enable-tcb=btree $ make # make install
Note: You need to have Tokyo Cabinet installed prior to configure GoAccess. You can install Tokyo Cabinet from your package management tool (see dependencies table), or from source (see question above). You may also choose to disable compression, see configuration options for more details.
Here are some of the top features to add:
Please see GitHub for more details.
If you would like to be notified of new releases of GoAccess then please follow the project on Twitter. Feel free to share it with others too :)