Just like I said in a previous blog post : the pgBadger projet is moving fast ! Version 3 was released a few days ago with a major improvements : With the new parallel mode, you can now launch mutiple pgBagder process at once !
In others word, pgBadger is not bound to one cpu anymore. If you have multiple cores, you can now uses them to give more power and decrease the processing time. All you have to do is add the "-j N" option in the command line, where N is the number of cores you want to use. For instance :
$ pgbadger -j 4 /var/log/postgresql/postgresql-2013-02-*
Quick benchmark : We had 5 log files with a total volume of 9.5 GB. Here's the results
- -j 1 : 1h41m18s
- -j 2 : 50m25s
- -j 4 : 25m39s
- -j 8 : 15m58s
However it's important to note that the parallel mode has a little drawback. With this method, some queries may be truncated. If you enable N cores, then result may differ in a maximum of N queries per log file.
However, this is a minor issue: parallel mode is interesting if you have millions of queries to analyze. And if you have millions of queries in a log file, you can afford to loose a few as it's quite unlikely that the lost queries would have changed the overall results.
However, to avoid this problem, you can use the pgBadger "per-file parallel mode" to analyze your logs but with lower performance than the standard parallel mode. To enable this behaviour, you have to use the "-J N" option instead of "-j N". In per-file mode, the performances start being really interesting when there's hundreds of small log files (e.g. 10MB rotation size limit) and with at least 8 cores.
And by the way, this is not the only big feature in pgBadger v3 ! For the complete list of changes, please checkout the release note on github
And of course stay tuned.... because version 4 is coming fast :-)