Monitoring Puppet – Part 3

Ever been concerned about the health of your Puppet agents? You know they are executing, but you don’t know if execution is successful? I’d like to show how to achieve this by monitoring the syslog of the Puppet machines. In case any errors occur during Puppet’s execution, the appropriate error messages will be logged to – most probably – /var/log/messages. So the only thing left to do is to scan this file for fishy patterns.

The best tool known to me is the NRPE plugin check_logfiles. It’s a powerful Perl plugin with many configuration options. I will not give an introduction on this plugin, but you will find the plugin’s website useful. Instead I will simply show two examples of how I am monitoring my Puppet agents and my Puppet master machines.

For my agents, I have this command configuration inside my nrpe.cfg:

command[check_puppet_errors_agent]=/usr/lib64/nagios/plugins/check_logfiles -config /usr/lib64/nagios/plugins/check_puppet_errors_agent.conf

About the same for the masters:

command[check_puppet_errors_master]=/usr/lib64/nagios/plugins/check_logfiles -config /usr/lib64/nagios/plugins/check_puppet_errors_master.conf

As you can see, all the configuration stuff is done within the configuration file. Let’s start with the agent’s configuration check_puppet_errors_agent.conf:

$scriptpath = '/usr/bin';
$prescript = 'sudo';
$prescriptparams = 'setfacl -m u:$CL_USERNAME$:r-- /var/log/messages*';
@searches = ({
  tag => 'puppet-failing',
  rotation => 'loglog0log1',
  logfile => '/var/log/messages',
  criticalpatterns => [
     '.*Too many open files.*',
       '.*Could not retrieve catalog.*',
       '.*Retrieved certificate does not match private key.*',],
  okpatterns => ['.*puppetd.*Finished catalog run in.*'],
  options => 'noperfdata',
});

Let’s look at the details:

  • Line 1: Define the path to scripts being executed.
  • Line 2: The script to be executed before the actual check takes place.
  • Line 3: The parameters handed over to the prescript. Basically what is happening here is to grant access (setfacl) to all files matching /var/log/messages to the user this plugin is executed with. In my case this is nrpe. Beware that $CL_USERNAME$ is replaced with the actual user name. You may want to check your nrpe.cfg.
  • Line 6: The log rotation method. This plugin is also capable of scanning rotated logs.
  • Line 7: The actual log file to scan.
  • Lines 8-11: These are the patterns which are used for indicating failed Puppet runs. Basically this is the sum of all error messages I detected manually while setting up all my agents. Of course one may add or remove patterns as desired.
  • Line 12: One run may fail, the next run may be OK again. In this case we don’t want to have any critical errors any more. That’s what the OK pattern is for.
  • Line 13: Disable performance data. (Admittedly I never had a closer look what performance data would be actually generated here.)

The Puppet master is configured pretty much the same way. However there are a lot more error patterns which may occur on a master, so I will show only the patterns here:

  criticalpatterns => [
     '.*Too many open files.*',
       '.*Could not retrieve catalog.*',
       '.*Syntax error at.*of file.*',
       '.*Mysql::Error.*',
       '.*MySQL server has gone away.*',
       '.*Could not read YAML data.*',
       '.*Could not run Puppet configuration client.*',
       '.*Parameter group failed.*Invalid group name.*'],

It is very important to allow user nrpe executing the setfacl command along with sudo. That’s why I have this line inside my /etc/sudoers:

nrpe    ALL=NOPASSWD:/usr/bin/setfacl

Of course all the configuration files as well as the check_logfiles plugin are maintained by Puppet itself. Everything is done using stored configurations, so every new Puppet agent attached to the system will establish its own monitoring. By the time of writing this, I had these checks running for several months now and it gives me a very reliable overview of my agent’s health as soon as I have a look at Icinga’s front end.

Related topics

Some other posts I made covering Puppet monitoring might also be interesting:

, , , , , , ,

No comments yet.

Leave a Reply

* Copy This Password *

* Type Or Paste Password Here *