Thursday 25 February 2010

Nagios Agents (NRPE)

In an earlier post , I mentioned Nagios as a system monitoring tool. It's simple, it's flexible, and out of the box, you can monitor network services without any software installed on the monitored systems.

Now if you want to monitor other aspects of a system, like it's disk usage, you can either make that information generically visible on the network (say with SNMP) or you can install an agent for Nagios. The most common agent is NRPE.

Like everything else in Nagios, you first need a plugin for Nagios to be able to check nrpe and there's a standard package available called, well, check_nrpe. Use your package manager of choice to install this plugin (nagios-plugins-nrpe in Fedora). I found that although this installed the Nagios plugin, it did not create a command definition so I created one myself. First run the check_nrpe command manually to see what arguments it takes and then add your command definition to your Nagios configuration. It should look something like this:

# 'check-nrpe' command definition
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ $ARG2$
}


The command definition specifies the name of the command and then simply it's invocation. The macros given ($USER1$, etc) are pretty generic and it's pretty easy to work from existing command definitions or the Nagios documentation.

Now once you get NRPE installed on a client, the service definition is going to look something like this:

define service{
use generic-service
host_name Hudson
service_description DISK_ROOT
check_command check_nrpe!check_root
}


You should be able to get the NRPE agent installed on many "Linux" distros from the package manager. The agent can either run under inetd (preferred) or as a stand-alone daemon. If you are using xinetd (which you should), make sure you specify the Nagios server in the only_from line, enable the service and then kick xinetd. Since you're using xinetd, basically all the service configuration is there leaving really only the command definitions in NRPE's main config file (/etc/nagios/nrpe.cfg). In the main config file, you are going to specify the commands that can be run. Here's the definition for the check_root command:

command[check_root]=/usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /


As you can see, the command definition provides all the arguments needed such that the Nagios server should not ever have to pass any arguments to NRPE. This is for both safety and simplicity.

Now you're done! Reload your NRPE and Nagios processes and check back in a few minutes to ensure your service check is working. If it's not, typical issues are that the port is firewalled (TCP 5666 by default) or the Nagios host was not specified correctly in the only_from line (or the allowed_hosts line if not using a xinetd).

Next up is to monitor a Windows host. Since Microsoft doesn't have a convenient software repository of third-party applications, you get to go download and install an agent yourself. There are a handful of choices but generally, NSC++ (NSCP) will be the one you want. It supports a variety of protocols including NRPE and NSCA (NSCA is for submitting passive checks). When you install NSCP, the installer will let you enable NRPE and should handle setting up NRPE as a service and opening the firewall for it. The one thing you have to do is either enable external scripts (preferred) or enable arguments. There are a handful of stock scripts and aliases provided which get you most of the basic functionality like checking disk usage etc.

One last note is that you can always quickly check if the NRPE (or NSCP) process is talking to the server okay by simply running the check_nrpe plugin manually giving it only the host. It will report OK if NRPE is working or an error if it is not:

[root@alma nagios]# /usr/lib/nagios/plugins/check_nrpe -H hudson
Connection refused by host
[root@alma nagios]# /usr/lib/nagios/plugins/check_nrpe -H hudson
NRPE v2.12


- Arch

Popular Posts