Nagios (previously NetSaint) is an open-source monitoring and alerting system that's widely used to monitor systems, networks and infrastructure. Objects monitored by Nagios are split into two categories: hosts (physical machines) and services (particular functionalities). Nagios does not perform any host or service checks on its own and relies on plugins to do this. This makes it a very modular and flexible solution.
Extending this "monitoring by plugin" paradigm of Nagios, Solace provides a set of plugin scripts that perform the event broker and VPN monitoring. Following Nagios convention, the plugins add host (event broker) and service (VPN resources) categories.
Nagiograph parses output and performance data from Nagios plugins and generates graphs and HTML pages for reporting. Nagiograph stores data in RRD (Round Robin Database) files as time series data. RRDTool is an Open source parser that integrates into Shell scripts, Perl, Python, Ruby, Tcl, etc. The following picture illustrates different components and high level data flow.
This document is primarily intended for Solace administrators and operations teams responsible for setting up and monitoring infrastructure.
These documents contain information related to the feature defined in this document:
This tutorial requires access to Solace PubSub+ event broker and requires that you know several connectivity properties about your event broker. Specifically you need to know the following:
Resource | Value | Description |
Host | String | This is the address clients use when connecting to the event broker to send and receive messages. (Format: |
Message VPN | String | The event broker Message VPN that this client should connect to. |
Client Username | String | The client username. (See Notes below) |
Client Password | String | The client password. (See Notes below) |
There are several ways you can get access to Solace messaging and find these required properties.
A working Nagios installatin requires the following packages/tools
# yum install -y httpd php gcc glibc glibc-common gd gd-devel make net-snmp wget
# yum install rrdtool perl-rrdtool perl-GD
# yum install perl-CGI\* perl-Time-HiRes
The steps for enabling Extra Packge Repositor (EPEL) are release specific and please check for your install.
# wget http://dl.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-5.noarch.rpm
# rpm -ivh epel-release-7-5.noarch.rpm
rpm -Uvh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
rpm -Uvh http://rpms.famillecollet.com/enterprise/remi-release-6.rpm
yum -y install nagios nagios-plugins-all nagios-plugins-nrpe nrpe
Setting up an HTTP server (such as Apache) is beyond the scope of this document. Pl refer to the documentation for your HTTP server. Nagios setup should have dropped the required config file to setup URL Aliases. The file is typically /nagios.cfg (Eg: /etc/httpd/conf.d/nagios.cfg). If Checking Webadmin below doesn't work, check the httpd config and logs to see if this file was processed.
Firewall setup and making changes to existing firewall configuration is again beyond the scope of this document. For eg, if you are using iptables (default firewall on CentOS), then you may need to explicitly allow incoming connection on required ports (Refer Additional Topics Section)
Create password for nagiosadmin user for web access.
# htpasswd –c /etc/nagios/passwd nagiosadmin
# systemctl restart httpd
# systemctl restart nagios
Point the webbrowser to http://your-server-ip[:port]/nagios
Login with the username nagiosadmin and the password above step (htpassword command).
Sample page is shown below
NagiosGraph installation requires the following packages/tools
# yum install rrdtool perl-rrdtool perl-GD
# cd /opt/pkg # or path of your choice
# wget http://downloads.sourceforge.net/project/nagiosgraph/nagiosgraph/1.5.2/nagiosgraph-1.5.2.tar.gzwget
# tar xvzf nagiosgraph-1.5.2.tar.gz
# cd nagiosgraph-1.5.2
# ./install.pl --check-prereq
...
# ./install.pl --layout standalone --prefix /usr/local/nagiosgraph --nagios-perfdata-file=/var/log/nagios/perfdata.log --var-dir=/var/nagios/
...
Continue with this configuration? [y] y
...
Run the Solace provided Nagios that modify the Config files appropriately. Edit the script and change the file paths if necessary at the top of the script. Running setup_nagiossolace can be deferred until Solace plugins are installed (in following section).If Solace specific tools are not in place or the Config files not setup, corresponding steps will be skipped.
# ./setup_nagiossolace
Checking required files ...
...
Setting up config files ...
...
Restarting services ...
nagios :
nagios (pid 4155) is running...
httpd :
...
# chkconfig httpd on && chkconfig nagios on
# systemctl restart httpd
# systemctl restart nagios
The above script adds another host by name "localhost-gr" with graphics enabled that can be used to test the Nagiosgraph install. Sample page is shown below.
Clicking on the graph symbols to the right of service name brings up a graph with various time bracket. A sample graph for one of the services (HTTP) for a day is shown below:
Solace Nagios plugin requirs the following tools / packages
# yum install perl-CPAN perl-XML-LibXML perl-Nagios-Plugin
# perl -MCPAN -e shell
##
cpan> install Nagios::Plugin
cpan> install Nagios::Plugin::DieNicely
cpan> install LWP::UserAgent
cpan> install Nagios::Config
cpan> exit
Create a sample event broker Config. This file used by the setup script (below) to create commands for this config. This step can also be repeated for other brokers/vpns using the command mk_nagiossolacecfg script.
# cat cfg/solace-vmr2.cfg
# change the your_XXX in this file before running setup_nagiossolace script
HOSTNAME=solace-vmr2
HOSTIP=192.168.56.102
HOSTPORT=8080
USERNAME=admin
PASSWORD=*****
VPNNAME=vpn1
VPN_BRIDGENAME=bridge_vpn1_to_2
Run the setup_nagiossolace script. This will setup both Nagiosgraph and the solace plugin. If Nagisgraph is already setup, the related steps will be skipped.
# ./mk_nagiossolacecfg cfg/solace-vmr2.cfg
Generating Nagios config for Solace Event Broker ...
Using cfgfile cfg/solace-vmr2.cfg
Using template file cfg/solace-templaterouter-no_bi_bridge.cfg
Generating config ...
/etc/nagios/objects/solace-solace-vmr2.cfg :
Setting up config files ...
/etc/nagios/nagios.cfg :
If the sample event broker info is successfully configured, a host entry matching the event broker hostname from Config file will be added to the web page.
A sample entry is shown below:
Clicking on the graph icon to the right of the service name brings up the graphs with options. A sample graph is shown below.
NSCA (Nagios Service check Acceptor) is a daemon that can accept the requests on behalf of Nagios and update the external command file. Nagios would periodically check the external command file and process them. This "passive check" is a way to push the external status and alerts into Nagios instead of Nagios pulling the status for configured hosts and services "actively". The passive check offers more fine grained intervals and useful for near real time status updates and alerting.
Step 1. Download NSCA plugin from the Nagios Exchange. The current version as of this writing is 2.9.1 (http://prdownloads.sourceforge.net/sourceforge/nagios/nsca-2.9.1.tar.gz)
Step 2. Untar and follow install instructions in the package.
$ cd /opt/pkg (or directory of your choice)
$ wget http://prdownloads.sourceforge.net/sourceforge/nagios/nsca-2.9.1.tar.gz (check the nsca site for latest package name and replace)
$ tar xvzf nsca-2.9.1.tar.gz (adjust version accordingly)
$ cd /opt/pkg/nsca-2.9.1/ (adjust version accordingly)
$ ./configure
*** Configuration summary for nsca 2.9.1 01-27-2012 ***:
General Options:
-------------------------
NSCA port: 5667
NSCA user: nagios
NSCA group: nagios
$ make all
Step 1. Run ./setup_nsca script with source and target directories. The default target dir is /usr/local/nsca
This will generate required Config file and setup nsca as service.
NOTE: Check Config file /usr/local/nsca/cfg/nsca.cfg and make changes if required, esp to host IP.
# ./setup_nsca -s /opt/pkg/nsca-2.9.1/
srcdir /opt/pkg/nsca-2.9.1/ ok
target dir /usr/local/nsca ok
Checking ...
...
Installing ...
...
Creating service ...
Installation complete!
Step 2. Start service
Now that the component is installed, start it up.
# systemctl restart nsca
Use send_alert or send_ngalert.pl wrapper to send an alert to remote host and test. Use send_alert –h / send_ngalert.pl –h for usage info.
# ./send_ngalert.pl -n vmr3 -c OK -i Status=UP
Reading server config : /usr/local/nsca/cfg/nsca_server.cfg
starting send_nsca client ...
sending host command: vmr3:0: - [Status=UP]
Nov 9 15:40:30 centos7-vm nsca[9888]: Handling the connection...
1 data packet(s) sent to host successfully.
/var/log/messages Logs
Nov 9 15:40:30 centos7-vm nsca[9888]: Time difference in packet: 0 seconds for host vmr3
Nov 9 15:40:30 centos7-vm nsca[9888]: HOST CHECK -> Host Name: 'vmr3', Return Code: '0', Output: ' - [Status=UP]'
Nov 9 15:40:30 centos7-vm nsca[9888]: Attempting to write to nagios command pipe
[root@centos7-vm bin]# Nov 9 15:40:30 centos7-vm nsca[9888]: End of connection...
Nov 9 15:40:30 centos7-vm nagios: EXTERNAL COMMAND: PROCESS_HOST_CHECK_RESULT;vmr3;0; - [Status=UP]
Nov 9 15:40:33 centos7-vm nagios: PASSIVE HOST CHECK: vmr3;0;- [Status=UP]
Nov 9 15:40:33 centos7-vm nagios: HOST ALERT: vmr3;UP;HARD;1;- [Status=UP]
# ./send_ngalert.pl -n vmr3 -s hardware -c OK -i Status=UP
Reading server config : /usr/local/nsca/cfg/nsca_server.cfg
starting send_nsca client ...
sending service command: vmr3:hardware:0: - [Status=UP]
Nov 9 15:41:16 centos7-vm nsca[9888]: Handling the connection...
1 data packet(s) sent to host successfully.
/var/log/messages Logs
Nov 9 15:41:16 centos7-vm nsca[9888]: Time difference in packet: 0 seconds for host vmr3
Nov 9 15:41:16 centos7-vm nsca[9888]: SERVICE CHECK -> Host Name: 'vmr3', Service Description: 'hardware', Return Code: '0', Output: ' - [Status=UP]'
Nov 9 15:41:16 centos7-vm nsca[9888]: Attempting to write to nagios command pipe
Nov 9 15:41:16 centos7-vm nsca[9888]: End of connection...
Nov 9 15:41:16 centos7-vm nagios: EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;vmr3;hardware;0; - [Status=UP]
Nov 9 15:41:23 centos7-vm nagios: PASSIVE SERVICE CHECK: vmr3;hardware;0;- [Status=UP]
Nov 9 15:41:23 centos7-vm nagios: SERVICE ALERT: vmr3;hardware;OK;SOFT;2;- [Status=UP]
NRPE (Nagios Remote Plugin Executor) enables executing remote plugins on other Linux/Unix machines, allowing monitoring remote resources such as disk usage, CPU load, etc. Unlike NCSA which is a passive check, this is an active check where the Nagios server pulls the information from remote hosts periodically. This requires that NRPE be configured on both the server and the client end, though the client end changes are relatively light.
In the following example, we will continue to use the CenOS 7 server as Nagios server and conigure an Ubuntu server as client.
NRPE requires that the client and server communicate over specified port. Default NRPE port is 5666. Ensure that network connectivity exists between the server and client hosts and the ports are open both ways.
In addition to Nagios server packages, we would need to install the following. These were already covered in previous sections and listed here for completeness and for any standalone installs.
# yum install -y nrpe nagios-plugins-all openssl
On the remote client hosts, we would need to install the Nagios plugins and NRPE daemon component.
# apt-get install openssl nagios-nrpe-server nagios-plugins nagios-plugins-basic nagios-plugins-standard
Step 1. Add Nagios server IP to allowed_hosts in nrpe.cfg
# grep ^allowed_hosts /etc/nagios/nrpe.cfg
allowed_hosts=127.0.0.1,192.168.56.201,192.168.1.223
Step 2. Restart service
systemctl start nagios-nrpe-server
Step 1. Add config file per client under nagios objects directory. A sample config file for ubuntu server is given in "Setup Scripts and Config Files" section.
Step 2. Update nagios.cfg (/etc/nagios by default) and add the client config file.
For eg:
root@centos7-vm conf.d# tail /etc/nagios/nagios.cfg
cfg_file=/etc/nagios/objects/nrpe-test1.cfg
Step 3. Now verify the config and fix any errors
nagios -v /etc/nagios/nagios.cfg
Step 4. Restart services
# systemctl restart nagios
# systemctl restart nrpe
# systemctl restart httpd
After susseful NRPE configuration, the newly added host(s) would appear on the Nagios dashboard. In the screenshot below, the added NRPE client (ubuntu-svr) is shown with ping and other NRPE service status.
Both NCSA and NRPE enable remote host monitoring with Nagios. This section looks at the advantages and differences.
Cacti is performance monitoring tool based on a LAMP stack (Linux/Apache/MySQL/PHP) and RRD (Round Robin Database). It can collect, manage and display graphs of collected data.
Installation of Cacti and dependencies is a topic of its own and beyond the scope of this document. There are instructions and materials available that can be readily used. Here are some lnks:
This section walks thru the steps to add a sample solace artifiact monitoring to Cacti using web admin. Similar to Nagios, Cacti uses external scripts for gathering the monitoring info. In this example, Queue Depth is used for illustration which can be replaced with any monitoring using right scripts.
Select "Script/Command' as Input Type and type in the external script name that would collect the metrics from the Solace Event Broker. Note that <path_cacti> is a predefined variable that can be used. The input arguments for the script need to be both declared on the "Input String" and defined in the "Input Fields" section below. Similarly output expected form the script should be defined in the "Output Fields" section. This is used mostly as a placeholder and Cacti doesn't do any validation on Output Field.
Using the Data Input method defined above, define a data template for a specifc queue. Use New Data Souce Item to declare info for the specific queue (event broker name, vpn name, queue name and credentials).
Now create a Graph Template and associate the Data Template defined earlier with the graph template (using Graph Template Items).
Now add each event broker as a device so graphs for this event broker can all be grouped under here. This will also allow to run host specific services like ping test.
Use "Data Souce List" link above the Device screen and using "Add" link, add the the endpoint stats here. Also shown here is another data source for ping latency for the event broker. These steps will ensure Cacti will generate the correspoinding RRD files for these resources.
Using "Graph List" link above the device screen, add new graph link by clicking "New" link. Link the Data Source for the specific queue created earlier. This step will ensure graphs are generated from the RRD file using rrdgraph tool
Once you have basic Cacti setup up and running, you should be able to access the install and do further configuration thru web admin page at http://
After configuration, you would be able to see basic system metrics for the localhost that would look similar to this
Once Solace event broker is added as a device, from he Devices link, you would be able view the event broker with basic health test such as ping status.
Once VPN artifiact checkings are added, you should be able to view them under Graphs tab. Here is a sample of queue statistics showing queue depth for a single queue on a VPN.
As with any other Cacti graph, aggrecation over a peroid is available by clicking on the graph. Here is an example of aggregation of queue stats over various window period, from hour to year.
Create custom maps for any outputs that need it, for example turn the iface counter values into gauge: Add to /usr/local/nagiosgraph/etc/nagiosgraph/map
#######################
# Service type: iface
# output:IFACE OK - [Interface:1/6/1,Enabled:yes,Link:yes]
# perfdate: rx-pkts=365832598;; rx-bytes=108312525627;; tx-pkts=390410814;; tx-bytes=363571175913;;
/perfdata:rx-pkts=(\d+);; rx-bytes=(\d+);; tx-pkts=(\d+);; tx-bytes=(\d+);;/
and push @s, [ 'if_stats',
['tx-pkts', COUNTER, int $1 ],
['rx-pkts', COUNTER, int $2 ],
['tx-bytes', COUNTER, int $3 ],
['rx-bytes', COUNTER, int $4 ] ];
Use cfg/samplerouter.cfg as template to create additional event broker, VPN or VPN bridge configuration. Run mk_nagiossolacecfg to generate required solace plugin Config files.
Alternatively, the files can be edited or added in /etc/nagios/objects dir manually. The corresponding entries need to be added into /etc/nagios/nagios.cfg file.
Note that the services need to be restarted for the changes to be effective.
Change debug level in Nagios config to 2048
# grep ^debug_level /etc/nagios/nagios.cfg
debug_level=2048
Now check the debug logs. All commands executed and output will be logged over there.
# tail /var/log/nagios/nagios.debug
[1446670816.003569] [2048.1] [pid=25423] **** BEGIN MACRO PROCESSING ***********
[1446670816.003572] [2048.1] [pid=25423] Processing: '/usr/bin/perl -w /usr/local/nagios/plugins/queue.pl --router $ARG1$ --user $ARG2$ --pass $ARG3$ --vpn $ARG4$ --queue $ARG5$ --warning $ARG6$ --critical $ARG7$ --checkAvail --quiet'
[1446670816.003584] [2048.1] [pid=25423] Done. Final output: '/usr/bin/perl -w /usr/local/nagios/plugins/queue.pl --router 192.168.56.103:8080 --user admin --pass admin --vpn vpn2 --queue q5 --warning 100 --critical 1000 --checkAvail --quiet'
[1446670816.003595] [2048.1] [pid=25423] **** END MACRO PROCESSING *************
When you Can't access http from outside or when telnet to server on port 80 fail, Check Iptables routes:
# iptables -I INPUT 5 -i eth0 -p tcp --dport 80 -m state --state NEW,ESTABLISHED -j ACCEPT
# iptables -I INPUT 5 -i eth0 -p tcp --dport 443 -m state --state NEW,ESTABLISHED -j ACCEPT
# iptables-save | tee /etc/sysconfig/iptables
# service iptables save
# service iptables restart
Check Config file for any errors.
# /usr/sbin/nagios -v /etc/nagios/nagios.cfg
When access to port is working form within the box (for localhost) and not form outside, check the iptables.
Check http error.log
[Fri May 02 02:21:12 2014] [error] [client 192.168.150.1] (13)Permission denied: exec of '/usr/local/nagios/sbin/show.cgi' failed, referer: http://192.168.150.137/nagios/cgi-bin//status.cgi?host=all
[Fri May 02 02:21:12 2014] [error] [client 192.168.150.1] Premature end of script headers: show.cgi, referer: http://192.168.150.137/nagios/cgi-bin//status.cgi?host=all
# chmod go+rx /usr/local/nagios/sbin/
# chmod go+rx /usr/local/nagios/sbin/show.cgi
# setenforce 0
# service httpd restart
# service nagios restart
Check that the CGI scripts are runnable from command line.
Check for missing required modules and Perl libraries.
# /usr/local/nagios/plugins/disk.pl -r solace-router:port -u user -p password -v
DISK OK - Disk 1 [State: up, Enabled: true], Disk 2 [State: up, Enabled: true] RAID [in fully redundant state]
If you see error Error: Could not stat() command file ‘/usr/local/nagios/var/rw/nagios.cmd'!
# chcon -R -t httpd_sys_script_rw_t /usr/local/nagios/var/rw
(change path based on the error message)
This section lists sample config files and setup scripts used in this document. Please check respected product document for most recent syntax for config files.
Default install paths and files
This section lists minimal files required for Solace Nagios setup. Please contact Solace Professional Services to get a tarball package with the files and required plugin files.
Each client host to be monitored with NRPE should have a config file on the server. Here is a sample config file that should be placed on Nagios objects directory (/etc/nagios/objects by default) on the Nagios server.
This config file does the following checks
cat /etc/nagios/objects/nrpe-test1.cfg
define host{
use generic-host
host_name ubuntu-svr
alias ubuntu
address 192.168.1.108
max_check_attempts 10
}
define service{
use generic-service
host_name ubuntu-svr
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
define service{
use generic-service
host_name ubuntu-svr
service_description Current Load
check_command check_nrpe!check_load
}
define service{
use generic-service
host_name ubuntu-svr
service_description Total Processes
check_command check_nrpe!check_users
}
# cat /etc/nagios/nrpe.cfg
log_facility=daemon
pid_file=/var/run/nagios/nrpe.pid
server_port=5666
nrpe_user=nagios
nrpe_group=nagios
allowed_hosts=127.0.0.1,192.168.56.201,192.168.1.223
dont_blame_nrpe=0
allow_bash_command_substitution=0
debug=0
command_timeout=60
connection_timeout=300
command[check_users]=/usr/lib/nagios/plugins/check_users -w 5 -c 10
command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
command[check_hda1]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /dev/hda1
command[check_zombie_procs]=/usr/lib/nagios/plugins/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/lib/nagios/plugins/check_procs -w 150 -c 200
include=/etc/nagios/nrpe_local.cfg
include_dir=/etc/nagios/nrpe.d/