Answers to Frequently Asked Questions about nagiosgraph

nagiosgraph is an add-on to Nagios. nagiosgraph does two things: (1) collect performance data from Nagios plugins into RRD files, and (2) generate graphs and web-based reports of the performance data. nagiosgraph is written in perl. nagiosgraph is almost entirely self-contained; it requires only RRDs - the perl interface to rrdtool. Graphs are generated and managed via CGI scripts, with a small amount of JavaScript and CSS.
nagiosgraph uses a parametric approach to configuration rather than a template approach.
nagiosgraph was first released in 2004.
nagiosgraph is free. The source code is distributed under the terms of the Artistic License.
The official nagiosgraph site is http://nagiosgraph.sourceforge.net/
The README and INSTALL files contain all of the nagiosgraph documentation. The configuration files (etc/*.conf) contain descriptions of syntax and examples.
There are a few ways to install nagiosgraph.
When installing from source (not from a deb or rpm package), the standalone layout is recommended as it is easier to update than the overlay layout.
The install script and packages were introduced in 1.4.4.
First identify whether the problem is a data collection problem or a data display problem. Data collection involves nagios, rrdtool, the nagiosgraph ''map'' file, and and the nagiosgraph insert.pl script. Data display involves the web server, rrdtool, and the nagiosgraph CGI scripts.
Scan the Frequently Asked Questions on this page and the Help Forum to see if you are facing a problem already encountered by someone else. If that does not yield a solution, follow the instructions in the Troubleshooting section of the README document. If that does not yield a solution, please post a description of your problem to the nagiosgraph Help Forum.
The nagiosgraph Help Forum is located at:
https://sourceforge.net/projects/nagiosgraph/forums/forum/394748
The log file for data collection is specified by the logfile directive in the nagiosgraph.conf file. The log file must be writable by the nagios user. If no log file is specified, or if the file cannot be written, log entries go to the nagios log file.
The log file for data display is specified by the cgilogfile directive in the nagiosgraph.conf file. The CGI log file must be writable by the web server user. If no log file is specified, or if the file cannot be written, log entries go to the web server log file.
The debug directives in the nagiosgraph.conf file control what information is logged. There are 5 log levels, from debug to critical. There are also mechanisms to specify different log levels for each host and/or service. This makes it easy to find out what is happening for a specific host and/or service, even if your installation has thousands of hosts or services.
If GD is installed, rrdtool errors will be displayed directly in the CGI output. If GD is not installed, look in the nagiosgraph CGI log or the web server error log.
Hosts and services are defined in Nagios as host_name and service_desc, respectively. A database is a single RRD file. Each database contains one or more data sources. Databases and data sources are defined in the map file by the rules that extract data from Nagios output and performance data. A single graph can display one or more data sources from one or more databases.
The configuration files control the behavior of data collection and data display. The syntax for each file is spelled out in the sample .conf files. nagiosgraph.conf is the only required configuration file.
nagiosgraph.conf
datasetdb.conf
groupdb.conf
hostdb.conf
servdb.conf
rrdopts.conf
labels.conf
access.conf
First see whether insert.pl is being invoked properly. To do this, increase the logging level in Nagios. In the nagios.cfg file, set debug_level=256 and set debug_verbosity and debug_file. Then look at the Nagios log file for clues.
If you see messages such as 'Can't locate object method "croak" via package "..."', the embedded PERL interpreter (ePN) is not able to execute insert.pl. To fix this, use non-embedded PERL by invoking insert.pl directly in the Nagios configuration, for example:
/usr/bin/perl /usr/local/nagiosgraph/libexec/insert.pl
Nagios must be restarted after any change to the Nagios configuration files (nagios.cfg, commands.cfg). Changes to the map file or nagiosgraph.conf do not require a restart of Nagios. Changes to nagiosgraph.conf might require a restart of the web server, for example if mod_perl or mod_cgi is in use and caching.
Ensure that you have configured Nagios to process performance data as detailed in the Configuring Data Processing section of the README file. Be sure that process_performance_data=1 for Nagios (typically in the nagios.cfg file) and that process_perf_data=1 for any service you want to record (typically in a service template).
Permissions
Does the nagios user have write access to the directory in which the performance data log file is located? When nagiosgraph parses performance data, it creates a temporary file next to the performance data log file, so it needs write access to the containing directory.
Does the nagios user have write access to the RRD directory? Look for messages about 'cannot create rrd directory' or 'cannot create directory' in the nagiosgraph log file. Log in as the nagios user and ensure that you can create a file in the RRD directory.
Is SELinux interfering? Use setenforce 0 to temporarily disable SELinux. If that is the cause of the problem, use setenforce 1 to re-enable SELinux, then see the SELinux documentation to configure policies that do not interfere with Nagios, the web server, or nagiosgraph.
Performance Data
Are the perfdata recognized? Look for messages about 'output/perfdata not recognized' in the nagiosgraph log file. If no map rule matches the plugin output and/or perfdata, no RRD file will be created/updated.
Are the data source names valid? Monitor the nagiosgraph log file for messages about 'ds-name is not valid'. Ensure that each map rule uses valid data source names.
Does the plugin not return output or perfdata? In nagiosgraph 1.4.3 and earlier, insert.pl would silently abort if a plugin did not return output or performance data. See this thread for the symptoms and the fix:
https://sourceforge.net/projects/nagiosgraph/forums/forum/394748/topic/3764827
With logging set to INFO (debug_insert=4 in nagiosgraph.conf), nagiosgraph 1.4.4 and later makes log entries about the number of lines of perfdata available from nagios, followed by information about how many of those lines were recognized and processed by nagiosgraph.
Ensure that permissions are set correctly. The nagios user must have write access to the RRD directory and to the directory in which the performance data log file resides. The web server user must have read access to the RRD directory and its contents.
For example, with the RRD directory at /var/nagios/rrd and the performance data log file at /var/nagios/perfdata.log, this will not work:
[user@host]% ls -la /var/nagios/
total 620
drwxr-xr-x 3 root root 4096 Mar 21 12:39 .
drwxr-xr-x 28 root root 4096 Dec 14 16:22 ..
-rw-r--r-- 2 nagios nagioscmd 610996 Mar 23 11:31 perfdata.log
drwxr-xr-x 2 nagios nagioscmd 4096 Mar 22 16:22 rrd
but this will work:
[user@host]% chown nagios:nagioscmd /var/nagios
[user@host]% ls -la /var/nagios/
total 620
drwxr-xr-x 3 nagios nagioscmd 4096 Mar 21 12:39 .
drwxr-xr-x 28 root root 4096 Dec 14 16:22 ..
-rw-r--r-- 2 nagios nagioscmd 611089 Mar 23 11:33 perfdata.log
drwxr-xr-x 2 nagios nagioscmd 4096 Mar 22 16:22 rrd
Services may emit output, performance data, or both output and performance data. Make sure there is a rule in the map file that matches the service output and/or performance data. If the service does not emit performance data, you will have to create a rule to parse the service output. See the section Adding Service Types in the README file for details.
Nagiosgraph 1.4.4 and later includes a map rule which will capture perfdata from any standards-compliant plugin. It should work as the last rule in the map file for earlier releases as well.
http://nagiosgraph.svn.sourceforge.net/viewvc/nagiosgraph/trunk/nagiosgraph/etc/map
In nagiosgraph 1.4.3 and earlier, if no perfdata and no output were emitted, processing of data would stop. See this thread for details and the fix (included in 1.4.4):
https://sourceforge.net/projects/nagiosgraph/forums/forum/394748/topic/3967175
First ensure that the data are being collected. Look in the RRD directory (specified in nagiosgraph.conf) and see if there are RRD files for the windows hosts. If there are no RRD files, you probably need to add one or more rules to the map file (also specified in nagiosgraph.conf). Here are some examples:
http://nerhood.wordpress.com/2004/09/22/nagiosgraph-with-windows-support/
http://rambling-techie.blogspot.com/2009/02/nagiosgraph-windows-clients.html
http://www.claudiokuenzler.com/blog/120/nagiosgraph-map-windows-nsclient-memory-cpu-disk
If you are using nagiosgraph 1.4.2 or earlier, please upgrade. The 1.4.3 release of nagiosgraph contains a bugfix for use of backslashes and colons in service and database names (this often shows up when using windows disk and directory names directly, such as c:).
A bug was introduced in Nagios 3.3.1. If a plugin does not emit performance data, nothing is emitted to the performance data file, even if the plugin does emit output. As a result, any plugin that does not emit performance data will be ignored, even if there is a map rule to parse its output.
See this thread:
http://www.mail-archive.com/nagios-users@lists.sourceforge.net/msg36835.html
One workaround is to create a plugin wrapper that captures the output from a plugin and formats the output as performance data in the standard format.
See the section Graphs in Nagios Mouseovers in the README file. Explicit support for popups on mouseovers was introduced in version 1.4.1.
See the section Graphs in Nagios Frames in the README file.
Simply click and drag to zoom in on a section of data in a graph. To revert back to the original zoom level, right-click anywhere on the graph. Zooming was introduced in version 1.4.3.
There are two mechanisms for controlling access: Nagios configuration files or standalone nagiosgraph access control file. See the section Configuring Access Controls in the README file. Access control was introduced in version 1.4.2.
nagiosgraph 1.4.x or earlier can display only one vertical (Y) axis.
The Nagios embedded PERL (ePN) does not understand all PERL idioms. If you see errors such as this:
ePN failed to compile /usr/local/nagiosgraph/libexec/insert.pl: "Missing right curly or square bracket at (eval 1) line 45, at end of line: syntax error at (eval 1) line 52, at EOF" at /usr/local/nagios/bin/p1.pl line 159.
you must invoke insert.pl explicitly with non-embedded PERL, for example:
/usr/bin/perl /usr/local/nagiosgraph/libexec/insert.pl
See this thread:
https://sourceforge.net/projects/nagiosgraph/forums/forum/394748/topic/3616208
Is SELinux enabled? If the web server error log contains errors such as this:
Permission denied: exec of '/usr/lib/nagiosgraph/cgi-bin/show.cgi' failed
then try temporarily disabling SELinux with setenforce 0. If that is the problem, you can re-enable with setenforce 1 then see the SELinux documentation to create a policy that does not interfere with the web server behavior.
Check the data sampling rate. The stepsize (specified in nagiosgraph), heartbeat (specified in nagiosgraph) and sampling interval (specified in Nagios) must be coordinated.
For example, if the stepsize is 300 (5 minutes - the default) and the heartbeat is 600 (10 minutes - the default), but data are sampled every 20 minutes, then every other data point in the RRD will be undefined (a value of NaN in the RRD file), resulting in fragmented graphs.
Gaps can also happen when the sampling interval is equal to the heartbeat, but sampling is delayed. For example, with a stepsize of 300 (5 minutes), a heartbeat of 600 (10 minutes), and a sampling interval of 10 minutes (specified in Nagios), and delays due to Nagios processing will result in NaN values in the RRD file and gaps in the graphs.
A good rule of thumb is to use a heartbeat that is twice the sampling interval, and a stepsize that is the same as the sampling interval.
Note that the stepsize and heartbeat are set when an RRD file is created. If you change the stepsize and/or heartbeat, you must either delete the corresponding RRD file(s) so that nagiosgraph can create a new one with the new stepsize/heartbeat, or manually modify the stepsize and/or heartbeat in the RRD files(s) by doing a dump/edit/restore.
rrdtool can record values as AVERAGE, MIN, MAX, or LAST.
By default, RRD files created by nagiosgraph record average values. Use maximums, minimums, or lasts in nagiosgraph.conf to specify the services for which data should be recorded as MAX, MIN, or LAST, respectively.
Note that if the RRD file for a service has already been created using AVERAGE (the default), you must delete the RRD file after changing the service to MAX, MIN, or LAST so that the RRD file can be re-created.
If you want to record maximum and/or minimum values in addition to average values, use withmaximums and/or withminimums in nagiosgraph.conf.
rrdtool characterizes data as one of GAUGE, COUNTER, DERIVE, or ABSOLUTE.
The data source type is specified by the rules in the map file. Most data are saved as GAUGE or DERIVE. To specify a different type, create a rule that matches the service check output/perfdata then use the desired type in the update array.
See the rrdtool documentation for details about each type:
http://www.mrtg.org/rrdtool/doc/rrdcreate.en.html
In nagiosgraph.conf, modify the resolution and step for all hosts/services/databases, or the resolutions and steps to specify values for a single host/service/database. The resolution determines the number of points that will be saved. The step determines how many values are consolidated.
This will only affect new RRD files; you must manually dump/edit/restore to resize any existing RRD files.
The default settings are:
resolution=600 700 775 797
step=1 6 24 288
To record twice as much data, use:
resolution=1200 1400 1550 1594
step=1 6 24 288
The stepsize, in seconds, defines the nominal amount of time between data points. The default value is 300 (5 minutes), which matches the Nagios sampling interval. The heartbeat, in seconds, defines the amount of time between updates before a data point should be considered unknown. The default is 600 (10 minutes) and is typically set to twice the stepsize. The resolution defines how many data points should be kept. The step defines how data points are consolidated. The xfiles factor defines how unknown data points are considered when consolidating data.
These values are used only when an RRD file is created. To change the stepsize, heartbeat, or resolution of an existing RRD, one must dump the RRD file to XML, modify the data, then restore the RRD file. Alternatively, simply delete the existing RRD file and create a new one with the new settings.
The heartbeat and stepsize must be coordinated with the values in Nagios that specify how often data will be collected, recorded , and processed (the check_interval and configuration for processing of the perfdata). If these values are not coordinated, RRD files will contain gaps in data and graphs will appear spotty.
As of nagiosgraph 1.4.3, the stepsize, heartbeat, and resolution can be specified per-host, per-service, and/or per-database. For example, data sampled from a wind sensor every 10 seconds could have a stepsize of 10 seconds while data sampled from pinging a host every 10 minutes could have a stepsize of 600 seconds.
As of nagiosgraph 1.4.4, the xff (xfiles factor) and step can be specified per-host, per-service, and/or per-database.
A typo in 1.4.3 and 1.4.4 prevents the specification of stepsizes, heartbeats, and resolutions. It has been fixed in 1.4.5. The problem and solution are described in this thread:
https://sourceforge.net/projects/nagiosgraph/forums/forum/394748/topic/4428502
For nagiosgraph up to 1.4, the following definition is used to create an RRD file:
DS:DSNAME:DST:HEARTBEAT:U:U
RRA:CF:XFF:STEP1:NROWS1
RRA:CF:XFF:STEP2:NROWS2
RRA:CF:XFF:STEP3:NROWS3
RRA:CF:XFF:STEP4:NROWS4
where:
DSNAME is the data source name. A data source name must be 1 to 19 characters long in the characters [a-zA-Z0-9_]. The data source name is specified in the map rule.
DST is the data source type, one of GAUGE, COUNTER, DERIVE, or ABSOLUTE. The data source type is specified in the map rule.
HEARTBEAT is the heartbeat specified in the nagiosgraph configuration.
CF is the consolidation function, one of AVERAGE, MIN, MAX, or LAST. AVERAGE is the default. MIN, MAX, or LAST is used when minimums, maximums, or lasts is specified. MIN and MAX are used in separate RRD files when withminimums or withmaximums is specified.
XFF is the xfiles factor defined as xff in the nagiosgraph configuration. The default value is 0.5.
STEP are step values from the nagiosgraph configuration.
NROWS are resolution values from the nagiosgraph configuration.
For example, the default configuration and the map rule:
/output:PING.*?(\d+)%.+?([.\d]+)\sms/
and push @s, [ 'pingloss',
[ 'losspct', GAUGE, $1 ]]
and push @s, [ 'pingrta',
[ 'rta', GAUGE, $2/1000 ]];
yields the following two files: hostname/PING___pingloss.rrd
DS:losspct:GAUGE:600:U:U
RRA:AVERAGE:0.5:1:600
RRA:AVERAGE:0.5:6:700
RRA:AVERAGE:0.5:24:775
RRA:AVERAGE:0.5:288:797
and hostname/PING___pingrta.rrd
DS:rta:GAUGE:600:U:U
RRA:AVERAGE:0.5:1:600
RRA:AVERAGE:0.5:6:700
RRA:AVERAGE:0.5:24:775
RRA:AVERAGE:0.5:288:797
The default nagiosgraph configuration uses these parameters:
hearbeat=600
stepsize=300
resolution=600 700 775 797
step=1 6 24 288
xff=0.5
which results in a single data source RRD file with size 24124 bytes and RRA definition:
DS:datasourcename:GAUGE:600:U:U seconds hours days years
RRA:AVERAGE:0.5:1:600 180000 50 2.08
RRA:AVERAGE:0.5:6:700 1260000 350 14.58
RRA:AVERAGE:0.5:24:775 5580000 1550 64.58
RRA:AVERAGE:0.5:288:797 68860800 19128 797.00 2.18
To record twice the historical and four times the daily data using the same consolidation factors, use these parameters:
hearbeat=600
stepsize=300
resolution=2400 1400 1550 1594
step=1 6 24 288
xff=0.5
which results in a single data source RRD file with size 56700 bytes and RRA definition:
DS:datasourcename:GAUGE:600:U:U seconds hours days years
RRA:AVERAGE:0.5:1:2400 720000 200 8.33
RRA:AVERAGE:0.5:6:1400 2520000 700 29.17
RRA:AVERAGE:0.5:24:1550 11160000 3100 129.17
RRA:AVERAGE:0.5:288:1594 137721600 38256 1594.00 4.37
Use rrdtool to manipulate the contents of RRD files. For example, to export to XML use the dump option:
rrdtool dump filename.rrd > filename.xml
To import from XML use the restore option:
rrdtool restore filename.xml filename.rrd
The default map rule creates an RRD file for each performance data metric. Each RRD file contains the metric data and may contain additional data sources for warning, critical, minimum, and maximum values.
For example, a plugin outputs the following:
OK - all is well |size=10 cost=30;100;500 level=2%;80;90;0;100
From this output, nagiosgraph creates 3 RRD files, one each for size, cost, and level. The size RRD file contains a single data source named 'data'. The cost RRD file contains 3 data sources named 'data', 'warn', and 'crit'. The level RRD file contains 5 data sources named 'data', 'warn', 'crit', 'min', and 'max'.
The graphing portion of nagiosgraph is agnostic with respect to RRD contents. The CGI scripts will display data whether they come from multiple data sources in a single RRD file, or individual data sources each in its own RRD file.