Posts about icinga

icinga nagios smart ssd

Monitoring S.M.A.R.T. attributes in Nagios/Icinga

One of my customers is using various (Samsung) SSD’s in their servers, and the first of these have started reaching their end-of-life. SSD’s have a somewhat different failure scenario then spinning metal disks, so monitoring their life-expectancy can be interesting.

Besides just logging and graphing the SMART attributes, it is also handy to have some alerting on when certain thresholds are crossed. To do this, I’ve written a simple nagios/icinga script which will alert on interesting SMART attributes, and will also calculate the percentage of total guaranteed writes on the SSD’s. Since the guaranteed TBW value will differ between various SSD vendors and product-ranges, this value needs to be specified on the command-line by the user.

/images/2016-03-11-153110_1581x46_scrot.png

I’ve integrated this check-script into my normal monitoring-scripts, but it can off-course also be used as a stand-alone tool. If has options to specify the device to smartctl, so disks behind raid-controllers can also be monitored.

The script can be found in my sysadmin repository on github: check_ssd_attribs

icinga monitor.sh monitoring nagios nrpe

A simple NRPE alternative, based on bash, cron and NSCA

Monitoring remote hosts with Nagios can be done with various methods, ranging from snmp, ssh, nrpe, of a custom solution. To monitor some ‘black-box’ appliances with a very minimal OS-environment it wasn’t possible to install/run the NRPE agent. Since I seem to be using more and more passive nagios checks with the nagios service check acceptor (NSCA), it seemed like a good idea to try and use that.

I copied most of the checks to the system and setup the NSCA configuration (/etc/send_nsca.cfg), then I created a simple bash script which is scheduled to run from cron and loops through a list of service-checks to execute.

The check-results are then fed into send_nsca to finally arrive at the monitoring system. This way you only need to allow incoming traffic on 1 port to the nagios monitoring host and have no connections going into the device being monitored.

Update: The code has been updated and moved to my github account. You can find it at: https://github.com/sigio/sysadmin in the files monitor.sh and monitor.rc