nagios, mdadm and snmp

I found this script while looking for a simple script to monitor mdadm arrays. The script is fine, but it has a subtle bug – it will never report an error because the –detail parameter is missing in the call to mdadm. I modified the script a bit, like so:

#!/bin/sh
# (c) 2008 Jasper Spaans 

worst=0
msg=""

for dev in /dev/md?* ; do \
  mdadm --misc -t --detail $dev >/dev/null
  status=$?
  if [ $status == 0 ]; then
    msg="${msg} ${dev}: ok"
  elif [ $status == 1 ] ; then
    if [ worst != 2 ] ; then
      worst=1
    fi
    msg="${msg} ${dev}: degraded"
  elif [ $status == 2 ] ; then
    worst=2
    msg="${msg} ${dev}: degraded - unusable"
  fi
done

echo "mdadm:$msg"
exit $worst

which I saved as /usr/local/bin/check-mdadm.sh.

Add in a bit of snmpd.conf config (and set up sudo accordingly, of course):

...
exec   mdadm /usr/bin/sudo /usr/local/bin/check-mdadm.sh

and a small script on the nagios side (/usr/local/bin/nagios-check-mdadm):

#!/bin/sh

SNMP=`snmpwalk -v1 -c YOUR-PUBLIC $1 extOutput |grep mdadm`
TMP1=`echo $SNMP |grep degraded`
TMP2=`echo $SNMP |sed -e 's/^.*mdadm: //'`

if [ "$TMP1" = "" ]; then
  echo "OK: $TMP2"
  return 0
else
  echo "ERROR: $TMP2"
  return 2
fi

add a bit of nagios config:

define command {
       command_name check_mdadm
       command_line /usr/local/bin/nagios-check-mdadm $HOSTADDRESS$
}

define service {
       use      defaults
       name     check_mdadm
       description   MDADM
       check_command check_mdadm
}

And voila, nagios notifications when disks fall out of the array.

This entry was posted in Sysadmin. Bookmark the permalink.

One Response to nagios, mdadm and snmp

Serge van Ginderachter says:

November 16, 2009 at 9:11 pm

I adapted a similar (maybe it was based on the same actually) script to use under Zabbix. I however discovered that certain default mdadm installs tend to do a full check on the array on a weekly basis, which triggers a false positive.

if [ "$status" = 0 ] && \
[ $(cat /sys/block/${md}/md/degraded) = 1 ] && \
$( echo $mdadmoutput | grep -e State.*resyncing -e State.*recovering >/dev/null )

then your device isn’t really degraded, but just doing a full check.

full script see http://support.ginsys.be/wsvn/scripts/zabbix/mdcheck.sh

One Response to nagios, mdadm and snmp

Leave a Reply Cancel reply

Blogroll

other

Archives