I found this script while looking for a simple script to monitor mdadm arrays. The script is fine, but it has a subtle bug – it will never report an error because the –detail parameter is missing in the call to mdadm. I modified the script a bit, like so:
#!/bin/sh # (c) 2008 Jasper Spaansworst=0 msg="" for dev in /dev/md?* ; do \ mdadm --misc -t --detail $dev >/dev/null status=$? if [ $status == 0 ]; then msg="${msg} ${dev}: ok" elif [ $status == 1 ] ; then if [ worst != 2 ] ; then worst=1 fi msg="${msg} ${dev}: degraded" elif [ $status == 2 ] ; then worst=2 msg="${msg} ${dev}: degraded - unusable" fi done echo "mdadm:$msg" exit $worst
which I saved as /usr/local/bin/check-mdadm.sh.
Add in a bit of snmpd.conf config (and set up sudo accordingly, of course):
... exec mdadm /usr/bin/sudo /usr/local/bin/check-mdadm.sh
and a small script on the nagios side (/usr/local/bin/nagios-check-mdadm):
#!/bin/sh SNMP=`snmpwalk -v1 -c YOUR-PUBLIC $1 extOutput |grep mdadm` TMP1=`echo $SNMP |grep degraded` TMP2=`echo $SNMP |sed -e 's/^.*mdadm: //'` if [ "$TMP1" = "" ]; then echo "OK: $TMP2" return 0 else echo "ERROR: $TMP2" return 2 fi
add a bit of nagios config:
define command { command_name check_mdadm command_line /usr/local/bin/nagios-check-mdadm $HOSTADDRESS$ }
define service { use defaults name check_mdadm description MDADM check_command check_mdadm }
And voila, nagios notifications when disks fall out of the array.
I adapted a similar (maybe it was based on the same actually) script to use under Zabbix. I however discovered that certain default mdadm installs tend to do a full check on the array on a weekly basis, which triggers a false positive.
if [ "$status" = 0 ] && \
[ $(cat /sys/block/${md}/md/degraded) = 1 ] && \
$( echo $mdadmoutput | grep -e State.*resyncing -e State.*recovering >/dev/null )
then your device isn’t really degraded, but just doing a full check.
full script see http://support.ginsys.be/wsvn/scripts/zabbix/mdcheck.sh