Translate

Saturday, March 24, 2018

Drobo - monitoring it



Now that the Drobo unit is on the final place, connected to the server, I did set up a monitor activity of it.

The server, other than handle the backup operations, also monitor that everything is OK, Drobo included.







To do so I'm using an utility called drobom and Jenkins.
drobom is a python script that can connect with the Drobo unit and report back some information.
Mainly what I'm interested is to know how much space I have on Drobo and the status of the hard disks in it.

Drobo handles automatically many things, including the state of the hard disks.
If one starts to fail Drobo can determine how bad it is, issuing an alarm.
drobom can report this alarm.

Jenkins is used to run a script using drobom. If an error condition is detected, an email is sent out notifying the problem.
Let see that little bit more in detail.

Drobom

Once installed drobom, is necessary to call it in sudo mode.
The main commands are :

  • sudo drobom status
  • sudo drobom info


The first command returns a simple message containing how much the unit is full.
For example :

$ sudo /usr/sbin/drobom status

/dev/sdb /media/drobo Drobo 23% full - ([], 0)

To see the status of the disks, we need to run this command :


$ sudo /usr/sbin/drobom info 
---------------------------------------------------------
Info about Drobo     Name: Drobo       Devices: /dev/sdb
---------------------------------------------------------
query config result: 
(5, 16, 70368744177664)
max lun size is:  70368744177664

query capacity result:
(2508020686848, 790782369792, 3298803056640, 0)
Physical space... used:  790782369792  free:  2508020686848  Total:  3298803056640

query protocol version result: 
(0, 11)

query settings result:
(1521493890, 8, 'Drobo')
Drobo time is Mon Mar 19 16:11:30 2018

query slotinfo result:  number of slots: 5
[(0, 2000398934016, 0, 'green', 'Hitachi HUA72302SATA', 'Hitachi'), (1, 500107862016, 0, 'green', 'WDC WD5000AADS-0SATA', 'WDC WD50'), (2, 1000204886016, 0, 'green', 'ST1000DM003-1CH1SATA', 'ST1000DM'), (3, 2000398934016, 0, 'green', 'Hitachi HUA72302SATA', 'Hitachi'), (4, 0, 0, 'gray', '', '')]
query firmware result:
(11, 48, 19122, 0, 0, 'Sep 29 2016,21:42:00', 'ArmMarvell', '3.5.3', ['NO_AUTO_REBOOT', 'NO_FAT32_FORMAT', 'USED_CAPACITY_FROM_HOST', 'DISKPACKSTATUS', 'ENCRYPT_NOHEADER', 'CMD_STATUS_QUERIABLE', 'VARIABLE_LUN_SIZE_1_16', 'PARTITION_LUN_GPT_MBR', 'FAT32_FORMAT_VOLNAME', 'SUPPORTS_NEW_LUNINFO2', 'feature x0800', 'feature x2000 ', 'SUPPORTS_OPTIONS2', 'SUPPORTS_SHUTDOWN', 'SUPPORTS_SINGLE_LUN_FORMAT', 'leftovers (106d0000)'])
drobo says firmware revision:  11 . 48 ( 19122 ) was built:  Sep 29 2016,21:42:00

query status result:
([], 0)

query options result:
None
query luninfo result:
(0, 70368744177664, 790782369792, 'GPT', ['NTFS'])


---------------------------------------------------------

We actually don't need all this stuff, what is important to know is if the status of each hard disk, i.e. this part of the output :

query slotinfo result:  number of slots: 5
[(0, 2000398934016, 0, 'green', 'Hitachi HUA72302SATA', 'Hitachi'), (1, 500107862016, 0, 'green', 'WDC WD5000AADS-0SATA', 'WDC WD50'), (2, 1000204886016, 0, 'green', 'ST1000DM003-1CH1SATA', 'ST1000DM'), (3, 2000398934016, 0, 'green', 'Hitachi HUA72302SATA', 'Hitachi'), (4, 0, 0, 'gray', '', '')]

Jenkins

Based on these information I wrote a very quick and dirty script executed in Jenkins, i.e. Jenkins in the Build section has an "Execute shell" area with the script below (I hidden some info like my emails :)  ) :

#!/bin/bash
droboUsedSpace=`sudo /usr/sbin/drobom status |awk '{print $4}' | rev | cut -c 2- | rev`
droboInfo=`sudo /usr/sbin/drobom info | grep "\[(0,"`

if [ $droboUsedSpace -gt 80 ];then
     echo "WARNING !! - Drobo unit is close to the limit !! UNit full  for the $droboUsedSpace % !" | mail -s "WARNING! Drobo status" -r email email
fi

# droboINfo contains a string with the info of the inserted hard disk and their status.
hd1status=`echo $droboInfo | awk ' /'0,'/ {print $4}' | rev | cut -c 2- | rev `
hd2status=`echo $droboInfo | awk ' /'1,'/ {print $4}' | rev | cut -c 2- | rev `
hd3status=`echo $droboInfo | awk ' /'2,'/ {print $4}' | rev | cut -c 2- | rev `
hd4status=`echo $droboInfo | awk ' /'3,'/ {print $4}' | rev | cut -c 2- | rev `
hd1=`echo $hd1status |grep green`
hd2=`echo $hd2status |grep green`
hd3=`echo $hd3status |grep green`
hd4=`echo $hd4status |grep green`

if [[ -z "${hd1// }" ]];then
     echo "WARNING !! - Hard disk 1 in Drobo unit need attention ! $hd1status" | mail -s "WARNING! Drobo status" -r email email
fi

if [[ -z "${hd2// }" ]];then
     echo "WARNING !! - Hard disk 2 in Drobo unit need attention ! $hd2status" | mail -s "WARNING! Drobo status" -r email email
fi

if [[ -z "${hd3// }" ]];then
     echo "WARNING !! - Hard disk 3 in Drobo unit need attention ! $hd3status" | mail -s "WARNING! Drobo status" -r email email
fi

if [[ -z "${hd4// }" ]];then
     echo "WARNING !! - Hard disk 4 in Drobo unit need attention ! $hd4status" | mail -s "WARNING! Drobo status" -r email email
fi

echo -e "Drobo HD status \n HD1 : $hd1status \n HD2 : $hd2status \n HD3 : $hd3status \n HD4 : $hd4status \n Space used $droboUsedSpace %"


Basically I extract the drobo information and analyze them in order to decide if send an email with a warning.
The first lines capture in a variable the result of drobom query, the rest is extracting the parameters that I go to check.
If the droboUsedSpace indication is equal or greater than 80 (80%) a message is sent via email.
In this case there is the need to change some hard disk with bigger ones (currently I have two 2 Tbyte hard drive, 1 Tbyte and 1 500Gbyte).

The second part of the check extract the status from the disk.
If all OK the disk is marked as 'green'.

Any other color indicate a problem, thus the script simply check if there is the green associated to a specific disk, based on the position, 0 to 3.
If not, an email is sent indicating what hard disk has problems and thus need to be manually checked and eventually replaced.

Jenkins is running this script every day at 1 am so that in the morning I can be notified via an email about possible problems.

This is the schedule :

# Execute the job every day at 1 am
H 1 * * *

It is important to remember that every script executed in Jenkins is executed with the permission of the user "jenkins".
Be sure to give the user jenkins the sudo permission without the need to set the password.
Many commands need to be executed as root (sudo) because the need to access hardware or the capability to read/write on the entire system.

Enhancements


For fun is possible to see if other services can be involved to notify about problems.
For example, instead the email, a twit could be created by the server, or a message on services like Slack.
Maybe a contraption based on some simil-Arduino hardware or Raspberry Pi can be built, driving some kind of lamps/LED indication about problems and/or status, maybe replicating what the Drobo unit has built in.
Since my Drobo unit is in the server area, is physically not visible.


No comments:

Post a Comment