Saturday, March 24, 2018

Drobo - monitoring it

Now that the Drobo unit is on the final place, connected to the server, I did set up a monitor activity of it.

The server, other than handle the backup operations, also monitor that everything is OK, Drobo included.

To do so I'm using an utility called drobom and Jenkins.
drobom is a python script that can connect with the Drobo unit and report back some information.
Mainly what I'm interested is to know how much space I have on Drobo and the status of the hard disks in it.

Drobo handles automatically many things, including the state of the hard disks.
If one starts to fail Drobo can determine how bad it is, issuing an alarm.
drobom can report this alarm.

Jenkins is used to run a script using drobom. If an error condition is detected, an email is sent out notifying the problem.
Let see that little bit more in detail.


Once installed drobom, is necessary to call it in sudo mode.
The main commands are :

  • sudo drobom status
  • sudo drobom info

The first command returns a simple message containing how much the unit is full.
For example :

$ sudo /usr/sbin/drobom status

/dev/sdb /media/drobo Drobo 23% full - ([], 0)

To see the status of the disks, we need to run this command :

$ sudo /usr/sbin/drobom info 
Info about Drobo     Name: Drobo       Devices: /dev/sdb
query config result: 
(5, 16, 70368744177664)
max lun size is:  70368744177664

query capacity result:
(2508020686848, 790782369792, 3298803056640, 0)
Physical space... used:  790782369792  free:  2508020686848  Total:  3298803056640

query protocol version result: 
(0, 11)

query settings result:
(1521493890, 8, 'Drobo')
Drobo time is Mon Mar 19 16:11:30 2018

query slotinfo result:  number of slots: 5
[(0, 2000398934016, 0, 'green', 'Hitachi HUA72302SATA', 'Hitachi'), (1, 500107862016, 0, 'green', 'WDC WD5000AADS-0SATA', 'WDC WD50'), (2, 1000204886016, 0, 'green', 'ST1000DM003-1CH1SATA', 'ST1000DM'), (3, 2000398934016, 0, 'green', 'Hitachi HUA72302SATA', 'Hitachi'), (4, 0, 0, 'gray', '', '')]
query firmware result:
drobo says firmware revision:  11 . 48 ( 19122 ) was built:  Sep 29 2016,21:42:00

query status result:
([], 0)

query options result:
query luninfo result:
(0, 70368744177664, 790782369792, 'GPT', ['NTFS'])


We actually don't need all this stuff, what is important to know is if the status of each hard disk, i.e. this part of the output :

query slotinfo result:  number of slots: 5
[(0, 2000398934016, 0, 'green', 'Hitachi HUA72302SATA', 'Hitachi'), (1, 500107862016, 0, 'green', 'WDC WD5000AADS-0SATA', 'WDC WD50'), (2, 1000204886016, 0, 'green', 'ST1000DM003-1CH1SATA', 'ST1000DM'), (3, 2000398934016, 0, 'green', 'Hitachi HUA72302SATA', 'Hitachi'), (4, 0, 0, 'gray', '', '')]


Based on these information I wrote a very quick and dirty script executed in Jenkins, i.e. Jenkins in the Build section has an "Execute shell" area with the script below (I hidden some info like my emails :)  ) :

droboUsedSpace=`sudo /usr/sbin/drobom status |awk '{print $4}' | rev | cut -c 2- | rev`
droboInfo=`sudo /usr/sbin/drobom info | grep "\[(0,"`

if [ $droboUsedSpace -gt 80 ];then
     echo "WARNING !! - Drobo unit is close to the limit !! UNit full  for the $droboUsedSpace % !" | mail -s "WARNING! Drobo status" -r email email

# droboINfo contains a string with the info of the inserted hard disk and their status.
hd1status=`echo $droboInfo | awk ' /'0,'/ {print $4}' | rev | cut -c 2- | rev `
hd2status=`echo $droboInfo | awk ' /'1,'/ {print $4}' | rev | cut -c 2- | rev `
hd3status=`echo $droboInfo | awk ' /'2,'/ {print $4}' | rev | cut -c 2- | rev `
hd4status=`echo $droboInfo | awk ' /'3,'/ {print $4}' | rev | cut -c 2- | rev `
hd1=`echo $hd1status |grep green`
hd2=`echo $hd2status |grep green`
hd3=`echo $hd3status |grep green`
hd4=`echo $hd4status |grep green`

if [[ -z "${hd1// }" ]];then
     echo "WARNING !! - Hard disk 1 in Drobo unit need attention ! $hd1status" | mail -s "WARNING! Drobo status" -r email email

if [[ -z "${hd2// }" ]];then
     echo "WARNING !! - Hard disk 2 in Drobo unit need attention ! $hd2status" | mail -s "WARNING! Drobo status" -r email email

if [[ -z "${hd3// }" ]];then
     echo "WARNING !! - Hard disk 3 in Drobo unit need attention ! $hd3status" | mail -s "WARNING! Drobo status" -r email email

if [[ -z "${hd4// }" ]];then
     echo "WARNING !! - Hard disk 4 in Drobo unit need attention ! $hd4status" | mail -s "WARNING! Drobo status" -r email email

echo -e "Drobo HD status \n HD1 : $hd1status \n HD2 : $hd2status \n HD3 : $hd3status \n HD4 : $hd4status \n Space used $droboUsedSpace %"

Basically I extract the drobo information and analyze them in order to decide if send an email with a warning.
The first lines capture in a variable the result of drobom query, the rest is extracting the parameters that I go to check.
If the droboUsedSpace indication is equal or greater than 80 (80%) a message is sent via email.
In this case there is the need to change some hard disk with bigger ones (currently I have two 2 Tbyte hard drive, 1 Tbyte and 1 500Gbyte).

The second part of the check extract the status from the disk.
If all OK the disk is marked as 'green'.

Any other color indicate a problem, thus the script simply check if there is the green associated to a specific disk, based on the position, 0 to 3.
If not, an email is sent indicating what hard disk has problems and thus need to be manually checked and eventually replaced.

Jenkins is running this script every day at 1 am so that in the morning I can be notified via an email about possible problems.

This is the schedule :

# Execute the job every day at 1 am
H 1 * * *

It is important to remember that every script executed in Jenkins is executed with the permission of the user "jenkins".
Be sure to give the user jenkins the sudo permission without the need to set the password.
Many commands need to be executed as root (sudo) because the need to access hardware or the capability to read/write on the entire system.


For fun is possible to see if other services can be involved to notify about problems.
For example, instead the email, a twit could be created by the server, or a message on services like Slack.
Maybe a contraption based on some simil-Arduino hardware or Raspberry Pi can be built, driving some kind of lamps/LED indication about problems and/or status, maybe replicating what the Drobo unit has built in.
Since my Drobo unit is in the server area, is physically not visible.

Friday, March 23, 2018

Repairing a zapper

In the springtime, summer and fall, there is an appliance that works almost 24 hours a day, at least in my garage.
A zapper.

This one, a PestiTech, did work all the last summer and fall.
Really, in Arkansas is not an option to have a zapper to limit the "things" that can fly and bite you !
The garage is the main entry for these pests, so a good zapper is really the first protection.
Now it was time to restore it and ... of course dead !

Before to buy a new one, I did buy a new set of fluorescent lamps, typically is the "component" that is damaged first.
Anyway these lamps have a limited life anyway, typically about a year.

Change the lamps is quite trivial but expensive ... let consider that you can buy a zapper like that around 30/35$. The "original lamps" cost around 30$ !!!  Absurd !!
But anyway, found some equivalent lamps for just 19.95$ (sigh) and finally when they arrived ... nothing.  The new lamps were dead like the old ones, meaning that the electronic inside driving them was dead !

Before to trash it, I decided to give a try to repair it, but before to continue, let me say that loud and clear one very important thing :


In other words, if you are not an electronic engineer, don't even think to open the unit.
There is the potential to be killed.  If you are not an expert and want to play with that anyway, well, is your right to be included in the Darwin Award.
Good luck.

Said so ... I opened the unit and started with some visual explorations.
First of all, after disconnected the unit, discharged the high voltage grid and opened the unit, I disconnected the high voltage transformer.
The transformer is powering up the grid with about 2000 Volts.

So, while repairing the electronic ballast, the probable cause of the problem, I disconnected the power to the high voltage transformer.
I don't want to be zapped :)
Note that a ballast can be dangerous as well, so always proceed with caution when handling a ballast circuit.

A quick visual inspection didn't show any evident sign of problems, like exploded capacitors, fried resistors, etc.  But of course it doesn't mean nothing is broken.
I did power up my multi-meter and started to check the fuse (it was OK) and the semiconductors.
Lots of diodes first. All were OK, but always remember that unless you remove the component from the PCB, the measure can be faulty since other components are involved in the measurement.

Then I proceeded to measure also the transistors with the multimeter.
This ballast use two ST13003  high voltage fast switching power transistor, designed actually for ballast use.
The measurement of the component seemed ok, usually is one of the component that first fail in a high voltage circuit, this why I still was suspicious about them.
If one of this transistor in broken, one of the coils is not working and thus no way to have the lamps lit up.

This is NOT the schematic of my particular unit, but it can give an idea about what I'm dealing with (schematic found on the ResearchGate website) :

Since no components showed problems via multimeter I did run also some tests under current.
After installing the two new bulbs in the socket, using a non contact tester, I verified that some voltage was present to one side of the bulbs when the circuit is powered up.

Since I did not found an explicit component damaged I decided t give a try and ordered few transistors from DigiKey.  They are not very expensive, the main cost as usual is the shipment.

After the substitution of the transistors, the circuit behaved differently but still not working, meaning that the damage was more extensive than I originally thought.
Mainly for curiosity I did measure the old transistors removed from the circuit with a transistor tester and in fact both transistors were damaged.
One lost a junction and the other degraded completely the characteristics, sign that other components are damaged as well.
In order to destroy two transistors it means the circuit was probably very poorly designed.

A "new" normal transistor. Note the hFE of 77

One of the two damaged transistors. It is seen as resistor !

The other damaged transistor. Note the hFE of 19 !!!

So at this point continue to diagnose the circuit would become quite long and more expensive.

I saw three solutions :

  1. trash away everything and buy a new zapper
  2. change completely the ballast
  3. continue to debug the circuit
I opted for the second choice.
On Amazon I did look for a cheap ballast (around 15$ including the shipment) capable to drive two 10W fluorescent lamps.
When arrived I did cut all the wires from the old ballast and connected to the new one.

Wires connected to the old ballast

New ballast wiring

As soon as powered, the lamps become alive.
Few modifications to the wiring and the adding of a in line fuse holder on the wiring completed the repair of the zapper.

Maybe if I find some time, I will try to better diagnose the old ballast circuit.
I'll try to remove the capacitors and the diodes checking them "out of the circuit" to see which one is out of specifications to cause the circuit to don't work.
But for now, the old ballast goes in the "junk box" :)


It was worth to repair the zapper ?
Economically NOT AT ALL.
Buying the new lamps, the components, the new ballast and the time spent to track the problem and repair the unit, did cost TWICE if not more the price to buy a new one !
But morally I know I did the right thing and .. I had fun doing it ! 
Well, anybody has it's way to have fun :)

Saturday, March 17, 2018

Why sometime ...

... simple things does not work or just make you crazy losing hours and hours if not days  ?

Who work in the magical world of the software development, especially embedded or firmware, sometime find himself in weird situation, when something incredibly simple and trivial, simply does not work.
Why ?

Well, many of us attribute the fault to Murphy :), or the fate, or the BluBugEater.
The truth is ... nobody knows, every time is a challenge.

So, let see an example.
Here a simple task I wanted to do : writing a simple sketch for Energia, using a MSP430 to drive an Adafruit Ring LED.

I did grab the first Launchpad board I had around, with a MSP430G2452 and I connected with the Adafruit Neopixel ring.
Then I wrote a simple sketch to turn on some LEDs.


Then I started to cut code and put simple I/O manipulation and I had weird behavior.
So I changed Launchpad, same chip, the same.
Then I loaded some example sketch to see if the board was OK ... it was.

I started to scratch my head.
So usual things then, update everything that can be updated, check libraries version, install older versions. Nothing.

Again, is important to remember that my goal was to do a quick test !
Not spending weeks on it !!!!
So I decided to see if somebody else did something on the issue.
After a brief google search I found exactly what I wanted on : LED Hat with MSP430

So I followed the instructions, connected everything and ... nothing !!!

Time to put hand on some instrumentation.
Fired up my oscilloscope and I discovered the signal generated by the MSP430 was totally out of timing.
Instead of a period of 1.2 uSec, I had something around milliseconds !
No wonder the ring was dead.

So back again, replace Launchpad with a new one. The ones I was using had the crystal soldered, I though that maybe something was wrong with that (should not).
But as you can guess .. no luck.

Ok, at this point I decided to change chip.
I had around a MSP430G2231 but it was too small for the sketch.
So after more digging in my "rufo box" (rufo is an "Italian/dialect word to indicate junk :) ) I did found an MSP430G2553, installed it on one of the boards, recompiled the code and ... voila' !
It worked immediately.

In the end I spent almost two days  for a stupid damaged chip to do a test that would have required 5 minutes to be done.