Layer7 API Management

 View Only

Monitoring/Altering of SW RAID on API GW Hardware Appliance

  • 1.  Monitoring/Altering of SW RAID on API GW Hardware Appliance

    Posted Jan 30, 2019 06:39 AM

    Hi,

    one of our customers have 8 physical HW devices (oracle X4-2).

    We had a failed HD in the raid1 cluster on the server (not HW failure - just mdadm taking the server out the raid cluster as it was corrupt)

    They wish to know if there is a way of monitoring and/or altering for the health of the RAID cluster in the future to avoid this situation ?

    Since the only reason this was noticed is that I saw it during some physical work in the data centre on the console when connected (I then confirmed this as below):


    > cat /proc/mdstat
    Personalities : [raid1]
    md0 : active raid1 sda1[0] sdb1[1]
    1048512 blocks super 1.0 [2/2] [UU]
    bitmap: 0/1 pages [0KB], 65536KB chunk

    md1 : active raid1 sdb2[1](F) sda2[0]
    291787584 blocks super 1.1 [2/1] [U_]
    bitmap: 2/3 pages [8KB], 65536KB chunk

    and:

    > mdadm --detail /dev/md1
    /dev/md1:
    Version : 1.1
    Creation Time : Fri Dec 19 19:22:52 2014
    Raid Level : raid1
    Array Size : 291787584 (278.27 GiB 298.79 GB)
    Used Dev Size : 291787584 (278.27 GiB 298.79 GB)
    Raid Devices : 2
    Total Devices : 2
    Persistence : Superblock is persistent

    Intent Bitmap : Internal

    Update Time : Mon Jan 28 14:46:31 2019
    State : clean, degraded
    Active Devices : 1
    Working Devices : 1
    Failed Devices : 1
    Spare Devices : 0

    Name : localhost.localdomain:1
    UUID : 9b7d309a:24840105:66650b93:cbb35cc9
    Events : 43398733

    Number Major Minor RaidDevice State
    0 8 2 0 active sync /dev/sda2
    2 0 0 2 removed

    1 8 18 - faulty /dev/sdb2

     

    They have SNMP enabled on the servers, but as far as I can see there are no counters visible through any of the installed MIBs that expose RAID health ?

     

    I have asked CA Support, but they can offer no solutions.

     

    mdadm COULD be used to monitor it looking at the documentation:


    e.g. mdadm --monitor --daemonise --mail=root@localhost --delay=1800 /dev/md0

    And configure the servers to a local email server.

    or even use --program to call wget and an API on the GW for alterting/SNMP trap, etc

     

    Perhaps it's just the rarity of HW appliances out there that CA Support can offer no support around this whatsoever.

    I'd be interested in what solutions others have used out there if there are any before I embark further down the approach above of using mdadm to monitor.

    stu