ESXi

 View Only
  • 1.  Raid rebuild error

    Posted Mar 11, 2011 04:50 PM

    Cant an issue with one of our ESXi hosts. It is a HP server with the latest CIM agents.

    We are getting random rebuild messages but no mention of failed disks. We had the server offline last week and reseated all the drives.

    We could run an offline diage of the array, but before we do this I just wondered if there is a way of looking in the VC or ESXi logs to see if it shows a failed drive for a second or 2 that then goes green again.



  • 2.  RE: Raid rebuild error

    Posted Mar 11, 2011 08:27 PM

    I would go through the ESXi logs for errors, SCSI resets, or disconnects. Consider enabling SNMP and monitor the drives. Install HP SIM to monitor the hardware.



  • 3.  RE: Raid rebuild error

    Posted Mar 11, 2011 08:47 PM

    Looked through the logs and it seems we got some scsi resets for about 15 seconds.

    I am not sure if this is a disk or raid card issue. Nothing in the logs report and CIM details. Cant install SIM as it is esxi (as far as I am aware)

    I guess we will need to run an offline diag to get to the bottom of it



  • 4.  RE: Raid rebuild error

    Posted Mar 11, 2011 09:21 PM

    HP SIM will work with ESXi. SIM communicates with the CIM modules using the WBEM configuration pages.



  • 5.  RE: Raid rebuild error

    Posted Mar 12, 2011 03:44 PM

    can you upload the log file ( /var/log/messages* ) ?



  • 6.  RE: Raid rebuild error

    Posted Mar 25, 2011 10:15 PM

    Hi,

    Sorry for the late response. I looked through the logs and it only seems to show scsi resets. I have built a windows OS on a LUN and booted into it so I could run the HP ADU.

    I have attached the file. I have had a quick look but nothing is jumping out at me. Still not sure if it a faulty disk or a problem with the actual raid controller.

    Anyone out there familiar with HP ADU log files???

    Thanks in advance



  • 7.  RE: Raid rebuild error

    Posted Apr 05, 2011 09:35 PM

    Sorted I think....

    If you look at the file each disk has 3 sections. first is since factor, second since reset and otherwhich I think just shows possible values for errors (ignore the last section)

    I found that disk 5 had rebuilt 56 times since factory, others were sub 10 and this is probably where I initially set up the array and tested rebuilds etc