VMware vSphere

 View Only
Expand all | Collapse all

HP Proliant Servers Important Update

  • 1.  HP Proliant Servers Important Update

    Posted Feb 16, 2012 06:43 AM

    PLEASE CHECK THE BELOW AND ACTION

    http://bizsupport1.austin.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&taskId=110&prodSeriesId=4118472&prodTypeId=12169&objectID=c02964542

    SUPPORT COMMUNICATION - CUSTOMER ADVISORY

    Document ID: c02964542

    Version: 5

    Advisory: (Revision) HP ProLiant and HP StorageWorks Systems: HP NC375i, NC375T, NC522m, NC522SFP, NC523SFP, CN1000Q Network Adapters - FIRMWARE UPGRADE REQUIRED to Avoid the Loss and Automatic Recovery of Ethernet Connectivity or Adapter Unresponsiveness NOTICE: The information in this document, including products and software versions, is current as of the Release Date. This document is subject to change without notice.

    Release Date: 2012-02-10

    Last Updated: 2012-02-10

    IMPORTANT : The network adapter firmware and driver upgrades provided in the Resolution are required to prevent the loss and recovery of Ethernet connectivity, or adapter unresponsiveness requiring a reboot to recover, from occurring. HP recommends performing these upgrades at the customer's earliest possible convenience. Neglecting to perform the recommended action and not performing the recommended resolution could result in the potential for subsequent errors to occur.

    The HP network adapters listed in the Scope section (below) may encounter either of the following:

    • The adapter may temporarily lose Ethernet connectivity, and then automatically recover.

    OR

    • The adapter may stop responding, requiring a server reboot to recover the operation of the adapter.

    Note: There is a low probability of this occurring when operating under a normal network worklo



  • 2.  RE: HP Proliant Servers Important Update

    Posted Feb 27, 2012 11:49 AM

    After the upgrade to the latest drivers and firmware versions we still see network lost redundancy errors and all the other warnings described in the HP document.

    We use the following versions:

    ethtool -i vmnic4
    driver: nx_nic
    version: 4.0.602
    firmware-version: 4.0.579

    Anyone else still having promblems after ugprading the firmware and drivers to the latest versions?



  • 3.  RE: HP Proliant Servers Important Update

    Posted Feb 27, 2012 12:59 PM

    We are running exactly this driver/firmware combination with NC522SFP adapters on vSphere 4.1 since about two months (with 6x ProLiant DL380G6).

    Only once we had a "firmware hang" of the NIC in one host resulting in a complete loss of network connectivity. Luckily, no more problems so far.

    I say luckily , because when I opened a case with HP regarding this issue an HP engineer told me that many customers still have problems with this latest firmware, and that HP and QLogic are currently working on fixing this in the next firmware version. He couldn't tell me when we can expect that next version.

    I recommend that you open support cases with both VMware and HP regarding your issue.

    - Andreas



  • 4.  RE: HP Proliant Servers Important Update

    Posted Feb 28, 2012 04:32 AM

    Hello,

    HP advised that this issue will not be resolved with the firmware update or the driver. The issue is in the chipset itslef and the NIC cards and motherboard needs to be replaced with a revised hardware version.

    HP are replacing my NICs and board soon, I will update you in case this solves my issue.

    Regards,



  • 5.  RE: HP Proliant Servers Important Update

    Posted Feb 28, 2012 07:23 AM

    Now, that's intersting ...

    Do you have any information on how to identify the faulty chipsets/motherboards? By a specific range of serial numbers or ...?

    What servers and what NICs are you using?

    Thanks

    Andreas



  • 6.  RE: HP Proliant Servers Important Update

    Posted Feb 28, 2012 07:28 AM

    I see in the advisory that those NICS are having the issue:

    Advisory: (Revision) HP ProLiant and HP StorageWorks Systems: HP NC375i, NC375T, NC522m, NC522SFP, NC523SFP, CN1000Q Network Adapters

    As for me, I'm facing the issues on  NC375i, NC375T which HP are replacing.

    I'm not really sure how to identify the chipset fault, but I'm sure that HP are aware of it now.



  • 7.  RE: HP Proliant Servers Important Update

    Posted Apr 09, 2012 03:43 PM

    I'm not sure how HP will replace your NC375i as these are integrated on the main board or in fact with the DL58x g7 series they are on the main system riser..

    The 3 series are on the main board..

    Are you saying they are replacing the NC375i and therefore riser with a new riser with a later revision of the NC375i integrated on it. ??



  • 8.  RE: HP Proliant Servers Important Update

    Posted Apr 12, 2012 11:11 AM

    Hi there,

    We are also running the HPDL580 G7s which utilize the NC375i (quad card)

    For the past 12 months since we started using HP hardware we intermittently experience an issue where the host stops responding.  After looking through the vmkernel logs contained in the /var/log directory the following error message is displayed:

    vmkernel: 11:07:44:22.369 cpu6:4355)<3>nx_nic[vmnic0]: Firmware hang detected.

    After several discussion with HP they asked that I upgraded to the latest driver/firmware, of which I'm running the below:

    NIC Driver version: 4.0.602

    Firmware version: 4.0.579

    Even though we are now running the latest versions, we continue to see this intermittent outage.

    Earlier this week I decided enough was enough and had an extensive conversation with HP.  I wanted them to send me two dual cards which would allow me to bypass the need for the quad card entirely.  Unfortunately they would not honor this request and asked for more logs to be sent from the server which had recently failed.

    Finally a positive result!!!!  HP responded advising that they had analyzed the logs and indicated that they had seen some issues with the NC375i in the SPI board.  This was the reason we are seeing the intermittent network issues caused by the 'firmware hang'.  As a resolution, HP are now sending us new SPI boards which apparently have a new version of the NC375i card integrated into them, which have been rigorously tested.  HP have also suggested that the new board has proven to rectify the firmware hang issues.

    I'm currently waiting to take delivery of the SPI boards, at which point I plan to install it into one of our ESX hosts and check for stability prior to proceeding with the installation on my other hosts.

    I'll keep you posted as to whether finally my issue is resolved!



  • 9.  RE: HP Proliant Servers Important Update

    Posted Apr 16, 2012 10:28 AM

    This is exactly what I have reached at the end with HP. Apparently, this will resolve the problem.

    My SPIs has been replaced 3 weeks ago with no issues reported yet.



  • 10.  RE: HP Proliant Servers Important Update

    Posted Apr 16, 2012 12:51 PM

    Hi Mouhamad,

    So hopefully this should resolve my issue as well, seeing as you haven't experienced any issues in the last 3 weeks.

    All the best



  • 11.  RE: HP Proliant Servers Important Update

    Posted May 04, 2012 09:58 AM

    Hi Mouhamad

    Just curious as to whether you have experience any issues since replacing the SPI cards.

    I'm just waiting delivery of my new cards from HP.

    thanks



  • 12.  RE: HP Proliant Servers Important Update

    Posted May 04, 2012 10:01 AM

    Hello there,

    No issues at all, the only thing you need to worry about it your iLO configuration (because it will reset) and the BIOS config it will go back to default.

    Good luck!



  • 13.  RE: HP Proliant Servers Important Update

    Posted May 04, 2012 10:04 AM

    Hi Mouhamad

    Thanks for the quick response, I'm glad to hear it has resolved your problem.

    Can I just confirm exactly what drivers you are running?

    We are on the latest which is

    NIC Driver version: 4.0.602

    Firmware version: 4.0.579

    Thanks



  • 14.  RE: HP Proliant Servers Important Update

    Posted May 04, 2012 08:04 PM

    I have also been fighting this issue for about 12 months now although our issue was a little different.

    I have only ever used the onboard port for the Service Console and vMotion activity.

    When heavily utilised the onboard NC375i ports definitly drop or appear to suffer link loss, but then so do the other QLogic NIC's..

    More of an issue for us has been the NC523SFP cards which carry NFS and guest traffic, also the NC375T's which carry traffic for some unique segment requirements.

    I have had a call logged with both VMWare and HP on intially an issue with the NC523SFP regading link loss and an apparent card firmware hang.

    This call has been on going for about a month now.

    VMWare really could not define any issue with ESX but HP did eventually conceed there are some odd issues with the qLogic cards both NC523 and the NC375t. In both cases these issues had been addressed with the latest firmware... As I pointed out the previous 2 releases also resolved these issues and apparently had truely failed to do so..

    After much to and fro activity with support, we replaced the cards with a later hardware release of the NC523. Unfortunatly this did not solve anything..

    Again some more sending of logs and hours on the phone with HP 1st and 2nd level support guys.

    In the end HP could not resolve this issue or make any reasonable suggestion as to how we could resolve it.

    HP have now agreed to loan me 2x NC552SFP which will replace the NC523SFP and 2x NC365T which will replace the NC375T.

    I'm happy to say I have now had 4 days of error free no link loss, no card hangs no anything..

    In the 12 months these DL585G7's have been running this is the first time I've not seen logged link loss or hang issues in a 4 day period.

    Oh the other DL585G7's which still have the original nic configuration have experienced issues during this 4 day period.

    Of course it's been 4 days but so far so good..

    OH and even though I've not altered the 375i (so the integrated qLogic Nic's) they have been stable.

    I have for instance vMotioned some 200GB of running system (memory foot print etc) over these nic's and no failed vMotions.. Again this is a first for these servers..

    I must admit I'm not sure why the stability of the onboard nic's has improved but the change is incredibly obvious..

    I'm now going back to HP to discuss our purchase or swap out options to replace both the NC523SFP's and the NC375T's.

    I want to see this QLogic rubbish gone..



  • 15.  RE: HP Proliant Servers Important Update

    Posted May 25, 2012 05:19 PM

    An update for those interested..

    Approx. 1 month operating with the new NIC's..

    NO issues.

    I've actually ordered some new servers this month..

    I've not ordered any NC523 or NC375t's in the bundle..

    NC552 and NC365T have replaced them..

    I have a case open with HP regarding the on board NIC's



  • 16.  RE: HP Proliant Servers Important Update

    Posted May 28, 2012 08:58 AM

    Hi Markzz

    I've been running the new SPI board for a couple of weeks now without any issues so hopefully this has resolved the problem.

    Will update you again if I do experience any further issues.



  • 17.  RE: HP Proliant Servers Important Update

    Posted May 28, 2012 12:36 PM

    Thanks for the update Gooose.

    Regarding the onboard NC375i NIC's

    I've sent HP quite a lot of logs. Initially they were very keen to chase the issue but have gone quite in the past week.

    Maybe it's my turn to chase them up..

    What routine did you follow to have them supply you the new riser..



  • 18.  RE: HP Proliant Servers Important Update

    Posted May 28, 2012 02:03 PM

    Hi Markzz,

    Initially I sent logs to them from the failing hosts, and also made sure I was running the latest firmware and NIC drivers.

    I eventually got an email from their support saying that they was aware of an issue and basically it went from there.

    My only advice to you is to keep pestering them everyday and tell them that you want the new SPI boards without having to submit any further logs.  I also raised it as a complaint to get the escalation increased.

    Let me know how you get on.



  • 19.  RE: HP Proliant Servers Important Update

    Posted May 30, 2012 01:46 PM

    Hi All,

    Thought I'd just say I've experienced exactly the same issue with our HP servers. This is also affecting our physical Windows servers. I have also had the battle with HP regarding replacements. However they have agreed to send replacement SPI cards.

    My VM's are connected to vswitches which spans across both the onboard and an additional PCIe nic cards so it's not been a big issue for us. However we are getting frequent alerts regarding connectivity to the onboard nics.

    They sent a replacement out for one of our hosts. I replaced it nearly two months ago and so far so good....

    Good luck to everyone with this issue.

    Mike.



  • 20.  RE: HP Proliant Servers Important Update

    Posted Jun 08, 2012 12:01 AM

    Hi Mike and Gooose

    Are there any identifying details on the new SPI riser?

    As far as I've been able to determine the SPI riser has a spare part number of SP#591199-001 and a hardware version of V.A03

    I'm unsure as it has a bunch of number on it.

    Could you both have a look at the replacement SPI risers for identifying numbers.

    I'm trying to determine the new SPI version and part numbers.

    I'm attaching a photo of the current failing SPI

    Thank you for your assistance



  • 21.  RE: HP Proliant Servers Important Update

    Posted Jun 08, 2012 08:54 AM

    Hi Markzz,

    Unfortunately there is nothing that I can see to identify any difference with the SPI boards, as I also had this concern when I received the new ones.

    HP did say that all the SPI boards now have the required fix/firmware on them so hoepfully the parts they send you should be good.

    I'm glad to say that I installed the first replacement part on the 15th May and all seems well with the environment.

    Have you managed to get HP to ship you replacement parts?



  • 22.  RE: HP Proliant Servers Important Update

    Posted Jun 08, 2012 12:49 PM

    Hi Gooose

    Although I thought HP were on board with the idea these SPI boards were the issue but I don't seem to be able to get a firm word from them.

    It's all very unclear if they accept there is an issue or not..

    One point I don't think they appreciate is I have 3 g7's which all suffer the same HA Isolation issue during heavy load.

    Currently I'm looking through logs trying to find something to prove the issue or at least give me a lead as to where to look.

    I've looked at the Cisco switches, port etc, nothing there. The Cisco switches just show the traffic drops to nothing during these HA isolation events.



  • 23.  RE: HP Proliant Servers Important Update

    Posted Jun 08, 2012 03:07 PM

    Hi Markzz,

    The way I become aware of the problem was by connecting to the ESX host using WinSCP3 and then looking at the vmkernel logs located in the following location:

    /var/log

    The log would show the following error:

    Firmware hang detected. Severity code=0 Peg Number- Error Code=0 Return address=0

    Once you have opened the log file just do a search for 'Firmware Hang'

    I eventually managed to get 10 new SPI boards from HP but only sending them the logs from two of my hosts. I basically said I did not have the time to keep sending logs when it was obvious it was the same issue I was constantly experiencing.

    Keep me informed as to how you get on.

    cheers



  • 24.  RE: HP Proliant Servers Important Update

    Posted Jun 12, 2012 07:45 PM

    Just a bump as we have had the same disconnects - and with the new Firmware, we are seeing less disconnects but I am not sure its fixed, I think the recovery is just faster - now I get two NICs going down - and no disconnected hosts - but the host will alarm about being isolated (because both NICs were down on the Management vSwitch at the same time??)

    And then the Host Isolation (but no Host Disconnect).


    And note the recovery times - its 1-4 seconds.

    These 580's use a single NIC card for the Quad Ports - so it may make sense to add another card as a 2nd NIC vs. using 2 ports on the Quad Card.



  • 25.  RE: HP Proliant Servers Important Update

    Posted Jun 15, 2012 05:52 PM

    The saga continues.

    Although this case has been ongoing now for a number of months HP have in the past week been hot on the case..

    They have been very supportive of the issue are getting all the right people involved..

    Unfortunatly this has not culminated in a firm resolution yet but they really are putting the yards in trying to make sense of the issues.

    In our case the onboard NC375i ports are not reporting a firmware hang but do under heavy load fail to maintain connectivity to the network and therefore each other and the VC.

    This puts the hosts into either a HA partitioned state or vorse HA Isolation.

    The issue only effects the g7 hosts not the g6 hosts..

    One solution which we are likely going to implement in the short term is to add another PCIe nic (NC365t) to each server and disable the onboard NC375i ports.

    But I'm intent on not laying down and taking it. We purchased a resource from HP and I'm very keen to see this function as it is advertised to do.

    OH qLogic have released a new nx_nic firmware and driver package for all platforms

    driver: nx_nic
    version: 5.0.619
    firmware-version: 4.0.588
    bus-info: 0000:04:00.0

    This is available from the HP or VMware sites..

    http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c02964542

    This package is intended for NC375i, NC375T, NC522SFP nic's



  • 26.  RE: HP Proliant Servers Important Update

    Posted Jul 12, 2012 01:36 PM

    Markzz : What I do not understand in that new driver 5.0.619 is why does it say the Firmware of the card is at 4.0.588 when you can clearly see at boot or by booting with the HP Firmware DVD that the FW on the NIC is at 4.0.585.

    I'm trying to figure out where I could find FW .588 so that I can install it on those NICs (have 2 x NC375T per Host) and would like to have matching FW and Driver report.



  • 27.  RE: HP Proliant Servers Important Update

    Posted Jul 12, 2012 01:53 PM

    The driver is able to load a newer firmware into the device. So, probably the NIC is flashed with 4.0.585, but the new driver comes with firmware version 4.0.588 and loads it into the NIC.

    You should see related boot messages in VMKernel.log.

    - Andreas



  • 28.  RE: HP Proliant Servers Important Update

    Posted Jul 12, 2012 02:26 PM

    Do you mean that on each boot, the driver loads the firmware onto the cards hense why I do not see it when using the HP Firmware Update DVD?



  • 29.  RE: HP Proliant Servers Important Update

    Posted Jul 12, 2012 03:45 PM

    Yes, exactly.



  • 30.  RE: HP Proliant Servers Important Update

    Posted Jul 16, 2012 01:25 PM

    Lorio

    in a ssh (putty session) type

    ethtool -i vmnic0

    the vmnicx referrs to the vmnic you want information on. This will display the current driver a firmware version loaded.