Brocade Fibre Channel Networking Community

Expand all | Collapse all

Out of memory errors after 7.0.2a upgrade

  • 1.  Out of memory errors after 7.0.2a upgrade

    Posted 03-25-2013 01:44 AM

    Hello

    On several, but not all, of our Hp blade switches, aka Brocade 5480, after an upgrade from 6.4.1 to 7.0.2b the switches reboots every other day because

    the switch thinks it ran out of memory

    2013/03/23-11:37:37, , 721, FFDC | CHASSIS, CRITICAL, SW5480, haReboot is automatically triggered for warm recovery from OOM

    HP supports only suggestion is to upgrade FOS to a newer version,but this 7.0.2a is supposed to be stable and is a target path release

    Does anyone have some troubleshooting tips to find out why the switches run out of memory?


    #BrocadeFibreChannelNetworkingCommunity


  • 2.  Re: Out of memory errors after 7.0.2a upgrade

    Posted 03-25-2013 05:10 AM

    Hi,

    How often does the switch experiences this behavior? with command 'memshow' you can check the current status of the memory and review how fast the free memory decreases.

    SWITCH:admin> memshow

                total      used      free   shared    buffers    cached

    Mem:    520437760  394846208  125591552         0  33353728  125112320

    Swap:            0          0          0

    In the errdump log you should be able to check what process caused the OOM situation and forced the reboot. Also, you can debug the content of supportsave file and check the core files created when there is a OOM situation.

    on the other hand, a full reboot may help.

    Kind regards,

    Felipon


    #BrocadeFibreChannelNetworkingCommunity


  • 3.  Re: Out of memory errors after 7.0.2a upgrade

    Posted 03-25-2013 05:44 AM

    Nothing in the errdump points to a process..

    2013/03/23-15:01:00, , 910, FFDC | CHASSIS, CRITICAL, SW5480, haReboot is automatically triggered for warm recovery from OOM.

    2013/03/23-15:01:01, , 911, CHASSIS, INFO, SW5480, First failure data capture (FFDC) event occurred.

    2013/03/23-15:04:36, , 912, FFDC | CHASSIS, CRITICAL, SW5480, Rebooting the system for recovery - auto-reboot is enabled.

    2013/03/23-15:04:36, , 913, CHASSIS, INFO, SW5480, First failure data capture (FFDC) event occurred.

    2013/03/23-15:05:43, , 914, CHASSIS, INFO, SW5480, Processor rebooted - reboot.

    2013/03/23-15:05:51, , 915, CHASSIS, INFO, SW5480, SW/0 Ether/0 IPv4 DHCP 10.103.15.23/24 DHCP On.

    2013/03/23-15:05:51, , 916, CHASSIS, INFO, SW5480, CP/0 IPv4 DHCP 10.103.15.254 DHCP On.

    the switch restard every 3rd or 4th day,

    Memshow shows only 19mb left, and it has been running for only 2 days.

    other similar switches has at least 40-60 mb free memory.

    I will look into the supportsave and see if I find something

    2013/03/23-15:06:33, , 917, CHASSIS, INFO, SW5480, Initializing ports...

    2013/03/23-15:06:33, , 918, CHASSIS, INFO, SW5480, Port initialization completed.


    #BrocadeFibreChannelNetworkingCommunity


  • 4.  Re: Out of memory errors after 7.0.2a upgrade

    Posted 03-25-2013 09:00 AM

    Hi,

    In the supportsave, in the RAS_POST, you'll find the output of errdumpall command, that may provide some additional info.


    #BrocadeFibreChannelNetworkingCommunity


  • 5.  Re: Out of memory errors after 7.0.2a upgrade

    Posted 03-25-2013 10:46 AM

    From my days supporting Brocade switches, this can potentially be a long and involved process.

    I would recommend calling Brocade Support as this may require support access to the switch.

    FYI: Out of memory *might* be indicating out of storage space (Compact Flash)...


    #BrocadeFibreChannelNetworkingCommunity


  • 6.  Re: Out of memory errors after 7.0.2a upgrade

    Posted 03-26-2013 01:22 AM

    's right. in order to check the free space of  the flash storage, you can review the content of xxx.SSHOW_NET.tar.gz inside the supportsave and look into the output of command df.

    with command supportsave -R you'll delete the core files an free up some space. Also, if you have root credentials, you can execute cleanup in order to delete the unused files in the switch.

    rgds


    #BrocadeFibreChannelNetworkingCommunity


  • 7.  Re: Out of memory errors after 7.0.2a upgrade

    Posted 03-27-2013 01:22 AM

    It turned out to be the snmp daemon that took up all the memory.

    I ran the top command, hit shift+f and the selected n to sort the processes with the highest memory usage on top, and on all the switches that rebooted the snmp daemon took up over 100mb, now the daemon takes 10-13 mb.

    It should be good if the switch logged in the errdump which process had the highest memory usage at the time of the OOM reboot, maybe something for the next release???


    #BrocadeFibreChannelNetworkingCommunity


  • 8.  Re: Out of memory errors after 7.0.2a upgrade

    Posted 03-27-2013 01:33 AM

    hahaha, if it were that easy It wouldn't be fun!

    On the other hand, i want to remember that with previous code release, FOS 6.x, a switch did not perform a hareboot when this happened, it peformed a full panic, and in that situations, the logs did report the daemon that raised the OOM situation. I suppose that Brocade has considered it better this way.

    If the issue reoccurs, and you see the SNMP daemon eating up all the memory again, you should check the snmp applications accessing this switch, since an intensive polling could make the daemon consume high percentages of cpu and mem.

    Rgds


    #BrocadeFibreChannelNetworkingCommunity