VMware vSphere

 View Only
Expand all | Collapse all

Re: High latency during backup on the virtual machine

  • 1.  Re: High latency during backup on the virtual machine

    Posted Feb 14, 2013 03:09 AM

    I have a Database Server installed as a virtual machine on a host. I found that every time when there is a backup, the latency increases to a few hundred millisecond. And the event log in vSphere shows a warning "Device naa.xxxx (iSCSI storage) performance has deteriorated,I/O latency increased from average value ....". I have checked the KB and changed some configurations in vSphere but still can't improve the latency of the iSCSI storage. Does anyone experience this issue too? This issue has annoyed me for a long time, could someone give help to me? Thanks a lot!



  • 2.  RE: Re: High latency during backup on the virtual machine

    Broadcom Employee
    Posted Feb 14, 2013 04:53 AM

    Hi,

        You will certainly see performance has deteriorated messages if there is a over load on the underlying storage.Please refer KB:http://kb.vmware.com/kb/2007236



  • 3.  RE: Re: High latency during backup on the virtual machine

    Posted Feb 14, 2013 05:34 AM

    Yes, I have read this article before. But is there any solution to improve the latency or the only solution is to replace the iSCSI storage?Thanks



  • 4.  RE: Re: High latency during backup on the virtual machine

    Posted Feb 14, 2013 05:49 AM

    How many IOPS are you pushing during this period and what sort of array do you have serving the VM?



  • 5.  RE: Re: High latency during backup on the virtual machine

    Posted Feb 14, 2013 06:10 AM

    Hi Dave, please see the attached picture. This is just one of the hosts which has only Database server installed. The other host has vCenter,Application server and Web Server installed, but that is normal regarding the disk latency, there are just a few warnings happen in a day. The second device shown in the picture is the iSCSI storage. I'm using Netgear ReadyNAS 3100 as an ISCSI storage serving the VM. Thanks



  • 6.  RE: Re: High latency during backup on the virtual machine

    Broadcom Employee
    Posted Feb 14, 2013 06:00 AM

    Hi,

        Appreciate your update,i wont suggest you replace Storage without knowing the root cause.

    1.What was the Davg value reported against the lun's which were part of back-up job?It should not go above 10 on a constant basis

    Highdevice latency can be a result of an over-loaded or improperly architected storage device

    that is having difficulty keeping up with all the storage I/O that is occurring. Keep in mind

    that multiple hosts can access the same storage devices concurrently, so the combined I/O

    between all the hosts may be too much for it to handle.Some bottlenecks can easily be

    fixed by changing settings or by balancing storage I/O across datastores.Other bottlenecks can be more difficult
    to resolve and may require adding additional storage capacity or upgrading components

    2.Do we have any lun's were in we didn't had high latency even though they were part of back up job?

    3.Is the storage array supported as per Vmware HCL>http://www.vmware.com/resources/compatibility/search.php?deviceCategory=san

    4.Hope the lun was not part of any replication(Internal/External) from Storage perspective?



  • 7.  RE: Re: High latency during backup on the virtual machine

    Posted Feb 14, 2013 06:26 AM

    Hi,

    1. The Davg is about 60 during the backup job which is abnormal, as I found that Davg should be under 25.

    2. We have the other LUN using another ISCSI storage, HP instead of Netgear in our system, and that doesn't have this problem. That's why I wonder if it is the problem of ReadyNAS ISCSI storage.

    3.The storage array is supported as per VMware HCL

    4. Sorry I'm still new to VMware. What does internal/external replication mean?

    Thanks a lot!



  • 8.  RE: Re: High latency during backup on the virtual machine

    Broadcom Employee
    Posted Feb 14, 2013 06:42 AM

    Hello ,

             Thanks for your update.

    1. The Davg is about 60 during the backup job which is abnormal, as I found that Davg should be under 25.

        As per the screen shot i can cleary see DAVG value is high.As per Vmware recommendation it should not go above 10!!

    What is the count prior to back-up activity?

    2. We have the other LUN using another ISCSI storage, HP instead of Netgear in our system, and that doesn't have this problem.That's why I wonder if it is the problem of ReadyNAS ISCSI storage.

    Most likey.Do you have Storage Vmotion configured?If so can you please migrate this VM and do check the latency?

    3.The storage array is supported as per VMware HCL

    Yes,you are correct >http://www.vmware.com/resources/compatibility/detail.php?deviceCategory=san&productid=14582&deviceCategory=san&partner=65&arrayTypes=1&isSVA=1&page=1&display_interval=10&sortColumn=Partner&sortOrder=Asc

    Are you following the right path policy for the respetive esx/esxi version?

    4.Sorry I'm still new to VMware. What does internal/external replication mean?

    Internal replications are Clone/Snap activity within the same Storage Array

    External replications are between Storage array



  • 9.  RE: Re: High latency during backup on the virtual machine

    Posted Feb 14, 2013 07:19 AM

    Hi Sreec,

    1. Before the back-up activity, the Davg/cmd is just about 1.

    2. I have Storage Vmotion configured between the two hosts. I tried it before but more less the same. The latency is still high and warnings still show.

    3. I'm using ESXi 5.0, however I have changed the path policy from MRU to RR as suggested in some posts in order to improve the latency. But it doesn't help much. Indeed, my system was even down because of the latency. But after I have changed some configurations( change from Broadcom iSCSI adapter to software iSCSI adapter, change iops from 1000 to 3 in Path Selection Policy Device Config etc), now there are only warnings.

    Thanks



  • 10.  RE: Re: High latency during backup on the virtual machine

    Broadcom Employee
    Posted Feb 14, 2013 07:29 AM

    Hi,

        Much Appreciated :smileyhappy:

    2. I have Storage Vmotion configured between the two hosts. I tried it before but more less the same. The latency is still high and warnings still show.

    Did you migrate the VM from Netgear- HP  datastore? Still the issue is same?



  • 11.  RE: Re: High latency during backup on the virtual machine

    Posted Feb 14, 2013 07:36 AM

    Hi,

    I would like to migrate the VM from Netgear to HP datastore but the size are not enough for the DB Server to migrate, and I don't have extra disks for the datastore.



  • 12.  RE: Re: High latency during backup on the virtual machine

    Posted Feb 14, 2013 06:30 AM

    Is there anything that I can change in the Database Server in order to improve the latency during the backup job? Normally, the backup job just takes 1.5mins and the backup size is just 0.8G, so can I extend the duration of the backup?



  • 13.  RE: Re: High latency during backup on the virtual machine

    Posted Feb 14, 2013 06:47 AM

    What version and edition of SQL are you using?



  • 14.  RE: Re: High latency during backup on the virtual machine

    Posted Feb 14, 2013 07:03 AM

    Microsoft SQL Server 2008 R2
    Microsoft SQL Server Management Studio version: 10.50.1600.1



  • 15.  RE: Re: High latency during backup on the virtual machine

    Posted Feb 14, 2013 07:35 AM

    If you're using backup compression you could potentially use Resource Governor to throttle the backup process.  Or if you're using a seperate virtual disk for backups you could limit IOPS to that disk (the downside would be slow restores unless you remembered to change the limit first).

    How many disk make up the problem LUN and is there other loads on it?



  • 16.  RE: Re: High latency during backup on the virtual machine

    Posted Feb 14, 2013 07:47 AM

    There are four disks in the iSCSI storage.All of them are Seagate ST1000NM011, SATA 1GB. The whole iSCSI storage is only for the MIS, which consist of vCenter, Application Server, Database Server and Web Server.



  • 17.  RE: Re: High latency during backup on the virtual machine

    Posted Feb 14, 2013 08:04 AM

    I tried to copy a backup from E:/ to C:/Desktop in the Database Server. During this period, the Davg/cmd rises to about 300 and there are about 30 latency warnings shown in the event log of the vSphere. After the copy is finished, I found that C:/pagefile.sys has a write speed at 20MB/sec from Resource Monitor, which uses a lot of Disk I/O. I wonder if this is related to the backup. Thanks a lot!



  • 18.  RE: Re: High latency during backup on the virtual machine

    Broadcom Employee
    Posted Feb 14, 2013 08:41 AM

    Hi,

         When the DAVG spikes,are there any errors logging against those lun's from Storage Perspective?



  • 19.  RE: Re: High latency during backup on the virtual machine

    Posted Feb 14, 2013 09:11 AM

    Hi,

    When the Davg spikes, there are only warnings saying the latency increase from the iSCSI storage. And what I see from the datastore's performance chart is that the read latency increases to about 120ms, and sometimes it can jump to 500ms.



  • 20.  RE: Re: High latency during backup on the virtual machine

    Broadcom Employee
    Posted Feb 14, 2013 09:29 AM

    Hi,

        I was in need of errors against those lun's from the Storage Array.Does you storage have any cmd line utility which will help us to filter out errors/events logging against that lun during back_up time?

    If you still have slots available in your HP array,i would suggest you to go ahead and push one new disk,carve out a lun and present the same,later move forward with the SV motion to HP datastore which is newly created.If you cant do that,still would like to know whats going wrong? we may have to do some test like sending few I/Os to the lun and capturing the ingress/outgress traffic from the Switch/Storage which will certainly let us know where the delay is happening.



  • 21.  RE: Re: High latency during backup on the virtual machine

    Posted Feb 14, 2013 09:58 AM

    Hi,

    There are some errors "kernel: bonding: bond0: Error: cannot release eth0."&"kernel: bonding: bond0: Error: cannot release eth1."from the log files of the storage array, but I have no idea what they are. The storage array(ReadyNAS 3100) has no cmd line utility, there is no ssh access to it. I am not able to add one more new disk to the HP array due to some reasons. How can we do some tests like sending few I/Os to the lun? Using IOmeter? But I'm not familiar with it, could you give me some hints to do that? Thanks a lot!



  • 22.  RE: Re: High latency during backup on the virtual machine

    Broadcom Employee
    Posted Feb 14, 2013 10:07 AM

    Hi,

        You can refer http://kb.vmware.com/kb/1006821

    "There are some errors "kernel: bonding: bond0: Error: cannot release eth0."&"kernel: bonding: bond0: Error: cannot release eth1"

    Since this is ISCSI array i believe those are etherner controllers.Is there a way where in you can reset/restart the controllers?



  • 23.  RE: Re: High latency during backup on the virtual machine

    Posted Feb 15, 2013 01:43 AM

    Hi,

    Thanks. Yes, there are dual NICs on this iSCSI storage array(ReadyNAS 3100)http://www.readynas.com/?page_id=3714, but I think there is no way to reset/restart the controllers. The only way I think is to restart the array, but I tried it before and nothing changed. Here are the errors in the log files starting from Jan 15. However, this issue started from Jan 28 till now. The MIS was down because of the latency, but now it is running with only warnings of latency increase.

    Line 1518: Jan 15 04:00:38 SAN01 RAIDiator: Reallocated sector count has increased in the last day.\n\nDisk  4:\n  Previous count: 11\n  Current count: 12\n\nGrowing SMART errors indicate a disk that may fail soon.  If the errors continue to increase, you should be prepared to replace the disk.
    Line 1518: Jan 15 04:00:38 SAN01 RAIDiator: Reallocated sector count has increased in the last day.\n\nDisk  4:\n  Previous count: 11\n  Current count: 12\n\nGrowing SMART errors indicate a disk that may fail soon.  If the errors continue to increase, you should be prepared to replace the disk.
    Line 1573: Jan 21 04:00:23 SAN01 RAIDiator: Reallocated sector count has increased in the last day.\n\nDisk  1:\n  Previous count: 2\n  Current count: 3\n\nGrowing SMART errors indicate a disk that may fail soon.  If the errors continue to increase, you should be prepared to replace the disk.
    Line 1573: Jan 21 04:00:23 SAN01 RAIDiator: Reallocated sector count has increased in the last day.\n\nDisk  1:\n  Previous count: 2\n  Current count: 3\n\nGrowing SMART errors indicate a disk that may fail soon.  If the errors continue to increase, you should be prepared to replace the disk.
    Line 3226: Jan 28 16:26:48 SAN01 kernel: rx_data() returned an error.
    Line 3905: Feb  3 13:48:41 SAN01 kernel: bonding: bond0: Error: cannot release eth1.
    Line 3907: Feb  3 13:48:43 SAN01 kernel: bonding: bond0: Error: cannot release eth1.
    Line 3909: Feb  3 13:48:43 SAN01 kernel: bonding: bond0: Error: cannot release eth0.
    Line 3911: Feb  3 13:48:45 SAN01 kernel: bonding: bond0: Error: cannot release eth0.
    Line 5034: Feb  6 10:53:21 SAN01 kernel: bonding: bond0: Error: cannot release eth1.
    Line 5037: Feb  6 10:53:23 SAN01 kernel: bonding: bond0: Error: cannot release eth1.
    Line 5040: Feb  6 10:53:23 SAN01 kernel: bonding: bond0: Error: cannot release eth0.
    Line 5043: Feb  6 10:53:25 SAN01 kernel: bonding: bond0: Error: cannot release eth0.
    Line 5183: Feb  6 16:14:44 SAN01 kernel: bonding: bond0: Error: cannot release eth1.
    Line 5186: Feb  6 16:14:46 SAN01 kernel: bonding: bond0: Error: cannot release eth1.
    Line 5189: Feb  6 16:14:46 SAN01 kernel: bonding: bond0: Error: cannot release eth0.
    Line 5192: Feb  6 16:14:48 SAN01 kernel: bonding: bond0: Error: cannot release eth0.


  • 24.  RE: Re: High latency during backup on the virtual machine

    Broadcom Employee
    Posted Feb 15, 2013 02:43 AM

    Hi,

       Hope your day is going great.Thanks for sharing http://www.readynas.com/?page_id=3714 :smileyhappy:

    1.Are we running with lower RPM drives? Can you compare RPM of HP/Netgear?

    2.Error: cannot release eth1/Error: cannot release eth0 ,i did a research but i'm not able to confirm what this error is :smileysad: ,any problem with those nics/cables ?



  • 25.  RE: Re: High latency during backup on the virtual machine

    Posted Feb 15, 2013 03:15 AM

    Hi,

    Thanks for your immediate response.


    1.For Netgear iSCSI Storage, 4x1TB Seagate disks inside are all running with 7200RPM, SATA 6Gb/s.(Model no:ST1000NM0011)

       For HP iSCSI Storage, 8x450GB Seagate disks inside are all running with 15000RPM, SAS 3Gb/s (Model no:ST3450856SS)

    2. I am not sure whether there is any problem with the NIC. I have swapped two cables which are connected to the iSCSI storage in order to test whether one of them are faulty, but nothing changed. I found from the log files that when the hosts lost connection from the iSCSI storage, the following log is shown, but not sure what it is. I have sent the log file to Netgear, still waiting for their reply. I have emailed them before indeed, but it seems that they don't help much on that.

    *Attached are the picture of two ethernet port status.

    Jan 28 16:30:06 SAN01 kernel: e1000e: eth1 NIC Link is Down
    Jan 28 16:30:06 SAN01 ifplugd(eth1)[2711]: Link beat lost.
    Jan 28 16:30:07 SAN01 ifplugd(eth1)[2711]: Executing '/etc/ifplugd/ifplugd.action eth1 down'.
    Jan 28 16:30:07 SAN01 avahi-daemon[3045]: Withdrawing address record for 10.121.253.70 on eth1.
    Jan 28 16:30:07 SAN01 avahi-daemon[3045]: Leaving mDNS multicast group on interface eth1.IPv4 with address 10.121.253.70.
    Jan 28 16:30:07 SAN01 avahi-daemon[3045]: Interface eth1.IPv4 no longer relevant for mDNS.
    Jan 28 16:30:07 SAN01 ifplugd(eth1)[2711]: client: No udhcpc found running; none killed.
    Jan 28 16:30:07 SAN01 ifplugd(eth1)[2711]: client: udhcpc: no process killed
    Jan 28 16:30:07 SAN01 ifplugd(eth1)[2711]: client: INFO: User requested network portals update.
    Jan 28 16:30:07 SAN01 ifplugd(eth1)[2711]: Program executed successfully.
    Jan 28 16:41:30 SAN01 kernel: e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
    Jan 28 16:41:31 SAN01 ifplugd(eth1)[2711]: Link beat detected.
    Jan 28 16:41:32 SAN01 ifplugd(eth1)[2711]: Executing '/etc/ifplugd/ifplugd.action eth1 up'.
    Jan 28 16:41:32 SAN01 ifplugd(eth1)[2711]: client: 13465: old priority 0, new priority 0
    Jan 28 16:41:32 SAN01 avahi-daemon[3045]: Joining mDNS multicast group on interface eth1.IPv4 with address 10.121.253.70.
    Jan 28 16:41:32 SAN01 avahi-daemon[3045]: New relevant interface eth1.IPv4 for mDNS.
    Jan 28 16:41:32 SAN01 avahi-daemon[3045]: Registering new address record for 10.121.253.70 on eth1.IPv4.
    Jan 28 16:41:32 SAN01 avahi-daemon[3045]: Withdrawing address record for 10.121.253.70 on eth1.
    Jan 28 16:41:32 SAN01 avahi-daemon[3045]: Leaving mDNS multicast group on interface eth1.IPv4 with address 10.121.253.70.
    Jan 28 16:41:32 SAN01 avahi-daemon[3045]: Interface eth1.IPv4 no longer relevant for mDNS.
    Jan 28 16:41:32 SAN01 avahi-daemon[3045]: Joining mDNS multicast group on interface eth1.IPv4 with address 10.121.253.70.
    Jan 28 16:41:32 SAN01 avahi-daemon[3045]: New relevant interface eth1.IPv4 for mDNS.
    Jan 28 16:41:32 SAN01 avahi-daemon[3045]: Registering new address record for 10.121.253.70 on eth1.IPv4.
    Jan 28 16:41:34 SAN01 ifplugd(eth1)[2711]: client: metric: Unknown host
    Jan 28 16:41:34 SAN01 ifplugd(eth1)[2711]: client: INFO: User requested network portals update.
    Jan 28 16:41:34 SAN01 ifplugd(eth1)[2711]: client: INFO: Adding network portal 10.121.253.70 for iqn.2012-11.SAN01:san.
    Jan 28 16:41:34 SAN01 upnpd(eth1)[13431]: received signal 15, exiting
    Jan 28 16:41:34 SAN01 upnpd(eth1)[13495]: Listening on 10.121.253.70:50000
    Jan 28 16:41:34 SAN01 ifplugd(eth1)[2711]: Program executed successfully.


  • 26.  RE: Re: High latency during backup on the virtual machine

    Broadcom Employee
    Posted Feb 15, 2013 03:54 AM

    Hi,

        Screen shot looks informative

    1)RX Dropped 74965 Ethernet2
    2)RX Dropped 55332 Ethernet1

    Why recieving packets are getting dropped?

    2)Jan 28 16:30:06 SAN01 kernel: e1000e: eth1 NIC Link is Down
       Jan 28 16:30:06 SAN01 ifplugd(eth1)[2711]: Link beat lost.
    Since there is no manual activity(Removal of cable),why is the Nic link going down?

       For Netgear iSCSI Storage, 4x1TB Seagate disks inside are all running with 7200RPM, SATA 6Gb/s.(Model no:ST1000NM0011)

       For HP iSCSI Storage, 8x450GB Seagate disks inside are all running with 15000RPM, SAS 3Gb/s (Model no:ST3450856SS)

    Is there a chance were in you can make use of higher RPM disk in Netgear? or try with a SAS disk ?



  • 27.  RE: Re: High latency during backup on the virtual machine

    Posted Feb 15, 2013 04:28 AM

    Hi,

    1. I have no idea why RX packets keep on dropping.

    2. I wonder if NIC link going down is the cause or the result of this issue. Because the hosts lost connection from iSCSI storage when the warnings of latency increase appeared for many times. But for now, the lost connection doesn't happen any more. It's not easy for me to replace higher RPM disks in Netgear, because the iSCSI storage is in a remote place. And I just have 4x500G,7.2k rpm, SAS MDL disks, I don't know if these disks are much better than the 7.2k rpm,SATA one.



  • 28.  RE: Re: High latency during backup on the virtual machine

    Posted Feb 15, 2013 07:15 AM

    opps..I just found that SAS disk is not compatible with ReadyNAS 3100. http://www.readynas.com/?page_id=82

    I tried to find other disks which are compatible with ReadyNAS 3100, but their disk performance are just more less the same as the current disks installed in the Netgear.