Disaster Recovery

 View Only
Expand all | Collapse all

IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

  • 1.  IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

    Posted May 30, 2012 09:42 AM

    Hi All

    I wonder if anyone can help.

    We have 2 IBM Storwize V7000's - 1 at the Protected Site and 1 at the Recovery Site.

    We have a test Volume that is GlobalMirror 'repliacted' to the DR side - showing 'Consistent Synchronized'.

    Within SRM 5.0 - Array Managers I can see that the Volume is found and replicated as expected. Not sure if our replicated Volume needs to be in a Consisteny group tho?

    When we run a Recovery Plan it fails at Step 4. and we get this error:-

    4. Create Writeable Storage Snapshot

    Error - Failed to create snapshots of replica devices.

    SRA command 'testFailoverStart' failed. sraError.38295F72-F7D0-4A0F-B8D4-FF9821AB2675.1.desc sraError.38295F72-F7D0-4A0F-B8D4-FF9821AB2675.1.fixHint

    We are using the IBM V7000 SRA version 2.x (downloaded from VMware). Our IBM V7000's are Firmware 6.2.0.5.

    The SRA is configured with the tick box 'Pre-Configured Env.' ticked. All other settings greyed out (as shown):

    I have set the Advanced Settings option: storage.Commandtimeout to 1800 in 'Sites' as suggested by IBM.

    Has anyone out there got their IBM V7000's working with SRM 5.0? If so are we missing a piece of the jigsaw somewhere? Does the DR side (on the V7000) need a FlashCopy Mapping as well?

    Thanks in advance.



  • 2.  RE: IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

    Posted May 30, 2012 11:20 AM

    Hi,

    if you run the recovery plan is the failover process working completly?

    In your case you are testing a recovery plan, otherwise you wouldnt have the step "4. Create Writeable Storage Snapshot".

    I don´t know your IBM system but the snapshot function (in your case it seems to be FlashCopy), needs to be working mainly on your DR site! Because your storage system at the DR site is creating the snapshot not the system at the protected site. Of course it should be also on your protected site for testing the failback later after you did a failover (or in a bi-directional setup), but in the first step just your DR site matters.

    Regards,

    Mario



  • 3.  RE: IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

    Posted May 30, 2012 12:11 PM

    Hi

    Thanks for the input - yes we are just doing a Test not a Recovery. The test fails at Step 4. We do have a FlashCopy Mapping of the Volume thats been replicated at the DR side. 

    Is there anyone with specific IBM V7000 experience & SRM 5.0 out there? thanks



  • 4.  RE: IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

    Posted May 30, 2012 03:10 PM

    Silly question: do you have enough free space for the flashcopy? I know that the free space must be 20% (at least) of the volume you want to snapshot...



  • 5.  RE: IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

    Posted May 31, 2012 10:05 AM

    Yep - enough free space. We are now in the process of testing a new SRA code release from IBM....



  • 6.  RE: IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

    Posted Nov 26, 2012 02:48 PM

    Hi,

    Is you SRM working now with v7000 without any issue .What is your SRA and SRM version you are running

    Thanks,

    Visak



  • 7.  RE: IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

    Posted Mar 21, 2013 08:55 PM

    HI - I know this article is a little dated, but I am curious if you ever found a solution to your issues. We are having very similar issues, using exactly what you have. The IBM V7000 Storwize, VMware SRM version 5, and ESXi 5 Update1.

    I know we've had to adjust some of the storageProvider settings, under Advanced Settings for both sites (protected and recovery) but that hasn't solved all of our issues. We still have failed tests, with random and similar errors.

    If you can, will you update this discussion as to what you did to resolve your issues? Also, all of our LUNs are in consistency groups, which are in various Protection Groups, depending on the test I am performing.

    Thank you!



  • 8.  RE: IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

    Posted Apr 03, 2013 05:46 PM

    Hi, Having the same problem here:


    Site Recovery Manager 5.02

    IBM Storwize v7000 6.4.1.3
    SRA IBMSVCSRA_v2.1.0.121224



    "Error: Failed to create snapshots of replica devices. Failed to create snapshot of replica consistency group ..."



  • 9.  RE: IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

    Posted Apr 04, 2013 02:12 PM

    When we received the error that you have about failing to create snapshot of replica devices, we had to make sure that the flash copies at the DR site were mapped to the DR ESXi host cluster, but NOT the change volumes at the DR site. Once we did that the failure to create the snapshots stopped.

    I forget if there was anything else we did, on the SAN.

    In SRM, for both the protected and recovery site, I went into advanced settings for each site (right click and select advanced settings once in SRM) and changed the storage provider then under storageProvider.hostRescanRepeatCnt [number of repeated host rescans during test and recover] I changed it from 1 to 2.. that still did not give me the result i needed so i changed it from 2 to 3. It helps give the hosts time to rescan and mount any "new" snapshot flashcopies that are slower to prepare.

    Do NOT change the storageProvider.resignatureFailureRetryCount [number of times to retry resignaturing a VMFS colume (after a failure)] to anything but 1... this will fail miserably to the extent that the test will instantly fail, and actually cause your flash copies from production to DR to stop!

    Thats just the update on what we are going through right now. I have an open PMR with IBM for the remaining issues with the SRA not working well with the SAN. I will keep you posted.

    To note, we are at the following code levels as of yesterday:

    vCenter 5.1

    SRM 5.1

    SRA v2.1.0.120916   This I pulled down from VMware's approved SRAs under the My Downloads page..



  • 10.  RE: IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

    Posted Apr 08, 2013 01:24 PM

    Hello, I'm using the 6.4.1.3 microcode, which microcode are you using ?

    Thanks in advance



  • 11.  RE: IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

    Posted Apr 08, 2013 02:25 PM

    Same error here.  Have not done any setting in the advanced setting.

    On IBM Storwise 7000, 6.4

    SRM 5.1.1

    vSphere 5.1

    Same thing failing at the snapshot.

    Both flashcopy has been enabled and mapped.  Any update to the error resolution would be good.

    Current status, we are able to by pass the above error now we are facing "failed to recover datastore vmfs volume" issues.  I have increased the HBA per host scan time to 3000 from 1800.  The repeat count is the same remaining at default of 3.



  • 12.  RE: IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

    Posted Apr 09, 2013 09:52 AM

    Here goes my update on this.

    Created a new volume(small one 3GB, installed minimal linux for testing), consistency group with change volumes, downgrade SRA from IBMSVCSRA_v2.1.0.121224 To SRA 2.1.0.121108, after that, I was able to test SRM with success.

              1. Synchronize Storage Skipped   
              1.1. Protection Group teste Skipped   
              2. Restore hosts from standby Success 2013-04-09 09:41:28 (UTC 0) 2013-04-09 09:41:28 (UTC 0) 
              3. Suspend Non-critical VMs at Recovery Site Inactive   
              4. Create Writeable Storage Snapshot Success 2013-04-09 09:41:28 (UTC 0) 2013-04-09 09:43:06 (UTC 0) 
              4.1. Protection Group teste Success 2013-04-09 09:41:28 (UTC 0) 2013-04-09 09:43:06 (UTC 0)

    But the issue still remains on largest volumes (ex. 500GB), still fails with change volumes:

         "Error - Failed to create snapshots of replica devices. SRA command 'testFailoverStart' failed. Invalid Array ID. Refer to IBM SAN Volume Controller      troubleshooting"

    the strange thing is that it creates the snapshot and the consistency group fcmap

    When i try to run the clean process under SRM, it fails with:

         "Error - Failed to delete snapshots of replica devices. SRA command 'testFailoverStop' failed. Invalid Array ID. Refer to IBM SAN Volume Controller      troubleshooting"

    I have to force clean up, but then again I get the follwing warning message:

         "Warning - Failed to delete snapshots of replica devices. SRA command 'testFailoverStop' failed. Invalid Array ID. Refer to IBM SAN Volume Controller troubleshooting"



  • 13.  RE: IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

    Posted Apr 12, 2013 03:23 PM

    Same problems here:

    IBM v7000 Version 6.4.1.2 (build 75.0.1211301000)

    vCenter 5.1

    SRM 5.1.0-941848

    I'm running a "test" of a simple recovery plan.

    I'm getting the same exact error with IBMSVCSRA_v2.1.0.121224 and IBMSVCSRA_v2.1.0.121108.

    "Error - failed to create snapshots of replica devices

    blah blah

    Volume cannot be created"

    I am not set up for preconfigured. and set the "Test MDisk Group ID" on both sides. I have not set

    "Protect Source Vols" or "Protect Target Vols".


    I have set the Advanced Settings option: storage.Commandtimeout to 1800

    and storageProvider.hostRescanRepeatCnt to 3.

    Has anyone actually got this working yet?



  • 14.  RE: IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

    Posted Apr 12, 2013 03:35 PM

    Hi, it seems to be an IBM SRA problem, I've opened a case on IBM, I'm still waiting for an answer.



  • 15.  RE: IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

    Posted Apr 12, 2013 03:37 PM

    Can you let use know if you hear anything Vlad?

    Thanks.



  • 16.  RE: IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

    Posted Apr 12, 2013 03:44 PM

    Sure.



  • 17.  RE: IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

    Posted Apr 17, 2013 06:44 PM

    going to follow this thread. I'm very interested in the SRA for SVC / v7000. There are a lot of issues with this SRA which are not solved. They should make a better SRA soon!!



  • 18.  RE: IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

    Posted Apr 09, 2013 11:30 AM

    Hi,

    Are you using a preconfigured env. or the non-preconfigured env ?

    Thanks on advance.



  • 19.  RE: IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

    Posted Apr 23, 2013 10:57 PM

    Hi there,

    I've done a lot of testing on all kind of SRA levels, V7000 levels and SRM versions. Lots of call's at IBM, VMware and tested several SRA's from the development team and test lab in Mainz.

    Here's my working setup.

    First of all, use the NON-preconfigured environment. Don't use the preconfigured environment. Use the latest SRA version you can download at the VMware site. At the time of this reply, this is version .....24.zip

    Leave all the checkboxes empty and don't fill in the Src. Mdisk Group ID and Target Mdisk Group ID.

    Set the Test MDisk Group ID according to your V7000 setup at both sites. You might need to ad the ID feeld in the V7000 Gui to be able to read the ID..

    You can use SpaceEfficient Mode or not.

    Within SRM adjust the advanced settings at both sites.

    Set the repeat count settings at 2 and the timeout at 600

    Dont forget to change all the settings at both sites.

    Hope this helps.

    This setting works for me.



  • 20.  RE: IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

    Posted Apr 24, 2013 08:41 AM

    Thanks vfovermars,

    That's pretty much what I'm using. I still don't see how to find the Test MDisk Group ID, but "0" seems to work for me (we only have one big pool on each SAN).

    The problem now seems to be that the SAN is not creating a Flash Copy. It just sits at 0%, and the SRA waits and waits and waits.

    I've started another thread - for up to date versions. You can see it here:

    http://communities.vmware.com/message/2231160



  • 21.  RE: IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

    Posted Dec 22, 2020 11:42 AM

    Hello, Accourding to IBM installation instructions,

    "SRA only supports non-preconfigured settings for IBM SVC Stretched Cluster and IBM HyperSwap."

    Have a setup in which Global Mirror replication is used. Are you saying that if I select the non-preconfigured option, it would work?

    Using an IBM Storewize V7000 with SRM 6.5. 



  • 22.  RE: IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

    Posted May 16, 2013 09:14 AM

    Good news everyone

    IBM send me a ifix(build 75.3.1303080005, note the available version from IBM site is 75.3.1303080000) for the 6.4.1.4 firmware level for the storwize v7000.

    And I must say it works like a charm, now I can test with success all replicated volumes.

    SRM 5.02 (5.0.2.4655)

    VCENTER build 913577

    vSphere Hosts: 5.0 build 914586

    V7000 Firmware level with IFIX PATCH: 6.4.1.4 Build: 75.3.1303080005

    You must ask IBM support for this fix, it is not available on the IBM download site.



  • 23.  RE: IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

    Posted Jun 05, 2013 01:37 PM

    I am really suprised to learn some one applied this iFix!! Did your IBM storage rep tell you what happens after you've applied an iFix?

    We thought hard about it, but with the inability to move forward with standard GA's, we declined the iFix back in March...

    I'm so curious to hear more about your experience!



  • 24.  RE: IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

    Posted Jun 05, 2013 01:49 PM

    Hi, they said that the code on 6.4.1.3 and 6.4.1.4 is broke with change volumes.

    This iFix is from April.

    We are using san replication.



  • 25.  RE: IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

    Posted Jun 05, 2013 02:17 PM

    And you guys applied it without issue? That's awesome. We are very skeptical since we were told if we applied the iFix, we could no longer apply normal GA's.. Did you have any problems or down time when you applied the fix?



  • 26.  RE: IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

    Posted Jun 05, 2013 02:22 PM

    Hi,

    Yes without any issues and without any downtime. This is a fix for the 6.4.1.4 the only thing that changes is the build version.



  • 27.  RE: IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

    Posted Jun 05, 2013 02:24 PM

    Also, I want to know how its running after the 31st day of installing the iFix! Our experience has been, if it doesn't break after 30 days of running - its solid :smileywink:



  • 28.  RE: IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

    Posted Jun 05, 2013 01:34 PM

    Hello.

    With your V7000, are you using Global Mirror Change Volumes? Are you doing SAN replication to your DR site rather than using the VMware replication feature?

    If so, with the current V7000 & SRA code, there is a known bug & IBM is actively working on it. We've had a case opened with them for almost a year for this (and a few other) issues.

    Apparently there is a "misfire" between when the SRA calls on the V7000 to allow the ESXi hosts to mount the FlashCopied/Change Volume - volumes... all behavior afterwards is somewhat the same, and always slightly different.

    In IBM's latest GA, they were supposet to include the code that is available as an iFix for now... however they did not, which means there were obviously still some issues. If you call IBM to request this iFox, just be aware that once you go the iFix route, you're no longer to apply GA's standardly, you will always have a flag on your account and it will be treated differently.

    Rumor has it that the new fix will be released in the coming GA... that being said, everyone who has V7000's regardless of Global mirror change volumes will have this patch "just in case"..

    After it's released, we are going to wait a few weeks and let the more daring out there apply the latest code, once we are OK with the results we will apply it at DR but not HQ... after a few weeks of it running at DR we will then place it on our HQ site... and by the time this is all said and done, we may have out-grown our SAN all together!!

    Best of luck and I hope this helps!!



  • 29.  RE: IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

    Posted Oct 02, 2013 02:55 PM

    Great News!! (pre-apologies for the length)

    I was finally able to get ours working 100%! I'll post how & what configuration changes I made, as well as code levels.. hopefully I don't miss anything.

    vCenter: 5.1 (5.0 worked too)

    ESXi: combination between 5.0 & 5.1..

    IBM SRA: 2.2.0 (previous version worked too)

    IBM SVC (management nodes): HQ- 1.4.0.1-5c / DR- 1.4.1.1-4 (yes two different code levels)

    IBM V7000 (block level): HQ- 6.4.1.2 / DR- 7.1.0.3 (also different code levels)

    Replication Technologies & Partnerships (in our environment):

    Global Mirror (to DR) some pure Global Mirrored some Global Mirror with Change Volumes.

    We use the SAN & SQL Clusters to replicate to DR, we use SRM without vSphere replication.

    Important Items (not supported by SRM for microcode 5-6.4 on the V7000): Consistency Groups are not supported. This is not the same as remote copy relationships.. so don't remove the relationship!

    Within the SRM client, I did not change any of the default properties in the Advanced Settings (easier to troubleshoot without messing with a bunch of stuff).
    On the V7000 - - first I tested SRM on a volume that was purely a Global Mirror. One thing you will need to do, is at the DR site, create a second volume in the same pool as the volume (protected group) you will be testing on. In my case I made it a thin provision volume (must be exact size). Then I created a FlashCopy mapping from the volume (still at the DR site) aka primary volume, using the Clone Preset when creating the mapping. I've been told you can use the Backup Preset, but I haven't tested that yet. Keep all the defaults in the Advanced area in the GUI (check with your storage admin if you aren't already).

    Then it prompted for a consistency group. Going back to a publication released in March 2013, consistency groups were not supported in 6.x so I did not add the FC mapping to a CG. Also, make sure that the FlashCopy Mapping Volume is mapped to your ESXi Hosts.. offline & unmapped is no good :smileyhappy:

    Once the volume and flashCopy mappings were created I went to the SRA application. This is where it gets a little interesting.

    Something worth mentioning, because I created a FlashCopy mapping (at DR only) on the volume I was going to be testing on, using the PreConfigured check box fails almost instantly when running a test. Instead, you need to select Standard (or thin provisioned) under volume type. Then For the Test MDisk Group ID, you will actually need to put the Pool ID that the volume/volumes you'll be testing in lives. My Pool ID was 0 for this test. It's a bit misleading & the developers should change this.

    If you look at the SRA for the IBM DS8000, the wording is correct & they instruct you to use P0 or P1 and so on... not for the V7K...

    If you don't know your pool ID, then you will need to SSH into the cluster IP for your V7000 as 'root' or 'superuser'.. and run the lsmdiskgrp command. (double check that online - I could be wrong)..

    Next to each pool name in your environment, you'll see a number -  that's your Pool ID aka MDisk Group ID... after you plug that in, just click OK. Don't bother with the rest because this is just for DR testing... make sure you do the same on BOTH sites.

    Go back to vCenter & rescan your SRAs at both sites. Also make sure that if you did create a new volume or flashCopy that you rescan for Devices in the Array Manager tab.. this way it's seen by the DR side...

    Also, unless you have a network created just for SRM testing, it's recommended to make a port group that doesn't actually have any NICs attached on a distributed switch. When you run your test, make sure you edit it first, and attach each network to the SRM Test Network.. (the one without Network connectivity).

    IF you have Change Volumes in your environment or require them the test WILL STILL WORK. However, there is a HUGE catch! In our environment almost all of our volumes have Change Volumes mapped to them (lots of busy applications)...

    After the first successful test, I change the Global Mirrored volume to GM plus Change Volumes at both sites. I re-ran the test & it was a success.

    Then I went to another volume with Change Volumes that was pre-existing, created the Clone (thin provisioned volume), created the FlashCopy mapping, re-ran the rest and it FAILED...

    So bottom line is this, if SRM testing was an after thought, then you must remove all change volumes & consistency groups from your V7000 (SVC). This means stopping replication while you remove them - just to be safe.

    Once the change volume at the DR side has been removed, create the new Thin Provisioned Volume (same pool & size as the one it'll be mapped to), then create the Change Volume.

    Sorry, I know this was long but I hope it helps. I've been working with IBM for almost 9 months to get much of this resolved because initially there were code release issues that didn't support GM's with CV's and a slew of other things... but finally it's working and finally (!!!!) we can test.



  • 30.  RE: IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

    Posted Oct 08, 2013 01:29 AM

    Also, it's worth noting.. that any volumes located on the auxilary site (DR) created when running code 6.4 will not work with IBM's SRA 2.2. You'll have to re-create these volumes... which means..

    -stopping remote copy

    -break the global mirror (or metro)

    -delete the change volumes (if any) at both sites

    -remove any other flashCopy mappings at the dr site

    -recreate the primary remote copy relationship

    -add it back to a CG group

    -recreate the change volumes

    -resume replication...

    Once the remote copy is about 2%, you can run an SRM test. Do not create the Flash Copy mapping of the primary (protected) volume you are testing. The SRA will do this for you (vdisk0). Just make sure you set the SRA to use ThinProvision, and set the Mdisk group ID to the appropriate number (aka Pool ID) - ssh into the storage cluster, then run lsmdiskgrp...

    Rescan the SRAs, Rescan for devices and you're all set. I've run 3 sucessful tests today & it's awesome! Oh, for SRM tests, in the Advanced options for each site, set it so you aren't waiting for VMware Tools. recovery.powerOnTimeout = 0 for testing. Just makes it go faster rather than the default 300..



  • 31.  RE: IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

    Posted Nov 12, 2014 04:18 PM

    Hi guys, I need help!!

    My scenary is:

    Protected Site: v7000 1.4.1.0-40 / 7.1.0.1 build 79

    Recovery Site: v7000 1.4.0.1-5c / 6.4.1.2 bild 75

    I´ve metro mirror replicas, test works but recovery plan doesn´t work....I´ve this error and I don´t know why v center cannot re signature the datastore...any ideas?

    Thanks in advance!!

    Mónica



  • 32.  RE: IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

    Posted Nov 20, 2014 10:25 AM

    Hi.

    I´ve already working my SRM environment.

    I updated my SRA to 2.3 and my replicas are Metro Mirror (just 30 Kms between datacenters).

    Best Regards,

    Mónica