VMware vSphere

 View Only
Expand all | Collapse all

Cannot start the VM due to one of the snapshot is deleted

  • 1.  Cannot start the VM due to one of the snapshot is deleted

    Posted Jul 22, 2018 02:57 PM

    Dear All,

    Last week, I found one of my VM cannot work properly and found the VM host is full of disk. I would like to delete one of my snaphot to free the disk. There are totally 4 snapshots (001,002,003,004). I power down the VM and deleted the 003 snapshot. After deleting the snapshot, It can free up over 100G.

    However, when I tried to power up the VM, I found it can't start the VM and prompted "The system cannot find the file specified VMware ESX cannot find the virtual disk ".....FMC-0003.vmdk.". Even I want to consolidate the disk and it also promoted the error.

    Actually, I didn't backup the 0003.vmdk file and at this moment, I am worried the VM can't start forever as there is some important information stored.

    I would like to ask whether any method or suggestion at this moment ? Thanks all.



  • 2.  RE: Cannot start the VM due to one of the snapshot is deleted

    Posted Jul 22, 2018 03:11 PM

    So the name-000003.vmdk had a size of 100GB ?
    This is a significant loss - and you will very likely see corruption once we reassembled the incomplete chain.
    Ok - download WinSCP and enable SSH-access.
    Log in via WinSCP and navigate to the directory of the VM.
    Download the following files:
    - the vmx-file
    - name.vmdk
    -name-000001.vmdk

    -name-000002.vmdk

    -name-000004.vmdk
    Zip those 5 files and attach the archive to your next reply.



  • 3.  RE: Cannot start the VM due to one of the snapshot is deleted

    Posted Jul 23, 2018 01:03 AM

    Hi Continuum,

    Thanks for your reply.

    Sorry to tell you the remaining vmdk belongs to my client and i don’t think they will share with me and post at here.

    Actually, is there any solution that can reassemble the VM like my situation ?



  • 4.  RE: Cannot start the VM due to one of the snapshot is deleted

    Posted Jul 23, 2018 06:58 PM

    How was the snapshot deletion performed?? Was it from the Snapshot Manager tool, or did someone just go into the datastore and delete the snapshot file??



  • 5.  RE: Cannot start the VM due to one of the snapshot is deleted

    Posted Jul 24, 2018 03:49 AM

    Just go to datastore manager to delete it.

    Thanks for your help.



  • 6.  RE: Cannot start the VM due to one of the snapshot is deleted

    Posted Jul 24, 2018 04:45 PM

    :smileyplain:

    Wrong way to remove a snapshot, as you've found out... You need to do it through the snapshot manager item to do it cleanly. With it deleted, unless you have a backup of the VM, your options are very limited. You MIGHT be able to alter the vmx file to get it to use either one of the other snapshots present, or revert back to before the snapshots. BUT, any data that was included in the ones you bypass will be lost. So, any changes to the VM (any/all changes) that were included in the other snapshots will be lost.

    There's a reason why you do things the right way and don't just jump in and delete things from the datastore Willie-Nillie.



  • 7.  RE: Cannot start the VM due to one of the snapshot is deleted

    Posted Jul 24, 2018 05:27 PM

    Thanks golddiggie explaination.

    For my understanding, do you mean

    1)     in my case, I had snapshots 001,002,003,004 and I just deleted the 003 snapshot. It makes losing the snapshot chains and I can't start the VM anymore. The only way is to revert the existing snapshot. I have a question that which snapshot should I revert ? 002 or 004 ?

    2)     I remembered I have checked the vmx file and it should still pointing to 004 snapshot. Should I change the configuration which point to 002 snapshot ?

    3)     I have searched in forum and some said it also can't delete the snapshots through snapshot manager ? Actually, officially, what is the procedure to delete the snapshot in case the disk is full ? It really make me confused.

    Thanks.



  • 8.  RE: Cannot start the VM due to one of the snapshot is deleted

    Posted Jul 24, 2018 05:34 PM

    The file names are not necessarily in the order of the snapshots.

    To find out what's possible, post the output of ls -lisa in the VM's folder (from the hosts's command line), and attach the VM's  .vmsd, and .vmx file to a reply post.

    André



  • 9.  RE: Cannot start the VM due to one of the snapshot is deleted

    Posted Jul 25, 2018 02:53 PM

    Andre,

    Thanks for your help.

    Attached as the output for your analysis. Hope this can help you for the investigation. Thanks.

    Kurt Lei



  • 10.  RE: Cannot start the VM due to one of the snapshot is deleted

    Posted Jul 25, 2018 05:55 PM

    With deleting snapshot 3, you already lost ~100GB, and the size of the latest snapshot is also large ~130GB.

    Reverting to snapshot 2 will bring you back to the VM's state of March,1st, which however I would think is likely worthless for you.

    What we can try is to modify the snapshot chain, so that snapshot 4 points to snapshot 2 as its parent. This won't bring back the deleted 100GB, but it may allow you to access the file system to extract/backup important files.

    If you want to try this, the please compress/zip the following metadata files, and attach the .zip archive to a reply post.

    • FMC-000001.vmdk
    • FMC-000002.vmdk
    • FMC-000004.vmdk
    • FMC.vmdk

    Please check whether vmware-19.log contains entries for snapshot 3 (FMC-000003.vmdk). If it does, include it in the .zip archive too.

    André



  • 11.  RE: Cannot start the VM due to one of the snapshot is deleted

    Posted Jul 27, 2018 12:40 AM

    Dear Andre,

    Thanks for your help.

    As as the VM materials belong my client, and it is not suitable to post their contents to the public. So what can I check how to next steps for me ? Sorry for the inconvenience caused.



  • 12.  RE: Cannot start the VM due to one of the snapshot is deleted

    Posted Jul 27, 2018 12:52 PM

    Just to ensure this you didn't misunderstand what's needed.

    The .vmdk files are small text files, which contain details about the virtual disks' geometry (number of heads, cylinders, sectors, ...), and their names. They do not contain any userdata.

    If this is still an issue, let me know, and I will provide you with detials on how to edit these files.

    André



  • 13.  RE: Cannot start the VM due to one of the snapshot is deleted

    Posted Jul 29, 2018 03:24 PM

    Dear Andre,

    Thanks for your advice.

    Last time, my server team engineer helped me to check this case. He tried

    1)     Press "Deleted All" in the snapshot manager (Failed)

    2)     Created a file which is xxxx-003.vmdk and tried to start the VM (Failed)

    3)     Consolidate the VM (Failed)

    Finally, he tried to create another VM and point to the base vmdk (the filename without any number, the smallest size). The VM can be started. But due to some issue, it can't start at this moment. Until now, I'm not sure what's the contents after starting the VM. Just I can confirm is that that base vmdk seems the most beginning work when observing the modified date of the file.

    Kurt Lei



  • 14.  RE: Cannot start the VM due to one of the snapshot is deleted

    Posted Jul 29, 2018 03:43 PM

    The VM can be started. But due to some issue, it can't start at this moment. ...

    It looks like things become even worse.

    • Options 1 and 3 won't work with a broken snapshot chain.
    • Option 2 can be used (if done correctly), which will allow to use the virtual disk again (this is basically similar to chaining snapshot 4 with snapshot 2). However, this won't bring back the deleted data from the original snapshot 3.
    • Starting from the base disk is the worst option of all, because this will not only break another snapshot chain, but also modify data in the base .vmdk, which could lead to even more issues. I hope this was done AFTER creating a backup, and that the VM has been powered off immediately after becoming aware of the old state!?

    I'd really ask you to provide the files I've asked for. If you take a look at them, you'll see that they don't contain any of your customer's user data.

    André



  • 15.  RE: Cannot start the VM due to one of the snapshot is deleted

    Posted Jul 31, 2018 12:22 AM

    Andre,

    Thanks for your help. I will try to ask my client to provide the vmdk for me because I don’t have the files in hand. the .004 file is over 100G too. Can I post it in this post ?

    At the moment, the new VM is created and pointed to the base vmdk file already (Actually, we have backup before do this). However, we didn’t start the new VM. Actually, i have a mind that even I can start the new VM and the content will be very old and need to have many further works in the future 

    You are right, I remembered the modified date of the base vmdk is already changed to lastest.

    Kurt



  • 16.  RE: Cannot start the VM due to one of the snapshot is deleted

    Posted Jul 31, 2018 07:25 AM

    the .004 file is over 100G too. Can I post it in this post ?

    That's the size of the data file, which is not what I'm asking for.

    Each virtual disk consists of two files, a descriptor file, and the data file (flat, delta, or sesparse). What I need are only the descriptor .vmdk files, i.e. the small text files (a few hundred bytes only). The Datastore Browser unfortunately shows both files as one (with the name of the descriptor file, and the size of the data file). In order to download the descriptor files, you may enable SSH on a host, and use e.g. WinSCP.

    Here's a list of the files (from the ls -lisa output that you've posted earlier) which 'd like you to provide in a .zip archine:

    239089284      0 -rw-------    1 root root           310 Jan 19  2018 FMC-000001.vmdk

    352335492      0 -rw-------    1 root root           317 Feb  9 14:09 FMC-000002.vmdk

    432027268      0 -rw-------    1 root root           349 Jul 17 01:46 FMC-000004.vmdk

    12596868      0 -rw-------    1 root root           499 Jan 17  2018 FMC.vmdk

    16791172      8 -rw-r--r--    1 root root          1194 Jul 17 01:28 FMC.vmsd

    339752580      8 -rwx------    1 root root          3308 Jul 20 02:19 FMC.vmx

    461387396 1024 -rw-------    1 root     root 1043893 Jul 20 02:19 vmware-19.log

    I've highlighted their sizes (in Bytes).

    André



  • 17.  RE: Cannot start the VM due to one of the snapshot is deleted

    Posted Aug 01, 2018 01:05 PM

    Andre,

    I have already attached the zip for you. Thanks for your help.

    Kurt Lei



  • 18.  RE: Cannot start the VM due to one of the snapshot is deleted

    Posted Aug 01, 2018 04:12 PM

    I've modified the "FMC-000004.vmdk" file, so that it points to "FMC-000002.vmdk". This fixes the snapshot chain, but cannot bring back the ~100GB of deleted data.

    Please upload the "FMC-000004.vmdk" file to the datastore, and then - before powering on the VM - create another snapshot to be able to revert to the current state in case things are not as expected.

    Due to the lost snapshot, the VM's file system will most likely show more or less corruption, and what you should do, is to backup important data that's accessible before you start a file system check/repair within the VM.


    André



  • 19.  RE: Cannot start the VM due to one of the snapshot is deleted

    Posted Aug 03, 2018 12:08 PM

    Andre,

    Thanks for your effort again. I tried to upload and overwrite the 004.vmdk descriptor file. However, the VM still can’t be opened and it prompted ”Can’t open the disk....002.vmdk or one of the disks it depends on”. I think the snapshot chain can’t recovered.

    Finally, due to the user need me restore the application ASAP and I choose to create another new virtual machine by using the existing base FMC.vmdk. The application can be started finally. surely some of data is old-fashioned. But the inconsistent is acceptable and I can recovered myself.

    Kurt Lei



  • 20.  RE: Cannot start the VM due to one of the snapshot is deleted

    Posted Jul 24, 2018 06:48 PM

    Not using the snapshot manager FIRST was a not minor mistake. Even if you used the tool, and it didn't actually delete it (happens sometimes) at least the VM wouldn't have been looking to use it. Even IF things went very wrong, and it was, you would still have been in a decent spot. Now you've had the VM unusable for a longer period and have to fight to get it back up and running. Hopefully, it's not completely boned and you will be able to get it back online.

    I've had some issues in the past with snapshots not removing cleanly by backup software. But, those were easier to take care of.

    IF you have a second datastore on your host (or in the environment) you could try a storage migration to another datastore with the VM. That, at times, can cleanup the bad hooks into the missing snapshot. Most often, though, only if it was actually removed decently from the VMs' vmx file.

    If you have SnS, you should be reaching out to VMware support over this. If it's not an important VM, then just remove it and make a replacement.

    Also, allowing your host's storage to get FULL is not wise. If you have vCenter (or pretty sure even if you don't) you SHOULD have set up datastore alerts to warn you when it hit 75% and then ~95% consumption (yellow and then red alerts). I usually have these send off an email so that I get the message about things.

    BTW, this type of thing is why I even have more than one datastore setup on stand-alone host servers. Even if you only have a single RAID volume, set it up with two (or more) datastores so that you CAN perform storage vMotion tasks. That little bit of planning can save you a LOT of pain later. Of course, even in my home lab (with a single host right now), I'm using shared storage (NAS providing over iSCSI connection). I have a hard drive inside the host, but I use the NAS for most of the VMs. Or the larger ones so that the local drive (inside the host) won't get filled up. I also make sure that I'm not over provisioning that internal drive so that it would fill up even IF the VM(s) living there went to 100% vmdk consumption.

    IMO, treating even a home lab, or test/dev lab at work, like it's in production will save you a LOT of pain down the road. Plus (again, IMO) it's a good practice to be in treating them all like production assets.