VMware vSphere

 View Only
  • 1.  My Microsoft cluster service keeps failing

    Posted Jun 03, 2010 08:44 PM

    I have two virtual

    machines on two separate ESX 4 hosts. Each VM is running Server 2008

    Enterprise with Failover Clustering installed. SQL 2008 Enterprise is

    installed on the cluster. MSDTC and SAP are also installed on the cluster. The

    cluster has been validated and I used a VMware document to configure the

    VM's correctly. For some reason the cluster service will fail and will

    not failover. You can't even restart the service. The node has to be

    rebooted.

    Event ID 1574

    The failover cluster database could

    not be unloaded. If restarting the cluster service does not fix the

    problem, please restart the machine.

    Before that I get these

    events leading up to it. They are not in order but I left the

    timestamps on them. Any ideas would be greatly appreciated. Thank you.

    Log

    Name: System

    Source: Ntfs

    Date: 6/2/2010

    9:50:54 AM

    Event ID: 137

    Task Category: (2)

    Level:

    Error

    Keywords: Classic

    User: N/A

    Computer:

    DB1

    Description:

    The default transaction resource manager on

    volume J: encountered a non-retryable error and could not start. The

    data contains the error code.

    ---

    Log Name: System

    Source:

    volmgr

    Date: 6/2/2010 9:51:03 AM

    Event ID:

    57

    Task Category: (2)

    Level: Warning

    Keywords:

    Classic

    User: N/A

    Computer: DB1

    Description:

    The

    system failed to flush data to the transaction log. Corruption may

    occur.

    ---

    Log Name: Application

    Source:

    Application Error

    Date: 6/2/2010 9:50:49 AM

    Event ID:

    1000

    Task Category: (100)

    Level: Error

    Keywords:

    Classic

    User: N/A

    Computer: DB1

    Description:

    Faulting

    application clussvc.exe, version 6.0.6002.18005, time stamp 0x49e025d2,

    faulting module ntdll.dll, version 6.0.6002.18005, time stamp

    0x49e0421d, exception code 0xc0000006, fault offset 0x000000000003347e,

    process id 0x7a4, application start time 0x01cb01a48c7cb13c.

    -


    Log

    Name: Application

    Source: Application Error

    Date:

    6/2/2010 9:50:49 AM

    Event ID: 1005

    Task Category: (100)

    Level:

    Error

    Keywords: Classic

    User: N/A

    Computer:

    DB1

    Description:

    Windows cannot access the file

    C:\Windows\Cluster\clussvc.exe for one of the following reasons:

    there is a problem with the network connection, the disk that the file

    is stored on, or the storage drivers installed on this computer; or the

    disk is missing. Windows closed the program Microsoft Failover Cluster

    Service because of this error.

    Program: Microsoft Failover

    Cluster Service

    File: C:\Windows\Cluster\clussvc.exe

    The

    error value is listed in the Additional Data section.

    User Action

    1.

    Open the file again. This situation might be a temporary problem that

    corrects itself when the program runs again.

    2. If the file still

    cannot be accessed and

    - It is on the network, your network

    administrator should verify that there is not a problem with the network

    and that the server can be contacted.

    - It is on a removable

    disk, for example, a floppy disk or CD-ROM, verify that the disk is

    fully inserted into the computer.

    3. Check and repair the file system

    by running CHKDSK. To run CHKDSK, click Start, click Run, type CMD, and

    then click OK. At the command prompt, type CHKDSK /F, and then press

    ENTER.

    4. If the problem persists, restore the file from a backup

    copy.

    5. Determine whether other files on the same disk can be

    opened. If not, the disk might be damaged. If it is a hard disk, contact

    your administrator or computer hardware vendor for further assistance.

    Additional

    Data

    Error value: 80000011

    Disk type: 3

    -


    Log

    Name: System

    Source: Microsoft-Windows-Kernel-General

    Date:

    6/2/2010 9:50:49 AM

    Event ID: 6

    Task Category: None

    Level:

    Error

    Keywords:

    User: SYSTEM

    Computer:

    DB1

    Description:

    An I/O operation initiated by the Registry

    failed unrecoverably.The Registry could not flush hive (file):

    '\??\C:\Windows\Cluster\CLUSDB'.

    -


    Log Name:

    System

    Source: Service Control Manager

    Date:

    6/2/2010 9:50:53 AM

    Event ID: 7031

    Task Category: None

    Level:

    Error

    Keywords: Classic

    User: N/A

    Computer:

    DB1

    Description:

    The Cluster Service service terminated

    unexpectedly. It has done this 1 time(s). The following corrective

    action will be taken in 60000 milliseconds: Restart the service.



  • 2.  RE: My Microsoft cluster service keeps failing

    Posted Jun 30, 2010 03:34 PM

    I have a very similar setup. VSphere 4, MS cluster with SQL Standard.

    Getting one of the errors that you have:

    Windows cannot access the file for one of the following reasons: there is a problem with the network connection, the disk that the file is stored on, or the storage drivers installed on this computer; or the disk is missing. Windows closed the program SQL Server Windows NT - 64 Bit because of this error.

    Program: SQL Server Windows NT - 64 Bit

    File:

    The error value is listed in the Additional Data section.

    User Action

    1. Open the file again. This situation might be a temporary problem that corrects itself when the program runs again.

    2. If the file still cannot be accessed and

    + - It is on the network, your network administrator should verify that there is not a problem with the network and that the server can be contacted.+

    + - It is on a removable disk, for example, a floppy disk or CD-ROM, verify that the disk is fully inserted into the computer.+

    3. Check and repair the file system by running CHKDSK. To run CHKDSK, click Start, click Run, type CMD, and then click OK. At the command prompt, type CHKDSK /F, and then press ENTER.

    4. If the problem persists, restore the file from a backup copy.

    5. Determine whether other files on the same disk can be opened. If not, the disk might be damaged. If it is a hard disk, contact your administrator or computer hardware vendor for further assistance.

    Additional Data

    Error value: 80000011

    Disk type: 0

    The 80000011 error is mentioned here and indicates something is contending for the disk.



  • 3.  RE: My Microsoft cluster service keeps failing

    Posted Aug 31, 2010 12:36 PM

    We are having the exact same issues as the topic starter. (gmensching)

    Did you ever find an answer?

    We have a Microsoft cluster with 2 nodes running on a separate ESX 4.1 host.

    Both nodes in our cluster use 2 shared cluster disks, which are LUN's connected by Raw Device mappings.

    The two cluster nodes are running Windows 2008 SP2 x86.

    They are hosted by two different ESX hosts.

    The LSI SAS controller is being used.

    The RDM's are in Physical Compatibility Mode SCSI Bus Sharing is set to physical.

    I found out that the status code 0x80000011 maps to STATUS_DEVICE_BUSY and implies that the device is currently busy.

    So this is definitely a disk-problem. Question is what causes the device to be busy? Configuration error?

    And is there any way to know which device is causing the problem? (local C-drive or shared cluster drives)



  • 4.  RE: My Microsoft cluster service keeps failing

    Posted Aug 31, 2010 06:25 PM

    On the advise of VMware I moved the C drive of the virtual machines off of a datastore on the FC SAN and on to the VM host's local datastore and I have not had the problem since. All of the shared disk works just fine. I still haven't figured out the problem though. I'm taking a look at the fiber switch and the disks.



  • 5.  RE: My Microsoft cluster service keeps failing

    Posted Sep 01, 2010 08:35 AM

    We have no round-robin policy in Vmware enabled. (as seen in screenshot attached)

    It's a fixed path.

    We are using a Hitachi AMS1000 SAN.

    On our SAN we have an Active/Passive configuration. The active/passive state did not change during the cluster problems.

    We are not able to move the C-drive vmdk's to a local disk of the ESX host, because our ESX hosts have no local disks.

    Our ESX hosts are blade servers which boot from SAN.



  • 6.  RE: My Microsoft cluster service keeps failing

    Posted Sep 13, 2010 03:38 PM

    I think I have found my problem. When I created my configuration I missed a step. I had my shared disks using the same SCSI controller as the other disks. After changing the configuration of the shared disks to use the new controller and changing the original controller to be none for bus sharing my problems seem to have gone away. The step I missed from the VMware document is below.

    7. Select a new virtual device node (for example, select SCSI (1:0)), and click Next.

    NOTE This must be a new SCSI controller. You cannot use SCSI 0.



  • 7.  RE: My Microsoft cluster service keeps failing

    Posted Sep 13, 2010 03:47 PM

    Hi,

    Thanks for the follow-up!

    I just figured out the same solution and I am configuring it as we speak.

    I had also used only 1 SCSI controller for all disks (OS .vmdk file AND shared mapped raw lun's)

    Once it has been tested, I'll let you know if it also fixed my problem.

    I have now 2 SCSI controllers:

    SCSI controller 0 is used for the OS .vmdk, and SCSI Bus Sharing is set to None.

    SCSI controller 1 is used for all mapped raw lun's, and SCSI Bus Sharing is set to Physical.



  • 8.  RE: My Microsoft cluster service keeps failing

    Posted Jan 05, 2011 07:36 AM

    Hi,

    thnx for sharing your valuable info.

    currently i am looking for the information/document to configure MSCS on my ESX 4.0\4.1 version.

    will pls help me on that...if you have any document/info. pls share .

    my email id is :- villykaras@gmail.com

    Thnx and Regards,

    Valerian Crasto.



  • 9.  RE: My Microsoft cluster service keeps failing

    Posted Jan 05, 2011 07:38 AM

    Hi,

    looks like the question is marked as answered. do you mind sharing how you resolve? i'm keen to know. Thanks. :smileyhappy:



  • 10.  RE: My Microsoft cluster service keeps failing

    Posted Aug 31, 2010 03:24 PM

    MSCS does not support SAN multipathing, are you using PowerPath/VE or round-robin policy for Quorum/Witness LUNs?

    If yes, create claim rule to enable NMP for Quorum/Witness LUNs and configure NMP to use MRU or Fixed policy.

    http://v-reality.info