VMware vSphere

 View Only
  • 1.  Help with Windows 10 Guest Crash with Multiple GPU Passthrough

    Posted Jun 17, 2017 07:31 PM

    I am using ESXi 6.5 and having trouble using PCI passthrough of four GPUs to a single Windows 10 guest VM.  They seem to pass through OK initially, before Windows has installed the driver for them.  I can see the four GPUs in Device Manager with the little yellow indicators for the drivers not being installed.  Once the drivers are installed and Windows starts recognizing them, it crashes.  Subsequent startups will crash right after displaying the initial Windows splash screen and before getting to the login screen.  Removing all GPUs except for one will work, and the crashing stops.  I have tried each GPU individually and they all work with no crashing.  I can pass one GPU each to two individual VMs and run them at the same time with no problems.  The crashing begins as soon as I try to pass a second GPU to one virtual machine.  Bypassing ESXi entirely and using Windows 10 as the main OS I am able to use all four GPUs, so I think perhaps I have an incorrect setting.

    My hardware is:

    Supermicro X10SRL-F motherboard, Intel Xeon 2620 V4 CPU, 32 GB RAM, 4x Radeon RX580 GPUs.

    I have enabled above 4G encoding and configured my motherboard BIOS settings according to this article from Nvidia (It's not Radeon but I figured it would apply to my application as well).

    MMIOHBase = 2T

    MMIO High Size = 256G

    All PCIe oprom = EFI

    Onboard LAN oprom type = EFI

    Incorrect BIOS settings on a server when used with a hypervisor can cause MMIO address issues that result in GRID GPUs f…

    I have also added the following lines to my Windows 10 VM configuration:

    firmware=“efi"

    pciPassthru.use64bitMMIO="TRUE"

    according to the KB article I found here:

    VMware vSphere VMDirectPath I/O: Requirements for Platforms and Devices (2142307) | VMware KB

    Does anyone have an idea of what might be happening here?



  • 2.  RE: Help with Windows 10 Guest Crash with Multiple GPU Passthrough

    Posted Jun 18, 2017 10:36 PM

    It seems that since v6 vmware broke something with GPU pass through and hasnt said anything about it.

    Re: ESXi 6.5 host freezing/crashing when shutting down VM with GPU passthrough

    The steps you need to take is try to boot into safe mode and install the drivers then. The problem is once they are installed, as soon as you reboot your guest VM, your ESXi is going to hard freeze and reboot

    Here are steps a user used to get an R9 fury working (applies to v6.*)
    ESXi 6.0 and AMD R9 Fury Working Procedure | [H]ard|Forum

    1. Install ESXi 6.0

    2. Directly attach monitor to GPU (using DisplayPort in my case)

    3. Boot up ESXi

    4. Add R9 Fury GPU adapter to DirectPath I/O Configuration screen in vSphere Client. Note that you will be forced to add both the video and audio adapters.

    5. Reboot ESXi

    6. Add new Win10 64-bit VM.
      1. (Optional) Dedicate 32GB RAM to VM (optional - doesn’t seem to matter if RAM is dedicated or not)

      2. (Optional) EFI Boot Mode (optional - can be BIOS as well, doesn’t seem to matter)
    7. Install Windows 10

    8. Install VMWare Tools

    9. (Optional) Do all Windows Critical Updates (optional - can also do later but rebooting VM is a pain since VM doesn’t seem to release the GPU on VM reboot, causing VM to freeze before login screen)

    10. Shut down VM

    11. (Optional) Take VM snapshot (to make troubleshooting easier if required)

    12. Boot Win10 VM up

    13. Either
      1. Disable windows driver search (Right click This PC, Properties, Advanced System Settings, Hardware tab, Device Installation Settings, No, Never Install, Uncheck “Automatically get device app info…”

        OR


      2. Install Display Driver Uninstaller v15.3.1, run it, allow it to reboot to safe mode. Once in safe mode, do a AMD Driver uninstall by clicking “Clean and Restart”. DDU disables automatic updates of windows device drivers, effectively doing the same thing as item a)
    14. Shut down Win10 VM

    15. Add R9 Fury video adapter as a PCI Device to VM in VM Settings

    16. Add R9 Fury audio adapter as a PCI Device to VM in VM Settings

    17. Start up Win10 VM

    18. Once Win10 boots up, device manager should still show a yellow exclamation mark on a generic Microsoft Basic Display Adapter (in other words, should not say AMD Radeon quite yet)

    19. Wait ~2 minutes or so to make sure Windows isn’t attempting to update drivers in background

    20. Install AMD Drivers. You can install all of AMD’s bloatware as well, no need to exclude anything). My Sapphire GPU has a blue LED that turns on when it is active - that turns on here, and DP-connected monitor turns on
      Driver versions Crimson 15.12, 16.3.1, and 16.3.2 all tested with my procedure here, all seem to work.

    21. Once driver installation completes, shut down Win10 VM

    22. Reboot ESXi physical host (this will free up the GPU since the blue light turns on during driver installation effectively consuming the GPU resources)

    23. Start up Win10 VM using vSphere
      1. Open VM Console window in vSphere, click to give focus.

      2. This vSphere console window is the “secondary” monitor, and your directly-connected monitor is set up as the “primary monitor”. Move mouse around in vSphere console and see if you can get the mouse over to the primary monitor. Cursor should weirdly disappear and then appear on primary monitor when crossing over from right half of console window to left hand side of console window

      3. Click to enter login/password

      4. Log in and have fun!
    24. Make a mental note that you can’t reboot the VM (only). If you have to reboot the VM, you have to reboot the entire ESXi physical machine.

    25. If you want, re-enable automatic update of Windows device drivers (it got disabled in step 13 above)


  • 3.  RE: Help with Windows 10 Guest Crash with Multiple GPU Passthrough

    Posted Jun 20, 2017 03:34 AM

    Thank you for the suggestion!  I just got done following it to the letter.  Fresh Win 10 install.  Installed all critical updates and guest additions before doing any PCI passthrough.  Turned off the Win 10 device updates and shut down VM.  Pass through 4 GPUs.  One of them is connected to a display.  Start up VM and everything is fine, although the little spinning dots at startup are spinning much faster than normal.  Not sure what that means. I'm starting to recognize fast spinning dots with imminent crash.  When I only have one GPU connected the dots spin at normal speed.  More than one GPU and they spin really fast.

    So I'm back at the desktop and I see the four GPUs in device manager with the yellow indicators for not having drivers.  Waited a couple minutes to make sure Windows isn't installing its own driver.  Then I downloaded and installed the latest Radeon driver for the RX580.  During installation I watched the device manager.  The screen flickers each time it installs a driver for a graphics card.  After the first flicker I could see in the device manager that one card was configured and three to go.  The second flicker never came back.  Just hard crash. 

    Let me just mention that the host is fine throughout all of this.  The PCI passthrough seems to be working and the host is stable.  The VM just continues to crash at the second GPU.  Is it possible that ESXi is mapping the second GPU incorrectly to the VM somehow?  Would there be some way of checking this?



  • 4.  RE: Help with Windows 10 Guest Crash with Multiple GPU Passthrough

    Posted Oct 01, 2017 03:07 PM

    I have been doing AMD GPU passthrough for a couple of years and have run into many of the issues (probably more) than the average user.  I read your post and am curious if you manually set the pcihole.start and pcihole.end if that would solve your VM reboot issue. 

    Even though the newer versions of ESXi have the dynamic setting and it is supposed to handle it for you, I found that if I removed the "pciHole.dynStart = "3072"" entry and set it then I can reboot at will.  This allows me to upgrade my AMD drivers install normally.

    Currently my settings are:

    pciHole.start = "1200"

    pciHole.end = "4040"

    My setup:

    HPE ProLiant DL380p Gen8

    2 x Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz

    256GB RAM

    Windows 10 Pro (all updates)

    Sapphire Radeon NITRO+ Rx 480 8GB GDDR5 256 bit PCI Express 3.0 x16 Graphics Card

    I'd be interested to know if you are able to fix the reboots...

    Steve



  • 5.  RE: Help with Windows 10 Guest Crash with Multiple GPU Passthrough

    Posted Oct 01, 2017 08:12 PM

    Good to know, I may have to try this again then. I also had crashing when passing through some QLogic BCM5709C dual port NIC's, perhaps it may fix that as well!

    Will try to test out as soon as possible and report back. I would still love to use this single rig for everything.



  • 6.  RE: Help with Windows 10 Guest Crash with Multiple GPU Passthrough

    Posted Jan 05, 2018 02:57 AM

    Hi,

    Thank you pci.hole settings fixed my issue with BSOD for ESXi host with GPU Passthrough to VM Win7.



  • 7.  RE: Help with Windows 10 Guest Crash with Multiple GPU Passthrough

    Posted Oct 01, 2017 05:09 PM

    I had a similar issue with trying to add a Quadro k620 card to a win10 pro VM. took me a month or so to find anything it passed through ok and the spinning dots did zoom faster than usual but the yellow mark on the device manager next to the gpu was annoying! I found a article about adding a config setting to the VM which was “hypervisor.cpuid.v0 = FALSE“ after I did that and booted the machine it worked straight away installed the drivers and premier pro and Autodesk use the Quadro card. Maybe try that or try one card maybe it doesn’t like 4 gpu‘s on one VM what ever you need 4 for.