ESXi

Expand all | Collapse all

SD Boot issue Solution in 7.x

einstein-a-go-go

einstein-a-go-goJul 16, 2021 02:48 PM

vivithemage

vivithemageJul 16, 2021 08:28 PM

  • 1.  SD Boot issue Solution in 7.x

    Broadcom Employee
    Posted Jun 10, 2021 01:39 PM

    Issue The host goes into an un-responsive state due to: "Bootbank cannot be found at path '/bootbank” and boot device is in an APD state.

    This issue is seen due to the boot device failing to respond & enter APD state (All paths down). Some cases, Host goes to non-responsive state & shows disconnected from vCenter.

    As of 7.0 Update 1, the format of the ESX-OSData boot data partition has been changed. Instead of using FAT it is using a new format called VMFS-L. This new format allows much more and faster I/O to the partition. The level of read and write traffic is overwhelming and corrupting many less capable SD cards.

    We have come across lot of customer’s reporting bootbank errors (host booting from SD cards) and host going into un-responsive state in ESXi version 7. 

    Our VMware engineering team is gathering information for a fix, there is a new vmkusb driver version available for testing. There is currently a workaround in place, which is to install version-2 of vmkusb driver and monitor the host.

    The action plan for future resolution would be to replace the SD card/s with a capable device/disk. Per the best practices mentioned on Installation guide. 

    The version 7.0 Update 2 VMware ESXi Installation and Setup Guide, page 12, specifically says that the ESX-OSData partition "must be created on high-endurance storage devices".

    https://docs.vmware.com/en/VMware-vSphere/7.0/vsphere-esxi-702-installation-setup-guide.pdf

    You can also refer to the below KB:

    Reference: https://kb.vmware.com/s/article/83376?lang=en_US

    Resolution

    VMware engineering has a fix that will be in the next release of 7.02 P03 which is planned for sometime in July 2021.



  • 2.  RE: SD Boot issue Solution in 7.x

    Posted Jun 10, 2021 02:53 PM

    What about systems with local SD boot, but /scratched placed on a remote datastore?  This would relieve some of the SD I/O wouldn't it? Or is it not enough to make a difference?  That's the default installation we use.



  • 3.  RE: SD Boot issue Solution in 7.x

    Broadcom Employee
    Posted Jun 10, 2021 03:26 PM

    Whats the make/model of the SD card (or) how old are your SD cards? 

    Whats your current environment and upgrade plans?



  • 4.  RE: SD Boot issue Solution in 7.x

    Posted Jun 10, 2021 04:15 PM

    Hi,

    Even you move the .locker to a datastore(that should be the best practices if using SD Cards) we still get the issue with Update 2.



  • 5.  RE: SD Boot issue Solution in 7.x

    Posted Jun 10, 2021 03:12 PM

    Hi,

    Finally some information (not official for this big issue).

    Meanwhile, I wrote some workaround steps so that customer can get their servers back on without the need to reboot VMs.

    https://www.provirtualzone.com/vsphere-7-update-2-loses-connection-with-sd-cards-workaround/

    LP



  • 6.  RE: SD Boot issue Solution in 7.x

    Posted Jun 10, 2021 04:17 PM

    Any ideas where to get version-2 of vmkusb driver?



  • 7.  RE: SD Boot issue Solution in 7.x

    Posted Jun 14, 2021 02:26 PM

     wrote:

    Any ideas where to get version-2 of vmkusb driver?


    Only from VMware support can provide you that vmkusb driver.



  • 8.  RE: SD Boot issue Solution in 7.x

    Posted Jun 15, 2021 06:43 PM

      Only from GSS, and I have had zero luck getting them to give it to me, despite being part of a VERY large company with an ELA.  Good luck!



  • 9.  RE: SD Boot issue Solution in 7.x

    Posted Jun 15, 2021 06:46 PM

    Thanks I spoke with  offline and he confirmed that GSS was only using it to isolate the issue and is not part of a workaround.  The true fix is coming in a feature patch release in the near future.



  • 10.  RE: SD Boot issue Solution in 7.x

    Broadcom Employee
    Posted Jun 15, 2021 06:52 PM

    Correct , GSS does NOT want the workaround to be used as a perm fix and recommending the install of P3 in July sometime.



  • 11.  RE: SD Boot issue Solution in 7.x

    Posted Jun 10, 2021 05:58 PM

    Honestly...there should've been more publicity on this. The Guide you reference also states on pg. 16 SD cards *can* be used. So maybe that should be removed. You all (VMware, that is) promoted SD cards back in the day so heavily, and rightfully so. They're awesome! The install was small (still is); SD is fast (generally); with the exception of no disk redundancy, it's a great way to run ESXi. So ok....VMW changes things up a bit for their boot partitions...fine. Tech changes. We technologists get that. But a LOT of orgs run ESXi on SDs. As such, this change should've been made very public suggesting orgs to work towards moving away from SDs; and if not, what the repercussions would be. Just allowing orgs/customers who use SDs have Hosts go down is pretty crappy...unless of course you all didn't do due diligence QA'ing and notice Hosts crashing.



  • 12.  RE: SD Boot issue Solution in 7.x

    Posted Jun 11, 2021 09:22 AM

     wrote:

    Honestly...there should've been more publicity on this. The Guide you reference also states on pg. 16 SD cards *can* be used. So maybe that should be removed. You all (VMware, that is) promoted SD cards back in the day so heavily, and rightfully so. They're awesome! The install was small (still is); SD is fast (generally); with the exception of no disk redundancy, it's a great way to run ESXi. So ok....VMW changes things up a bit for their boot partitions...fine. Tech changes. We technologists get that. But a LOT of orgs run ESXi on SDs. As such, this change should've been made very public suggesting orgs to work towards moving away from SDs; and if not, what the repercussions would be. Just allowing orgs/customers who use SDs have Hosts go down is pretty crappy...unless of course you all didn't do due diligence QA'ing and notice Hosts crashing.


    It's also worth mentioning that Dell, while not recommending SD cards anymore, never stated that they're not supported anymore:
    "The Boot Optimized Storage Solution (BOSS) card is the preferred non-HDD or SSD device for VMware ESXi 7.0 installation. The Dell Internal Dual SD Module (IDSDM) install is no longer recommended due to write endurance issues with the SD flash media."

    Source: https://www.dell.com/support/manuals/de-de/vmware-esxi-7.x/vmware_esxi_7.0_gsg/getting-started-with-vmware-vsphere?guid=guid-c18ba369-c295-40ea-b289-f82b4cd5270a



  • 13.  RE: SD Boot issue Solution in 7.x

    Posted Jun 11, 2021 12:32 PM

    Thanks for sharing "sysadmin84". I've always used HPE servers, until last yr. For our refresh cycle last yr, we got a good deal on DELLs and I really like them. Since v7 was out then, according to you, DELL reps who built my server specs should've probably known this and added those modules instead of the IDSDMs. But of course they didn't. Communication is a wonderful thing...and maybe there needs to be more of that with respect to this issue amongst all the h/w vendors, and recommendations to customers for boot h/w moving forward.

    Cheers!



  • 14.  RE: SD Boot issue Solution in 7.x

    Posted Jun 11, 2021 04:11 PM

    Dell or HPE still continues to sell the Servers with SD Cards for vSphere 7, regardless of the statement on that document.

    Just talk to a vendor and he will inform you nothing about that and will sell you the servers anyway with that configuration. But besides that, the problem for most of the customers is not the new ones, but the thousands of ESXi hosts that are installed today with SD Cards.

    And changing that type of configuration from SD cards to local disks or whatever is not cheap. For hundreds of servers, this change will cost thousands to any company.



  • 15.  RE: SD Boot issue Solution in 7.x

    Posted Jun 15, 2021 06:58 PM

    Recent client took delivery of Dell EMC R740 with Dell IDSM modules, Client refused to pay the bill, and ask Dell to collect them all, unless Dell EMC upgraded BOSS for FREE!

    oh Dell came through with free upgrades!



  • 16.  RE: SD Boot issue Solution in 7.x

    Posted Jun 15, 2021 07:02 PM

    Yes, no longer recommended, our Clients loved that print out from Dell Engineers that visited a recent delivery of new Dell EMC R740 !

     

    with IDSM modules!



  • 17.  RE: SD Boot issue Solution in 7.x

    Posted Jun 15, 2021 06:45 PM

      Agree with everything you said in your post about the historical push to simplify installs by using SD/USB and removing spinning rust failure points.



  • 18.  RE: SD Boot issue Solution in 7.x

    Posted Jun 15, 2021 06:56 PM

    I would like to know of all the existing client installs of 6.5 and 6.7 on SD cards, using Dell IDSM and HPE Mirrored technology, are UPGRADES to ESXi 7.0 now NOT SUPPORTED or BROKE.

    Basically new server, or NEW VIRGIN INSTALL time ?

    let me know ?



  • 19.  RE: SD Boot issue Solution in 7.x

    Posted Jun 15, 2021 07:01 PM

    I bought new DELLs with IDSDM & installed v7. All good. I upgraded via vLCM to v7U1c and still all was (is) good. The problem seems to be specific to v7U2a. If you have yet to pay the bill, I wouldn't until DELL replaces the IDSDMs with modules geared towards the new I/O requirements of vSphere



  • 20.  RE: SD Boot issue Solution in 7.x

    Posted Jun 15, 2021 07:03 PM

    Dell EMC sent out Engineers to replace all the IDSM modules, with BOSS for FREE on two sites, and it took the engineers a week!

    But I'm not allowed to mention the clients name!



  • 21.  RE: SD Boot issue Solution in 7.x

    Broadcom Employee
    Posted Jun 15, 2021 07:16 PM

    You need to verify that your SD cards are "High Endurance"

    The fastest UHS-I microSD cards are the U3-rated Extreme PLUS line, which offer maximum read speeds of 100 MB/s and maximum write speeds of 90 MB/s, and are available in capacities of 32GB, 64GB, and 128GB. 

    UHS Speed Class

    The next speed class up is the UHS (Ultra-High Speed) Speed Class and it’s denoted with the “U” symbol. There are two ratings within the UHS Speed Class:

    • U1 (UHS Speed Class 1): minimum write speed of 10MB/s
    • U3 (UHS Speed Class 3): minimum write speed of 30MB/s

    The UHS Speed Class is more commonly used nowadays than the Speed Class and many high-end cameras require at least a U3-rated memory card for many of its functions, such as recording high-resolution videos. The UHS Speed Class mainly refers to the minimum sustained write performance for recording videos and came about due to 4K-capable video recording devices needing faster write speeds. As a rule of thumb, 4K-capable recording cameras will usually require at least a U3-rated SD card.

    What makes the U1 and U3 memory cards more advanced than those in the Speed Class are that they use one of two UHS bus interfaces:

    • UHS-I: theoretical maximum transfer speeds up to 104MB/s
    • UHS-II: theoretical maximum transfer speeds up to 312MB/s

    Both U1 and U3 memory cards can utilise the UHS-I bus interface, but are not compatible with the UHS-II bus interface.

    These UHS bus interfaces indicate the theoretical maximum read and write speeds, unlike the sustained write speeds of speed classes. The UHS bus interfaces are denoted by a Roman numeral “I” or “II” symbol on the front of the card. The bus speeds refer to the theoretical data transfer rate of the interface itself while a U3-rated SD card has its own sustained write speed of 30MB/s. For example, a UHS-I U3-rated card guarantees a write speed of 30MB/s but has the potential for a read and write speed of up to 104MB/s if used with a device that supports a UHS-I bus interface.

    A UHS-II compatible card has a potential read and write speed of up to 312MB/s. The UHS bus interfaces are backwards compatible so you can use a UHS-II card in a device that supports UHS-I, but you won’t see the speed benefits of UHS-II as the card will default back to the lower specs of UHS-I. Both the card and bus interface must be fully compatible to experience the speed benefits.

     



  • 22.  RE: SD Boot issue Solution in 7.x

    Posted Jun 15, 2021 09:15 PM

    this is fantastic!  Are you aware of any way to programatically retrieve this information (or in fact any information at all - mfr, serial# model # etc.) from the SD or Micro-SD media that is installed in a server without physically removing the card to look at it?  The lsusb -v command only provides details about the USB hub/reader device, not the media inserted in it and I'm unaware of any other commands that might return this type of information. It's not visible in iLo, and my quotes from BL460c Gen9 servers purchased in 2016 and SY660 Gen10 servers just purchased 2021 show the same vendor P/N, although they are most certainly NOT the same physical card.  Last time I worked on one of my HPE Synergy SY660 compute modules I did take a picture of the card - thanks to your info now I know what all the symbols and numbers mean ; - ) Thank you!

    HPE_32GB_microSD.jpg



  • 23.  RE: SD Boot issue Solution in 7.x

    Posted Jun 15, 2021 09:28 PM

    this is a good read at identifying SD/MicroSD cards

     

    https://www.bunniestudios.com/blog/?p=2297



  • 24.  RE: SD Boot issue Solution in 7.x

    Broadcom Employee
    Posted Jun 16, 2021 01:19 PM

    I am checking on commands/a way to validate - but check out this link

    http://partnerweb.vmware.com/programs/server_docs/Approved%20Flash%20Devices.pdf

     



  • 25.  RE: SD Boot issue Solution in 7.x

    Posted Jul 08, 2021 02:07 PM

    So, these SD cards \ flash devices are certified to run esxi 7.02 and beyond?  If so we'll look into purchasing something off of this list.



  • 26.  RE: SD Boot issue Solution in 7.x

    Posted Jul 11, 2021 05:12 PM

    It's very vaugue and a grey area, VMware just state "huigh endurance flash" - whatever that means ?

     

    and we've had that fail!

     

    Is it speed, or is it that the writes overwhelm the media.....



  • 27.  RE: SD Boot issue Solution in 7.x

    Posted Jun 15, 2021 09:20 PM

    Thanks for the SD specifications, as a video photographer I'm aware of the specifications.

    BUT, what date did this become a requirement?

    How does any Client/Architect/installation know, purchasing a server from a Vendor, e.g. DELL EMC or HPE, which gets purchased for use with ESXi 6.5, 6.7 or 7.0 and comes pre-installed using an IDSM, are we supposed to remove the server from the rack, and open the server to expose the SD cards in use ?

    But how would we know that a Dell or HPE branded SD card, with ESXi pre-installed meet these requirements ?

    Does this also mean that an upgrade to ESXi 7.0 is OFF THE TABLE, e.g. in-place upgrade of any system, which does not meet these requirements, and when is VMware going to publish a VMware KB (unless they already have) for me to distribute to ALL Clients tomorrow, reminding them they will have to purchase

    1. New servers with BOSS or SATADOM, M2, NVMe

    2. Upgrades required for existing HCL based hardware, because the solution in place no longer meets the SD requirements, although those servers will still be on the HCL.

    68% of our clients are using SD cards because that's what was sold since 2004, and ESXi Embedded installations!



  • 28.  RE: SD Boot issue Solution in 7.x

    Broadcom Employee
    Posted Jun 16, 2021 01:14 PM

    I am not sure of the exact date I will see if I can dig up any docs/details, but the "recommendation" from this KB (updated Feb 2021) 

    What is the recommendation if you already have these older devices?

    • We recommend to install larger boot media. You should consider moving from USB/SDCard devices because high performance devices are required for predictable application behavior, some of them requiring larger and more reliable storage. Server OEM vendors to ensure the device meets the required endurance parameters provided in the guidance documentation.
    • https://kb.vmware.com/s/article/82515 


  • 29.  RE: SD Boot issue Solution in 7.x

    Posted Jun 20, 2021 12:02 PM

    Hello,

    replacing all boot devices in all servers definitely is not an option for us and also for many others, I think. I need a quick solution and I don't like to experiment with any old drivers or work-arounds in a production environment. I wonder if relocating scratch (KB 1033696) and ProductLocker (KB 2129825) to a shared "capable" disk/LUN would solve the problem for "normal" SD boot devices.

    __Leo



  • 30.  RE: SD Boot issue Solution in 7.x

    Posted Jun 24, 2021 03:09 PM

     wrote:

    Hello,

    replacing all boot devices in all servers definitely is not an option for us and also for many others, I think. I need a quick solution and I don't like to experiment with any old drivers or work-arounds in a production environment. I wonder if relocating scratch (KB 1033696) and ProductLocker (KB 2129825) to a shared "capable" disk/LUN would solve the problem for "normal" SD boot devices.

    __Leo


    Unfortunately, the workaround is all we got for now and is the only way to recover the server without the option to do a hard reboot and restart all VMs running on it. I also will not apply any old, or beta drivers in my production environment.

    Regarding your question: First, that should be set if you are using SD Cards. Scratch and locker should not run in SD Cards. The best practice is that they should run in a datastore (local os SAN, iSCSI not recommended).

    By moving this will not fix your problem, but it will reduce the time this bug can trigger.



  • 31.  RE: SD Boot issue Solution in 7.x

    Posted Jun 25, 2021 06:39 AM

    It is a shame, that such important things are not published properly (but seen from the side of VMware, this is understandable. Imagine the rumor IF they make an official announcement that SD Cards are no longer supported).

    At the moment we have a test Host running on HPE SD with redirected ProductLocker, Scratch and syslog. It is running fine for 7 days now. Hope this will stay so.

    But this mess does not end, we found out that a bunch of our Hosts are using a Disk Controller that with vSphere 7 is no longer supported: B140i and vSphere 7 

    So we HAD Hosts with "High Endurance Storage" but had to insert SD Cards to be able to install vSphere 7 (clearly HPEs fault). And now VMware is telling us, that with U2a SD Card is no more the best practice? The Hypervisor with the smallest possible footprint? What in Gods name is so important, that you have to read it a million times from local media instead of loading this c*** inside the memory of the Host since we have hundreds of GB of RAM. Get your things together.



  • 32.  RE: SD Boot issue Solution in 7.x

    Posted Jun 25, 2021 06:48 PM

    Based on VMware's info, is it safe to say the best course is to use a read-intensive SSD?  Or would mixed-use be the better bet?



  • 33.  RE: SD Boot issue Solution in 7.x

    Posted Jun 30, 2021 03:27 PM

    Even if this problem goes away, it seems Dual SD modules are no longer recommend so is it best to move to a BOSS card with two M.2. or add two SSDs to the raid controller supporting my data store?



  • 34.  RE: SD Boot issue Solution in 7.x

    Posted Jun 30, 2021 09:04 PM

    Our solution was a bit more radical but long lasting. We bought hard drives and reinstalled ESX on six hosts and blow-torched the SDs. They were nothing but misery. Dell took us down a dark alley.



  • 35.  RE: SD Boot issue Solution in 7.x

    Posted Jul 02, 2021 09:03 AM

    SD-Failure.PNG

    two servers failed now, after 46 hours, using High Endurance MicroSD cards as per specification!



  • 36.  RE: SD Boot issue Solution in 7.x

    Posted Jul 07, 2021 10:58 AM

    Our second host just failed as well after ~2 months (Dell r740 with IDSM (Dell SD cards)). I now ordered a couple of BOSS cards since I can't keep waiting on a patch anymore.



  • 37.  RE: SD Boot issue Solution in 7.x

    Posted Jul 10, 2021 03:40 PM

    Dell won't let me order a Boss card because it's on backorder.  They blame it on a chip shortage but it's probably because of this SD card bug.  Going to open a support ticket with Dell and complain.  7.02 should have been pulled.  All it does is brick systems yet they still have it out there to download.  Time to migrate off vmware??



  • 38.  RE: SD Boot issue Solution in 7.x

    Posted Jul 10, 2021 06:04 PM

    Got the same info from my seller, 3 months lead time on BOSS cards. Since our servers are diskless, we'll have to setup boot from SAN.

    I'm still wondering though: Will SD cards be ok again with the newest patch or will it just slow down the problem. VMWare will hopefully make this clear.



  • 39.  RE: SD Boot issue Solution in 7.x

    Posted Jul 10, 2021 06:26 PM

    Hello.
    I have been following this post and others because of the serious problems of using SD Card as Boot device. Which apparently are more critical in version 7.
    I remembered that when working with version 4, they started to use USB key (4Gb or 8GB) as Boot device without problems, maybe this can be an alternative, to solve the problem with a patch and/or change the SD card by internal mechanical disks.

     



  • 40.  RE: SD Boot issue Solution in 7.x

    Posted Jul 11, 2021 05:15 PM

    the issue is the wear level of the "flash technology" or the quality of the flash technology.

    This would apply to all things flash.



  • 41.  RE: SD Boot issue Solution in 7.x

    Posted Jul 11, 2021 05:21 PM

    We are currently checking ESXi versions old an new to compare read and write cycles to SD/microSD because we have connected up logic analysers to servers whilst ESXi is running in realtime to debug and collect data.

    One thing is if ESXi 7.0.2a is wearing out microSD/SD cards which are certified for 4K and 8K video transfer at 60fps, then ESXI *MUST* be doing some very heavy writing to the media, and considering we've always been SOLD, it's called "'" for Embedded, and goes memory resident, it's doing some serious writing, and would not be long before SATA M2. SSD, BOSS cards are also worn out!

    They have better wear, but the lifetime will be reduced significantly.

    IMG_9399.JPG

    IMG_9400.JPG

    and before anyone gets funky in the thread WHY !!!! Because we can !!!!! That's what we do best Embedded Electronic Debugging!!!! 



  • 42.  RE: SD Boot issue Solution in 7.x

    Posted Jul 11, 2021 05:49 PM

     wrote:

    One thing is if ESXi 7.0.2a is wearing out microSD/SD cards which are certified for 4K and 8K video transfer at 60fps, then ESXI *MUST* be doing some very heavy writing to the media, and considering we've always been SOLD, it's called "'" for Embedded, and goes memory resident, it's doing some serious writing, and would not be long before SATA M2. SSD, BOSS cards are also worn out!


    Agreed. If this is expected behavior with ESXi 7, why is VMWare working on a patch? We have a small environment, we write our logs to our SAN, ESXi runs from ram, we use Dell branded SD cards. How much IO can there be to corrupt them? From what I've read the majority is from the clustering service (vCLS) VMs .



  • 43.  RE: SD Boot issue Solution in 7.x

    Posted Jul 12, 2021 01:17 PM

    this debug analysis looks really interesting, although I admit I have no idea what I'm looking at on your real-time graph - can you explain a bit about the visualization, and have you been able to run old/new versions of ESXi to compare?  I'm also interested in your statement regarding I/O to the boot device that you've read that the "majority is from the clustering service (vCLS) VMs".  I have suspected vCLS played a role as well - in fact I even suggested this very thing in one of my comments on  's blog article about this subject - but I haven't seen that written anywhere else. Can you point to your sources for that information?  I'd like to read more.



  • 44.  RE: SD Boot issue Solution in 7.x

    Posted Jul 12, 2021 01:26 PM

    Patrick, the info about the cluster service VMs is unfortunately only anecdotal, I have seen multiple people point to this across the threads I've been following. Here's one comment from a Reddit thread: "In environments with HA the little heartbeat vms write a lot so it kills SD cards. Ask me how I know....."

    https://www.reddit.com/r/vmware/comments/nn1src/careful_when_upgrading_to_702_if_you_have_your/



  • 45.  RE: SD Boot issue Solution in 7.x

    Posted Jul 12, 2021 01:45 PM

    But why would the vcls machines write on the SD Cards? 8 or 32 GB Cards do not provide a datastore because they do not have enough space. This little annyoing things are spreading across all datastores but I have never seen one running on SD since this should be impossbile.

    Our Testhost with 7.0 U2a on SD runs now for 24 days without issues.



  • 46.  RE: SD Boot issue Solution in 7.x

    Posted Jul 12, 2021 05:33 PM

     

    Sure I'll explain, we can inspect the data in real time being read and written to the media, either microSD or SD card in the server, whilst ESXi is live and running.

    The logic analyser can decode the data in real time, and show us the data as HEX or ASCII, we can comapre and contrast what the OS is doing read and write with the media, we can then compare this with other versions of ESXi, at present we are comparing 

    ESXi-7.01 Build 17551050 HPE

    ESXi-7.0U2a Build 17867351 HPE

    We are using identical servers, and the same high endurance MLC media which was used when these servers failed after 46 hours!

    We've used fresh new media, and re-installed both of the above, and we are now testing and comparing logs, at present these are standalone, e.g. not connected to vCenter Server or have any VMs running, but with info in this thread reference to vCLS, we will examine this nature as well.



  • 47.  RE: SD Boot issue Solution in 7.x

    Posted Jul 12, 2021 05:45 PM

    @PatrickDLong 

    I've just had a look at that link to Reddit and Blog, I think there are mixed issues here, the BOOTBANK missing issue, seems different to corruption, or are they one and the same!

    This is not what we have seen we have seen ESXi OS damaged and SD cards failing e.g. due to hardware sectors, and wear, resulting in corrupt and not booting OS.

    My understanding and I've demonstrated this many times, you should be able to remove SD/USB from an ESXi host and the host will continue running, BUT if MEDIA is disappearing as per that Rediit/Web Blog, any high writes cause the OS to crash!

     

    Why is ESXi now writing alot ? Change of function in the OS ?

    and Latency warning could be because media has disappeared ? not necessarily because the media is too slow.

     

    and what is a "must be created on high-endurance storage devices" ?

     

     



  • 48.  RE: SD Boot issue Solution in 7.x

    Posted Jul 12, 2021 06:31 PM

     wrote:

     

    Sure I'll explain, we can inspect the data in real time being read and written to the media, either microSD or SD card in the server, whilst ESXi is live and running.

    The logic analyser can decode the data in real time, and show us the data as HEX or ASCII, we can comapre and contrast what the OS is doing read and write with the media, we can then compare this with other versions of ESXi, at present we are comparing 

    ESXi-7.01 Build 17551050 HPE

    ESXi-7.0U2a Build 17867351 HPE

    We are using identical servers, and the same high endurance MLC media which was used when these servers failed after 46 hours!

    We've used fresh new media, and re-installed both of the above, and we are now testing and comparing logs, at present these are standalone, e.g. not connected to vCenter Server or have any VMs running, but with info in this thread reference to vCLS, we will examine this nature as well.


    Hi Andrew,

    First awesome that you taking the time to do this testing and troubleshooting, great work.

    But I have some doughts that you will have the proper information that is needed. Particularly because you are not using any VMs or ESXi hosts are connected to a vCenter.

    If you have a standalone ESXi host with vSphere 7.0.2 U2a without any VMs on it running on an SD card, is very rare that you will get the issues. I have 2 and after almost 2 months until today no issues.

    Like I said before, what will trigger faster the issue is Importing OVF/OVA files to vCenter, upgrading VMware Tools, and also if you have a vCD running on that vCenter and using ESXi resources, you get the issue triggered in 2/3 days max.

    With the tests I did with VMware Tools upgrades, 24h was enough to trigger the issue in a particular host where VMs were upgraded.

    Regarding vCLS, I don't have data to answer that this is the root cause, or is just another process that is also triggering the issue. But yes vCLS is doing some r/w data on the partitions.

    Also, running my Veeam backups can trigger this. I tested by disabling the backups for a couple of days, and no issues were found (or in some cases less than usual), After I enable again, start the issues with the same frequency.

    At the moment I only have 4 environments with 7.02 U2a running on SD Cards, 2x HPE(DL360 G9/G10) with 10 ESXi hosts each, 1 with 8x HPE(BL460c G10) with vCloud Director, 1x with 6x HPE(BL460c G10) using vSAN.

    The first one I get issues 1/2 times a week. Not very often. But no VMware Tools are allowed or importing OVF/OVA files to it. But is a very active Cluster that has a lot of new VMs per week, removed VMs, a lot of snapshots, etc.

    The second one(vCD) I get 5/6 times per week. Sometimes more.

    The third environment(vSAN) is very rare I get the issues. Maybe 2/3 times in a couple of weeks, sometimes none.

    Besides the HPE server model, all have the same SD cards, etc.

    All the rest of my vSphere 7.0.2 U2a are running in Local Disks, so the issue is not an issue here.

    So this is an example of different environments and how they trigger.

    I would like to have more time to troubleshoot and test better this issue and I don't work for VMware
    .



  • 49.  RE: SD Boot issue Solution in 7.x

    Posted Jul 12, 2021 09:19 PM

    Thanks for additional info, we will include in our test plan.

    We need to walk before we can run, and collect baselines, these the first baselines collection in the first part of the test plan, and we have been capturing data from ESXi 4.x to ESXi 7.x. When we have collected enough data, we will move on with our test plan, and connect the standalone hosts to vCenter Server. Slowly slowly catchy monkey.....

    This will be the exact same condition when our hosts failed after 46 hours using the same conditions at 7.0.2a.

    "Like I said before, what will trigger faster the issue is Importing OVF/OVA files to vCenter, upgrading VMware Tools, and also if you have a vCD running on that vCenter and using ESXi resources, you get the issue triggered in 2/3 days max."

    Again are we looking at different issues here, we did not see any bootbank loss, on either host, one host failed at 46 hours, the other failed at 58 hours, on host restarts the SD cards were corrupted and causing errors with the media. e.g. the VIBs were corrupted, and compeleting surface scans showed physical media errors.

    We still have those cards and are undergoing forensic testing.

    Again with our failure. two hosts connected to vCenter Server, with no VMs, failure after 46 and 58 hours.

    No OVF imported, NO VMware Tools nothing, this is the condition we are trying to reproduce.

    No VMs no backups either.

    Standard HPE installs with no changes to configuration or redirection of logs.

    We don't work for VMware either, BUT whatever the issue we will detect it and see it by read and writes to media via Logic Analyzer, if we cannot reproduce the issue as we have seen in our environment, we will look at

     

    1. vCLS

    2. Import OVA

    3. VMware Tools

    But why should 1,2 and 3 cause this ? what is being excessively written to and why ?

     

    With the above did you see log files being written alot, excessive swapping to the SD card ? 

     

    Hence why we are trapping reads and writes at logic level outside of ESXi at physical hardware.

    We will incorporate your findings into our test plans. There is much data gathering to perform.



  • 50.  RE: SD Boot issue Solution in 7.x

    Posted Jul 12, 2021 10:04 PM

      thanks for the update.

    When I say, I don't work for VMware, I mean, I have no much time to spend troubleshooting and testing this. I have a team to manage(spending stupid times in meetings these days) and hundreds of ESXi hosts to manage. So no much time to go deeper or build some test labs to test these issues properly.

    So it is good that you are doing this and will share with the community the findings.

    Most of the logs entries(a lot and I mean a lo) are ESXi host trying to access the SD cards and getting r/w errors because no SD card was found. No physical errors on SD Cards, I don't one SD cards that were corrupt.

    After the workaround a reboot, ESXi host is back to production. But funny is that in a Cluster with 10 ESXi hosts, I don't get the issue in the same server twice (at least during weeks).

    This week is hosts A, B, next C, D, then maybe third or fourth week it happens again in the same host.

    Again, all coredump, scratch, logs all, are stored in a Datastore, not on the SD Cards. We do this for all our ESXi hosts, regardless of using SD Cards, or Local Disks.



  • 51.  RE: SD Boot issue Solution in 7.x

    Posted Jul 12, 2021 10:11 PM

    "When I say, I don't work for VMware, I mean, I have no much time to spend troubleshooting and testing this. I have a team to manage(spending stupid times in meetings these days) and hundreds of ESXi hosts to manage. So no much time to go deeper or build some test labs to test these issues properly."

    we do that as well, that's why there are 100 hours in a day!

    I think you issue is possibly connected, by we are seeing corrupted SDs, but looking at the evidence now, I think I see a connection!

    and HOW VMware is going to fix this - umm - Re-engineering I think!

     

    or they have to issue a real statement.



  • 52.  RE: SD Boot issue Solution in 7.x

    Posted Jul 12, 2021 10:31 PM

    so it does state in this

    There is no resolution for the SD card corruption as the hardware has failed.
    An update to alleviate the problem is being planned for a future release.

    Alternatively, once the new drive is installed, and ESXi has been reinstalled, you can immediately move the locker partition to a RAMdisk, per directions in High frequency of read operations on VMware Tools image may cause SD card corruption

    Source

    https://kb.vmware.com/s/article/83376

    So VMware Engineering do recognize this as a fault and bug!

    But there is more going on here with ESXi than just VMware Tools etc, it's the actual buggy ESXi OS.

    These recent changes have not been regression tested, which has been confirmed by VMware, they did not know what the results would be on non-high endurance flash and wear level!

    Which considering vendors have been selling SD/MicroSD solutions for many years, it would seem VMware are blaming the vendor because they told them in 2018 of their plans to change the OS, and the vendor is blaming VMware!

    and unfortunately, us at the coal face are getting the **bleep**e!

     



  • 53.  RE: SD Boot issue Solution in 7.x

    Posted Jul 12, 2021 11:07 PM

    The technical term is fecked!!!!

    We can now reproduce at will!!!!

    It's the ESXi 7.0.2a OS !!!!! (standalone!).

    2021-07-13 00_05_13-Window.png



  • 54.  RE: SD Boot issue Solution in 7.x

    Posted Jul 13, 2021 05:20 PM

    VMWare has added some new info to the article about this problem. They now recommend to move the locker partition as a resolution and specifically mention low and high endurance SD cards:
    https://kb.vmware.com/s/article/83376?lang=en_US



  • 55.  RE: SD Boot issue Solution in 7.x

    Posted Jul 13, 2021 11:07 PM

     

    I've seen that Kb, but interestingly it's gone Page Not Found! as of writing this at 00:07 UTC

    Oh, we are using High Endurance, I don't believe that is the issue!

     

    for a Technical Company this is very vague

     

    • You could use a better-performing replacement device that can handle the increased I/O

     

    What and How much increased I/O ? v30, v90 Class 3 ? and are the servers SD card slots capable at read and writing at v30/v90 ?



  • 56.  RE: SD Boot issue Solution in 7.x

    Posted Jul 14, 2021 07:53 AM

    Strange, it was updated like 3 times yesterday, now they deleted it.  Well, the suggested resolution doesn't fix it anyway.



  • 57.  RE: SD Boot issue Solution in 7.x

    Broadcom Employee
    Posted Jul 14, 2021 04:04 PM

    The KB link is now active and looks like it was updated as of 7-13-21

    https://kb.vmware.com/s/article/83376?lang=en_US



  • 58.  RE: SD Boot issue Solution in 7.x

    Posted Jul 14, 2021 09:54 PM

    still a lot of gobble-de-**bleep**!

    I've seen better articles written by an 8 year old playing fornite!

    If SD card is lower tolerant devices, we can reduce heavy access to SD cards by following below steps.....

    and it makes out it's only occurs, if using VMware Tools, does not explain HOW an ESXi 7.0.2a server fails after 26 minutes of uptime!

    on

    high-endurance flash media

    as per Kb !

    I'll never know, maybe about time for VMware to include in their Skyline Product Health Assessment tool, they keep bragging about is so good!

     

    They could give a Predicted Failure Alert, your ESXi Host Server is going to crap out at the next reboot!

     



  • 59.  RE: SD Boot issue Solution in 7.x

    Posted Jul 15, 2021 05:26 PM

    And today is the 15th, and nothing... some rumors inside VMware that possibly the release of U3.

    Maybe 15th of August or September



  • 60.  RE: SD Boot issue Solution in 7.x

    Broadcom Employee
    Posted Jul 16, 2021 07:45 AM

    release dates are typically not shared, mainly as they change based on various aspects. In this case your source was/is wrong.



  • 61.  RE: SD Boot issue Solution in 7.x

    Posted Jul 16, 2021 01:42 PM

    I guess Dell finally got their hooks into how VMWare does business. This should be a zero-day-tier1 issue. 7.0 Update 1 and 2 should have been taken down from VMWare's downloads when this problem first creeped up. The maddening part is how VMWare seems to act like this is low tier issue. I had promised my Management Team that the VMWare 7.0 upgrade project would have been completed by April 2021, its now July and I'm only 1/3 of the way done!

    And my TAMs solution... "Can you rollback to 6.7?" WHAT? Sure lets just waste a ton of man hours getting to where I am today because VMWare is treating this like a 3rd level bug.

    Also, the embedded VMWare solutions are horrible (at least Dell's IDSDMs and BOSS cards ) they are not really redundant and are impossible to manage. When this issue killed one of my hosts, I tried to reinstall and the reinstalls still failed, even though the iDRAC says both cards are "Online" and since SD Cards don't really re-format themselves. I had to swap the SD cards around, and then the host booted up to version of 7.0 that was months old!

    I have now recommended to my Management Team, and I advise everyone reading this to do the same, to no longer used embedded ESXi solutions and buy servers with RAID cards with RAID0 (sorry I of course meant RAID1) mirrored SSD disks. The cost difference is about 1%.

    I really hope this update comes out soon and resolves this issue, it has really put me in a bad spot. 

     



  • 62.  RE: SD Boot issue Solution in 7.x

    Posted Jul 16, 2021 02:20 PM

     wrote:

    ...

    I have now recommended to my Management Team, and I advise everyone reading this to do the same, to no longer used embedded ESXi solutions and buy servers with RAID cards with RAID0 mirrored SSD disks. The cost difference is about 1%.

    I really hope this update comes out soon and resolves this issue, it has really put me in a bad spot. 

     


    Yes I know, I am in the spot. But gladly I only upgrade some of our environments, not all. If I did upgrade all to 7, or to 7 U2a then I was in a really bad spot. With some rollbacks and stop upgrades or patch to U2a, I only have at moment around 40 server with this issue. If not, I would have 5 times this.

    But the servers I had running with vSphere 7 U1 add no issues for months. Only when I patch those to U2a I start getting this issues. And yes U2 was when VMware changed the bootbank and partitions for ESXi OS. So if anyone is running vSphere 7 U1, should be ok. There was also some issues on it, that was supposed to fix on U2, and it was, but then trigger another one very, very serious.

    And yes, new server now always will have local disks.

    Since the 15th of July was not a real date(I also had the same information that could be the launch of the U3) so when? There is many, many companies suffering with this issue and systems engineer working a lot so that system continues to run without impact on production. And really rollback for most of the environments is something that cannot be done. Some were planned for months until is finish, like ours and now rollback all again?



  • 63.  RE: SD Boot issue Solution in 7.x

    Posted Jul 16, 2021 02:45 PM

    This is a sad story. On the other hand they show us running ESXi on Raspberries and on the other we have to use "high endurance" flash media to power the hypervisor.

    But in the end, U3 will solve this mess. If not, I have to explain my managment that we bought a bunch of useless SD Cards because ESXi 7 does not support the RAID Controller with the existing SSDs in our existing servers because VMware decided to vmotion their QA to the customers.

    FYI: Our SD Card Test Blade runs now for 29 days without issues (with heavy workload).



  • 64.  RE: SD Boot issue Solution in 7.x

    Posted Jul 16, 2021 02:48 PM

     we would be interested to know which brand of media ?



  • 65.  RE: SD Boot issue Solution in 7.x

    Posted Jul 16, 2021 02:53 PM

    It's a 8 GB HPE SD Card in a BL460c Gen9 Blade. The 8 GB card is not officially supportet by HPE for ESXi 7, but the 32GB is Quickspecs 

    All we did was relocating scratch, productLocker and LogDump to FC-SAN (which we also did before for many years). Since during the upgrade process we've seen many other strange issues, this Host was fresh installed.



  • 66.  RE: SD Boot issue Solution in 7.x

    Posted Jul 16, 2021 03:51 PM

     sorry to be the bearer of bad news we;ve had those fail !!!

    and they are not high endurance! also class 1.

    I think personally speed and increase i/o, high endurance is a red herring and misleading

    it looks like ESXi is overwriting and writing across a page boundary!!!

     

    e.g. self destruct mode!!!

    BUG!!!!!

    but for some reason finger is pointed at flash not suitable!!!



  • 67.  RE: SD Boot issue Solution in 7.x

    Posted Jul 16, 2021 04:19 PM

    At least one of the following statements must be true regarding 7.0U2:
    -VMware didn't realize the severity of the impact of the change in I/O profile to USB-based boot devices (likely)
    -VMware didn't realize the volume of their install base using USB-based boot devices (unlikely)
    -VMware didn't anticipate the blowback that this issue would cause from both clients and hardware vendors and the unbridled anticipation of a forthcoming patch release. (likely)

    No serious person would equate VMware's change of recommendation of USB-based boot devices in 7.x to "Legacy" with other high-=endurance methods now "Preferred" to have actually meant that USB-based boot devices are now "at risk of catastrophic failure in your vSphere 7.0U2 environment".

    I value simplicity in my VMware hosts back to the GSX days- the fewer hardware components the better. I worked for YEARS to get all spinning disks out of my hosts to eliminate the most common failure point (aside from the occassional failed DIMM), only to have VMware pull a complete 180 with VSAN which of course requires plentiful local disks. Oh well, AFA vendors got my $ instead of my host compute vendor and I've never regretted the decision.   My entire vSphere environment runs on a large number of top tier (read: the orange company) FC AFA storage arrays that are unbelievably easy to manage and I've only replaced one single disk in an AFA over 6+ years. I REALLY don't want to get back in the business of installing controllers and local disks on my hosts unless it's absolutely necessary.

    indicated "just buy RAID cards and mirrored SD disks...cost difference is about 1%" which may be true for the up-font cost, but *certainly* not for the TCO of the entire environment.  There would be three additional components (potential points of failure) in every host in my environment which will all require regular firmware upgrades and break/fix management. I have a 300-node production environment split across remote data centers. 300 RAID cards, plus 600 SSDs, plus the travel costs and labor hours associated with installing all of that hardware, not to mention the labor required to reload ESXi on all of those hosts with their shiny new "Preferred" boot devices - and the opportunity cost of all of the other business projects that will sit idle while me and my staff accomplish all of this rigamarole - the total cost of an effort like this is ASTRONOMICAL.

    I'm very interested to see where you read your 8 GB HPE microSD card in the BL460c: "is not officially supportet by HPE for ESXi 7" as I have many of the same blades. According to ESXi 7.0 Hardware Requirements doc, https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.esxi.install.doc/GUID-DEB8086A-306B-4239-BF76-E354679202FC.html   8 GB micro-SD boot devices should meet the requirement, albeit as a "Legacy" storage upgrade scenario requiring use of additional high-endurance device (which you have already said you do by relocating scratch, productLocker and LogDump to FC-SAN, as do I) per Neils' blog post here: https://blogs.vmware.com/vsphere/2020/07/vsphere-7-system-storage-when-upgrading.html

    I'm patiently awaiting the U3 patch like everyone else, but I am HIGHLY skeptical that it will document the precise root causes of this issue and what exact methods the patch is using to mitigate them. Maybe VMware will surprise me with transparency



  • 68.  RE: SD Boot issue Solution in 7.x

    Posted Jul 16, 2021 04:28 PM

    I think all our true facts!

    and they did no testing!

    we have had very few failures of USB or SD cards since 2004 with ESXi but then I cannot remember the last time I changed a spinning rust disk either in SANs NAS either they seem to run for years now without failure!



  • 69.  RE: SD Boot issue Solution in 7.x

    Posted Jul 16, 2021 08:28 PM

    Is the update out yet to resolve this? 



  • 70.  RE: SD Boot issue Solution in 7.x

    Posted Jul 19, 2021 12:19 PM

    Seems it got delayed until end of August (only rumors). But since there are no official statements, it is really difficult to tell an exact release date.



  • 71.  RE: SD Boot issue Solution in 7.x

    Posted Jul 19, 2021 01:52 PM

    My latest issue was on 14/07 with 2 hosts, and today was a nightmare, with 5 hosts with the issue.

    I never had these numbers on the same Cluster. So in a 12 ESXi hosts Cluster, 5 had the issue today (or during the weekend). One of those was running the vCenter. So had double of issues.

    Because until I don't recover the ESXi host where the vCenter was running, it was all crazy and unstable.

    PS: If you leave the ESXi host with the issue for a long time (10/12h), VMs then start to get affected and CPU 100% usage, together with performance issues.



  • 72.  RE: SD Boot issue Solution in 7.x

    Posted Jul 19, 2021 03:48 PM

    I managed to get my hands on a BOSS card for one of our hosts and moved all the VMs to that host. That will hold me over. I feel bad for people with bigger environments where it's not an option to replace the boot device for dozens or hundreds of hosts.



  • 73.  RE: SD Boot issue Solution in 7.x

    Posted Jul 21, 2021 07:30 AM

    Status of our SD Card Testserver: 33 days uptime with no problems. It runs along with 19 other servers in a cluster.

    Quick question to the HPE owners: Have you updated the firmware with the latest SPP for Gen9 (2021.05.0) and installed ESXi with the U2a customized image? Maybe this prevents or slows down the issue?



  • 74.  RE: SD Boot issue Solution in 7.x

    Posted Jul 21, 2021 10:55 AM

    Patch release due next month to resolve this and also support secure boot.



  • 75.  RE: SD Boot issue Solution in 7.x

    Posted Jul 22, 2021 03:41 PM

    Do you care to share either your source or confidence level in your statement "due next month"?  I will point out that the OP (employee) statement of "recommending the install of P3 in July sometime." was clearly either incorrect from the outset or invalidated as the date approached, and Duncan clarified this in his response to complaints on this thread after nothing was released on July 15 as had been widely speculated here and elsewhere. 

    https://communities.vmware.com/t5/ESXi-Discussions/SD-Boot-issue-Solution-in-7-x/m-p/2857776/highlight/true#M277012

    "release dates are typically not shared, mainly as they change based on various aspects. In this case your source was/is wrong."

    It seems exceedingly clear to me that VMware is not going to make any official statement regarding release date for this patch, and speculation on release dates only serves to improperly set expectations, justified or not.



  • 76.  RE: SD Boot issue Solution in 7.x

    Posted Jul 22, 2021 03:57 PM

    The source is VMware via a sr. I obtained the patch before but it never supported secure boot. I opened up a new case to seek an ETA and was told esxi patches for 6.7 and 7 will be released next month.

    They also confirmed it several times. This SD card patch would be included and also support secure boot.



  • 77.  RE: SD Boot issue Solution in 7.x

    Posted Jul 22, 2021 10:52 PM

    We too have hit this issue with HP G9 BL460's in a dev cluster on 7.02. We asked to be put on pre-release of the patch from VMware which is supposedly in U3 mid August. Sounds like VMware needs to validate this fix and release it ASAP. Sorry for those that have this issue in Production!



  • 78.  RE: SD Boot issue Solution in 7.x

    Posted Jul 22, 2021 11:04 PM

    Apparently downgrading to 7.01 is an option to get around this issue. Anybody know if you can do that with VUM? I've been doing vmware for a million years and never had to downgrade a host.  I suppose we could just install a fresh copy of U1 on each.

    esxcfg-rescan -d vmhba32 just hangs and hangs

    https://www.provirtualzone.com/vsphere-7-update-2-loses-connection-with-sd-cards-workaround/

    ls -al on server this morning.. still hanging 8 hours later.. server hasn't disconnected but it's basically useless other than hosting vms.

    vmrulz_0-1626994938597.png

     



  • 79.  RE: SD Boot issue Solution in 7.x

    Posted Jul 23, 2021 04:40 PM

     

    does recovery mode work for you ? Shift-R at BOOT rollback ?



  • 80.  RE: SD Boot issue Solution in 7.x

    Posted Jul 27, 2021 01:48 AM

    Even the work around does not resolve the issue. 



  • 81.  RE: SD Boot issue Solution in 7.x

    Posted Jul 27, 2021 09:41 AM

     wrote:

    Even the work around does not resolve the issue. 


    The workaround is a temporary workaround. Mainly to recover ESXi hosts and able to reboot the host properly.



  • 82.  RE: SD Boot issue Solution in 7.x

    Posted Jul 27, 2021 10:50 AM
    • I thought the cache tools workaround was the fix?


  • 83.  RE: SD Boot issue Solution in 7.x

    Posted Jul 27, 2021 11:21 AM

    the fix is a later version of the vib from vmware which you can request via a support SR. the release is hopefully due next month with the rest of the vmware host and vcenter patches



  • 84.  RE: SD Boot issue Solution in 7.x

    Posted Jul 27, 2021 11:32 AM

    Ah, so what was that work around for? It's in their fix bulletin.

     

    I only use the free version, so no support contract.



  • 85.  RE: SD Boot issue Solution in 7.x

    Posted Jul 29, 2021 02:44 PM

     wrote:

    the fix is a later version of the vib from vmware which you can request via a support SR. the release is hopefully due next month with the rest of the vmware host and vcenter patches


    Unfortunately upgrading or even downgrade vmkusb vib did not fix all systems, only a couple were fixed. Many customers have stated that this solution did not fix the issue and they still get the ESXi host U2a issue.



  • 86.  RE: SD Boot issue Solution in 7.x

    Posted Jul 29, 2021 02:42 PM

     wrote:
    • I thought the cache tools workaround was the fix?

    In some of my ESXi hosts did fix the issue. Others reduce the number of times I get the issue. Instead of having every 24/48h, I get one time a week.

    So is not 100% a silver bullet.



  • 87.  RE: SD Boot issue Solution in 7.x

    Posted Jul 27, 2021 09:43 AM

     wrote:

    Apparently downgrading to 7.01 is an option to get around this issue. Anybody know if you can do that with VUM? I've been doing vmware for a million years and never had to downgrade a host.  I suppose we could just install a fresh copy of U1 on each.

    esxcfg-rescan -d vmhba32 just hangs and hangs

    https://www.provirtualzone.com/vsphere-7-update-2-loses-connection-with-sd-cards-workaround/

    ls -al on server this morning.. still hanging 8 hours later.. server hasn't disconnected but it's basically useless other than hosting vms.

    vmrulz_0-1626994938597.png

     


    If the host has the issue, you can't do ls or even a df -h or other Linux OS commands, it will hang. You need first fix the issue with the esxcfg-rescan -d vmhba3 and reboot. Then you can do other commands normaly.



  • 88.  RE: SD Boot issue Solution in 7.x

    Posted Aug 12, 2021 05:33 PM

    We have sort of mitigated the issue by scripting reboots of cluster nodes. We also stopped turbonomics from managing DRS in the cluster which had appeared to signficantly increase IO according to logs. esxcfg-rescan -d vmhba32 seems to work on hosts that are not fully disconnected from the cluster.

     

    Here is the comm from support.. note the promise for U3 by mid August.. clock is ticking vmware!

    "

     

    Thank you for your time over the course of this SR:21237061007 and thank you for choosing VMware Products!

     

    I will now proceed in placing this Support Request in an archived state. This state means the Support Request can be re-activated by replying to this mail or by calling VMware Customer Support at any stage within the next 21 days.

    To ensure clarity on the resolution of your issue and as a record for yourself below is a summary of what we worked on:

     

    Summary

    ESXi 7 host frequently disconnecting from vcenter

     

    Cause

    2021-07-07T23:07:41.135Z cpu12:2097520)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...

    2021-07-07T23:07:41.135Z cpu12:2097520)ScsiDeviceIO: 4315: Cmd(0x45d95fcd2100) 0x28, cmdId.initiator=0x43079ee36ac0 CmdSN 0x1 from world 4817311 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x5 D:0x0 P:0x0 Cancelled from path layer. Cmd count Active:1

    2021-07-07T23:07:41.135Z cpu12:2097520)Queued:2

    2021-07-07T23:07:41.136Z cpu27:4817311)VFAT: 5144: Failed to get object 36 type 2 uuid 5f525e1a-4f3300a9-443a-36db70100038 cnum 0 dindex fffffffecdate 0 ctime 0 MS 0 :Timeout

    2021-07-07T23:07:41.179Z cpu6:4817326)ALERT: Bootbank cannot be found at path '/bootbank'

    2021-07-07T23:07:41.770Z cpu22:2097521)ScsiPath: 8058: Cancelled Cmd(0x45b960955000) 0x0, cmdId.initiator=0x45393781bc58 CmdSN 0x0 from world 0 to path "vmhba32:C0:T0:L0". Cmd count Active:0 Queued:2.

    2021-07-07T23:07:41.770Z cpu12:4784715)VMW_SATP_LOCAL: satp_local_updatePath:856: Failed to update path "vmhba32:C0:T0:L0" state. Status=Transient storage condition, suggest retrys..

     

    Resolution

    As you are running ESXi 7.0 update 2 from a Sd-Card so the Host getting non responsive due to /bootbank cannot be found message is a known issue and an action plan was shared with you regarding it.

     

    Fix for the issue will be released in ESXi 7.0 patch 3 which is due to be released in a couple of days latest by mid August in the meanwhile you can perform the following as workaround:

     

    1. Reboot the affected Host as then ESXi starts talking to sd-card again untill sd-card is overwhelmed again in future with I/O's sent by our kernel.

     

     2. If reboot of ESXi host is not an option and VMs are running. Rescan vmhba using command: esxcfg-rescan -d vmhba32"



  • 89.  RE: SD Boot issue Solution in 7.x

    Posted Aug 24, 2021 07:37 PM

    Seems like a patch is imminent:

    Resolution
    This issue is resolved in VMware vSphere ESXi 7.0 U2c. To download go to the Customer Connect Patch Downloads page.

    https://kb.vmware.com/s/article/83376?lang=en_US



  • 90.  RE: SD Boot issue Solution in 7.x

    Posted Aug 24, 2021 08:26 PM

    It's available within the lifecycle manager in vCenter.



  • 91.  RE: SD Boot issue Solution in 7.x

    Posted Aug 24, 2021 09:56 PM

    Test Environment patched, will see how it goes before moving onto prod. I hope VMware are not going through a bad patch with the updates/ patches again like they did years ago. Patch one thing and introduce another bug just as bad...



  • 92.  RE: SD Boot issue Solution in 7.x

    Posted Aug 31, 2021 04:32 PM

    Hi,

    does someone know if I can install this patch if I'm running the DellEMC customized 7.0U2 version? Usually I would wait until dell releases their custom ISO/ZIP however this issue is really annoying...

    thanks



  • 93.  RE: SD Boot issue Solution in 7.x

    Posted Aug 31, 2021 04:38 PM

    yes!



  • 94.  RE: SD Boot issue Solution in 7.x

    Posted Aug 31, 2021 04:41 PM

    Same here using customized Dell ISOs and I have successfully updated my hosts with the U2c patch using Lifecycle Manager.



  • 95.  RE: SD Boot issue Solution in 7.x

    Posted Aug 31, 2021 05:37 PM

    I am running the Dell custom ISO : DEL-ESXi-701_17551050-A05 . My concern about using Lifecycle Manager is the Dell addons won't be upgraded.  What version were you on prior to updating via LM ? 

    I am hoping Dell EMC releases a custom ISO for this, but VMware hasn't even yet. They released a 7GB ISO for vCenter 7.0.2Uc but the ESXi available to download is still 7.0.2Ua



  • 96.  RE: SD Boot issue Solution in 7.x

    Posted Aug 31, 2021 05:44 PM

    I was using build 2a.  I noticed also that the 2c wasn't available for ESXi...just vCenter.  However, with this issue we were having with the SD cards I had to apply the update asap.  Also, from my understanding, the 2c update is more of a patch than an upgrade so I went with the Lifecycle Manager method.  Update went smooth and I haven't had any issues since then.  I also have DRS and HA enabled with no issues.  Having DRS and HA enabled with the SD card issue was a nightmare.

    Additionally, I am using Skyline and it doesn't report any issues regarding this...



  • 97.  RE: SD Boot issue Solution in 7.x

    Posted Aug 31, 2021 10:28 PM

     wrote:

    I am running the Dell custom ISO : DEL-ESXi-701_17551050-A05 . My concern about using Lifecycle Manager is the Dell addons won't be upgraded.  What version were you on prior to updating via LM ? 

    I am hoping Dell EMC releases a custom ISO for this, but VMware hasn't even yet. They released a 7GB ISO for vCenter 7.0.2Uc but the ESXi available to download is still 7.0.2Ua


    There is no ISO from VMware so no Customize ISO will be available for now. If VMware did not provide one, third-party companies will also not.

    We should see a new version ISO soon, until then we can only apply the patch. If you cannot have access to the patch through vCenter Lifecycle Manager, import it or do it manually in ESXi console.

    Check my blog post for more details
    https://www.provirtualzone.com/vmware-finally-launched-esxi-7-0-update-2c/

     



  • 98.  RE: SD Boot issue Solution in 7.x

    Posted Aug 31, 2021 10:36 PM

    Lcm has the Dell add-on images,the same ones that are built into the Dell iso. You just create a baseline and patch.



  • 99.  RE: SD Boot issue Solution in 7.x

    Posted Sep 01, 2021 03:12 PM

    Luciano.

    Something does not make sense here.  I ran these in my test environment. vCenter was 7.0.2 Build 17958471 , I ran the upgrade to .400 from the appliance update section. Patched successfully and brought vCenter to Build 18356314.

    The build number for ESXi 7.0.2Uc is : 18426014 , which is higher than vCenter. I can't ever remember in all my years of working with these products that the ESXi will have a build higher than vCenter.  My understanding has always been vCenter must be equal to or higher in build numbers than ESXi.  I am not sure I want to patch production without some clarification here.

    I did the patch using the Host Security Patches and Critical Host Patches baseline in test, however the build is the same if I were to create an image for my production cluster - the vCenter will still be lower than ESXi.  

    Is this ok ?



  • 100.  RE: SD Boot issue Solution in 7.x

    Broadcom Employee
    Posted Sep 01, 2021 03:17 PM

     For your concern, the build number is not relevant. The ESXi offline bundle (the ZIP file) was simply built/created AFTER the vCenter-ISO, hence the it has a higher build number.

    You don't need to check build numbers (at least I don't know any scenarios where relevant). Checking the "public version naming" and supportability via https://interopmatrix.vmware.com is enough.

    Regards,
    Patrik



  • 101.  RE: SD Boot issue Solution in 7.x

    Posted Sep 01, 2021 03:32 PM

    does not make any difference, what matters is the MAJOR version number e.g. 7.

    if you go back far enough, some minor versions caused issues with SSL !



  • 102.  RE: SD Boot issue Solution in 7.x

    Posted Aug 31, 2021 04:49 PM

    I went from 7.0.1 to 7 update2a using VMware-VMvisor-Installer-7.0.0.update02-17867351.x86_64-DellEMC_Customized-A04.iso

    Then using LifeCycle manager to patch to Update2c.

    Only strange thing that happened to me on two hosts was when I upgraded to U2a ESXi would boot up with missing all vmknic's (but seeing ordinary Nic's). I reverted to 7.0.1 again using Shift+r while booting. Then ran the upgrade again and it would work. 

    And because I'm using Skyline it triggered this https://kb.vmware.com/s/article/83851

    Patching to u2c fixed it.

     



  • 103.  RE: SD Boot issue Solution in 7.x

    Posted Jul 16, 2021 05:16 PM

     So you are correct. I would not recommend replacing any current embedded ESXi solution mainly because, at least with Dell, you can't! When you purchase a diskless server from Dell without a Perc card and drive cages, they do not support installing them afterwards, you are stuck.

    What makes matters even worse for me (and I have to assume other customers) is that I have some R730s that are diskless with only the IDSDM solution, and the R730 (which is still an ESXi supported server) does not support the BOSS card . If this fix does not work I have to replace servers I did not budget for in my upgrade project. 

    However, all new servers that I purchase will not longer utilize a diskless config. I can easily have a non-technical person replace a bad swappable SSD RAID drive, but replacing a BOSS card or its attached SSD requires downtime and opening the hood of the server. No Thanks!



  • 104.  RE: SD Boot issue Solution in 7.x

    Posted Jul 16, 2021 08:15 PM

     wrote:

     So you are correct. I would not recommend replacing any current embedded ESXi solution mainly because, at least with Dell, you can't! When you purchase a diskless server from Dell without a Perc card and drive cages, they do not support installing them afterwards, you are stuck.

    What makes matters even worse for me (and I have to assume other customers) is that I have some R730s that are diskless with only the IDSDM solution, and the R730 (which is still an ESXi supported server) does not support the BOSS card . If this fix does not work I have to replace servers I did not budget for in my upgrade project. 

    However, all new servers that I purchase will not longer utilize a diskless config. I can easily have a non-technical person replace a bad swappable SSD RAID drive, but replacing a BOSS card or its attached SSD requires downtime and opening the hood of the server. No Thanks!


    Fortunately, all my Dell was acquired with SSDs. All, so I don't have this issue with the Dells, only with HPE.



  • 105.  RE: SD Boot issue Solution in 7.x

    Posted Jul 14, 2021 10:46 AM

    I just upgraded 2 hosts to v7 that run Dell Dual SD cards.  I noticed a few things :

    1:  As soon as I added VMFS storage to one box, it automatically moved the /scratch or .locker to the HDD on its own

    2: My other cluster server already had scratch going to a SAN disk and retained these settings.

    I also followed the KB to move the Tools to RAM.  One host has been upgraded for a week without any issues at all and the other was done yesterday (but now takes 45 min to boot), so i've pulled it out of the cluster until VMware can figure out why it gets stuck for so long after loading the SATP_ALUA policy at boot.

    I am a bit concerned hosts will stop working so i've paused the upgrades on the rest of the servers.  If one has performed mitigations I listed above, is there still a chance the SD cards will stop working?



  • 106.  RE: SD Boot issue Solution in 7.x

    Posted Jul 14, 2021 01:28 PM

     wrote:

    I just upgraded 2 hosts to v7 that run Dell Dual SD cards.  I noticed a few things :

    1:  As soon as I added VMFS storage to one box, it automatically moved the /scratch or .locker to the HDD on its own

    2: My other cluster server already had scratch going to a SAN disk and retained these settings.

    I also followed the KB to move the Tools to RAM.  One host has been upgraded for a week without any issues at all and the other was done yesterday (but now takes 45 min to boot), so i've pulled it out of the cluster until VMware can figure out why it gets stuck for so long after loading the SATP_ALUA policy at boot.

    I am a bit concerned hosts will stop working so i've paused the upgrades on the rest of the servers.  If one has performed mitigations I listed above, is there still a chance the SD cards will stop working?


    If this version has a critical bug, why updating to this version?

    Also, I don't understand how VMware is still providing this version since it is a faulty version. Honestly can't understand that.



  • 107.  RE: SD Boot issue Solution in 7.x

    Posted Jul 14, 2021 01:41 PM

    Yes, I think the bug will still occur. We write our logs to a SAN and didn't upgrade any VMWare tools and the problem still occurred. One of our hosts got hit immediately, with another one it took 2 months. If you haven't upgraded the VM hardware versions yet, I'd rollback: https://kb.vmware.com/s/article/1033604

    True, considering how many hosts there are in the wild with SD cards, at the very least there should be big red box on the download page advising not to upgrade when using SD cards. Better yet, there should 've been a hotfix ages ago. I think the only reason this hasn't received more attention yet is because most admins prefer to take things slowly and are probably still on 6.7.



  • 108.  RE: SD Boot issue Solution in 7.x

    Posted Jul 14, 2021 01:51 PM

    Thank you. Since I only did one host. I may seriously consider rebuilding it back to 6.7 U3 and wait for further fixes to come from VMware. I have a SR open for my long boot issue. I'll see what they say but after reading all your posts, I have some concerns with 7.0.2



  • 109.  RE: SD Boot issue Solution in 7.x

    Posted Jul 14, 2021 02:17 PM

    After long boot times we’ve seen flash fail!



  • 110.  RE: SD Boot issue Solution in 7.x

    Posted Jul 14, 2021 01:48 PM

    Hi Luciano

    I did not know I would have these issues or i would have not upgraded. One host was a test server and it seems to have worked ok. I took my first prod server to 7.0.2 and now 45 minutes to boot, stuck on vmw_satp_alua loaded successfully. 

    Then i happened to see the issues with SD card problem. I did the HCL compatibility check and even Skyline check and never once did it say there are potential issues with SD card.

    I may just wait for 7.0.3 . I don't think I want to do the other 8 hosts because i really don't want to have to build them from scratch and then face potential server loss due to corrupt SD cards.

    it seems this version is one gigantic mess. So glad I did not apply the upgrade to the cluster and chose only to start with 1 host. 



  • 111.  RE: SD Boot issue Solution in 7.x

    Posted Jul 12, 2021 10:09 PM

    WELL WE HAVE A BROKEN 7.0.2A server already and it's only been 40 minutes!

    This was not connected to vCenter Server, no VMs, NO VMware Tools, NO OVF/OVA.

    Basically I suspect that any other additional I/O just causes more read/writes to the "system", VMFS-L seems to be the culprit, and I wonder if VMware Engineering did any actual physical test across media, e.g. HDD/SDD/Flash/NVMe/SD/USB/MicroSD before releasing to the wild, or just introduced a new thing, and assumed!

    We are using hign endurance 3D NAND flash, rated for 4K and 8K 60fps high data rate, but I suspect that is not the issue!

    We need to check the data dumps and compare across builds and servers.

     



  • 112.  RE: SD Boot issue Solution in 7.x

    Posted Jul 12, 2021 05:27 PM

    We have no idea what is causing the issue at present, we will add Cluster to our list of investigations.



  • 113.  RE: SD Boot issue Solution in 7.x

    Posted Jul 11, 2021 05:14 PM

    again VMware just state use "high endurance flash" it's possible the days of SD card/USB for ESXi 7.0.2a installs is over.!



  • 114.  RE: SD Boot issue Solution in 7.x

    Posted Jul 11, 2021 05:13 PM

    Dell have issues a statement they no longer recommend installing on SD card, so if they supplied a server with SD for the use of 7.0.2a - get a refund and send it back!



  • 115.  RE: SD Boot issue Solution in 7.x

    Posted Jul 11, 2021 05:11 PM

    You should have got DELL to replace with BOSS - F.O.C - we did with many clients.

     

    Under UK LAW, Not fit for FECKING PURPOSE!!!!



  • 116.  RE: SD Boot issue Solution in 7.x

    Posted Aug 04, 2021 03:41 PM

    Any word on the imminent release of U2P03?  My team is tired of playing Whack-A-Mole.

    Larry

     



  • 117.  RE: SD Boot issue Solution in 7.x

    Broadcom Employee
    Posted Aug 04, 2021 04:19 PM

    Sorry no specific date yet has been verified for the GA release, I would assume sometime in Aug 2021.



  • 118.  RE: SD Boot issue Solution in 7.x

    Broadcom Employee
    Posted Aug 25, 2021 09:24 PM

    Thanks to one of our awesome VMware TAM's (thats probably why every account should have a TAM covering them) he provided me with this Skyline update,  which proactively detects vSphere-VMFS-L-SDCard for potential VMFS-L Locker partition corruption with low-endurance boot devices on ESXi.

    https://twitter.com/VMwareSkyline/status/1430246999475900417

    get a TAM & get Skyline rolling



  • 119.  RE: SD Boot issue Solution in 7.x

    Posted Aug 25, 2021 10:09 PM

    and what does it do when it detects the corruption!

    Automatically fix it. email VMware Support advising you not to reboot ever!



  • 120.  RE: SD Boot issue Solution in 7.x

    Posted Aug 26, 2021 04:21 PM

    Has anyone confirmed that P03 fixes the SD IO saturation?



  • 121.  RE: SD Boot issue Solution in 7.x

    Posted Aug 26, 2021 04:51 PM

    it's still a recommendation to use "High Endurance Flash" even with this patch!

    it will be interesting to see if Dell/HPE retract their statements about SD cards!

    Only time will tell, if it fixes it, whatever it was!!!! There seem to be many different scenarios which occur.

     

    for me, it crapped out after 13 minutes of new install and high endurance flash media! with no VMware Tools, no VMs.... I will try the same situation and see if I can get it to corrupt the install!

     

     



  • 122.  RE: SD Boot issue Solution in 7.x

    Posted Aug 26, 2021 05:27 PM

    All seems fine for majority of customers however I do have a few which skyline detects possible sd card issues.

    Only time will tell but so far so good



  • 123.  RE: SD Boot issue Solution in 7.x

    Posted Aug 26, 2021 05:38 PM

    and what does  Skyline do ? or recommend ?



  • 124.  RE: SD Boot issue Solution in 7.x

    Posted Aug 30, 2021 02:01 PM

    It just points you to the KB article.

    As it did for us even though we only have 6.7 hosts. I guess because the article states that 6.7 is also affected (with no resolution) even though it also states:

    "Potential VMFS-L Locker partition corruption on SD cards in ESXi 7.0"

    "Starting in ESXi 7.0, the boot partition is formatted as VMFS-L instead of FAT"

    Does anyone read these articles before they publish them?

    Unfortunately, it seems that long gone are the days when we only had to wait for U1 to consider the new ESXi version stable, we obviously have to change our policy and consider it beta until at least U3.



  • 125.  RE: SD Boot issue Solution in 7.x

    Posted Aug 30, 2021 02:43 PM

    Skyline detects for potential issues with the new VMFS-L on low endurance SD cards 



  • 126.  RE: SD Boot issue Solution in 7.x

    Posted Aug 30, 2021 03:43 PM

    how does Skyline KNOW - thislow endurance SD cards  ???

     

    does it have some sort of AI ?

    and what does it do automatically fix it ? or just point you to a useless Kb !



  • 127.  RE: SD Boot issue Solution in 7.x

    Posted Aug 30, 2021 03:49 PM

    It doesn't know. It just sees an SD card and points you to the article, it does nothing.



  • 128.  RE: SD Boot issue Solution in 7.x

    Posted Aug 30, 2021 04:39 PM

     thanks

     

    exactly, I have no idea, "why everyone thinks this is the next best thing since sliced bread!"



  • 129.  RE: SD Boot issue Solution in 7.x

    Posted Sep 02, 2021 01:38 PM

    Well this is great... Now there's a new article saying that having only SD card (or USB) is unsupported.

    Now I am doubting even more that they actually fixed the issue (maybe just lessened the impact), obviously they changed the design too much without thinking or testing the currently supported hardware and it would be too much trouble for them to fix it now. So...dear customers, suddenly you're unsupported, tough luck! Oh wait, there's a simple workaround, just reinstall all your servers on new hardware...I am sure VMware will cover the cost...

    https://kb.vmware.com/s/article/85615?lang=en_US

    Great job!



  • 130.  RE: SD Boot issue Solution in 7.x

    Posted Sep 02, 2021 02:04 PM

     

    Well you have to say, that this article does mention only the problem with non-persistent storage of /scratch which in SD-Card Installations always led to this message. If you redirect the scratch location to persistent storage (like a LUN) the message will not pop up.

    The article also says "A system with only a SD-Card/USB boot device is operating in an unsupported state with the potential for premature corruption", which in most cases, will not be the case.



  • 131.  RE: SD Boot issue Solution in 7.x

    Posted Sep 02, 2021 02:18 PM

    Hi,

    If it is as you say, just about the scratch partition, like it always was, why the new article and new warnings - "Please move installation to persistent storage". *installation*, not /scratch

    They clearly say that the only supported configuration is having a local persistent storage device that is not a SD-Card/USB boot device.

    That is new. Maybe some people design their servers with extra disks lying around in the servers not being used, we don't. If I have local disks, they are for vSAN.

    If I was buying local disks just for the /scratch partition, why would I even have the SD Card, I would use those disks for booting...

    /scratch partition is of course redirected to a remote datastore, but this is about all other parts of the shiny new much improved ESX-OSData partition which they found out doesn't support SD Cards, even though it did until 7.0 U2



  • 132.  RE: SD Boot issue Solution in 7.x

    Posted Sep 02, 2021 02:33 PM

    this was inevitable based on the evidence we've seen between versions at the electrical logging data level !!!!!

    We never believed for one minute they attempted to fix the issue!

    Once HPE and DELL issued statements of not supported/not recommended for IDSM modules!

    Smoke and Mirrors !

     



  • 133.  RE: SD Boot issue Solution in 7.x

    Posted Sep 02, 2021 02:35 PM

    just to clear up....

    we've seen corruption issues with even having re-directed scratch partitions!

    and we can generate the issue and reproduce in 13m-26mins!



  • 134.  RE: SD Boot issue Solution in 7.x

    Posted Sep 02, 2021 02:43 PM

    Exactly, it's not about the scratch. This is the biggest screw up I have seen from VMware... and the sad part is they don't care and they try to minimize it and blame OEMs and customers for using supposedly low endurance cards (now it's obvious no card has a high enough endurance for their briliant design)

    As I said, I haven't yet started upgrading to 7.0 because I don't consider it a stable version (I don't think anyone can argue about that in this thread)...what am I supposed to do now? Stay on 6.7 until my servers are old enough for replacement in 3-4 years?



  • 135.  RE: SD Boot issue Solution in 7.x

    Posted Sep 02, 2021 02:56 PM

     The biggest problem is, that there, until now, is no 100% clear statement. If you have to plan for the future this course is unacceptible.

    The strangest thing is, that my test host with an 8 GB HPE SD-Card is now running for almost two months without a problem on U2a.

    For the other hosts, since the B140i controller is unsupported with ESXi 7, I am going to think to disable the RAID functionality and use the Wellsburg SATA controller. Has anyone done that yet on HPE hardware? I was not aware that this is possible.



  • 136.  RE: SD Boot issue Solution in 7.x

    Posted Sep 02, 2021 03:44 PM

    This was a discussion (or argument) I was having with fellow vExperts, which also happen to be VMware Employees!!!!

    when this broke, imagine the dis-array this is causing our clients, that have been convinced by VMware, VMware Partners to install ESXi (Hypervisor) in "Embedded Mode" ever since ESX "i" was invented claiming one of the "benefits" over Hyper-V, smaller footprint, no need for spinning disks for mirrored installation, smaller footprint, less security issues, less overhead...

    and we all bought into that, now to be told.... 

    USB/SD cards, even though supplied by OEM vendors, NOT SUPPORTED!!!!

    oh!  since 2004 we've been installing onto USB/SD card installs, even the vendors have been doing it and supplying IDSM, and SD pre-installed

    It has been said that VMware, advised OEM's off this change in 2018, and people poke fun at SD/microSD card installations, and we always thought it was not about USB/SD type cards, we've seen high endurance industrial SD cards FAIL that are used in more serious life-threatening applications and systems than an ESXi server!!!!

    Some in the latest ESXi 7.x is foo-barred!!!

    It does leave an awkward situation for upgrades from 6.5 and 6.7, to 7.0, where there is no easy upgrade path, other than full install to something you can get fitted in your servers, as I understand BOSS/SATADOM may not fit some servers, like the Dell IDSM module.

    It is a **bleep**-up! 

    (not to be confused with a vSAN installation, which does have a requirement for BOSS/SATADOM because of the vSAN trace logs!)

    I do feel for everyone now.... abandoned by VMware !



  • 137.  RE: SD Boot issue Solution in 7.x

    Posted Sep 02, 2021 04:04 PM

     i really do not see what the issue is, times have changed and so have recommendations. It also baffles me why anyone would upgrade to major releases and not do a complete full rebuild of a host. On the next host refresh just spec in some disks rather than SD cards. 

    the errors and log spew would have been detected after upgrading to make you stop the roll out.

    if you can afford to run VMware i am sure a few disks are next to nothing. ESXi rebuilds can be done in minutes.



  • 138.  RE: SD Boot issue Solution in 7.x

    Posted Sep 02, 2021 04:15 PM

    Yes, the recommendation changed abruptly without prior notice. When things like this are done, responsible thing to do is do announce for example, this is the last major version to support this configuration, the next one will not. And VMware usually does this, but this was obviously not planned, but the consequence of a major screw up in planning and/or development.

    When are you available to come and rebuild all our hosts free of charge? Of course, bring a bunch of boot devices with you...

    Really, rebuild all hosts instead of an upgrade? Is this a Microsoft forum? That used to be the benefit of using VMware, same host going through 4-5 major releases without problems during their lifetime (that was when releases were every year)



  • 139.  RE: SD Boot issue Solution in 7.x

    Posted Sep 02, 2021 05:17 PM

      Almost my entire environment has diskless server configurations from Dell (you know the parent company of VMWare). So, my Servers do not have PERC or Disk Cages, and some are R730s that do not support BOSS cards. Dell also does not support installing PERCs into servers that came from the factory without them.

    So, now my VMWare upgrade which should have had almost zero cost is now costing me over $100k because I need to replace the R730 hosts and purchase new BOSS cards.

    Also, BOSS cards have a 90-day turnaround right now.

    And by the way, I have never had to do a complete host rebuild to do an upgrade.



  • 140.  RE: SD Boot issue Solution in 7.x

    Posted Sep 02, 2021 05:37 PM

    I think we can agree there is a serious amount of ill-feeling in this thread, and if this is a representation of the VMware installations out there, VMware Admins and Organisations feel fecking let down by VMware!

    Right or Wrong, New or Old technology.

    It's not what I expected, and we've been using their products since their inception!

     



  • 141.  RE: SD Boot issue Solution in 7.x

    Posted Sep 02, 2021 06:23 PM

    appreciate your thoughts, but one of your comments IMO highlights part of why so many VMware administrators are quite upset about this issue - WE were the QA.

    "the errors and log spew would have been detected after upgrading to make you stop the roll out." ...by VMware QA and stopped the 7.0 U2a release to GA.  There, I fixed it for you.:-) 

    I honestly don't think most people would have a problem with a major configuration supportability change like this given enough warning and procurement cycle runway to spec replacement hosts with actual disks- but that type of change should only happen at a MAJOR release version.  This issue did not - the VMFS-L formatting change occurred at 7.0GA, true enough, but no one running 7.0U1 is having any issues with USB or SD card boot devices that I am aware of; something clearly changed in the I/O profile of the vmkusb driver released with 7.0.U2 that is causing issues with that class of devices.  Thankfully I've seen no issues so far on the diskless hosts I've patched with U2c.

    Your statement regarding full and complete rebuild of a host is also a bit confusing.  I happen to agree that reinstalling from scratch is the cleanest method to upgrade between major versions - we've probably both been in this game long enough to see quite a few support issues caused by artifacts left over from previous installations.  But installing 7.0 from scratch would not have saved a diskless host from this issue eventually being triggered by patching to 7.02.  If you mean "rebuild" in the sense of retrofit existing hosts with additional hardware - that might make sense for some smaller implementations, but for larger environments in multiple fully-remote data centers the expense- in new physical equipment, travel, man-hours, and opportunity cost - would be simply staggering.  I should be able to upgrade my current hardware to the latest available version so long as I am compliant with VMware's HCL.  If VMware wants to stop supporting installation on USB/SD card media, they need to have given PLENTY of notice of a change like that coming for a future Major release AND figure out a way to incorporate that information into the HCL when selecting servers from various manufacturers.

    After waiting an interminable length of time for the patch I'm trying to move on - I've wasted enough time playing whack a mole with this issue and with U2c I'm rather enjoying not waking up every morning and having to check in on which hosts can no longer see their boot device



  • 142.  RE: SD Boot issue Solution in 7.x

    Posted Sep 03, 2021 01:00 PM

    I don't mind if the industry moves away from SD cards, but it could have been coordinated better. To be fair, I have now heard that VMware in 2018 did communicate to vendors that SD cards should not be used anymore. Imo two things should have happened that are very easy to implement: Vendors warn customers when configuring a server with SD cards, that it is not supported with ESXi 7 and the same on the ESXi 7 download page. It should have been stated in big red letters to not install it onto SD cards (page 12 of the installation guide isn't good enough imo).

    I just configured a Dell server with an IDSM and ESXi 7 to see what happens and Dell does not allow this configuration (anymore):

    2021-09-03 14_51_24-PowerEdge R740 Rack Server _ Dell USA.png



  • 143.  RE: SD Boot issue Solution in 7.x

    Posted Sep 03, 2021 01:04 PM

     but that IDSM statement of "not recommended" only appeared recently when all this started....

    after DELL shipped out many servers Dell R740 and R640 to many of our clients with IDSM modules!!!!

     

     



  • 144.  RE: SD Boot issue Solution in 7.x

    Posted Sep 03, 2021 01:18 PM

    I'm sorry, but that story sounds like a face saving exercise for VMware, because

    1. Vendors obviously haven't heard that because they continued to sell and recommend SD Cards, on VMware's own HCL! Event if they  ignored warnings from VMware, wouldn't they (VMware) say something about it publicly then so we, the customers, know about it?

    2. VMware itself obviously hasn't heard that, because 2 years later, in 2020, when they released 7.0, they explicitly documented that SD Cards are fully supported in 7.0 the same way as in 6.7. The only change was raising the minimum size to 32 GB, and that only for new installs.

    So the next opportunity to do what you say they should have done is the next major release, and even then it should be deprecated, not immediately unsupported (except maybe for new installs)



  • 145.  RE: SD Boot issue Solution in 7.x

    Posted Sep 03, 2021 02:35 PM

    we will have to wait with baited breath as to what the updated KB states.



  • 146.  RE: SD Boot issue Solution in 7.x

    Posted Sep 09, 2021 07:45 AM

    I talked to Dell and HPE these days mentioning that issue and they all referring to that linked blog entry and told me ESX7 on SD card installation is still fine 

    https://core.vmware.com/resource/esxi-system-storage-changes#section1



  • 147.  RE: SD Boot issue Solution in 7.x

    Posted Sep 16, 2021 01:21 PM

    Has anyone else experienced that U2c slows the local disk extremely down? Installations of the NSX-T or HA agents take a very long time, even on SSDs connect via the chipset SATA controller (Wellsburg oder Lewisburg).



  • 148.  RE: SD Boot issue Solution in 7.x

    Posted Sep 17, 2021 07:32 AM

    Even yesterday I deploy NSX-T / vSAN with the same version and did not notice extra slowness.



  • 149.  RE: SD Boot issue Solution in 7.x

    Posted Sep 17, 2021 07:38 AM

    It sounds like your NSX appliance might need a db cleanup. Our T and V appliances always clog up after sometime and we clear the db down before any upgrade



  • 150.  RE: SD Boot issue Solution in 7.x

    Posted Sep 22, 2021 02:20 PM

    Seems related to the AHCI integrated driver of ESXi. Today I reinstalled another server which has a RAID setup and there everything installed with normal speed.



  • 151.  RE: SD Boot issue Solution in 7.x

    Posted Sep 02, 2021 04:06 PM

    Even for vSAN, another local device is not a requirement, trace logs can be redirected to a remote storage or (partly) to a syslog server or limited in size.

    Anyway, even the vSAN Ready Nodes (still on the HCL) have the option to boot from SD card (with no additional local disks other than vSAN disks). Imagine you buy a bunch of them today, trusting they are a sure thing to be checked and tested, and by the time they arrive, they are unsupported.

    At least now I know what I'll say to the VMware sales team trying to sell Tanzu to me...sorry, unsupported...



  • 152.  RE: SD Boot issue Solution in 7.x

    Posted Sep 02, 2021 08:51 PM

     wrote:

    Well this is great... Now there's a new article saying that having only SD card (or USB) is unsupported.

    Now I am doubting even more that they actually fixed the issue (maybe just lessened the impact), obviously they changed the design too much without thinking or testing the currently supported hardware and it would be too much trouble for them to fix it now. So...dear customers, suddenly you're unsupported, tough luck! Oh wait, there's a simple workaround, just reinstall all your servers on new hardware...I am sure VMware will cover the cost...

    https://kb.vmware.com/s/article/85615?lang=en_US

    Great job!


    No, please read carefully. Is not stating that but that you should use persistent storage for .locker, coredump, logs etc. That is will not be supported if you not used. That is it, not the USB/SD cards.

    But this is not new, that was the Best Practices anyway if you use a USB/SD card, you should always move this to persistence Storage.

    The only problem I see here is when using vSAN and USB/SD Cards and if the only Storage you have is the vSAN, then you have a not supported system. We had the same before but was not explicit not supported and it seems the future we will have.

    In the long run? I am pretty sure the path is to remove any possibility to use USB/SD Cards, that is what VMware will do in the long run. But that is different from what we have today and will have in a near future.



  • 153.  RE: SD Boot issue Solution in 7.x

    Posted Sep 02, 2021 09:42 PM

    Again, this is not about the scratch partition, that was always the case. Where do you see ".locker, coredump, logs" in the article?

    "Please move installation to persistent storage"

    "ESXi requires local persistent storage for operating system use, to store system state, configuration, logs, and live data"

    "A system with only a SD-Card/USB boot device is operating in an unsupported state with the potential for premature corruption"

    You really don't see anything new here? Even if all that was best practice, the sudden change to "unsupported" is the main issue. 

    I am sure you followed all of those best practices you mention but I know from your website that you had a lot of PSODs because of this, why if this is nothing new? There are things on the boot disk that can't be redirected (so it couldn't have been a best practice) that previously could be on the SD Cards, but now they can't.

    If they said when they released 7.0, SD Cards are deprecated and won't be supported in the next major release, that would be fine. No, they specifically stated they are still supported and what are the minimum sizes for upgrades and for new installs. Otherwise, I would have a lot less SD Cards in my servers by now...



  • 154.  RE: SD Boot issue Solution in 7.x

    Posted Sep 03, 2021 08:14 AM

    Now the link to https://kb.vmware.com/s/article/85615?lang=en_US shows pagenotfound. Interesting...



  • 155.  RE: SD Boot issue Solution in 7.x

    Posted Sep 03, 2021 08:22 AM

    I'm not surprised... Fortunately, screenshot was made Just in case someone calls us crazy for thinking VMware would ever do something like that..

    Let's just hope they learned something and the new version of the article will be more "customer friendly"



  • 156.  RE: SD Boot issue Solution in 7.x

    Posted Sep 03, 2021 08:53 AM

    Well maybe they call it 7.5 instead of 7.0 U3 - "as in earlier versions, non-persistent storage was considered supported, now, systems with only a SD-Card/USB boot devices are considered unsupported"

    Or the upgrade requires a fresh install like they did with the vcsa and for esxi you then need "high endurance fast low latency high iops flash media"



  • 157.  RE: SD Boot issue Solution in 7.x

    Posted Sep 03, 2021 09:01 AM

    I'm waiting for this one to be pulled!!!

    Installing ESXi on a supported USB flash drive or SD flash card (2004784)

    https://kb.vmware.com/s/article/2004784

    which was updated August 2021 !

     

     



  • 158.  RE: SD Boot issue Solution in 7.x

    Broadcom Employee
    Posted Sep 03, 2021 10:17 AM

    as far as I can tell the KB article was prematurely published, and I think (from what I have understand in terms of what we are planning) that it wasn't completely accurate either.



  • 159.  RE: SD Boot issue Solution in 7.x

    Posted Sep 03, 2021 10:28 AM

    Thanks for the info Duncan, we eagerly wait for the new article..

    Not that the "main" article about the issue is completely clear, saying that 6.7 is affected by the VMFS-L (!) corruption on SD cards...which also causes Skyline alerts on 6.7 hosts..

    https://kb.vmware.com/s/article/83376

     



  • 160.  RE: SD Boot issue Solution in 7.x

    Posted Sep 03, 2021 10:51 AM

     more shambles!!!!



  • 161.  RE: SD Boot issue Solution in 7.x



  • 162.  RE: SD Boot issue Solution in 7.x

    Posted Sep 22, 2021 03:27 PM

    I posted in another thread but for extra visibility, I will add it here as well

    ESXi 7.0.2c or d , did not fix the SD card bug for us.

    We run Dell FC640 Blades, the dual SD card firmware is at 1.15 (all other firmware is up to date). 

    We had been running 7.01d without a single issue. Within 24 hours of upgrading to 7.0.2c one node had a SD card die.

    A day later a second node had a card die

    I applied 7.0.2d and it did nothing . So we've rolled back once again to 7.0.1 and have ordered BOSS cards and M.2

    I know many folks have had success with this, but I wanted to let people know that something is still causing cards to die. If anyone from VMware is reading this, I would be happy to provide logs to help diagnose this.

    TL:DR - Stay away from 7.02 if you value your free time and enjoy stable servers



  • 163.  RE: SD Boot issue Solution in 7.x

    Posted Sep 22, 2021 04:57 PM

    no surprises there!

    lets wait for NVMe, SATADOM, M2 to get corrupted!

    and then the fun will begin......

    people close to/in VMware already know this, hence the recent statements about MOVE AWAY FROM SD/USB flash drives!



  • 164.  RE: SD Boot issue Solution in 7.x

    Posted Oct 02, 2021 10:28 AM


  • 165.  RE: SD Boot issue Solution in 7.x

    Posted Oct 02, 2021 10:35 AM

    also the original link has been updated and 7,.0U3 states SD/USB configuration is now deprecated!

    So maybe it was never fixed!!!!???



  • 166.  RE: SD Boot issue Solution in 7.x

    Broadcom Employee
    Posted Oct 04, 2021 05:48 PM

    Since quite some times, especially with vSphere 7.0 GA, using SD-cards wasn't the preferred way going forward, even when it wasn't clearly stated somewhere. Now vSphere 7.0 Update 3 just now makes the obvious official to deprecate boot from SD-cards.

    While it's still supported for vSphere 7.x as of today, the support to boot from SD-cards might be removed in a next major release. )(Unclear if the ability to boot from such devices might be removed as well.)

    I'm not sure where you've got the data and evidence from, but from what I know and have seen the issue introduced in 7.0 U2a was fixed in 7.0 U2c. Even when there're similar messages like "state in doubt", it doesn't necessarily mean it's the exact same underlying root cause. The behavior can be seen in many, many different scenarios - not just with the USB sd-card bug.

    Regards,
    Patrik



  • 167.  RE: SD Boot issue Solution in 7.x

    Posted Oct 05, 2021 08:20 AM

     wrote:

    also the original link has been updated and 7,.0U3 states SD/USB configuration is now deprecated!

    So maybe it was never fixed!!!!???


    Yes pretty sure it was fixed. In more than 100 ESXi servers, none had any issue after the patch was applied. Before was weekly 2/3x times. So of course if fixed.

    But again, the patch will not fix corrupted SD cards, will not fix using crap SD cards, or will not fix not having the best practices in place.

    Regarding not supporting SD/USB cards anymore, I don't read that.

    "VMware is moving away from the support of SD cards and USB drives as boot media. ESXi Boot configuration with only SD card or USB drive, without any persistent device, is deprecated with vSphere 7 Update 3. In future vSphere releases, it will be an unsupported configuration. Customers are advised to move away from SD cards or USB drives completely. If that is not currently a feasible situation, please ensure a minimum of 8GB SD cards or USB drive is present and an additional minimum of 32 GB locally attached high endurance device available for ESX-OSData Partition"

    I don't read here they are not supported anymore or in the future. Having only SD/USB without any local storage for ESX-OSData Partition will not be supported in the future. That is a different statement saying that SD/USB is completely not supported in the future.



  • 168.  RE: SD Boot issue Solution in 7.x

    Posted Oct 05, 2021 08:33 AM

    that's the issue define "crap SD cards" nobody knows what Vendors supplied!

    you would think an Enterprise SD from HPE would be better than a $1 WALMART SD-card!



  • 169.  RE: SD Boot issue Solution in 7.x

    Posted Oct 05, 2021 08:43 AM

     wrote:

    that's the issue define "crap SD cards" nobody knows what Vendors supplied!

    you would think an Enterprise SD from HPE would be better than a $1 WALMART SD-card!


    We don't use one SD card that was supplied by Server Vendors. Not one!

    A couple of years ago we had many that did break, like 2/3 per week, so we decided to replace all the SD cards with better ones. Until now, I think I replace 1 or 2 nothing more.



  • 170.  RE: SD Boot issue Solution in 7.x

    Posted Oct 05, 2021 08:47 AM

    But one thing is clear, we will not buy any more servers with SD cards, that is for sure. While we are replacing servers, we will replace them with local disks or NVMe.

    Today an M.2 128Gb is cheap, almost the same as a good SD card in the past.



  • 171.  RE: SD Boot issue Solution in 7.x

    Posted Oct 06, 2021 04:45 PM

    Yes, that was clear as soon as these issues started. But what to do with all the current servers... Since somehow I doubt the ones who caused this mess (VMware) will cover the costs of retrofitting servers with new boot devices, is a (not shared) SAN LUN an option for the ESX-OSData partition?

    The new blog post and the KB articles mention only locally attached devices, but the "summary" table in the blog post also has "Managed FCoE/iSCSI LUN" as an option, so which is true?

    https://blogs.vmware.com/vsphere/2021/09/esxi-7-boot-media-consideration-vmware-technical-guidance.html

    Another useful thing to have would be a supported way of replacing a boot disk (without reinstall), since it's not only a question of buying new devices, but (AFAIK) there's no supported way of replacing a boot disk and a reinstall of all servers would probably be even more expensive (in man hours) than the hardware itself..



  • 172.  RE: SD Boot issue Solution in 7.x

    Posted Oct 07, 2021 08:47 AM

     wrote:

    Yes, that was clear as soon as these issues started. But what to do with all the current servers... Since somehow I doubt the ones who caused this mess (VMware) will cover the costs of retrofitting servers with new boot devices, is a (not shared) SAN LUN an option for the ESX-OSData partition?

    The new blog post and the KB articles mention only locally attached devices, but the "summary" table in the blog post also has "Managed FCoE/iSCSI LUN" as an option, so which is true?

    https://blogs.vmware.com/vsphere/2021/09/esxi-7-boot-media-consideration-vmware-technical-guidance.html

    Another useful thing to have would be a supported way of replacing a boot disk (without reinstall), since it's not only a question of buying new devices, but (AFAIK) there's no supported way of replacing a boot disk and a reinstall of all servers would probably be even more expensive (in man hours) than the hardware itself..


    If we are talking about the bug, yes I agreed was a **bleep** show. But if you are talking about (like others) that new versions need to be reinstalled and not SD/USB cards I don't see where is the scandal here.

    We have seen this times and times with other products, with ESXi regarding CPU support etc. Before we had a lot of G5, then G6, then G7, and now none of those are supported and some can't run the new versions and those servers you just don't change the CPU, is not possible you need full new servers and yes no upgrade was possible, new installations only.

    So now why all the fuss because we need to change servers, add local disks, or NVMe(almost the price of SD cards) and do a fresh install? Is the same, but with different hardware.

    If everyone has seen my comments and blog posts about this stupid issue and the way VMware handle I was and I am very, very critical about that something that was unacceptable. But criticizing these changes just because?

    Hardware changes, systems changes, OS changes, Hypervisor changes, that is why this is a multibillion-dollar industry for everyone.



  • 173.  RE: SD Boot issue Solution in 7.x

    Posted Oct 07, 2021 09:18 AM

    Hi,

    I'm sorry, but there's a big difference between a new major release not supporting a server/CPU generation that is literally 10 years old and not sold for 7-8 years and suddenly (in an update) deprecating something that was until "yesterday" fully supported and is still being sold today! Even today, go look at the VMware's own HCL (!), vsan ready nodes for example and you will find nodes with SD cards fully supported even for version 7.0 Update 3! That means people are still buying them if they are not reading every blog and KB.

    And like us with a 1 year old server (so expected to be in productions for at least 4 more years), "tomorrow" they will not be able to upgrade to the next version. And not because they didn't check the HCL and bought 10 year old hardware, but because of a software fiasco.. It's normal to do a fresh install when you retire a 5 year old server, It's not normal to be forced to do a HW upgrade and reinstall a 2 year old server...

    If you think that is just the way it always was and should be, I respectfully disagree...maybe it's expected for free software, it was never normal for VMware..



  • 174.  RE: SD Boot issue Solution in 7.x

    Posted Oct 07, 2021 01:05 PM

     I have to agree, and we are facing the same discussions with Clients at present, it is very difficult for the implementation and consultants, and VMware Partners at present!

    The take home here is how a major change was implemented in an update! Not a major version change e.g. 8.0 ! 



  • 175.  RE: SD Boot issue Solution in 7.x

    Posted Oct 08, 2021 07:58 AM

     wrote:

     I have to agree, and we are facing the same discussions with Clients at present, it is very difficult for the implementation and consultants, and VMware Partners at present!

    The take home here is how a major change was implemented in an update! Not a major version change e.g. 8.0 ! 


    From 6.0 to 6.5 and then to 6.7, there were bit changes, and like I said CPU support changes. Numbers don't mean anything. So no need to be a v8 to have bigger changes.

    Again like I said, these kind of changes in partitions should not be done in an update. That I agree 100%



  • 176.  RE: SD Boot issue Solution in 7.x

    Posted Oct 08, 2021 07:54 AM

     wrote:

    Hi,

    I'm sorry, but there's a big difference between a new major release not supporting a server/CPU generation that is literally 10 years old and not sold for 7-8 years and suddenly (in an update) deprecating something that was until "yesterday" fully supported and is still being sold today! Even today, go look at the VMware's own HCL (!), vsan ready nodes for example and you will find nodes with SD cards fully supported even for version 7.0 Update 3! That means people are still buying them if they are not reading every blog and KB.

    And like us with a 1 year old server (so expected to be in productions for at least 4 more years), "tomorrow" they will not be able to upgrade to the next version. And not because they didn't check the HCL and bought 10 year old hardware, but because of a software fiasco.. It's normal to do a fresh install when you retire a 5 year old server, It's not normal to be forced to do a HW upgrade and reinstall a 2 year old server...

    If you think that is just the way it always was and should be, I respectfully disagree...maybe it's expected for free software, it was never normal for VMware..


    Again, SD/USB is and will continue to be supported. Only with some changes.

    And yes is a major that was release with vSphere 7 (the wrong here, is that they should done the partition changes in the first release to inform this SD/USB at the beginning of this release, not in an update).

    So until vSphere 7 is running we still have the option to use SD/USB devices without any issues(as long best practices is in place). So if you are talking about a vSphere 7.5, or a v8, there are some years until they launch. And even they launched vSphere 7 End Tecncnial is in 2027 and EOL will be at least 7/8 years.

    So again, don't see the big issue here and compared to other changes in the past.

    That is my view here and I think looking at the past, makes totally sense. But not planning to win the argument, is just my view and opinion.