Setup:
Gen10 ProLiant DL series
ESXi 6.7U3
SPP version 2022.03
SUM 8.9.5
iSUT/SUT 2.9.1
iLO5 2.65
Problem:
(I've reached out to HPE for this but have not gotten a solid answer yet so maybe VMware can assist with this)
Short summary of the issue: ESXi 6.7 can't connect to iLO on itself (and vice versa), which is required to complete ESXi-specific firmware updates on an HPE ProLiant Gen10 server using the installation software
Long version:
We have several HPE Gen10 servers running ESXi 6.7U3. We're trying to run firmware updates for those servers and HPE has drastically changed their firmware update process (using SPP packages) so that it now uses iSUT 2.9.1 (Integrated Smart Update Tools) and SUM (Software Update Manager) 8.9.5. After fighting with it and learning the hard way that this new system is very different from before, I still have yet to complete all firmware updates available for our ESXi 6.7 hosts.
First, in the past (with 6.5 and 6.7 Gen8 systems), we attached virtual media (ISO file via URL) as a CD/DVD to the iLO console of the server (while in maintenance mode) and booted from the SPP (Service Pack for ProLiant) ISO into the SUM software it comes with and loads into memory (a lightweight SUSE Linux install with a GUI that allows "Automatic" updates or "Interactive" updates). SUM runs an inventory of what firmware the server is currently using against those included in the package, then it makes a list of updates to choose from and then it deploys once you've selected what you want. After a few reboots of iLO and the server, the server firmware and iLO system is updated.
Now, HPE offers SUM software that does this remotely using it's own UI or via OneView. Here are our problems thus far:
1. SUM loaded when using the ISO virtual media (this means the ISO file we host on a web server is available and the iLO can connect to it and load it with no network or firewall issues). However, what's new this time is that the software asks for iLO credentials before it starts running an inventory - that works.... but then it fails when it starts and tries to do the second part of the inventory which is on "Localhost" - itself, and says "Cannot connect to localhost". It (iLO) did connect to a remotely hosted ISO file, but cannot connect to itself? This is SUM running from memory on the same server - there aren't any firewalls or networking between iLO, the server's memory and the server firmware...right?
2. Frustratingly, HPE sent us instructions for installing and running SUM in Windows as a workaround (others have reported this issue, as well), which we we did not have since we're a Red Hat based program. Luckily, I have a Windows VM on my laptop I was able to test-run it from, but we have several servers to update and a team to do it with, but this limits this task to me exclusively - which is not viable solution, but good enough to test, at least.
3. After installing it and figuring out how to operate it, I was able to update firmware and iLOs on a few hosts. However, even though SUM showed each meeting the new baseline and that the installs were "done", there were still several items that had not completed and were still displayed in the Installation Queue in iLO, and after running inventory again on each. It wasn't clear and took a while to figure out that these were VMware specific firmware items (Broadcom, QLogic, etc.) stuck as "Pending" in the installation queue, but I couldn't figure out why it wasn't installing them since, included with SUM is iSUT (Integrated Smart Updates Tools), which is supposed to run from SUM while deploying the firmware. It showed a warning in SUM that "iSUT was not running on the OS".
More research led me to download and install SUT (you can download it as a installable VIB zip package file separately from the SPP ISO package) to each ESXi host via ESXCLI, then set the operating mode for it and reboot to start the service. It's a gigantic pain if this has to be done on each ESXi server.
4. After doing all that and getting SUT working on the ESXi host, it managed to finally start and install 3 of the ~12 items in the queue that were "Pending", but not all. When running commands in the ESXCLI to configure SUT (setting username/password for iLO, setting the operating mode to AutoDeploy or OnDemand, and checking the status of SUT to see those configs), each time, it shows the following error:
"Communication to iLO failed. If iLO is configured in any of the higher security modes, then use sut -set ilousername=<username> ilopassword=<password> to set the iLO credentials. If iLO is in CAC mode, then use sut -addcertificate <path_to_certificate_file> to set the certificate details"
The iLO credentials are good, but it still can't communicate with iLO from the ESXi CLI, which I suspect is the same problem when running SUM from the ISO loaded on the localhost to itself (it runs a version of SUT called iSUT, which is "integrated" into the ISO SPP/SUM package loaded into memory to perform updates), and which I suspect is the problem for why it won't finish the updates to the VMware-specific firmware stuck "Pending" in the installation queue in iLO.
What is between ESXi and iLO that prevents communication between them? Please assist, anyone who has done this before or can help.