That may be what I need to do. A recent event was a server in another cluster\workload domain had the following alert:
The NIC in Slot 7 Port 1 network link is down.
Which is a different slot. The interesting part is that the hardware in this cluster is slightly different due to being dedicated to SAP. One of the differences is the NIC. Rather than the Broadcom NIC, these servers have Intel NICs. Being slot 7 also goes to a different TOR switch. So, that seems to eliminate it necessarily being just an issue with Broadcom NICs.
Original Message:
Sent: Jul 17, 2025 12:35 AM
From: Vladimir Popov
Subject: Several ESXi hosts randomly have NCI drop alert
I ran into a similar issue back when 10G NICs were first hitting the scene. It took coordinated effort from IBM (servers), Cisco (switches), and VMware to get it resolved. Ultimately, they had to roll out targeted fixes across the blade servers, switches, and the ESXi build to fully address the problem.
In your case, I'd suggest asking Dell to assemble a cross-vendor team-Dell, Cisco, and Broadcom-to investigate. After all, these servers are a certified vSAN nodes, and they deserves proper attention.
Original Message:
Sent: Jul 15, 2025 09:52 AM
From: pcgeek2009
Subject: Several ESXi hosts randomly have NCI drop alert
The NIC's are: Broadcom BCM57508 2x100G QSFP PCIE
The transceivers are: FTLC9555REPM3-E5
I have spoken to Dell in the past about it. At one point there was an ESX firmware update that was causing another issue. I was hoping that would also fix this problem. They were trying to point more to the Cisco switch as a possible problem. However, the network team says it would not move around to different hosts or sites if it was the switches.
------------------------------
Rodney Barnhardt
vExpertPro
Original Message:
Sent: Jul 15, 2025 01:01 AM
From: Vladimir Popov
Subject: Several ESXi hosts randomly have NCI drop alert
Rodney, what exact NIC do you use?
Original Message:
Sent: Jul 14, 2025 09:25 AM
From: pcgeek2009
Subject: Several ESXi hosts randomly have NCI drop alert
It is multiple hosts and has happened at more than one location. The host are all configured identically. So, no matter which host generates the error, it is always the same slot number and port. The alert comes from the iDRAC itself but can also be seen in vCenter logs. The NIC's are identical, but I have not swapped them yet. It seemed unlikely to be a NIC since it happens across multiple hosts and at more than 1 location. These are in a well cooled data center with hot and cold aisles. The nodes are as built and shipped from Dell.
------------------------------
Rodney Barnhardt
vExpertPro
Original Message:
Sent: Jul 14, 2025 07:25 AM
From: Kent Wicker
Subject: Several ESXi hosts randomly have NCI drop alert
Please clarify this statement
"It is always the same slot and port, no matter which host it is. It is only 1 host."
Is this happening to only 1 host or multiple hosts?
My hunch would be the NIC is faulty or overheating. If it is just 1 host, I would lean towards a bad NIC. If multiple hosts, I would lean towards overheating causing it to reset or fail temporarily. I have experienced NIC's just stopping and when replaced they work fine. I have also experienced 10g copper NIC's that go to 1g and the theory is they are overheating.
You mentioned you swapped cables and SFP's but the issue stays in slot 6. Are the NIC's identical? Can you swap them in slots 6 and 7? If so and the issue moves to slot 7, you know you have a faulty NIC. If not, there may be a air flow issue, not necessarily a problem with the system but perhaps not enough cooling for that nic. Are there any other free slots you can move NIC 6 to?
Did you buy it in its current configuration or add any NIC's to it? The reason I ask is that Dell generally tests that a particular configuration will support the hardware purchased with it. There have been cases when I order Dell servers and the configuration will change by adding fans, not allowing as much memory, etc. and these changes are all due to cooling. I was wondering if perhaps it was ordered one way and hardware was added that requires additional cooling.
Original Message:
Sent: Jul 13, 2025 07:41 AM
From: pcgeek2009
Subject: Several ESXi hosts randomly have NCI drop alert
So, I have had this weird issue for a while and done some research, but no resolution. We migrated from a Cisco USC with SAN storage to Dell VSAN Ready nodes about two years ago. All of this is currently running VCF 5.2.1. We have two sites, same basic configuration, although this happens more at the production site than the DR site. Anyway, what is going on is that randomly there will be a NIC alert:
hostname.domain.com: The NIC in Slot 6 Port 1 network link is down.
It is always the same slot and port, no matter which host it is. It is only 1 host. These are Dell R750 VSAN ready nodes with 100G SFP's to Cisco switches. We have dual path. Slot 7 NIC's never show an alert. I have tried reseating the SFP's, switching them between slots 6 and 7, switching the cables, etc. It may only have alert once every couple of weeks. If you are looking at the Cisco switch when it happens, the light goes off and back on. Then, a host may do it a dozen times over the course of a day or so and suddenly stop. There does not seem to be a firmware update for the NICs. I have seen some post on a Dell forum where others have seen similar behavior, but those are a few years old. So far, no issues have been caused. Just seems a little frustrating. Anyone else seeing anything like this happening?
------------------------------
Rodney Barnhardt
vExpertPro
------------------------------