I recently had a Dell Poweredge R720 for my lab purposes and in attempt to save space, and electricity, I have recently purchased a nucbox as these are widely regarded as great alternatives and solutions for homelabs.
I have an issue though that under a relatively high load, the system will reboot itself. No PSOD. I have no idea what's causing it though so I'm coming here to help diagnose as to what could be the issue. Yes I have looked at the usual vmkernel, vobd, hostd logs etc but the data is all overwhelming.
I am running the latest ESXi build and running on an NVMe. I have also tried running from external USB. I have removed and swapped me RAM modules. I have ran memtests. I have ran CPU tests but no idea as to what is causing this.
CPU: AMD Ryzen 7 8845HS
RAM: 64GB DDR5
Storage: Internal 2TB Crucial T500 NVMe
GMKtec Nucbox
I thought I had found the culprit - I had enabled NVMe tiering and was getting constant crashing when using a consumer grade NVMe. I removed the NVMe and disabled memory tiering and I have been stable for about 2 weeks. No changes have been made but I received another host reboot. So frustrating!
Any help is appreciated.