HomeLab Part 7: SuperMicro X9SRH-7TF & 10Gbit onboard NIC Problem
After running the new homelab for several month now I encounter a really annoying problem with the onboard 10GBit NICs (TX540-A2). I had the problem that an application was not correctly syncing across the direct attached NICs. I checked the status of the links and saw that ESX01 showed NICs up and ESX02 showed NICs down. That was strange because it was direct attached so if one NIC is up then the other should also be up. I restarted ESX02 to see if I can fix the problem but after a restart the NICs were still down.
I tried to use this command to shutdown the NICs but it still didn’t work.
esxcli network nic down -vmnicX
Because I found it curious that ESX01 NICs were still up I took a look on the physical connections. I connected both ESX hosts to my 1GBit switch and saw that ESX01 was not getting a link but ESX02 did. So I have found the problem. I tried the esxcli mentioned above also on the “up” NICs of ESX01 and hey….. PSOD! ESX02 was in maintenance mode so all VMs were running on ESX01 which means all were now down. That was not really funny. After I managed to get ESX01 online again I saw that the 10Gbit NICs were missing. I checked the IPMI of the host and saw that the MAC addresses for both were similar to FF:FF:FF:FF:FF:FE and FF:FF:FF:FF:FF:F4 which was not correct.
I tried to reboot the host but nothing changed. I had a short Twitter conversation with Erik Bussink about the problem and he recommended to shutdown the host and remove the CMOS battery for approx. 5min. I have done this procedure and indeed the 10Gbit NICs were back again.
Thanks Erik for pointing me in the right direction!
Was your onboard x540-a2 overheating or something? What was the problem?
I assume it was overheating. Places an additional chassis fan in it and the problem went away. But it’s still really hot and it’s only passive cooled. Another idea can be to place a fan which is directed to the passive cooler of the onboard card itself.