Windows VM with EFI fails to start
I just wanted to share a situation I encountered recently with one of my customers. My customer built a new UAT environment with some hosts and a Pure Storage. They have then deployed a Windows 10 VM which was configured with EFI. Installation went smooth but after a reboot the Windows VM with EFI enabled came up with a BSOD and the following message:
The operating system couldn’t be loaded because the kernel is missung or contains errors.
File: \Windows\system32\ntoskrnl.exe
Error code: 0xc0000185
That was really strange so the customer tried to do the same installation in their old environment and there was no problem. They also migrated the VM to the new environment and it booted up normally. But after a reboot the migrated VM shows also the same BSOD. So what was different in these two environments?
UAT OLD
- HPE servers
- Netapp storage
- vSphere 6.0 U3
UAT NEW
- DELL server
- Pure Storage
- vSphere 6.5 U1g (includes the newest Spectre + microcode update)
vCenter Server was running 6.5 U1e which includes the first Spectre update.
So after the customer called me and explained the situation my first assumption was that the Spectre update and the new CPU features (IBRS, IBPB, STIBP) were the problem. The reason for that was because the vCenter was running the first patch for Spectre and the 6.5 hosts the second patch for Spectre. Unfortunately I couldn’t find anything external nor internal.
But what I found was a Case with the same behaviour we encounter which led to the KB article 2137402. There was a situation with Windows VMs using EFI and a EMC XtremIO storage explained. Maybe this could also be the same problem with Pure Storage. But just to be sure I digged deeper and found a more general KB article 2146167. It looks like that the root cause for the problem is a compatibility issue between VMware EFI firmware and the storage array when the SCSI commands the EFI firmware is sending exceeds the maximum I/O size of the storage device.
After having this information it was easy to find a Pure Storage VMware Best Practice Guide. In that guide it is recommended to set the Disk.MaxIOSize from 32MB (default) to 4MB when running EFI enabled VMs.
After my customer has changed the setting he could immediately power-on the Windows 10 VM.
There are still some question I can’t answer yet.
- Is this only happening with EFI enabled Windows VMs?
- Is this only happening with Pure Storage and XtremIO or do other storage vendors facing the same issue?
If somebody had problems with EFI enabled Linux VMs and other storage vendors, please leave a comment.
I know this post is rather ancient but… lifesaver! Had an old 5.5 esxi and one esxi 5.5 with a liiiittle newer build version. VM worked fine on the older host but failed to boot on the “new” old host. Storage is an Eternus 100 DX3 by Fujitsu. Resolution was to set Disk.MaxIOSize to 4096.