I’ve got a number of servers that run coreboot + Xen. I like to run coreboot with a linux-as-a-bootloader (LAB) payload. That means that coreboot, after bringing up the machine, boots into a small linux kernel + busybox environment, entirely contained in rom. That environment can serve as an emergency fallback to resolve booting problems remotely – remember, coreboot has excellent serial support from power-on. Anyway; on a normal boot the machine kexecs into a Xen kernel.
And that works great. Recently, on deploying a new server, I discovered that the machine would just hang trying to kexec into Xen 3.2.1.
After a lot of digging and some fun setting up a test environment with qemu, I discovered that this broke somewhere between Xen 3.1.0 and 3.1.3. A message to the Xen-devel mailing list today yielded a quick response from a Xen developer suggesting I try the ‘no-real-mode’ parameter as an argument to the hypervisor.
The no-real-mode option stops Xen from doing some bios calls, and it also tells it to ignore the e820 bios table.
Here’s an lab.conf that works:
CMDLINE="no-real-mode com1=115200,8n1 cdb=com1" INITRD="" KERNEL="/xen-3.1.4.gz" MODULE1="/vmlinuz-2.6.18.8-xen root=/dev/md1 ro xencons=ttyS0 console=tty0 console=ttyS0,115200" MODULE2="/initrd.img-2.6.18.8-xen"
Note that you need to pass no-real-mode as the first xen command line argument, otherwise it won’t work.
And somehow that solves the problem for Xen 3.1.3 and 3.1.4, both under qemu + coreboot and qemu + bochs. Kexec’ing into Xen 3.1.2 triple-faults qemu under qemu + coreboot, but works fine under qemu + bochs. That smells like a coreboot bug. I’ll try on real hardware tomorrow to see if this bug is specific to our qemu port, or a more general coreboot problem. To be continued..