Off you go... into the purple yonder! » Xen

kexec’ing into a Xen kernel

ward — Fri, 08 Aug 2008 01:13:29 +0000

I’ve got a number of servers that run coreboot + Xen. I like to run coreboot with a linux-as-a-bootloader (LAB) payload. That means that coreboot, after bringing up the machine, boots into a small linux kernel + busybox environment, entirely contained in rom. That environment can serve as an emergency fallback to resolve booting problems remotely – remember, coreboot has excellent serial support from power-on. Anyway; on a normal boot the machine kexecs into a Xen kernel.

And that works great. Recently, on deploying a new server, I discovered that the machine would just hang trying to kexec into Xen 3.2.1.

After a lot of digging and some fun setting up a test environment with qemu, I discovered that this broke somewhere between Xen 3.1.0 and 3.1.3. A message to the Xen-devel mailing list today yielded a quick response from a Xen developer suggesting I try the ‘no-real-mode’ parameter as an argument to the hypervisor.

The no-real-mode option stops Xen from doing some bios calls, and it also tells it to ignore the e820 bios table.

Here’s an lab.conf that works:

CMDLINE="no-real-mode com1=115200,8n1 cdb=com1"
INITRD=""
KERNEL="/xen-3.1.4.gz"
MODULE1="/vmlinuz-2.6.18.8-xen root=/dev/md1 ro xencons=ttyS0 console=tty0 console=ttyS0,115200"
MODULE2="/initrd.img-2.6.18.8-xen"

Note that you need to pass no-real-mode as the first xen command line argument, otherwise it won’t work.

And somehow that solves the problem for Xen 3.1.3 and 3.1.4, both under qemu + coreboot and qemu + bochs. Kexec’ing into Xen 3.1.2 triple-faults qemu under qemu + coreboot, but works fine under qemu + bochs. That smells like a coreboot bug. I’ll try on real hardware tomorrow to see if this bug is specific to our qemu port, or a more general coreboot problem. To be continued..

of how XFS saved the day

ward — Fri, 14 Mar 2008 23:19:59 +0000

Io, one of my mailservers, is a Xen instance. It’s moderately busy – a normal day sees about 12000 accepted messages and about 24000 rejected delivery attempts (spam). It’s an Exim/clamav/dspam setup, and a significant proportion of the 12000 accepted messages ends up quarantined as spam.

I recently moved this instance to a new Xen dom0 with significantly faster CPUs. On the old host, the domU ran off a disk image on a pair of fast 10K rpm Western Digital Raptors in software raid-1, connected to a crappy (in terms of performance) SII 3114 sata chipset. On the new host, the domU runs on a pair of Seagate Barracuda sata2 drives (also in raid-1), connected to an Nvidia MCP55 sata chipset. So the drives are slower, but the chipset is considerably better.

The mailserver began having intermittent performance issues soon after the move. The load on the system went high enough for Exim to queue new mail rather than deliver it immediately. In other words, mail got stuck on the mailserver until the load dropped sufficiently for Exim to start delivering it.

The problem turned out to be heavy disk IO – the disks were not keeping up with the rest of the system, leading to bad iowait and a sluggish system:

$ sar 1 0
01:04:35 PM     CPU      %user     %nice   %system    %iowait    %steal     %idle
01:04:35 PM     all      2.08      0.00      1.04     46.88      0.00     50.00
01:04:36 PM     all      0.00      0.00      0.00    100.00      0.00      0.00
01:04:36 PM     all      0.00      0.00      0.00     50.00      0.00     50.00
01:04:37 PM     all      0.00      0.00      0.00     15.69      0.00     84.31
01:04:38 PM     all      0.00      0.00      0.00     49.75      0.00     50.25
01:04:38 PM     all      0.00      0.00      0.00     51.53      0.00     48.47
01:04:39 PM     all      0.79      0.00      1.59      8.73      0.00     88.89
01:04:40 PM     all      1.00      0.00      1.00     26.87      0.00     71.14
01:04:41 PM     all      0.00      0.00      0.00    100.00      0.00      0.00
01:04:42 PM     all      0.00      0.00      0.43     41.13      0.00     58.44
01:04:43 PM     all      3.49      0.00      0.00     52.33      0.00     44.19
01:04:44 PM     all      0.00      0.00      0.00     49.75      0.00     50.25
01:04:45 PM     all      0.00      0.00      0.00     49.75      0.00     50.25
01:04:46 PM     all      0.00      0.00      0.00     49.75      0.00     50.25
01:04:47 PM     all      0.00      0.00      0.00     49.75      0.00     50.25
01:04:48 PM     all      3.47      0.00      0.99     44.06      0.00     51.49

The host has 2 CPUs; every time the iowait approximated 50% in the above table, one of the CPUs was 100% busy waiting for disk IO to complete. At 100%, both were fully tied up waiting for the disks.

The iowait was directly correlated with mail delivery – as mail came in, the iowait jumped up as it was processed. Strangely it seemed to stay high for a few seconds even after the mail was delivered.

Curiously enough, this iowait did not translate in a sluggish dom0, or even in significant iowait there.

I immediately suspected Xen to be the problem, but that turned out to be false. After having stopped Exim (which reduced iowait to zero), I ran hdparm on both the dom0 and the domU:

(dom0)
# hdparm -tT /dev/md2

/dev/md2:
 Timing cached reads:   3808 MB in  2.00 seconds = 1904.67 MB/sec
 Timing buffered disk reads:  194 MB in  3.02 seconds =  64.25 MB/sec

(domU)
# hdparm -tT /dev/sda1

/dev/sda1:
 Timing cached reads:   1850 MB in  2.00 seconds = 925.51 MB/sec
 Timing buffered disk reads:  186 MB in  3.02 seconds =  61.50 MB/sec

The cached reads are half of what they are in the dom0, but the buffered disk read performance was pretty much on par. So that was all pretty good.

So since the actualy disk performance looked fine – at least for reading – that meant the problem was likely to be higher up. The filesystem, or some process doing something stupid.

Time to figure out which process, exactly, was causing the slowness. The kernel allows us to see just that:

echo 1 > /proc/sys/vm/block_dump

and then

dmesg | awk ‘/(READ|WRITE|dirtied)/ {activity[$1]++} END {for (x in activity) print x, activity[x]}’| sort -nr | head -n 10

Note: you may want to disable syslog while you enable the block_dump logging so it does distort the results.

That generated the following output, after a few seconds (I repeated the dmesg command quite a few times to make sure I was getting an accurate picture over time – this is a sample result):

wc(3500): 1
sshd(3483): 5
sshd(3481): 2
pdflush(25494): 2
mysqld(26063): 9
mesg(3495): 1
kjournald(1624): 582
find(3499): 28
exim4(3520): 2
exim4(3508): 2

I had expected the problem to be caused by mysql, which is heavily taxed by dspam. But no, it was quite clearly caused by kjournald. Kjournald is a kernel thread that deals with ext3 journalling. I googled around a bit, looking for ext3 optimizations that I might have overlooked but did not find anything concrete. The filesystem was already mounted with noatime. This being a Xen system upgrading the kernel is a bit of a problem, so I could not try that.

So I ran some tests on another dom0: I created 2 new domUs, one with ext3 as the root filesystem, and one with xfs. I then ran bonnie++ on both and compared the results. The XFS system seemed to have a much lower CPU load as reported by bonnie++, especially for its Sequential Create and Random Create tests.

So I converted Io to XFS, and the problem went away altogether. It’s as responsive as it used to be, and a good deal faster when a lot of mail is flowing through.

And that’s how Io, the mailserver (named after one of Jupiter’s moons), got significantly improved disk IO.

xen 3.2 serial

ward — Mon, 18 Feb 2008 22:36:45 +0000

Getting access to the serial port in a Xen 3.2 dom0 is somewhat complicated. This is the magic incantation for your grub menu.lst file to get console at 115200 bps on the first physical serial port, as well as on the screen.

serial --unit=0 --speed=115200
terminal --timeout=5 serial console

title Xen 3
root    (hd0,0)
kernel    /boot/xen-3.gz com1=115200,8n1 console=com1,vga
module    /boot/vmlinuz-2.6.18.8-xen root=/dev/md0 ro xencons=ttyS0 console=tty0 console=ttyS0,115200n8
module    /boot/initrd.img-2.6.18.8-xen
boot

The ‘com1=115200,8n1 console=com1,vga’ arguments on the kernel line make sure that xen writes its output to the serial port (with the right speed, stop bit etc) as well as to the vga device. The ‘console=tty0 console=ttyS0,115200n8′ on the first module line tells the kernel to do the same. Xen controls the serial port hardware at this point (that’s the default in 3.2: XEN_DISABLE_CONSOLE is set in the dom0 kernel config!), and in order for the dom0 kernel and Xen to share that serial port, we have to tell the kernel to use the xencons virtual console driver – hence the ‘xencons=ttyS0′ parameter.

Debugging Xen with Serial Console has it almost right, but the xencons parameter is missing. Without xencons you’ll see serial output until the dom0 kernel taks over; that kernel won’t see any serial ports, which makes (m)getty very unhappy…

Xen 3.2.0

ward — Sat, 26 Jan 2008 20:55:16 +0000

So Xen 3.2.0 was released last week.

Oddly enough there are no precompiled 64-bit binaries anymore. That does not make much sense to me – running Xen on 32 bit is just… painful. Problems with accessing ram beyond 4GB, issues with libc6, etc. On 64 bit all that stuff just works.

Sadly, the 3.2 release is broken. It simply does not build – make world dies with this mercurial error:

select-repository: Searching `.:..' for linux-2.6.18-xen.hg
select-repository: Ignoring `.'
not found!
select-repository: Unable to determine Xen repository parent.

Google returns some hits suggesting that this is caused by using a version of mercurial that is too old, but that is nonsense – I’m seeing this even with 0.9.5. It’s a bug in the Xen 3.2.X release.

The fix is easy; just check out the linux-2.6.18-xen mercurial repository yourself in the parent directory to where you have downloaded Xen:

  apt-get install gawk libssl-dev libncurses5-dev pciutils-dev
  cd /usr/src/
  hg clone http://xenbits.xensource.com/linux-2.6.18-xen.hg
  cd xen-3.2.X
  make world
  make install

Of course you can use make -jX to compile things faster. Time to give 3.2 a spin now

a new record

ward — Thu, 19 Apr 2007 13:57:52 +0000

This morning, a user on one of our machines (inadvertently) created a mail loop with a bad procmail script:

09:42:05 up 120 days, 9:23, 20 users, load average: 3367.40, 3265.08, 2751.75

I had seen machines go up to about 200 before, but never this high. If you ever wonder about the stability of the 2.6 kernel – and this is a Xen setup! – here’s your answer. Even with the load this high, the machine was responsive enough on a couple of ssh sessions to solve the problem remotely.