of how XFS saved the day

Io, one of my mailservers, is a Xen instance. It’s moderately busy – a normal day sees about 12000 accepted messages and about 24000 rejected delivery attempts (spam). It’s an Exim/clamav/dspam setup, and a significant proportion of the 12000 accepted messages ends up quarantined as spam.

I recently moved this instance to a new Xen dom0 with significantly faster CPUs. On the old host, the domU ran off a disk image on a pair of fast 10K rpm Western Digital Raptors in software raid-1, connected to a crappy (in terms of performance) SII 3114 sata chipset. On the new host, the domU runs on a pair of Seagate Barracuda sata2 drives (also in raid-1), connected to an Nvidia MCP55 sata chipset. So the drives are slower, but the chipset is considerably better.

The mailserver began having intermittent performance issues soon after the move. The load on the system went high enough for Exim to queue new mail rather than deliver it immediately. In other words, mail got stuck on the mailserver until the load dropped sufficiently for Exim to start delivering it.

The problem turned out to be heavy disk IO – the disks were not keeping up with the rest of the system, leading to bad iowait and a sluggish system:

$ sar 1 0
01:04:35 PM     CPU      %user     %nice   %system    %iowait    %steal     %idle
01:04:35 PM     all      2.08      0.00      1.04     46.88      0.00     50.00
01:04:36 PM     all      0.00      0.00      0.00    100.00      0.00      0.00
01:04:36 PM     all      0.00      0.00      0.00     50.00      0.00     50.00
01:04:37 PM     all      0.00      0.00      0.00     15.69      0.00     84.31
01:04:38 PM     all      0.00      0.00      0.00     49.75      0.00     50.25
01:04:38 PM     all      0.00      0.00      0.00     51.53      0.00     48.47
01:04:39 PM     all      0.79      0.00      1.59      8.73      0.00     88.89
01:04:40 PM     all      1.00      0.00      1.00     26.87      0.00     71.14
01:04:41 PM     all      0.00      0.00      0.00    100.00      0.00      0.00
01:04:42 PM     all      0.00      0.00      0.43     41.13      0.00     58.44
01:04:43 PM     all      3.49      0.00      0.00     52.33      0.00     44.19
01:04:44 PM     all      0.00      0.00      0.00     49.75      0.00     50.25
01:04:45 PM     all      0.00      0.00      0.00     49.75      0.00     50.25
01:04:46 PM     all      0.00      0.00      0.00     49.75      0.00     50.25
01:04:47 PM     all      0.00      0.00      0.00     49.75      0.00     50.25
01:04:48 PM     all      3.47      0.00      0.99     44.06      0.00     51.49

The host has 2 CPUs; every time the iowait approximated 50% in the above table, one of the CPUs was 100% busy waiting for disk IO to complete. At 100%, both were fully tied up waiting for the disks.

The iowait was directly correlated with mail delivery – as mail came in, the iowait jumped up as it was processed. Strangely it seemed to stay high for a few seconds even after the mail was delivered.

Curiously enough, this iowait did not translate in a sluggish dom0, or even in significant iowait there.

I immediately suspected Xen to be the problem, but that turned out to be false. After having stopped Exim (which reduced iowait to zero), I ran hdparm on both the dom0 and the domU:

# hdparm -tT /dev/md2

 Timing cached reads:   3808 MB in  2.00 seconds = 1904.67 MB/sec
 Timing buffered disk reads:  194 MB in  3.02 seconds =  64.25 MB/sec
# hdparm -tT /dev/sda1

 Timing cached reads:   1850 MB in  2.00 seconds = 925.51 MB/sec
 Timing buffered disk reads:  186 MB in  3.02 seconds =  61.50 MB/sec

The cached reads are half of what they are in the dom0, but the buffered disk read performance was pretty much on par. So that was all pretty good.

So since the actualy disk performance looked fine – at least for reading – that meant the problem was likely to be higher up. The filesystem, or some process doing something stupid.

Time to figure out which process, exactly, was causing the slowness. The kernel allows us to see just that:

echo 1 > /proc/sys/vm/block_dump

and then

dmesg | awk ‘/(READ|WRITE|dirtied)/ {activity[$1]++} END {for (x in activity) print x, activity[x]}’| sort -nr | head -n 10

Note: you may want to disable syslog while you enable the block_dump logging so it does distort the results.

That generated the following output, after a few seconds (I repeated the dmesg command quite a few times to make sure I was getting an accurate picture over time – this is a sample result):

wc(3500): 1
sshd(3483): 5
sshd(3481): 2
pdflush(25494): 2
mysqld(26063): 9
mesg(3495): 1
kjournald(1624): 582
find(3499): 28
exim4(3520): 2
exim4(3508): 2

I had expected the problem to be caused by mysql, which is heavily taxed by dspam. But no, it was quite clearly caused by kjournald. Kjournald is a kernel thread that deals with ext3 journalling. I googled around a bit, looking for ext3 optimizations that I might have overlooked but did not find anything concrete. The filesystem was already mounted with noatime. This being a Xen system upgrading the kernel is a bit of a problem, so I could not try that.

So I ran some tests on another dom0: I created 2 new domUs, one with ext3 as the root filesystem, and one with xfs. I then ran bonnie++ on both and compared the results. The XFS system seemed to have a much lower CPU load as reported by bonnie++, especially for its Sequential Create and Random Create tests.

So I converted Io to XFS, and the problem went away altogether. It’s as responsive as it used to be, and a good deal faster when a lot of mail is flowing through.

And that’s how Io, the mailserver (named after one of Jupiter’s moons), got significantly improved disk IO.

This entry was posted in Sysadmin, Xen. Bookmark the permalink.

Leave a Reply