So hot off the back of my successful
upgrade from Sarge to Etch, I thought I'd try the new "Etch and a Half"
2.6.24 kernel on daedalus
Let's just say that it's times like these that I'm glad I got the
serial-over-LAN thing working, and have remote power.
I spent many hours today rebooting, trying to get to the bottom of why it
wouldn't work.
At first, I thought it was hanging, but then I realised it was dropping to a
shell in the initramfs, and just not making that very obvious because the
serial over LAN console is a bit crappy.
Once I realised what was going on, I did some poking around.
It seemed like udev wasn't getting started properly, so when it
went to assemble the mirrors, that failed, because the component devices
couldn't be found, then it freaked out because it couldn't mount the root
filesystem.
If I ran the relevant bits of /scripts/init-premount/udev by hand,
the SCSI devices appeared, and I could manually assemble the mirror, and
mount the root filesystem, which was handy, because I also discovered that
you can boot with debug on the kernel command line, and it logs the
initramfs run to /tmp/initramfs.debug. So that was a convenient way
of preserving the log, because there seemed to be some characters in it that
made inspecting it over the serial console difficult, and there was no way
to get it off the machine from the initramfs environment.
As far as I can determine, it's telling udev to start, but it certainly
isn't still running when it bombs out to a shell after failing to mount the
root filesystem. It's not immediately clear if there's something later on
that is stopping it again. I've put the log here in
case anyone's interested in looking at it. This was with the addition of "x"
to the options of the shebang line of
/usr/share/initramfs-tools/scripts/init-premount/udev so that I
could see what was happening when /scripts/init-premount/udev ran.
So I don't think the kernel itself is at fault, it's some sort of weird
udev/initramfs interaction.
Rather than further rebooting the tripe out of daedalus, I'll have
to see if I can reproduce the problem on a less important machine locally to
do further debugging. In the meantime, I'll have to stick with 2.6.18.