Diary of a geek

December 2007
Mon Tue Wed Thu Fri Sat Sun
         
           

My ugly mug

Where's Andrew?

Categories

Other people's blogs

Subscribe

RSS feed

Contact me

JavaScript required


Friday, 21 December 2007

Waiting for a software RAID-1 to resync is like watching paint dry

I'm so annoyed with myself.

So today was the day I was going to upgrade the disks in daedalus.

I'd finally managed to convince Dell to sell me two new 140Gb disks to replace to two 70Gb disks that were fast approaching capacity. I'd organised for access to the datacentre from 10am. I arrived early.

I didn't manage to get into the datacentre until about 11am, because they were having problems remotely unlocking the door, and I had to wait until an air conditioning repair guy turned up, who was fixing some air conditioning problem.

I hadn't had a lot of time to plan how I was going to do the upgrade, but I figured it was going to be pretty straightforward. Each disk is partitioned into three partitions, all of which are mirrored using software RAID. The first partition is the root filesystem, which is just an ext3 filesystem, mirrored. The second partition is swap, also mirrored, and the third partition is an LVM physical volume. The first two partitions are only a couple of gigabytes.

So the basic plan was to remove the second disk (disk 1, aka /dev/sdb), put in the new bigger disk, load the partition table from the original disk 1, resync the three mirrors, and then boot off the new disk 1, and rinse and repeat with disk 0.

This was all going swimmingly, when right at the end (in fact after the third partition finished syncing according to the kernel), /dev/sda decided it had a few unreadable sectors. This made the software mirroring have a dummy-spit, and decide it needed to restart the syncing from scratch. I'd seen this before with a bad disk.

So I wishfully hoped it was something to do with the new disk being bigger than the old disk, so tried to roll back to the original disk 1. As was pretty much expected, disk 0 exhibited the same problems at the same point. So the problem was disk 0, not the new disk 1.

So it seemed like I just managed to pick the wrong half of the mirror to work from.

I thought I'd try manually dd'ing the third partition from /dev/sda to /dev/sdb, ignoring the errors, but that didn't result in a usable half of a mirror either.

Interestingly, these bad sectors hadn't made themselves known until this point. The RAID array had been otherwise healthy, and SMART hadn't uttered a peep.

I think at this point, after spending about 3 hours fighting with it, I decided to move up a layer. I could successfully sync /dev/md0 (the root filesystem) and /dev/md1 (the swap device), and the only reason I could think of these bad sectors not giving me grief already, was because they were right at the end of the disk, and I hadn't quite used all the free physical extents in LVM yet. Lucky I hadn't decided to grow things any further.

So I decided that rather than trying to convince this existing /dev/md2 to sync with the new /dev/sdb3, I'd just build a new degraded RAID-1 on /dev/sdb3 and move all the physical extents from the degraded RAID-1 on /dev/sda3 to the new degraded RAID-1 on /dev/sdb3. I tried the logical volume with /tmp on it first, since it was fairly sacrificial. It worked fine. So I tried the largest logical volume, /srv, which was 30Gb. That went fine as well, so I moved the rest. All of them went without complaint. So then I removed the now unused /dev/md2 from the volume group, and finally managed to boot with just the new disk 1. Then I was able to put in the new disk 0, and just do a standard RAID-1 rebuild onto it. Hooray for LVM saving the day yet again.

And this is where I should have stopped.

At this point it was about 3pm. I'd been aiming to leave by 4pm to pick up a rental car that I had to get by 5pm. I'd wanted to also upgrade to Etch while I was physically in front of the box, in case there were any nasty kernel/udev issues that made the box unbootable. The catch was, I needed to grow /var, which was quite full.

The intention had been that once I'd got the two new disks in and all synced up, I'd create a fourth mirrored partition, and add this to LVM as a new physical volume, then I could just keep merrily growing things. It felt a little bit dirty having to make a new partition to get at the extra space, and I'd reading about mdadm's "grow" mode, whilst I was sitting around waiting for things to sync.

So I thought to myself, I'd just delete the third partition, and recreate it using the remaining additional cylinders I now had at my disposal, then use this grow feature of mdadm to tell Linux the array was now bigger, then resize the physical volume, and go from there.

This is where things went a bit pear-shaped.

I deleted and recreated (larger) the third partition on each disk, and rebooted.

I was then messing around with mdadm trying to update /etc/mdadm/mdadm.conf, since the UUID of /dev/md2 had changed, when I discovered that /dev/md2 had completely disappeared.

It seems that recreating that partition made Linux completely fail to see /dev/sda3 and /dev/sdb3 as RAID-1 members, and so because of the fairly transparent nature of Linux's RAID-1, LVM had seen the underlying physical volumes, and just decided to run with /dev/sda3 as the physical volume, and everything had continued to work. I just had no redundancy any more.

So it was about 3:20pm at this stage, and I was starting to panic, thinking I'd never get this sorted out before I had to leave, and I'd end up returning to the US with a non-redundant mess on my hands. Then I remembered that I had the backups of the partition tables of the old disks, from before I removed them. So I thought I'd go back to that state, and just have the additional cylinders unallocated. I hadn't yet attempted to grow the RAID-1 (partly because it had vanished), so in my haste, I didn't see this being a problem.

The problem was, I'd booted into multi-user mode, and things had merrily been operating on just half the mirror. I restored the partitions, and rebooted, and lo and behold, /dev/md2 magically reappeared, already 100% consistent according to /proc/mdstat. I was somewhat surprised by that, but I booted back into multi-user mode again, only to discover that the /var filesystem was panicking and remounting itself into read-only mode. This was pretty weird, given that during boot, it was considered clean by e2fsck.

So I was really getting bothered by this stage, so I rebooted into emergency mode (single-user mode in Debian starts way too much), and manually started up LVM, and forcibly fscked everything. There was a bit of filesystem damage to /var, /home and /srv.

Thinking I'd now fixed the problem, I rebooted into multi-user mode again, but again /var, which was considered clean during the fscking/mounting phase of booting, panicked shortly after mounting, and remounted itself read-only.

So I rebooted again, broke the mirror, and forcibly fscked the filesystems again (only operating on what was on /dev/sda3. Then I rebooted without incident into multi-user mode and resynced the mirror.

At this point I ran away.

So I'm really annoyed with myself. This should have been a blog post about how crap Linux's software RAID is at error recovery, and how LVM saved the day yet again, but it was overshadowed by me overreaching in my haste, and managing to cause silent data corruption.

The morals of this story:

  • If things go off the rails, and you get to a reasonable milestone, and you're approaching the end of your maintenance window, quit while you're ahead.
  • Don't do stuff in multi-user mode, just because you can and you want to minimise the length and magnitude of an outage
  • Don't rush to try and do something (growing a RAID-1) without researching it properly
  • Linux's software RAID-1 sucks when the underlying devices are having issues (I already knew this)
  • Linux's software RAID-1 can look perfectly healthy when the underlying disk has issues, as long as the bad bits aren't getting any exercise (I think newer versions of mdadmand/or Linux address this by doing a full array check on a regular basis)
  • Linux's software RAID-1 can get out of sync with reality and report that it's in-sync with itself when it isn't really, which causes silent data corruption
  • e2fsck can report a filesystem is clean when it's really got issues

I think that is all. I'm just so pissed off that what could have been a perfect (enough) upgrade with zero data-loss, even considering I had a bum disk, wasn't.

[04:08] [tech] [permalink]

Friday, 07 December 2007

Australia bound

We're heading back for the annual trip home. This time it's not really a holiday, it's just three weeks of running around like a headless chicken, catching up with friends and family.

We're killing time at SFO at the moment waiting for our flight to board. There's much to be said for arriving at the airport ludicrously early. We had the shortest check in line I've ever seen for an international flight. Security was also ridiculously quick. It all adds up to a stress-free travelling experience.

I'm working from the Sydney office for a week, and then taking two week's vacation. Sarah's sorting out her E-3D renewal while we're in Sydney, as she was unable to travel when I renewed mine in Dublin earlier in the year. After that I'm taking all of January off as unpaid time off, so I can look after Sarah after her heart surgery in January and just generally decompress.

We got a good deal with Qantas. The package gives us three domestic flights within Australia, which works out perfectly to allow us to visit Canberra and Brisbane.

I'm looking forward to seeing my family, as I haven't seen them since we lost Joshua, and I'm sure it's been as hard from them as it has been for me being so far apart from each other. It'll be good to have a cry together.

[20:48] [life] [permalink]

Tuesday, 04 December 2007

Queensland dragged kicking and screaming into the 60's

I'm pleased my home state of Queensland (which bills itself as the Smart State) has finally decided to start fluoridating its water, a move the rest of the country made back in the 1960's and 1970's.

Hardly surprising I guess, when the No campaign for a referendum on daylight saving involved a fear of the curtains fading from that extra hour of sunlight...

I remember when I was little, my Mum used to give me fluoride tablets. Dentists to this day, who have known I've grown up in Brisbane, have remarked how they can tell by my teeth that I've had fluoride as a kid. Apparently ones teeth are noticeably different (for the better) if fluoride has been involved.

[22:51] [life] [permalink]

Saturday, 01 December 2007

MacOS 10.4 setup rant

We popped out and bought a MacBook Pro for Sarah today, as her PowerBook G4 is starting to really fall apart.

Despite Leopard being well and truly out now, it came with Tiger preinstalled, but included a Leopard upgrade DVD. No big deal.

So we fired up the laptop, and went through the final install steps for Tiger. I was trying to preempt any networking problems by adding the MAC addresses to my DHCP server and wireless access point before it went to try and connect to the network. I have a whitelist of allowed MAC addresses in my access point, and DHCP server is only configured for static reservations on that interface.

The onboard Ethernet MAC address was easily determined. It's printed inside the battery cavity. Finding out the wireless Ethernet MAC address is another story altogether. Basically I couldn't, so I gave up, hoping maybe the access point would log the unauthorised MAC address, and I could then add it where it was needed.

Well, the access point didn't appear to log anything useful. The Mac did however pop up a dialog box saying something the effect of the laptop needing to be added to the wireless access point's access control list. The one piece of useful information to display at this point would be, you guessed it, the very MAC address I need to be adding to the access control list!

I ended up temporarily disabling MAC address restrictions in the access point and then the DHCP server happily logged the MAC address that didn't have a lease, and we were able to proceed.

Maybe preinstalled Leopard does it better. We'll never know.

Update

Aww crap. Seems I didn't study the box closely enough. There's a sticker with all that pertinent information on it.

[15:44] [tech] [permalink]