After the debacle last time I tried to grow the size of my existing RAID1 when I put new disks in daedalus, I thought this time I'd do my homework.
I did some research, I found out the way I should have done it. I did a practice run on a USB key. I fully planned how I was going to do it:
mdadm /dev/md2 --fail /dev/sdb3 <delete /dev/sdb3, recreate at new full size> reboot mdadm /dev/md2 --add /dev/sdb3 <wait for sync> mdadm /dev/md2 --fail /dev/sda3 <delete /dev/sda3, recreate at new full size> reboot mdadm /dev/md2 --add /dev/sda3 <wait for sync> mdadm --grow /dev/md2 <wait for sync> pvresize /dev/md2
Everything went as planned, until I went to grow the RAID1 volume. It still thought the underlying device was the same size. There was nothing to grow.
So at this point, I decided to do something similar to what I did last time to get around the failing disk, and should have done last time anyway. I broke the mirror, created a new degraded RAID1 using the full size of the new partition on the half I pulled out of the mirror, and did a pvmove from the old non-full-sized degraded mirror to the new full-sized degraded mirror.
All of that went swimmingly until the pvmove was around 50% complete, when the kernel decided to oops spectacularly. I had to power cycle daedalus to get it back under control, and even in single-user mode, without me doing anything, the kernel started oopsing again. Dammit.
I had to boot into emergency-mode (insert standard gripe about Debian's single-user mode being far too non-singular here), then I could resume the pvmove without any further oopsing. After that completed, I was able to ditch the old non-full-sized degraded RAID1 device and resync the new one onto the old partition. There was still some minor filesystem corruption, more likely because I had everything mounted at the time of the crash. Yes, I still haven't learned not to do this kind of thing in multi-user mode. It seems every time I try to minimise the size and duration of an outage, it bites me in the arse. Even though I should have been able to move open logical volumes between physical volumes, the kernel oops seemed to be in the dm_mirror code. daedalus is running a fairly old kernel. The annoying thing is that getting some additional disk space on board was the dependency for doing a general upgrade of all of the software on it. Argh.
Anyway, it's done. I hope not to have to go through this again. I just have to sit through a potentially nail-biting remote upgrade of Debian now, and I should be good for a couple more years hopefully.





