Diary of a geek

December 2008
Mon Tue Wed Thu Fri Sat Sun
20
       

My ugly mug

Where's Andrew?

Categories

Other people's blogs

Subscribe

RSS feed

Contact me

JavaScript required


Saturday, 20 December 2008

The Bad Block HOWTO hurts my brain

Seriously.

It reads more like a worked example, without enough of the underlying theory. It doesn't help that I have a slightly more complicated situation, in that I have a filesystem on a logical volume that spans 6 physical volumes.

So maybe writing all of this down will help...

I know the disk with the problem: /dev/sdc

I know the LBA of the sector where the rot starts: 754238963

I know that the physical volume is on /dev/sdc1 and that the partition starts at sector 63 (via sfdisk -luS /dev/sdc)

Therefore, within /dev/sdc1, we're talking about block number 754238899 (754238963 - 64)

I know the physical extent size is 4096 Kb (via pvdisplay -c /dev/sdc1 | awk -F: '{ print $8 }'). In LBA block size this becomes 8192 (2 x 4096)

I know the physical extents start 192K (into the partition, presumably) (via pvs -o+pe_start /dev/sdc1)

This is where the Bad block HOWTO starts to quicken the pace a bit on me...

I'm supposed to take the physical partition's bad block number (754238899) and divide it by the size of the physical extent (in LBA block size) (8192). So that'd be 92070.

This is where I think I end up completely departing the HOWTO, because my filesystem spans multiple physical volumes. Here's my general musings...

lvdisplay --maps /dev/data/srv tells me this about /dev/sdc1:

  Logical extent 272326 to 391559:
    Type                linear
    Physical volume     /dev/sdc1
    Physical extents    0 to 119233

So I'm inferring from this, given that I know I want physical extent 92070 of this device, that this corresponds to logical extent 92103 (again, I'm expressing this in LBA block size, so that's (272326 / 8192) + 92070

So now I supposedly know the logical extent (in LBA size) of the bad block. Now what? I think I'm wandering up too many layers, closer to the physical filesystem, but I'll continue wandering around...

So presumably at this point, I just want to convert the logical extent to an actual filesystem block number. Maybe the logical extent number divided by the extent filesystem block size divided by the LBA block size? 92070 / (4096 / 512) = 11508. That feels awfully low though. The dd test suggests this is incorrect.

At this point I'm feeling fairly lost. If someone out there is reading this, and they've done this before, I'd love to hear from you.

[23:08] [tech] [permalink]

Reflections whilst waiting for fsck, part 2

So it turns out if you wait long enough, you do get to piece the information together:

.
.
.
Too many illegal blocks in inode 209158151.
Clear inode? yes

Restarting e2fsck from the beginning...
/srv contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Entry 'denemo_0.7.7-3.1_hppa.deb' in /debian/pool/main/d/denemo (211845291) has deleted/unused inode 209158151.  Clear? yes
.
.
.

[16:59] [tech] [permalink]

Reflections whilst waiting for fsck

I'm watching paint dry while e2fsck does its thing on a ~2TB filesystem (the one with all the good stuff on it on mirror.linux.org.au).

I'd seen a spate of kernel errors during the week about "attempt to access beyond end of device" so I figured it was due for one.

Let's take this output for example:

apollock@disco:~$ sudo e2fsck -y /dev/data/srv
e2fsck 1.40-WIP (14-Nov-2006)
/srv contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Inode 209158151 has illegal block(s).  Clear? yes

Illegal block #12 (1620503259) in inode 209158151.  CLEARED.
Illegal block #13 (2992116621) in inode 209158151.  CLEARED.
Illegal block #14 (1657577172) in inode 209158151.  CLEARED.
Illegal block #15 (1168774619) in inode 209158151.  CLEARED.
Illegal block #16 (993415032) in inode 209158151.  CLEARED.
Illegal block #18 (1611893880) in inode 209158151.  CLEARED.
Illegal block #20 (2939071693) in inode 209158151.  CLEARED.
Illegal block #21 (1714919190) in inode 209158151.  CLEARED.
Illegal block #22 (1450852455) in inode 209158151.  CLEARED.
Illegal block #23 (3482149179) in inode 209158151.  CLEARED.
Illegal block #24 (4143923374) in inode 209158151.  CLEARED.
Too many illegal blocks in inode 209158151.
Clear inode? yes

Restarting e2fsck from the beginning...
/srv contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
.
.
.

How much more work would it be to tell the administrator the name of the file associated with inode 209158151? That'd be a lot more useful to most mere mortals than the inode number. I suppose if the filesystem is in a really bad state, ascertaining that information may be difficult...

Time to play with debugfs while I continue watching paint dry...

[16:12] [tech] [permalink]

On efficient shell scripting

Mike Hommey took some inefficient shell scripting to task.

I feel obligated to point out that more $file | wc | awk '{print $1}' is not the same as wc -l $file.

To get the same output, you're going to need one of:

cat $file | wc -l

or

wc -l $file | awk '{print $1}'

I suspect cat is going to have a faster startup time than awk... (On my system, gawk is an order of magnitude bigger than cat)

[13:51] [tech] [permalink]