Linux: Handling Software RAID

Ideally, you should never need to manage or even notice your Software RAID array. You may, of course, lose a hard disk of a RAID1 array and need to replace a drive. Or you've transplanted an array from one workstation to another and need to recover data off of it. Perhaps you're curious and just want more information on the status of your RAID array. In this second part of our 'Software RAID' series, I'll cover these topics and more.

Detecting existing arrays

On more than one occasion, I've booted my Gentoo LiveCD on a Linux workstation to recover a broken system configuration or misbehaving grub entry on both my own and client's machines. One of the things the Gentoo LiveCD does not do is automatically detect Software RAID arrays and create the md devices. We can use mdadm however to scan for existing arrays:

# livecd ~ # mdadm -E --scan
ARRAY /dev/md0 level=raid1 num-devices=4 UUID=e7a3fea9:320236f2:5c33b674:c387c6b1
ARRAY /dev/md1 level=raid0 num-devices=4 UUID=1e77394a:e489bcd8:81f98684:c9daeec5

Good, it sees the arrays I made during the first part of this series. I've noticed that, on some systems, the above command is enough for the arrays to be started and devices created (they probably auto dump array info to /etc/mdadm. On my Gentoo CD, however, this is not the case, and I had to do the following:

# livecd ~ # mdadm --assemble --scan
mdadm: /dev/md0 has been started with 4 drives.
mdadm: /dev/md0 has been started with 4 drives.

There, now the /dev/mdX block devices exist.

Checking the consistency of an array

Most, if not all, hardware RAID controllers include command line interface (CLI) or other userland tools to check that the data on the arrays is correct - that is, make sure that RAID1 mirrors are actually mirrors, no corrupt data is on a RAID5 array, etc. In addition, when checking the consistency of an array, the md software RAID driver will scan for bad blocks on your hard disks. Actively scanning your drive for bad data will prevent nasty surprises when you try to access a file.

The only caveat is that this requires a moden Linux Kernel >= 2.6.16. To start a data consistency check, run the following command as root on the appropriate Software RAID device (here, it's the RAID1 array we created in Part 1, /dev/md0):

# echo check > /sys/block/md0/md/sync_action

Once again, we can read the /proc/mdstat file to monitor the status of the check:

# watch -n 1 cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[1] sda1[0]
      488383936 blocks [2/2] [UU]
      [>....................]  check =  3.3% (16481792/488383936) finish=96.8min speed=81177K/sec
      bitmap: 0/233 pages [0KB], 1024KB chunk

unused devices: <none>

When an inconsistent block is discovered, the kernel increments a counter. You can read this counter quite easily:

# cat /sys/block/md0/md/mismatch_cnt

Mismatches can be caused by a number of factors, and it's possible to receive false-positives if write operations are occurring in swap or on a file. These are harmless; you should be much more worried about mismatches do to physical damage to your hard drives which can corrupt your data. If the mismatch_cnt parameter is greater than 0, you should consider repairing the disk.

Repairing a Software RAID Array

Like checking the array, repairing the array is quite easy:

# echo repair > /sys/block/md0/md/sync_action

It's important to remember that even if your array is consistent after the repair operation is completed, data on disk can still be corrupted. While this is fairly uncommon (typically I just lose an entire disk), you should keep it in mind, especially if you're a system administrator.

Once the repair operation is completed, you should fsck the disk/partition at your earliest convenience. But once again, this wont fix corrupted data - it'll just make sure the filesystem is consistent.

To catch corrupted files, consider saving md5 hashes of good files which rarely change. One of my preferred applications for doing this is a CLI tool called cfv. This can help you identify corrupt data, so that you can restore it from a backup, if possible.

Finally, the only way to truly be sure that your data is intact is to replace any damaged hardware, reformat the array, and replace any files detected as being corrupted. This is a fairly extreme measure, and since you'll probably be running hardware RAID on production systems, I'd recommend this only if you experience extreme corruption or are just looking for an excuse to reinstall your operating system.

sky

I like to play Last chaos, because I like its name, also I like last chaos gold. My friend told me that she would buy lastchaos gold for me, and I was so happy. I do not like to go shopping, because it always spends a lot of money, but I never hesitate to buy lastchaos money. If you buy last chaos gold, you will like it. You can buy cheap lastchaos gold; it is so easy and convenient.

dsa

They bring me a lot of smile potbs Doubloon
Tales Of Pirates gold

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Post new comment

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Copy the characters (respecting upper/lower case) from the image.