Software RAID 5 in Ubuntu/Debian with mdadm

Notes:

  1. I am performing these examples in Virtualbox, so the hard drive sizes will be much smaller than what you’ll have in reality, but this will serve as a good demonstration of how to perform the actions.
  2. RAID-5 requires a minimum of 3 drives, and all should be the same size. It provides the ability for one drive to fail without any data loss. Here’s a quick way to calculate how much space you’ll have when you’re complete.

Usable space = (number of drives – 1) * size of smallest drive

Required software

To create, format and resize the array we will need to install only one package, mdadm. To get started, let’s first switch to the root user, and then install mdadm, ssh, and parted. Personally, I like to upgrade to a newer version of mdadm, but first I’ll show you the instructions for a standard install.

sudo -i
enter root password here...
apt-get update && apt-get upgrade -y
apt-get install mdadm ssh parted gdisk

Initial setup

Before we jump into creating the actual RAID array, I would suggest you put a partition on each drive that you plan to use in your array. This is not a requirement with mdadm, but I like to have the disks show up in fdisk as partitioned. In the past I would have shown you how to create the partitions with fdisk, but fdisk doesn’t support partitions greater than 2TB, so that rules out many modern hard drives.

Instead, I’ll show you how to created the partitions with parted using GPT labels. But first, let’s view a list of our available hard drives and partitions.

fdisk -l

This will output, for each drive you have, something along the lines of:

root@test:~# fdisk -l

Disk /dev/sda: 8589 MB, 8589934592 bytes
255 heads, 63 sectors/track, 1044 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00030001

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1         996     7993344   83  Linux
Partition 1 does not end on cylinder boundary.
/dev/sda2             996        1045      392193    5  Extended
/dev/sda5             996        1045      392192   82  Linux swap / Solaris

Disk /dev/sdb: 1073 MB, 1073741824 bytes
255 heads, 63 sectors/track, 130 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/sdb doesn't contain a valid partition table

Disk /dev/sdc: 1073 MB, 1073741824 bytes
255 heads, 63 sectors/track, 130 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/sdc doesn't contain a valid partition table

Disk /dev/sdd: 1073 MB, 1073741824 bytes
255 heads, 63 sectors/track, 130 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/sdd doesn't contain a valid partition table

Disk /dev/sde: 1073 MB, 1073741824 bytes
255 heads, 63 sectors/track, 130 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/sde doesn't contain a valid partition table

This shows (4) 1GB drives with no partition, and /dev/sda that is where the operating system is installed. The last 4 drives should be safe to use in our array. Next, let’s actually partition those (4) disks.

parted -a optimal /dev/sdb
GNU Parted 2.3
Using /dev/sdb
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) mklabel gpt 
(parted) mkpart primary 1 -1
(parted) align-check                                                      
alignment type(min/opt)  [optimal]/minimal? optimal                       
Partition number? 1                                                       
1 aligned
(parted) quit 

After being started with the above switch, parted will align all the partitions created in such a way that every partition on the drive will start from a cylinder that is divisible by 4096. As a result, it will be properly aligned as the align-check command shows above.

If you are using GPT, then you can use sgdisk to clone the partition table from /dev/sdb to the other three drives. This has the other benefit of having a backup of your disk partition table.

sgdisk --backup=table /dev/sdb
sgdisk --load-backup=table /dev/sdc
sgdisk --load-backup=table /dev/sdd
sgdisk --load-backup=table /dev/sde

Creating the Array
Now that our disks are partitioned correctly, it’s time to start building an mdadm RAID5 array. To create the array, we use the mdadm create flag. We also need to specify what RAID level we want, as well as how many devices and what they are. The following command will use 3 of our newly partitioned disks.

mdadm --create --verbose /dev/md0 --level=5 --raid-devices=3 /dev/sd[bcd]1

The verbose flag tells it to output extra information. In the above command I am creating a RAID-5 array at /dev/md0, using 3 partitions. The number of partitions you are using, and their names may be different, so do not just copy and paste all of the command above without verifying your setup first. Note that the partition name is something like /dev/sdb1, whereas the drive name is something like /dev/sdb; the 1 refers to the partition number on the disk.

RAID6

If you wanted to build a RAID6 array, it’s equally as easy. For this example, I’ll throw in a couple new example drives to make our array bigger.

mdadm --create --verbose /dev/md0 --level=6 --raid-devices=8 /dev/sd[bcdefghi]1

While the array is being built you can view its status in the file /proc/mdstat. Here the watch command comes in handy:

watch cat /proc/mdstat

This will output the contents of the file to the screen, refreshing every 2 seconds (by default). While the array is being built it will show how much of the “recovery” has been done, and an estimated time remaining. This is what it looks like when it’s building.

Every 2,0s: cat /proc/mdstat                                                                                Tue Dec 31 16:48:47 2013
 
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdd1[3] sdc1[1] sdb1[0]
      7813770240 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]
      [=======>.............]  recovery = 35.0% (1368089744/3906885120) finish=346.5min speed=122077K/sec
 
unused devices: 

When it’s completed syncing, it should look like this. This process can take many hours depending on how big the array you’re assembling is.

Every 2.0s: cat /proc/mdstat                            Tue Nov 15 13:02:37 2011

Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdd1[3] sdc1[1] sdb1[0]
      2092032 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]

unused devices: 

Now that we have set up the array, we need to edit the mdadm configuration file so that it knows how to assemble the array when the system reboots.

echo "DEVICE partitions" > /etc/mdadm/mdadm.conf
echo "HOMEHOST fileserver" >> /etc/mdadm/mdadm.conf
echo "MAILADDR youruser@gmail.com" >> /etc/mdadm/mdadm.conf
mdadm --detail --scan | cut -d " " -f 4 --complement >> /etc/mdadm/mdadm.conf

Next, update the initramfs, so that the OS has access to your new mdadm array at boot time.

update-initramfs -u

You can view your new array like this.

mdadm --detail /dev/md0

It should look something like this.

/dev/md0:
        Version : 1.2
  Creation Time : Tue Nov 15 13:01:40 2011
     Raid Level : raid5
     Array Size : 2092032 (2043.34 MiB 2142.24 MB)
  Used Dev Size : 1046016 (1021.67 MiB 1071.12 MB)
   Raid Devices : 3
  Total Devices : 3
    Persistence : Superblock is persistent

    Update Time : Tue Nov 15 13:57:25 2011
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : test:0
           UUID : 2e46af23:bca95854:eb8f8d7c:3fb727ef
         Events : 34

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       33        1      active sync   /dev/sdc1
       3       8       49        2      active sync   /dev/sdd1

You can see that the chunk size is 512K and the metadata version is 1.2.

Verify that your email is working like this. You’ll need to setup an [email server](http://zackreed.me/articles/39-send-system-email-with-gmail-and-ssmtp) first. This will ensure that you get alerts if something goes wrong on your system.

mdadm --monitor -m youruser@gmail.com /dev/md0 -t

Creating and mounting the filesystem
Now that the array is built we need to format it. What filesystem you choose is up to you, but I would recommend ext4, unless your array will be bigger than 16TB as that’s current the maximum size supported by e2fsprogs. I will show you the quick way to apply a filesystem to your array first. If you want to optimize your filesystem performance, follow the next example…

Unoptimized

mkfs.ext4 /dev/md0

This will take a while, especially if your array is large. If you want to optimize your filesystem performance on top of mdadm, you’ll need to do a little work, or use this [calculator](http://uclibc.org/~aldot/mkfs_stride.html). Here’s how you do it.

Optimized
1. chunk size = 512kB (see chunk size advise above)
2. block size = 4kB (recommended for large files, and most of time)
3. stride = chunk / block in this example 512kB / 4k = 128
4. stripe-width = stride * ( (n disks in raid5) – 1 ) this example: 128 * ( (3) – 1 ) = 256
So, your optimized mkfs command would look like this.

mkfs.ext4 -b 4096 -E stride=128,stripe-width=256 /dev/md0

Note: As a caveat for using ext4 with volumes > 16TB, you’ll need to use a newer version of e2fsprogs if you want to create a filesystem that will support 16TB+. Ubuntu 12.04 comes with e2fsprogs version 1.42, and this version supports creating a 64bit filesystem (will support > 16TB) like this.

mkfs.ext4 -O 64bit /dev/md0

If you chose to use ext2/3/4 you should also be aware of reserved space. By default ext2/3/4 will reserve 5% of the drives space, which only root is able to write to. This is done so a user cannot fill the drive and prevent critical daemons writing to it, but 5% of a large RAID array which isn’t going to be written to by critical daemons anyway, is a lot of wasted space. I chose to set the reserved space to 0%, using tune2fs:

tune2fs -m 0 /dev/md0

Next we should add the array to the fstab, so that it will automatically be mounted when the system boots up. This can be done by editing the file /etc/fstab.

nano /etc/fstab

Your fstab should already contain a few entries (if it doesn’t something is wrong!). At the bottom add a line similar to the following:

/dev/md0	/storage	    ext4	defaults	0	0

*Press crtl+x, and then y to save and exit the program.*

I chose to mount my array on /storage, but you may well wish to mount it somewhere else. As I said earlier I chose to use ext4, but here you will need to enter whatever filesystem you chose earlier. If the folder you chose doesn’t exist you will need to create it like this.

mkdir /storage

Now, mount the array.

mount -a

This will mount anything mentioned in the fstab that isn’t currently mounted. Hopefully, your array is now available on /storage. Check your available space like this.

df -h /storage

It should look something like this…

Filesystem            Size  Used Avail Use% Mounted on
/dev/md0              2.0G   35M  2.0G   2% /storage

You should now have a working RAID-5 array. Next, I would strongly suggest you read my other articles to setup email for monitoring, SMART information monitoring, spinning down disks, setting up a UPS battery backup, and other raid array actions. Being able to cope with one drive failing is no use if you don’t notice when it fails and let a second fail before replacing the failed drive.

Zack

I love learning new things and trying out the latest technology.

You may also like...

2 Responses

  1. gnilsen says:

    Hi Zack, nice guide.

    When I paste these lines I get errors:

    echo "DEVICE partitions" > /etc/mdadm/mdadm.conf
    echo "HOMEHOST fileserver" >> /etc/mdadm/mdadm.conf
    echo "MAILADDR youruser@gmail.com" >> /etc/mdadm/mdadm.conf
    mdadm --detail --scan | cut -d " " -f 4 --complement >> /etc/mdadm/mdadm.conf
    

    I am thinking these are html translation errors and the > are supposed to be >

    Can you instead post what the mdadm.conf file is supposed to look like after these bash commands are done?

    Regards, George

    • Zack says:

      Hello George. There were a couple HTML errors. I have corrected these. I don’t use mdadm anymore, so I would need to re-create this to post a mdadm.conf file. This should work as is now 🙂

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.