Software RAID 5 in Ubuntu/Debian with mdadm
- I am performing these examples in Virtualbox, so the hard drive sizes will be much smaller than what you’ll have in reality, but this will serve as a good demonstration of how to perform the actions.
- RAID-5 requires a minimum of 3 drives, and all should be the same size. It provides the ability for one drive to fail without any data loss. Here’s a quick way to calculate how much space you’ll have when you’re complete.
Usable space = (number of drives – 1) * size of smallest drive
To create, format and resize the array we will need to install only one package, mdadm. To get started, let’s first switch to the root user, and then install mdadm, ssh, and parted. Personally, I like to upgrade to a newer version of mdadm, but first I’ll show you the instructions for a standard install.
sudo -i enter root password here... apt-get update && apt-get upgrade -y apt-get install mdadm ssh parted gdisk
Before we jump into creating the actual RAID array, I would suggest you put a partition on each drive that you plan to use in your array. This is not a requirement with mdadm, but I like to have the disks show up in fdisk as partitioned. In the past I would have shown you how to create the partitions with fdisk, but fdisk doesn’t support partitions greater than 2TB, so that rules out many modern hard drives.
Instead, I’ll show you how to created the partitions with parted using GPT labels. But first, let’s view a list of our available hard drives and partitions.
This will output, for each drive you have, something along the lines of:
root@test:~# fdisk -l Disk /dev/sda: 8589 MB, 8589934592 bytes 255 heads, 63 sectors/track, 1044 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00030001 Device Boot Start End Blocks Id System /dev/sda1 * 1 996 7993344 83 Linux Partition 1 does not end on cylinder boundary. /dev/sda2 996 1045 392193 5 Extended /dev/sda5 996 1045 392192 82 Linux swap / Solaris Disk /dev/sdb: 1073 MB, 1073741824 bytes 255 heads, 63 sectors/track, 130 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Disk /dev/sdb doesn't contain a valid partition table Disk /dev/sdc: 1073 MB, 1073741824 bytes 255 heads, 63 sectors/track, 130 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Disk /dev/sdc doesn't contain a valid partition table Disk /dev/sdd: 1073 MB, 1073741824 bytes 255 heads, 63 sectors/track, 130 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Disk /dev/sdd doesn't contain a valid partition table Disk /dev/sde: 1073 MB, 1073741824 bytes 255 heads, 63 sectors/track, 130 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Disk /dev/sde doesn't contain a valid partition table
This shows (4) 1GB drives with no partition, and /dev/sda that is where the operating system is installed. The last 4 drives should be safe to use in our array. Next, let’s actually partition those (4) disks.
parted -a optimal /dev/sdb GNU Parted 2.3 Using /dev/sdb Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) mklabel gpt (parted) mkpart primary 1 -1 (parted) align-check alignment type(min/opt) [optimal]/minimal? optimal Partition number? 1 1 aligned (parted) quit
After being started with the above switch, parted will align all the partitions created in such a way that every partition on the drive will start from a cylinder that is divisible by 4096. As a result, it will be properly aligned as the align-check command shows above.
If you are using GPT, then you can use sgdisk to clone the partition table from /dev/sdb to the other three drives. This has the other benefit of having a backup of your disk partition table.
sgdisk --backup=table /dev/sdb sgdisk --load-backup=table /dev/sdc sgdisk --load-backup=table /dev/sdd sgdisk --load-backup=table /dev/sde
Creating the Array
Now that our disks are partitioned correctly, it’s time to start building an mdadm RAID5 array. To create the array, we use the mdadm create flag. We also need to specify what RAID level we want, as well as how many devices and what they are. The following command will use 3 of our newly partitioned disks.
mdadm --create --verbose /dev/md0 --level=5 --raid-devices=3 /dev/sd[bcd]1
The verbose flag tells it to output extra information. In the above command I am creating a RAID-5 array at /dev/md0, using 3 partitions. The number of partitions you are using, and their names may be different, so do not just copy and paste all of the command above without verifying your setup first. Note that the partition name is something like /dev/sdb1, whereas the drive name is something like /dev/sdb; the 1 refers to the partition number on the disk.
If you wanted to build a RAID6 array, it’s equally as easy. For this example, I’ll throw in a couple new example drives to make our array bigger.
mdadm --create --verbose /dev/md0 --level=6 --raid-devices=8 /dev/sd[bcdefghi]1
While the array is being built you can view its status in the file /proc/mdstat. Here the watch command comes in handy:
watch cat /proc/mdstat
This will output the contents of the file to the screen, refreshing every 2 seconds (by default). While the array is being built it will show how much of the “recovery” has been done, and an estimated time remaining. This is what it looks like when it’s building.
Every 2,0s: cat /proc/mdstat Tue Dec 31 16:48:47 2013 Personalities : [raid6] [raid5] [raid4] md0 : active raid5 sdd1 sdc1 sdb1 7813770240 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_] [=======>.............] recovery = 35.0% (1368089744/3906885120) finish=346.5min speed=122077K/sec unused devices:
When it’s completed syncing, it should look like this. This process can take many hours depending on how big the array you’re assembling is.
Every 2.0s: cat /proc/mdstat Tue Nov 15 13:02:37 2011 Personalities : [raid6] [raid5] [raid4] md0 : active raid5 sdd1 sdc1 sdb1 2092032 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU] unused devices:
Now that we have set up the array, we need to edit the mdadm configuration file so that it knows how to assemble the array when the system reboots.
echo "DEVICE partitions" > /etc/mdadm/mdadm.conf echo "HOMEHOST fileserver" >> /etc/mdadm/mdadm.conf echo "MAILADDR firstname.lastname@example.org" >> /etc/mdadm/mdadm.conf mdadm --detail --scan | cut -d " " -f 4 --complement >> /etc/mdadm/mdadm.conf
Next, update the initramfs, so that the OS has access to your new mdadm array at boot time.
You can view your new array like this.
mdadm --detail /dev/md0
It should look something like this.
/dev/md0: Version : 1.2 Creation Time : Tue Nov 15 13:01:40 2011 Raid Level : raid5 Array Size : 2092032 (2043.34 MiB 2142.24 MB) Used Dev Size : 1046016 (1021.67 MiB 1071.12 MB) Raid Devices : 3 Total Devices : 3 Persistence : Superblock is persistent Update Time : Tue Nov 15 13:57:25 2011 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 512K Name : test:0 UUID : 2e46af23:bca95854:eb8f8d7c:3fb727ef Events : 34 Number Major Minor RaidDevice State 0 8 17 0 active sync /dev/sdb1 1 8 33 1 active sync /dev/sdc1 3 8 49 2 active sync /dev/sdd1
You can see that the chunk size is 512K and the metadata version is 1.2.
Verify that your email is working like this. You’ll need to setup an [email server](http://zackreed.me/articles/39-send-system-email-with-gmail-and-ssmtp) first. This will ensure that you get alerts if something goes wrong on your system.
mdadm --monitor -m email@example.com /dev/md0 -t
Creating and mounting the filesystem
Now that the array is built we need to format it. What filesystem you choose is up to you, but I would recommend ext4, unless your array will be bigger than 16TB as that’s current the maximum size supported by e2fsprogs. I will show you the quick way to apply a filesystem to your array first. If you want to optimize your filesystem performance, follow the next example…
This will take a while, especially if your array is large. If you want to optimize your filesystem performance on top of mdadm, you’ll need to do a little work, or use this [calculator](http://uclibc.org/~aldot/mkfs_stride.html). Here’s how you do it.
1. chunk size = 512kB (see chunk size advise above)
2. block size = 4kB (recommended for large files, and most of time)
3. stride = chunk / block in this example 512kB / 4k = 128
4. stripe-width = stride * ( (n disks in raid5) – 1 ) this example: 128 * ( (3) – 1 ) = 256
So, your optimized mkfs command would look like this.
mkfs.ext4 -b 4096 -E stride=128,stripe-width=256 /dev/md0
Note: As a caveat for using ext4 with volumes > 16TB, you’ll need to use a newer version of e2fsprogs if you want to create a filesystem that will support 16TB+. Ubuntu 12.04 comes with e2fsprogs version 1.42, and this version supports creating a 64bit filesystem (will support > 16TB) like this.
mkfs.ext4 -O 64bit /dev/md0
If you chose to use ext2/3/4 you should also be aware of reserved space. By default ext2/3/4 will reserve 5% of the drives space, which only root is able to write to. This is done so a user cannot fill the drive and prevent critical daemons writing to it, but 5% of a large RAID array which isn’t going to be written to by critical daemons anyway, is a lot of wasted space. I chose to set the reserved space to 0%, using tune2fs:
tune2fs -m 0 /dev/md0
Next we should add the array to the fstab, so that it will automatically be mounted when the system boots up. This can be done by editing the file /etc/fstab.
Your fstab should already contain a few entries (if it doesn’t something is wrong!). At the bottom add a line similar to the following:
/dev/md0 /storage ext4 defaults 0 0
*Press crtl+x, and then y to save and exit the program.*
I chose to mount my array on /storage, but you may well wish to mount it somewhere else. As I said earlier I chose to use ext4, but here you will need to enter whatever filesystem you chose earlier. If the folder you chose doesn’t exist you will need to create it like this.
Now, mount the array.
This will mount anything mentioned in the fstab that isn’t currently mounted. Hopefully, your array is now available on /storage. Check your available space like this.
df -h /storage
It should look something like this…
Filesystem Size Used Avail Use% Mounted on /dev/md0 2.0G 35M 2.0G 2% /storage
You should now have a working RAID-5 array. Next, I would strongly suggest you read my other articles to setup email for monitoring, SMART information monitoring, spinning down disks, setting up a UPS battery backup, and other raid array actions. Being able to cope with one drive failing is no use if you don’t notice when it fails and let a second fail before replacing the failed drive.