Today I set up software raid10 on my server using ZFS as a file system. At first the task seemed daunting because of the unique nature of the file system, but as you become familiar with it you will realize just how easy and powerful it is. I urge you to read up on raid and ZFS, if you haven’t already, before you begin. I have placed the majority of the resources at the end of the post, but there are also links in the text as you read. If you have any questions send me a message on twitter or other means and I will do what I can to help out.

As a note, I am running Ubuntu server 14.04. Though these instructions should work for the desktop version as well.



Setting up ZFS

The following commands install ZFS by first adding the zfs archive to your system, then updating the list of available software, and finally installing the needed software.

# add-apt-repository ppa:zfs-native/stable
# sudo apt-get update
# apt-get install spl-dkms zfs-dkms ubuntu-zfs

Note the repository used here is for Native ZFS for Ubuntu PPA (64bit systems only). ZFS can also be installed from:

Before moving on we should check that the ZFS filesystem is installed and available:

# modprobe zfs
# dmesg | grep ZFS:
ZFS: Loaded module v0.6.4.1-1~trusty, ZFS pool version 5000, ZFS filesystem version 5


Create Storage Pool

According to the Arch wiki it is not necessary nor recommended to partition the drives before creating the zfs filesystem. So I will not be partitioning my disks.

The pool in my setup will contain more than ten disks, so I will be using /dev/disk/by-path/ rather than /dev/disk/by-id/ which is recommend by ZFS on linux developers for pools containing more than ten disks.

To get a list of all drives by path run:

# ls -lh /dev/disk/by-path/
lrwxrwxrwx 1 root root  9 Jun  7 12:16 pci-0000:04:00.0-scsi-0:0:0:0 -> ../../sda
lrwxrwxrwx 1 root root 10 Jun  5 18:57 pci-0000:04:00.0-scsi-0:0:0:0-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Jun  5 18:57 pci-0000:04:00.0-scsi-0:0:0:0-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Jun  5 18:57 pci-0000:04:00.0-scsi-0:0:0:0-part5 -> ../../sda5
lrwxrwxrwx 1 root root  9 Jun  5 18:57 pci-0000:04:00.0-scsi-0:1:10:0 -> ../../sdb
lrwxrwxrwx 1 root root  9 Jun  5 18:57 pci-0000:04:00.0-scsi-0:1:11:0 -> ../../sdc
lrwxrwxrwx 1 root root  9 Jun  5 18:57 pci-0000:04:00.0-scsi-0:1:12:0 -> ../../sdd
lrwxrwxrwx 1 root root  9 Jun  5 18:57 pci-0000:04:00.0-scsi-0:1:13:0 -> ../../sde
lrwxrwxrwx 1 root root  9 Jun  5 18:57 pci-0000:04:00.0-scsi-0:1:14:0 -> ../../sdf
lrwxrwxrwx 1 root root  9 Jun  5 18:57 pci-0000:04:00.0-scsi-0:1:15:0 -> ../../sdg
lrwxrwxrwx 1 root root  9 Jun  5 18:57 pci-0000:04:00.0-scsi-0:1:16:0 -> ../../sdh
lrwxrwxrwx 1 root root  9 Jun  5 18:57 pci-0000:04:00.0-scsi-0:1:17:0 -> ../../sdi
lrwxrwxrwx 1 root root 10 Jun  5 18:57 pci-0000:04:00.0-scsi-0:1:17:0-part2 -> ../../sdi2
lrwxrwxrwx 1 root root 10 Jun  5 18:57 pci-0000:04:00.0-scsi-0:1:17:0-part3 -> ../../sdi3
lrwxrwxrwx 1 root root  9 Jun  5 18:57 pci-0000:04:00.0-scsi-0:1:18:0 -> ../../sdj
lrwxrwxrwx 1 root root  9 Jun  5 18:57 pci-0000:04:00.0-scsi-0:1:19:0 -> ../../sdk
lrwxrwxrwx 1 root root  9 Jun  5 18:57 pci-0000:04:00.0-scsi-0:1:20:0 -> ../../sdl
lrwxrwxrwx 1 root root 10 Jun  5 18:57 pci-0000:04:00.0-scsi-0:1:20:0-part2 -> ../../sdl2
lrwxrwxrwx 1 root root 10 Jun  5 18:57 pci-0000:04:00.0-scsi-0:1:20:0-part3 -> ../../sdl3
lrwxrwxrwx 1 root root  9 Jun  5 18:57 pci-0000:04:00.0-scsi-0:1:21:0 -> ../../sdm

That’s a lot of disks…

To create a ZFS pool the general command structure is as follows for a raid5 configuration:

# zpool create -f -m <mount> <pool> raidz <ids>

Where:

  • create: subcommand to create the pool.
  • -f: Force creating the pool. This is to overcome the “EFI label error”.
  • -m: The mount point of the pool. If this is not specified, than the pool will be mounted to /.
  • pool: This is the name of the pool.
  • raidz: This is the type of virtual device that will be created from the pool of devices. Raidz is a special implementation of raid5. See Jeff Bonwick’s Blog – RAID-Z for more information about raidz.
  • ids: The names of the drives or partitions that to include into the pool. Get it from /dev/disk/by-path.

Though for my setup I will be using raid10 (striped mirror) because with the number of disks I have I don’t mind trading disk space for parity and increased read performance. I suggest reading up on raid types if you have not already before moving forward.

The general command structure for a raid10 pool is as follows:

# zpool create -f -m <mount> <pool> mirror <id1> <id2> mirror <id3> <id4>

You can add as many mirrors as you want.

Using the paths we found before we can create a raid10 pool:

# zpool create -f -m /mnt/data bigdata mirror pci-0000:04:00.0-scsi-0:1:10:0 pci-0000:04:00.0-scsi-0:1:11:0 mirror pci-0000:04:00.0-scsi-0:1:12:0 pci-0000:04:00.0-scsi-0:1:13:0 mirror pci-0000:04:00.0-scsi-0:1:14:0 pci-0000:04:00.0-scsi-0:1:15:0 mirror pci-0000:04:00.0-scsi-0:1:16:0 pci-0000:04:00.0-scsi-0:1:17:0 mirror pci-0000:04:00.0-scsi-0:1:18:0 pci-0000:04:00.0-scsi-0:1:19:0 mirror pci-0000:04:00.0-scsi-0:1:20:0 pci-0000:04:00.0-scsi-0:1:21:0

There will not be an output from the above command if there are no errors. We can verify the pool creation using:

# zpool status
  pool: bigdata
 state: ONLINE
  scan: none requested
config:

NAMESTATE READ WRITE CKSUM
bigdata ONLINE   0 0 0
  mirror-0  ONLINE   0 0 0
    pci-0000:04:00.0-scsi-0:1:10:0  ONLINE   0 0 0
    pci-0000:04:00.0-scsi-0:1:11:0  ONLINE   0 0 0
  mirror-1  ONLINE   0 0 0
    pci-0000:04:00.0-scsi-0:1:12:0  ONLINE   0 0 0
    pci-0000:04:00.0-scsi-0:1:13:0  ONLINE   0 0 0
  mirror-2  ONLINE   0 0 0
    pci-0000:04:00.0-scsi-0:1:14:0  ONLINE   0 0 0
    pci-0000:04:00.0-scsi-0:1:15:0  ONLINE   0 0 0
  mirror-3  ONLINE   0 0 0
    pci-0000:04:00.0-scsi-0:1:16:0  ONLINE   0 0 0
    pci-0000:04:00.0-scsi-0:1:17:0  ONLINE   0 0 0
  mirror-4  ONLINE   0 0 0
    pci-0000:04:00.0-scsi-0:1:18:0  ONLINE   0 0 0
    pci-0000:04:00.0-scsi-0:1:19:0  ONLINE   0 0 0
  mirror-5  ONLINE   0 0 0
    pci-0000:04:00.0-scsi-0:1:20:0  ONLINE   0 0 0
    pci-0000:04:00.0-scsi-0:1:21:0  ONLINE   0 0 0

errors: No known data errors

Everything looks good. We can run the following to see the available disk space:

$ df -h
Filesystem Size  Used Avail Use% Mounted on
/dev/mapper/daenerys--vg-root  905G  3.8G  855G   1% /
none   4.0K 0  4.0K   0% /sys/fs/cgroup
udev   5.9G   12K  5.9G   1% /dev
tmpfs  1.2G  1.5M  1.2G   1% /run
none   5.0M 0  5.0M   0% /run/lock
none   5.9G 0  5.9G   0% /run/shm
none   100M 0  100M   0% /run/user
/dev/sda1  236M   71M  153M  32% /boot
bigdata9.9T 0  9.9T   0% /mnt/data

Mmmm… 9.9T, I like the taste of that.

Resources:



Thanks for reading! If this post helped you, let me know on twitter so I know to keep writing stuff like this up.