Arch Linux on ZFS Root With Separate Boot

Submitted by Shaun.Foulkes on Thu, 2016/12/29 - 20:36

This is a guide designed to help you setup Arch Linux with root on ZFS with a separate boot partition. The boot partition can be any filesystem of your choice, but I will be using BTRFS as the example.

ZFS is an enterprise-grade filesystem that does RAID functions without the need for a hardware RAID controller. ZFS will work with a RAID controller, however, it is best used with unpartitioned raw disks. Some of the nice features of ZFS include:

  • copy-on-write
  • virtually endless pool expansion
  • ability to share the whole pool's free space among multiple ZFS volumes
  • ability to set the record size on a volume-by-volume basis to minimize rewriting
  • snapshots
  • ability to send snapshots to another pool, machine or file
  • built-in compression can be enabled (can save a considerable amount of space)
  • add a SSD as a cache drive to improve performance and mitigate fragmentation

Things I used

  • 1x2TB 7200rpm HDD
  • 1x40GB SSD
  • Arch Linux ISO with ZFS added to it

Disk layout

  • 40GB SSD - 2GB for /boot formatted to BTRFS and the remaining space as an unformatted partition for ZFS cache
  • 2TB HDD - a fresh MBR was created with no partitions (ZFS can use raw disks)

Create an arch linux live usb with ZFS included. I used the guide done here by John Ramsden. Once you have got a working ArchZFS ISO, boot your system with it.

For the remainder of this guide I have used John's related articles as a guide. However, I found another way to do the install as I could not get his instructions to work. Also, I have chosen BTRFS for the boot partition so that I can snapshot and rollback like ZFS but have native support to allow errors to show on boot if ZFS doesn't boot correctly.

Setting up the disks

Assuming the SSD is sda and the HDD is sdb

# fdisk /dev/sda

  1. Select "o" to create a new MBR
  2. then "n" to create a new partition
  3. hit enter for the next 3 questions or 
    • "p" for primary 
    • "1" for the first partition
    • "2048" for First Sector
  4. "+2G" for Last Sector
  5. choose "n" for another new partition
  6. hit enter 4 times to select all the defaults
  7. select "w" to write the changes to the disk

# fdisk /dev/sdb

  1. select "o" to create a new MBR
  2. choose "w" to write the changes

Now we can format and create our BTRFS subvolume for /boot

# mkfs.btrfs -f /dev/sda1
# mount /dev/sda1 /mnt
# btrfs subvolume create /mnt/boot
# umount /mnt

Create your ZFS pool

Make sure to create your pools by disk id to avoid any import issues (/dev/disk/by-id/your-disk-id-here). I will be using system as my pool name.

# zpool create -m none system /dev/disk/by-id/device-id-for-hdd cache /dev/disk/by-id/device-id-for-ssd-part2

Feel free to drop the cache drive or add a mirror.

Re-import your pool

Next you will have to export the pool and import it to /mnt. I have found that in this configuration I needed to use the pool id to import it instead of the pool name. For that reason I am using the device id of the pool. I believe that it may be because the raw disk is being used and not a raw partition.

# zpool export system
# zpool import   (get pool info for all available pools to import, get your pool's id)
# zpool import -d /dev/disk/by-id -R /mnt 199856487523469827

Set your default pool options

# zfs set compresssion=lz4 system
# zfs set atime=on system
# zfs set relatime=on system

Setup system structure

First we create the root directory in ZFS. We will be using a setup that will allow for boot environments should you wish to use them.

# zfs create -o mountpoint=none system/ROOT
# zfs create -o mountpoint=/ system/ROOT/default
# zpool set bootfs=system/ROOT/default system

Now you will have a root ZVOL and it will be mounted at /mnt. Next create /boot and mount your boot partition.

# mkdir /mnt/boot
# mount -o compress=lzo,subvol=boot /dev/sda1 /mnt/boot

Then we will create any other ZVOLs you wish to have as folders such as /home, /usr, /var, /tmp and so on. I have taken some suggestions from the Arch ZFS wiki. This step is optional.

# zfs create -o mountpoint=/home system/home
# zfs create -o setuid=off -o devices=off -o sync=disabled -o mountpoint=/tmp system/tmp
# zfs create -o xattr=sa -o mountpoint=/var system/var
# zfs create -o mountpoint=/usr system/usr

If you wish to have a swap partition follow the guide here on the Arch wki 

Install Arch Linux

We will install the base, base-devel and a couple other packages that I will need or like to have. Vim is not needed, but it is my preferred editor.

# pacstrap -i /mnt base base-devel btrfs-progs grub vim

Create a fstab file

# genfstab -U -p /mnt > /mnt/etc/fstab

Then edit it 

# vim /mnt/etc/fstab

looks something like this

# system/ROOT/default
system/ROOT/default    /               zfs             rw,relatime,xattr,noacl 0 0

# /dev/sda1
UUID=466f96c7-d74b-4309-8ee4-97e5dc6d7229       /boot           btrfs           rw,relatime,compress=lzo,ssd,space_cache,subvolid=262,subvol=/boot,subvol=boot  0 0

# system/home
system/home            /home           zfs             rw,relatime,xattr,noacl 0 0

# system/tmp
system/tmp             /tmp            zfs             rw,nosuid,nodev,relatime,xattr,noacl    0 0

# system/var
system/var             /var            zfs             rw,relatime,xattr,noacl 0 0

# system/usr
system/usr             /usr            zfs             rw,relatime,xattr,noacl 0 0

# /dev/zd0
/dev/zd0                   none            swap            discard         0 0

Modify it by deleting the ZFS partitions except for the swap. Swap needs to be changed from /dev/zd0 to /dev/zvol/system/swap. Should look more like this:

# /dev/sda1
UUID=466f96c7-d74b-4309-8ee4-97e5dc6d7229       /boot           btrfs           rw,relatime,compress=lzo,ssd,space_cache,subvol=boot  0 0

# /dev/zd0
/dev/zvol/system/swap                   none            swap            discard         0 0

Configure the Ramdisk

Open /mnt/etc/mkinitcpio.conf and change the HOOKS

# vim /mnt/etc/mkinitcpio.conf

Change: 
HOOKS="base udev autodetect modconf block filesystems keyboard fsck"

To:
HOOKS="base udev autodetect modconf block keyboard zfs filesystems"

Add ZFS repo to the new install

Open pacman.conf

# vim /mnt/etc/pacman.conf

Add the ZFS repo in the Repositories section above all other repos

[archzfs]
Server = http://archzfs.com/$repo/x86_64

Chroot into the new install

We will chroot into the system, update the pacman keys for ZFS and install ZFS

# arch-chroot /mnt /bin/bash
# pacman-key -r 5E1ABF240EE7A126
# pacman-key --lsign-key 5E1ABF240EE7A126
# pacman -Syu zfs-linux

Finish setting up Arch Linux

Set your timezone, locale, hostname, regenerate another Ramdisk to be safe and set a password

Install grub and modify for ZFS

We will have to install Grub and then modify grub.cfg to boot ZFS due to the way Grub detects the ZFS install

# grub-install --target=i386-pc /dev/sda
# grub-mkconfig -o /boot/grub/grub.cfg

This will likely produce an error such as /usr/bin/grub-probe: error: failed to get canonical path of `/dev/ata-ST...'. The workaround is to create a symlink to the expected partition. If you are using multiple device this will have to be done for each one in the root pool

# ln -s /dev/sdb /dev/device-id-for-hdd
# ln -s /dev/sda2 /dev/device-id-for-ssd-part2

Now we have to edit grub boot file

# vim /boot/grub/grub.cfg

Now we need to modify the file to read the ZFS dataset correctly. In the section 10_linux find every instance of linux   /boot/vmlinuz-linux root=ZFS=/ROOT/default rw  quiet and change it

from:
linux   /boot/vmlinuz-linux root=ZFS=/ROOT/default rw  quiet

to:
linux   /boot/vmlinuz-linux zfs=system rw quiet

There should be 3 instances.

Unmount and reboot

Exit the system and unmount your new Arch system

# exit
# zfs umount -a
# zpool export system

Reboot

That's it. You can now reboot. If everything went ok, then your system should boot and reboot over and over again. If so, enjoy. Now you just need to setup the snapshots, replication and data integrity checks. For that I would recommend looking at this article. For the snapshot and replication I prefer Znapzend. It will create the snapshots, send it to one or more other pools, manage the snapshots over time on the original pool and remote pools and allow the replication to happen at an interval after the snapshot is taken.