Migrating to RAID + LVM

These are my notes from migrating a Debian Stretch system to RAID1 + LVM.

I found several guides for adding redundant storage to an existing system. However none of them did quite what I wanted.

The specific commands are for a BIOS boot system. To use UEFI, they would need some adjustment. I include /boot inside the LVM.

Questions

Can I put /boot on RAID and/or LVM?
Can I switch between UEFI and BIOS boot?
Can I put EFI System Partition on RAID?
Do I need to avoid changing the hostname if I have MD RAID?

Q1: Can I put /boot on RAID and/or LVM?

GRUB2 claims support for /boot on LVM. The Debian wiki claims their installer can configure RAID1 + LVM, apparently including /boot. On the other hand when you look at Fedora documentation, they don’t support this. Perhaps Fedora have reasons to consider this a corner case which is not well supported. The same Fedora section does however imply support for /boot on RAID.

Setting it up on Debian, I didn’t see warnings of any issue or caveat. That said, if you ever want to use the GRUB2 command line to select and access an LVM volume, I found an undocumented bug which affects how you do this.

Q2: Can I switch between UEFI and BIOS boot?

I’m following the current Fedora installer, and using GPT disklabels even though this system uses BIOS boot.

When switching from UEFI to BIOS boot, grub will need a very small GPT partition. This replaces the 31 “reserved sectors” in MBR. I use 1MB to keep things simple and even.

This means it’s not too hard to switch between grub-efi and grub-pc, if you needed to in future.

If you want to be able to switch from grub-pc to grub-efi, just remember you’ll need to leave enough space for an EFI System Partition (ESP).

Q3: Can I put EFI System Partition on RAID?

To get UEFI to boot from a disk, you need an ESP. So how would you set this up for redundancy, when one of the RAID disks fails?

One answer is to depend on specific motherboard firmware support, aka fakeRAID. This is just software RAID where the specific format is also supported by the boot firmware. (Equally, if you already have a real hardware RAID which is supported by firmware, then you’re done). As far as I could tell no-one really likes fakeraid, except the companies who sell it. I’d much rather stick with the reliable mdadm.

The other answer is to just clone the ESP to each disk. This assumes the ESP is not changed e.g. for kernel updates. I believe this works correctly on Debian.

It’s not so easy on Fedora, as they decided to put grub.cfg on the ESP. At the same time as RedHat are promoting UEFI on servers. Technically this is more elegant and robust against other types of change. However I haven’t seen any suggestions about replicatating updates to multiple ESPs.

Q4: Do I need to avoid changing the hostname if I have MD RAID?

The full “name” of an MD is structured as hostname:devname. hostname should match any HOMEHOST in /etc/mdadm/mdadm.conf AND the same in the initrd. If not, auto-assembly will rename the array to a “remote” name (append _0). If you mount based on the device name e.g. /dev/md/0, this is what will break your boot.

names can be updated manually to different values while assembling (-U homehost or -U name).

An example of updating the device name on Debian:

mdadm --stop /dev/md/debian
mdadm --assemble -U name --name lair /dev/md/lair /dev/vdb1

# write a new ARRAY line for "lair"
mdadm --detail --scan | sudo tee -a /etc/mdadm/mdadm.conf

# remove old ARRAY line for "debian"
vi /etc/mdadm/mdadm.conf  
update-initramfs -u -k all

This can be adapted to change the homehost by instead using hosthost:devname, or -U homehost --homehost hostname.

Process

Outline:

Copy the current system to a degraded RAID on the new disk.
Boot the system from RAID.
Very carefully double-check, then have the RAID swallow the old disk.

I tested this inside a VM. On a physical machine, it will show sda instead of vda, etc.

# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vda    254:0    0   10G  0 disk
└─vda1 254:1    0   10G  0 part /
vdb    254:16   0   11G  0 disk

Step 1: Create degraded RAID on the new disk

Partition the extra disk with GPT:

# fdisk /dev/vdb
                                                
Welcome to fdisk (util-linux 2.29.2).             
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.
                
Command (m for help): g                    
        
Created a new GPT disklabel (GUID: 1C6EB8C4-B40A-4C9C-98AE-BA7E33F1A549).

Command (m for help): n
Partition number (1-128, default 1):
First sector (2048-23068638, default 2048):
Last sector, +sectors or +size{K,M,G,T,P} (2048-23068638, default 23068638): +9G

Created a new partition 1 of type 'Linux filesystem' and of size 9 GiB.

Command (m for help): n                                                
Partition number (2-128, default 2):                                   
First sector (18876416-23068638, default 18876416):                    
Last sector, +sectors or +size{K,M,G,T,P} (18876416-23068638, default 23068638): +1M                                          
                                                                
Created a new partition 2 of type 'Linux filesystem' and of size 1 MiB.

Command (m for help): t
Partition number (1,2, default 2): 2
Hex code (type L to list all codes): L
1 EFI System                     C12A7328-F81F-11D2-BA4B-00A0C93EC93B
2 MBR partition scheme           024DEE41-33E7-11D3-9D69-0008C781F39F
3 Intel Fast Flash               D3BFE2DE-3DAF-11DF-BA40-E3A556D89593
4 BIOS boot                      21686148-6449-6E6F-744E-656564454649
...

Hex code (type L to list all codes): 4

Changed type of partition 'Linux filesystem' to 'BIOS boot'.

Command (m for help): p
Disk /dev/vdb: 11 GiB, 11811160064 bytes, 23068672 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 1C6EB8C4-B40A-4C9C-98AE-BA7E33F1A549

Device        Start      End  Sectors Size Type
/dev/vdb1      2048 18876415 18874368   9G Linux filesystem
/dev/vdb2  18876416 18878463     2048   1M BIOS boot

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.

Create a RAID1, in a degraded state as if one of the two disks had been removed:

# apt-get install mdadm

# mdadm --create /dev/md/lair --level=1 --raid-devices=2 missing /dev/vdb1
mdadm: Note: this array has metadata at the start and
may not be suitable as a boot device.  If you plan to
store '/boot' on this device please ensure that
your boot-loader understands md/v1.x metadata, or use
--metadata=0.90
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md/debian_root started.

# mdadm --detail /dev/md/lair
/dev/md/debian_root:
        Version : 1.2
Creation Time : Mon Sep  4 09:51:57 2017
Raid Level : raid1
Array Size : 9428992 (8.99 GiB 9.66 GB)
Used Dev Size : 9428992 (8.99 GiB 9.66 GB)
Raid Devices : 2
Total Devices : 1
Persistence : Superblock is persistent

Update Time : Mon Sep  4 09:51:57 2017
        State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0

        Name : debian9-vm:lair  (local to host debian9-vm)
        UUID : 9dbcb6f5:988d6450:cbac9699:ee8bb59c
        Events : 0

Number   Major   Minor   RaidDevice State
-       0        0        0      removed
1     254       17        1      active sync   /dev/vdb1

Enter the array in mdadm.conf:

# mdadm --detail --scan | tee --append /etc/mdadm/mdadm.conf
ARRAY /dev/md/lair metadata=1.2 name=debian9-vm:lair UUID=9dbcb6f5:988d6450:cbac9699:ee8bb59c

I configured mdadm with a monthly scrub, and a monitoring daemon. Finally, rebuild the initramfs now we have updated mdadm.conf. Apparently this is necessary to activate the array at boot time. (There’s an alternative approach if you use Fedora’s dracut initramfs, which also ignores the homehost).

# dpkg-reconfigure mdadm

The monitoring daemon sends mail if something happens to the array (by default, to root). This isn’t necessarily going to reach you.

For the time being, I redirected root mail to my normal user. Note Debian’s MDA does not actually support sending mail to root. It recommends that you always reconfigure you mail server to redirect it somewhere more sensible. It asks if you want to change several other settings, but I left those all alone. Whenever you have a shell open, Debian will let you know if you have any unix mail.

# dpkg-reconfigure exim4-config

Set up LVM on the RAID volume:

# pvcreate /dev/md/lair
Physical volume "/dev/md/lair" successfully created.

# vgcreate vg_lair /dev/md/lair
Volume group "vg_lair" successfully created

# lvcreate vg_lair -n lv_debian -L 8G
Logical volume "lv_debian" created.

# mkfs.ext4 /dev/mapper/vg_lair-lv_debian
mke2fs 1.43.4 (31-Jan-2017)
Creating filesystem with 2097152 4k blocks and 524288 inodes
Filesystem UUID: 5e90a94c-a5b3-4d14-a5d5-d20d6dd97e2a
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

Allocating group tables: done 
Writing inode tables: done 
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done 

Copy the / filesystem to the Logical Volume. (Note that I didn’t have any other filesystems).

Ideally you would do this from a live system, to avoid data changing in the middle of the copy! You can also shut the system down and enter “single user mode”. E.g. boot to the GRUB menu, then edit the line with linux / vmlinuz to include systemd.unit=rescue.target, before booting it.

# mount /dev/mapper/vg_lair-lv_debian /mnt
# time cp -ax /. /mnt/

Step 1.5: Adjust the copied system for booting

Chroot into the new system. Update /etc/fstab so it points to the new device. I use /dev/mapper/vg_lair-lv_debian just because it’s convenient. It should be equally fine to use a UUID.

Make sure fstab doesn’t rely on any filesystems OR swap partitions on the old device.

# cd /mnt
# mount --bind /dev dev
# mount --bind /proc proc
# mount --bind /sys sys
# chroot .
# vi /etc/fstab

Regenerate the GRUB configuration file on the new system, so it refers to the new device.

# update-grub

Apparently we can ignore this message:

/usr/sbin/grub-probe: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image..

as it just means the RAID is degraded. You will also see

WARNING: Failed to connect to lvmetad. Falling back to device scanning.

Now we leave the chroot:

# exit

Next, run os-prober on the original system, and the original GRUB will pick up the new configuration as an extra item in its boot menu!

# apt-get install os-prober
# update-grub

It is also possible to boot into the RAID manually using the GRUB prompt. This would avoid hacking around with chroot - or you might want to resort to this if something went wrong. This requires several more details, and a specific sequence to ensure that GRUB scans for LVM on the RAID array. For the latter, see “undocumented bug” above.

Step 2: Reboot into the new system.

# lsblk
NAME                    MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
vda                     254:0    0   11G  0 disk  
vdb                     254:16   0   11G  0 disk  
├─vdb1                  254:17   0    9G  0 part  
│ └─md127                 9:127  0    9G  0 raid1 
│   └─vg_lair-lv_debian 252:0    0    8G  0 lvm   /
└─vdb2                  254:18   0    1M  0 part 

Install GRUB with the new system. This is what populates the BIOS boot partition. At this point you could choose to only install to the new disk (vdb), keeping the old GRUB (on vda). Then you could have two system disks, each independent of the other. (Except for what we’ve done to mdadm.conf… the Arch Wiki suggests updating that in the chroot step instead).

# apt-get install os-prober
# update-grub
# dpkg-reconfigure grub-pc

If you had a swap partition which has been moved, you will need need to regenerate Debian’s initramfs. Otherwise it will look for hibernation resume image in the wrong place, and there may be a delay during boot.

# vi /etc/initramfs-tools/conf.d/resume
# update-initramfs -u -k all

Step 3: Overwrite the old system with the RAID

You should double-check that the new system disk you created is independent. Try booting with the old disk removed before continuing. Then

Replace the old disk’s partitioning with GPT, if necessary.

Create a partition for RAID, using the exact same size. You may be prompted to erase an old filesystem signature. Also create the necessary BIOS boot partition.

Add the RAID partition.

# mdadm --add /dev/md/lair /dev/vda1
mdadm: added /dev/vda1

# cat /proc/mdstat 
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 
md127 : active raid1 vda1[2] vdb1[1]
    9428992 blocks super 1.2 [2/1] [_U]
    [===========>.........]  recovery = 57.4% (5421440/9428992) finish=1.4min speed=46151K/sec
    
unused devices: <none>

# cat /sys/class/block/md127/md/sync_speed_max 
200000 (system)

# mdadm --detail /dev/md/vg_lair
/dev/md/vg_lair:
        Version : 1.2
Creation Time : Mon Sep  4 09:51:57 2017
    Raid Level : raid1
    Array Size : 9428992 (8.99 GiB 9.66 GB)
Used Dev Size : 9428992 (8.99 GiB 9.66 GB)
Raid Devices : 2
Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Tue Sep  5 12:39:53 2017
        State : clean, degraded, recovering 
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1

Rebuild Status : 69% complete

        Name : debian9-vm:vg_lair  (local to host debian9-vm)
        UUID : 9dbcb6f5:988d6450:cbac9699:ee8bb59c
        Events : 1035

    Number   Major   Minor   RaidDevice State
    2     254        1        0      spare rebuilding   /dev/vda1
    1     254       17        1      active sync   /dev/vdb1

Finally, make sure GRUB is installed on both disks.

# dpkg-reconfigure grub-pc

When you test booting with only one disk, remember you will then need to manually add the disk back again using mdadm --add. Note that this is exactly the same command as before. It does not check for a matching RAID superblock. It will immediately start overwriting the entire partition. Type carefully!