What I will be detailing here is how to replace a root ZFS rpool disk on proxmox, we have all had it where a disk fails, and some systems are simpler than others.
When replacing a root ZFS disk in proxmox it's not as simple as just replacing the disk you also need to partition it and install grub.
First off you will need to make sure that you have a replacement disk the same size or larger, most systems will support hot swap but if yours doesn't then you will need to shutdown the server so do ensure you allow time for this specially if you are running a production server with live VPS/Containers
First check the status of your pool
zpool status
pool: rpool
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
scan: scrub repaired 0B in 00:03:38 with 0 errors on Sun May 8 00:27:39 2022
config:
NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
sda3, UNAVAIL 3 194 0
sdb3, ONLINE 0 0 0
errors: No known data errors
As you will see on rpool this is degraded and one disk is offline sda3, if your disk is not offline then you should take it offline before replacing it run zpool offline rpool /dev/sda3
to take this disk offline.
Once you have the disk ofline you can then replace the disk, if your machine does not support hot swap then shutdown and replace.
After the disk has been replaced check new disk is visible, you can use fdisk -l
to list your disks in this instance the new disk will be sda which is the same as the disk that was replaced this is not always the case and could be labelled as sdc,sde etc....
- As proxmox uses GPT we need to copy the partition table layout with the right tool, sgdisk.
Copy your partition layout from a good disk to your new disk sdb, here we copy the partition table from sdb to sda (sgdisk
-R ) sgdisk /dev/sdb -R /dev/sda
- Next we need to randomize the disk guids, this may not always be required but I feel that this is good practice to do it anyway (sgdisk -G
). sgdisk -G /dev/sda
The operation has completed successfully.
3. You now need to resilver/rebuild the ZFS rpool onto the new disk, once you start the resilver process it's advised that you don't reboot your machine until its completed. You can check the status with the command `zpool status` once the rebuild has completed move on to the next step
zpool replace -f rpool /dev/sda3 /dev/sda3
4. With the proxmox-boot-tool to install and setup the efi partition which is setup as partition 2 on the bootable disks setup by the proxmox installer.
proxmox-boot-tool format /dev/sda2
proxmox-boot-tool init /dev/sda2
You should now have a healthy rpool.
If you need to expand the pool as you have replaced both disk with a larger disks of the same size then you will need to set auto expand on and then use parted to expand the partitions.
Set auto expand on
zpool set autoexpand=on rpool
Install parted if you don't have it already on the system.
apt install parted
User parted to expand the zfs partition, we will start with sdb once complete this will also need to be actioned on sda
parted /dev/sdb
GNU Parted 3.4 Using /dev/sdb Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print
Model: ATA WDC WD1005FBYZ-0 (scsi) Disk /dev/sdb: 1000GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags:
Number Start End Size File system Name Flags 1 17.4kB 1049kB 1031kB bios_grub 2 1049kB 538MB 537MB fat32 boot, esp 3 538MB 120GB 119GB zfs
(parted) resizepart 3 100%
(parted) quit
Once you have done this to to both of the rpool disk you can check if the size of the pool has expanded by running `zpool list` here we went from 120GB disks to 1TB disks
zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT rpool 931G 77.6G 853G - - 1% 8% 1.00x ONLINE -
Hope this help you.