ZFS volume manipulations and best practices

I am a strong advocate of ZFS. I have been using ZFS for six years now and it has been a wonderful experience. In that time I have primarily installed consumer harddrives and SSDs, some of which ran bad firmware or poorly written HBA drivers. Many drives have failed, including one tragic night when three drives gave up the ghost.

Surprisingly I have had but one incident of data loss during that time which I am chalking up to my own ignorance of pool recovery and a misconfigured UPS shutdown command. In short that incident was brought on by a power failure and a host causing a lock on a block storage ZVol which in turn caused a kernel panic upon importing the pool. In hindsight I probably could of recovered the data had I mounted the pool in read-only mode and dd-ed the data to a file or rolled back on the transaction group.

Recently I have been restructuring my pools and building a machine for a full offsite replication of my data. During the testing of the backup, it has involved moving the ZVols which is not as thoroughly documented portion in the FreeBSD man pages.

The point of this article is to describe some of the gotchas with managing ZVols, in the hopes that it can spare you living the same pain points.

Setting a user defined zfs property for the LUN value

I highly recommend to set a user defined property on each iSCSI/FC ZVol dataset to keep track of the LU device-id associated with it. The reason is, the device-id is not kept with the ZVol but in fact kept on the OS drive. This can be a problem if you lose your OS drive or move the ZVol to another pool in a zfs send | zfs recv operation, and then want to import the LU from that new location to the same hosts. For example, ESXi will not recognize the same block storage device under a new device-id number.

First grab the LU device-id value:

root@solaris:~# stmfadm list-lu -v | egrep "LU|Data File"
LU Name: 600144F010010600000065677FFF1234
Data File : /dev/zvol/rdsk/tank/zvols/esxi-shared-storage

or

root@freebsd:~ # ctladm devlist
LUN Backend       Size (Blocks)   BS Serial Number    Device ID       
  0 block         401152851968                        600144F010010600000065677FFF1234

Then set the user define property:

epijunkie@anyOS:~# zfs set lu:device-id=600144F010010600000065677FFF1234 tank/zvols/esxi-shared-storage

Command explanation:

Check to see if the property is set:

epijunkie@anyOS:~# zfs get all tank/zvols/esxi-shared-storage
NAME                             PROPERTY              VALUE                             SOURCE
...<REDACTED>...
tank/zvols/esxi-shared-datastore lu:device-id          600144F010010600000065677FFF1234  local

The last line shows the new property, this property will transpire with a zfs send | zfs recv transaction when the -R flag is used on the zfs send side.

Snapshots

This was really a kick-in-the-ass-by-reality when I issued this command and noticed my ZVol had drastically increased the referred data. When taking a snapshot of a ZVol, the volume must be able to change every bit that is reserved to it. Which means if your ZVol is created with 5GB of space and has 2GB written to it, when you create a snapshot the ZVol will now be consuming 7GB of space. The same is true for setting reservations and refreservations.

This seems obvious now now that it is pointed out but it is a hard reality when trying to take a snapshot of a ZVol and are unable to. I was made aware of this behavior when attempting to snapshot a ZVol on a flash based pool which consumed most of the pool capacity. Which brings us to the next topic.

Moving a ZVol using dd

Typically when you want to move a ZVol from one pool to another, the best method is using zfs send | zfs recv. However there are at least two scenarios when this would not be possible: when moving a ZVol from a Solaris pool to a OpenZFS pool or when taking a snapshot is not possible such as the case when there are space constrains. While a zfs send | zfs recv can be done across pools of different zpool versions [1-37;5000], it can not be done across zfs versions [1-6]. This is a problem when sending a dataset from Solaris zfs version 6 and receiving on a OpenZFS based system which is on version 5.

In this situation because a snapshot can not be created, it is recommended to turn off all services that can modify the ZVol as having data at different states will surely cause issues.

root@solaris:~# svcadm disable stmf

or

root@freebsd:~ # service ctld stop

Next lets write the ZVol contents to a gzipped file for transportation.

epijunkie@oldhost:~# dd if=/dev/zvol/rdsk/tank/zvols/esxi-shared-storage | gzip > esxi-shared-storage.zvol.img.raw.gz

The volume sizes must match or the destination needs to be larger, not because ZFS or dd but that the host being served the block device will likely break especially when it attempts to write data to sector of the disk it thought existed. Take note that creating a 3T ZVol on Solaris is not the same size as 3T on FreeBSD. Using the -p flag will display the exact values, not the human friendly equivalents.

root@oldhost:~# zfs get -p volsize tank/zvols/esxi-shared-storage
NAME                               PROPERTY VALUE        SOURCE
tank/zvols/esxi-shared-datastore   volsize  401152851968 local

On the destination machine, first create a ZVol.

root@newhost:~# zfs create -V 401152851968 NEWtank/zvols/esxi-shared-storage

Un-gzip the file and dd to the ZVol.

root@newhost:~# dd if=esxi-shared-storage.zvol.img.raw.gz | gunzip | dd of=/dev/zvol/rdsk/NEWtank/zvols/esxi-shared-storage

Next re-create the LU using the same number as before.

root@newsolaris:~# stmfadm create-lu --lu-prop guid=600144F010010600000065677FFF1234 /dev/zvol/rdsk/NEWtank/zvols/esxi-shared-storage

or

root@newfreebsd:~ # ctladm create -b block -l 0 -d 600144F010010600000065677FFF1234 -o file=/dev/zvol/rdsk/NEWtank/zvols/esxi-shared-storage

Mounting a zpool contained on a ZVol.

Recently I had an experience where a misconfiguration on a virtual FreeBSD machine rendered a non-bootable environment. Fortunately this machine was actually on the to-do list to migrate to a jail, so this expedited the process. Recently this vm was converted over to bhyve so I imported the pool and copied over the needed data. Importing a zpool from a ZVol is not straight forward, even pointing the import command to the directory does not yield any results by default. This is where changing the volmode comes into play. There are 4 user settable modes to place a volume in as follows:

As mentioned before, this particular vm was troublesome and did not import using zpool import -d /dev/zvol/tank/vm-zvol -f -R /mnt/rpool rpool command as it should have. Not wanting to reboot the system and being okay with the changes not saving to the ZVol, I opted to clone the ZVol after explicitly setting the volmode to geom.

root@host:~# zfs set volmode=geom tank/vm-zvol
root@host:~# zfs snapshot tank/vm-zvol@troublesome
root@host:~# zfs clone tank/vm-zvol@troublesome tank/vm-zvol-clone

Now zpool import will recognizes the ZVol as an available pool to import. Import it and mount it to a temporary location of /mnt/temp-vm as not to clobber the existing root.

root@host:~# zpool import -d /dev/zvol/tank/vm-zvol-clone -f -R /mnt/temp-vm tank

Conclusion —- With the development of byhve which can use ZVols as block storage backing for the virtual machines, I suspect ZVol manipulations will become more common among users. Hopefully this will spare you some pain points if not at least make you aware of them.

References —- [1] - FreeBSD man pages - zfs(8)