Automatic ZFS Snapshots and Backups
Posted by Dave Eddy on Dec 05 2015 - tags: techI recently made a new storage server to replace my old one
to keep up with my growing space requirements (I think 40T should hold me over
for a while!). I store all of my movies, music, tv shows, etc. on it, as well
as all of my backups. All of my laptops and desktop computers also backup to
this server using rsync
.
While it’s all stored on SmartOS using the ZFS filesystem in a raid setup that can handle 2 or more drive failures without data loss, it still worries me because it is all stored in one physical location: my closet. If there is a fire or some other disaster like that, all of my data could potential be lost.
To remedy this, I’ve repurposed the server I replaced (my old storage server) to be an off-site backup server that is used solely for ZFS receive. This server now runs FreeBSD, which you can read about in my blog post here
Automatic Snapshots
Before diving into my off-site backup solution, the first thing to talk about is how I handle automatic ZFS snapshots, and also removing snapshots as they get too old.
zfs-snapshot-all
https://github.com/bahamas10/zfs-snapshot-all
Recursively snapshot all zpools
I use this program to snapshot all zpools on my new storage server automatically in cron. My crontab looks something like this:
1 0 * * * /opt/custom/bin/zfs-snapshot-all automated_daily >> /var/log/autosnap.log 2>&1
2 0 * * 0 /opt/custom/bin/zfs-snapshot-all automated_weekly >> /var/log/autosnap.log 2>&1
3 0 1 * * /opt/custom/bin/zfs-snapshot-all automated_monthly >> /var/log/autosnap.log 2>&1
4 0 1 1 * /opt/custom/bin/zfs-snapshot-all automated_yearly >> /var/log/autosnap.log 2>&1
Snapshots for each pool are created daily, weekly, monthly, and yearly, with a
name like automated_daily_$EPOCH
.
zfs-prune-snapshots
https://github.com/bahamas10/zfs-prune-snapshots
Remove snapshots from one or more zpools that match given criteria
I use this program to remove snapshots as they get too old automatically in cron. My crontab looks something like this:
11 0 * * * /opt/custom/bin/zfs-prune-snapshots -p automated_daily_ 7d >> /var/log/autosnap.log 2>&1
12 0 * * 0 /opt/custom/bin/zfs-prune-snapshots -p automated_weekly_ 4w >> /var/log/autosnap.log 2>&1
13 0 1 * * /opt/custom/bin/zfs-prune-snapshots -p automated_monthly_ 12M >> /var/log/autosnap.log 2>&1
14 0 1 1 * /opt/custom/bin/zfs-prune-snapshots -p automated_yearly_ 10y >> /var/log/autosnap.log 2>&1
This removes daily snapshots after 7 days, weekly snapshots after 4 weeks, monthly snapshots after 12 months, and yearly snapshots after 10 years.
Off-site Backups
With the above 2 programs, snapshots are created and managed automatically on my new storage server, however all of the data still lives on the same physical machine.
The final program is used to ZFS send data from my new storage server to be received on my old storage server, which lives at @papertigerss house.
zincrsend
https://github.com/bahamas10/zincrsend
Incremental ZFS send/recv backup script
NOTE: At the time of this writing, this script (due to laziness) takes configuration data inside the source itself.
At a high level, it works in 3 steps for every dataset you want to backup.
- Create snapshot locally
- Find latest remote snapshot
- Incremental send from latest remote snapshot -> new local snapshot
If no snapshot is found on the remote end, a full ZFS send is performed.
Finally, when the script finishes, I have a pushover notification sent to my phone to let me know how long it took, and if it was successful. An example run looks like:
# ./zincrsend
starting on Fri Dec 4 11:16:00 UTC 2015
processing dataset: goliath/public
creating snapshot locally: goliath/public@zincrsend_1449227760
latest remote snapshot: paper/public@zincrsend_1449173284
zfs sending (incremental) @zincrsend_1449173284 -> goliath/public@zincrsend_1449227760 to paper/public
receiving incremental stream of goliath/public@zincrsend_1449227760 into paper/public@zincrsend_1449227760
received 312B stream in 1 seconds (312B/sec)
processing dataset: goliath/minecraft
creating snapshot locally: goliath/minecraft@zincrsend_1449227763
latest remote snapshot: paper/minecraft@zincrsend_1449173286
zfs sending (incremental) @zincrsend_1449173286 -> goliath/minecraft@zincrsend_1449227763 to paper/minecraft
receiving incremental stream of goliath/minecraft@zincrsend_1449227763 into paper/minecraft@zincrsend_1449227763
received 312B stream in 1 seconds (312B/sec)
processing dataset: goliath/backups
creating snapshot locally: goliath/backups@zincrsend_1449227766
latest remote snapshot: paper/backups@zincrsend_1449173288
zfs sending (incremental) @zincrsend_1449173288 -> goliath/backups@zincrsend_1449227766 to paper/backups
receiving incremental stream of goliath/backups@zincrsend_1449227766 into paper/backups@zincrsend_1449227766
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of goliath/backups/dave@zincrsend_1449227766 into paper/backups/dave@zincrsend_1449227766
received 217MB stream in 367 seconds (607KB/sec)
receiving incremental stream of goliath/backups/skye@zincrsend_1449227766 into paper/backups/skye@zincrsend_1449227766
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of goliath/backups/dad@zincrsend_1449227766 into paper/backups/dad@zincrsend_1449227766
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of goliath/backups/web@zincrsend_1449227766 into paper/backups/web@zincrsend_1449227766
received 3.16MB stream in 5 seconds (647KB/sec)
script ran for ~6 minutes (384 seconds)
pushover sent!
---------------------------------
Conclusion
Because I run these scripts on SmartOS, I have to recreate the crontab upon a reboot, as all of that data is ephemeral. To do this, I have a “boot” service that runs every time the server is restarted, which runs a shell script. I create a persistent directory for both the manifest and the script:
mkdir /opt/custom/boot
Which contains 2 files.
/opt/custom/smf/smartos-boot.xml
<?xml version='1.0'?>
<!DOCTYPE service_bundle SYSTEM '/usr/share/lib/xml/dtd/service_bundle.dtd.1'>
<service_bundle type='manifest' name='export'>
<service name='smartos/boot' type='service' version='0'>
<create_default_instance enabled='true'/>
<single_instance/>
<dependency name='net-physical' grouping='require_all' restart_on='none' type='service'>
<service_fmri value='svc:/network/physical'/>
</dependency>
<dependency name='filesystem' grouping='require_all' restart_on='none' type='service'>
<service_fmri value='svc:/system/filesystem/local'/>
</dependency>
<exec_method name='start' type='method' exec='/opt/custom/boot/method' timeout_seconds='0'/>
<exec_method name='stop' type='method' exec=':true' timeout_seconds='60'/>
<property_group name='startd' type='framework'>
<propval name='duration' type='astring' value='transient'/>
</property_group>
<stability value='Unstable'/>
<template>
<common_name>
<loctext xml:lang='C'>SmartOS boot script</loctext>
</common_name>
</template>
</service>
</service_bundle>
/opt/custom/boot/method
mkdir -p /opt/custom/root
[[ -d /root && ! -L /root ]] && rm -rf /root
[[ ! -L /root ]] && ln -s /opt/custom/root /root
crontab <<-EOF
0 11 * * * date >> /var/log/autosnap.log 2>&1
1 11 * * * /opt/custom/bin/zfs-snapshot-all automated_daily >> /var/log/autosnap.log 2>&1
2 11 * * 0 /opt/custom/bin/zfs-snapshot-all automated_weekly >> /var/log/autosnap.log 2>&1
3 11 1 * * /opt/custom/bin/zfs-snapshot-all automated_monthly >> /var/log/autosnap.log 2>&1
4 11 1 1 * /opt/custom/bin/zfs-snapshot-all automated_yearly >> /var/log/autosnap.log 2>&1
5 11 * * * /opt/custom/bin/zfs-prune-snapshots -p automated_daily_ 7d >> /var/log/autosnap.log 2>&1
6 11 * * 0 /opt/custom/bin/zfs-prune-snapshots -p automated_weekly_ 4w >> /var/log/autosnap.log 2>&1
7 11 1 * * /opt/custom/bin/zfs-prune-snapshots -p automated_monthly_ 12M >> /var/log/autosnap.log 2>&1
8 11 1 1 * /opt/custom/bin/zfs-prune-snapshots -p automated_yearly_ 10y >> /var/log/autosnap.log 2>&1
9 11 * * * /opt/custom/bin/zfs-prune-snapshots -p zincrsend_ 1M >> /var/log/zincrsend.log 2>&1
10 11 * * * /opt/custom/bin/zincrsend >> /var/log/zincrsend.log 2>&1'
EOF
The first 3 lines of the script create a persistent /root
directory effectively,
and the rest of it creates a crontab file that:
- creates automatic snapshots
- deletes old snapshots
- sends data to the off-site backup