Feature request: Prune (or skip on creation) empty snapshots #728

Debilski · 2023-08-14T09:19:16Z

We are regularly snapshotting hundreds of people’s home folders and other data, eventually giving us between 10000 (still fine) and 100000 (things are getting much slower) snapshots on one of our storage systems.

While on its own this is not much of a problem, eventually zfs list will take a long time to complete (and I presume some of zrepls features that need a global refresh, will also take longer) with that many snapshots. (And some badly written shell-autocompletion scripts suddenly need 2 minutes to fill in a letter.)

In our case a lot of these file systems do not change regularly (especially not during nighttime hours) so we could probably get rid of > 2/3 of these snapshots by simply pruning empty snapshots that do not contain any new information.

Would it be feasible to have something like this built into zrepl? Either as an additional pruning rule or already at snapshot-creation time (just don’t create a new snapshot when nothing has changed).

The text was updated successfully, but these errors were encountered:

Halfwalker · 2023-08-17T15:01:06Z

I use a threshold hook script to decide whether or not to snapshot. It looks at a configurable property to determine if at least that many bytes have been written to the dataset.

For example, on the root dataset I have it set to 100M - don't really care if I miss changes smaller than that. Other datasets it's set to lower values. So every 15mins when zrepl snaps, that script prevents some/many snaps from happening. Unfortunately they show up as errors, but that's OK.

#!/bin/bash

# Checks the data-written threshold of a zfs dataset for use with zfs-auto-snapshot
# Returns 0 if amount written has not reached threshold
# Returns 1 if over threshold
# If no threshold set, then defaults to 2M (arbitrary)

# Set threshold in bytes like this :
# zfs set com.sun:snapshot-threshold=6000000 pool/dataset

# Enable auto-snapshots with
# zfs set com.sun:auto-snapshot=true pool/dataset

NAME=$1
WRITTEN=$(zfs get -Hpo value written ${NAME})
THRESH=$(zfs get -Hpo value com.sun:snapshot-threshold ${NAME})

# If no threshold set then default to 2mb
if [ "${THRESH}" = "-" ]; then
    echo "     ${NAME} No threshold,  setting to 2M"
    THRESH=2000000
fi

if [ ${WRITTEN} -gt ${THRESH} ]; then
    echo -n "SNAP"
    RC=0
else
    echo -n "----"
    RC=1
fi

echo " ${NAME} Threshold = ${THRESH}, Written = ${WRITTEN}"
exit $RC

Debilski · 2023-08-17T15:08:45Z

Ah, great. Didn’t think of using hooks. That sounds like a reasonable solution for now.

problame · 2023-08-25T21:05:40Z

@Halfwalker putting the threshold logic into a hook is a cool idea.

Just for completeness, I assume you use err_is_fatal=true in the zrepl configuration for you hook?

Looking at the OpenZFS source code: the written property is cheap to get, the accounting for it is done pro-actively by the kernel anyways.

We could definitely get it as part of the zfs list that we already do inside the snapshotter anyways.

The only question is how to hook it up in the config.

I have some longer-arching plans for how to evolve snapshot management, but, in the meantime, maybe we can either

extend the hooks framework to allow pre-snapshot-hooks to indicate that the snapshot should be skipped, or
introduce the notion of a skip decider into the common snapshotter code.

(1) is pretty much self-describing. It seems quite general, but, I'd prefer capturing common use cases through YAML configuration instead of shell hooks.

Hence, an outline of (2):

The skip decider would run for each filesystem that passes the filtesystems filter.
There can be multiple skip deciders that are composable.

leaf:
- command skip decider to implement arbitrary logic
- written threshold (make your use case built-in)
- snapshot_age threshold (make a snapshot if the last one still present is older than X)
composition:
- and: logical AND of two skip filters
- or: logical OR of two skip filters

Let me know what you think!

Halfwalker · 2023-08-25T21:25:15Z

Yup, I currently use err_is_fatal=true in the zrepl job config for the hook.

I do like the idea of formalizing things in the yaml, but also with the capability to shell out to handle arbitrary conditions. From your description above it looks like one would define several leaf configs, and then the decider chain would list them in order of operation, potentially linking some of them together with logical operators.

That sounds cool, flexible.

You could use the same mechanism possibly for the push jobs. I can think of several possibilities -

Only push if a specific interface is available (ie. don't push over laptop wifi, only hardwire from a dock)
Only push if target is on local network
Only push between certain hours
Check conditions on push target (loadavg, network util etc.)

reefland · 2023-08-25T23:33:39Z

When you use the err_is_fatal=true zrepl considers it an error condition and generates log noise that can hides real issues. I opened #715 asking for a way to have hooks report that a snapshot was not taken while also not being an error.

If you think 2/3 of your snapshots can be skipped by using this, that's a lot errors that will flood your log monitoring.

problame added this to Inbox in Snapshot Management via automation Aug 25, 2023

problame added the feature label Aug 25, 2023

problame added the awaiting_feedback label Aug 25, 2023

problame mentioned this issue Aug 26, 2023

Need non-fatal code from hooks to zrepl to not take snapshot #715

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Prune (or skip on creation) empty snapshots #728

Feature request: Prune (or skip on creation) empty snapshots #728

Debilski commented Aug 14, 2023

Halfwalker commented Aug 17, 2023

Debilski commented Aug 17, 2023

problame commented Aug 25, 2023

Halfwalker commented Aug 25, 2023

reefland commented Aug 25, 2023

Feature request: Prune (or skip on creation) empty snapshots #728

Feature request: Prune (or skip on creation) empty snapshots #728

Comments

Debilski commented Aug 14, 2023

Halfwalker commented Aug 17, 2023

Debilski commented Aug 17, 2023

problame commented Aug 25, 2023

Halfwalker commented Aug 25, 2023

reefland commented Aug 25, 2023