Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Prune (or skip on creation) empty snapshots #728

Open
Debilski opened this issue Aug 14, 2023 · 5 comments
Open

Feature request: Prune (or skip on creation) empty snapshots #728

Debilski opened this issue Aug 14, 2023 · 5 comments

Comments

@Debilski
Copy link

We are regularly snapshotting hundreds of people’s home folders and other data, eventually giving us between 10000 (still fine) and 100000 (things are getting much slower) snapshots on one of our storage systems.

While on its own this is not much of a problem, eventually zfs list will take a long time to complete (and I presume some of zrepls features that need a global refresh, will also take longer) with that many snapshots. (And some badly written shell-autocompletion scripts suddenly need 2 minutes to fill in a letter.)

In our case a lot of these file systems do not change regularly (especially not during nighttime hours) so we could probably get rid of > 2/3 of these snapshots by simply pruning empty snapshots that do not contain any new information.

Would it be feasible to have something like this built into zrepl? Either as an additional pruning rule or already at snapshot-creation time (just don’t create a new snapshot when nothing has changed).

@Halfwalker
Copy link

I use a threshold hook script to decide whether or not to snapshot. It looks at a configurable property to determine if at least that many bytes have been written to the dataset.

For example, on the root dataset I have it set to 100M - don't really care if I miss changes smaller than that. Other datasets it's set to lower values. So every 15mins when zrepl snaps, that script prevents some/many snaps from happening. Unfortunately they show up as errors, but that's OK.

#!/bin/bash

# Checks the data-written threshold of a zfs dataset for use with zfs-auto-snapshot
# Returns 0 if amount written has not reached threshold
# Returns 1 if over threshold
# If no threshold set, then defaults to 2M (arbitrary)

# Set threshold in bytes like this :
# zfs set com.sun:snapshot-threshold=6000000 pool/dataset

# Enable auto-snapshots with
# zfs set com.sun:auto-snapshot=true pool/dataset

NAME=$1
WRITTEN=$(zfs get -Hpo value written ${NAME})
THRESH=$(zfs get -Hpo value com.sun:snapshot-threshold ${NAME})

# If no threshold set then default to 2mb
if [ "${THRESH}" = "-" ]; then
    echo "     ${NAME} No threshold,  setting to 2M"
    THRESH=2000000
fi

if [ ${WRITTEN} -gt ${THRESH} ]; then
    echo -n "SNAP"
    RC=0
else
    echo -n "----"
    RC=1
fi

echo " ${NAME} Threshold = ${THRESH}, Written = ${WRITTEN}"
exit $RC

@Debilski
Copy link
Author

Ah, great. Didn’t think of using hooks. That sounds like a reasonable solution for now.

@problame problame added this to Inbox in Snapshot Management via automation Aug 25, 2023
@problame
Copy link
Member

@Halfwalker putting the threshold logic into a hook is a cool idea.

Just for completeness, I assume you use err_is_fatal=true in the zrepl configuration for you hook?


Looking at the OpenZFS source code: the written property is cheap to get, the accounting for it is done pro-actively by the kernel anyways.

We could definitely get it as part of the zfs list that we already do inside the snapshotter anyways.

The only question is how to hook it up in the config.

I have some longer-arching plans for how to evolve snapshot management, but, in the meantime, maybe we can either

  1. extend the hooks framework to allow pre-snapshot-hooks to indicate that the snapshot should be skipped, or
  2. introduce the notion of a skip decider into the common snapshotter code.

(1) is pretty much self-describing. It seems quite general, but, I'd prefer capturing common use cases through YAML configuration instead of shell hooks.

Hence, an outline of (2):

The skip decider would run for each filesystem that passes the filtesystems filter.
There can be multiple skip deciders that are composable.

  • leaf:
    • command skip decider to implement arbitrary logic
    • written threshold (make your use case built-in)
    • snapshot_age threshold (make a snapshot if the last one still present is older than X)
  • composition:
    • and: logical AND of two skip filters
    • or: logical OR of two skip filters

Let me know what you think!

@Halfwalker
Copy link

Yup, I currently use err_is_fatal=true in the zrepl job config for the hook.

I do like the idea of formalizing things in the yaml, but also with the capability to shell out to handle arbitrary conditions. From your description above it looks like one would define several leaf configs, and then the decider chain would list them in order of operation, potentially linking some of them together with logical operators.

That sounds cool, flexible.

You could use the same mechanism possibly for the push jobs. I can think of several possibilities -

  • Only push if a specific interface is available (ie. don't push over laptop wifi, only hardwire from a dock)
  • Only push if target is on local network
  • Only push between certain hours
  • Check conditions on push target (loadavg, network util etc.)

@reefland
Copy link

When you use the err_is_fatal=true zrepl considers it an error condition and generates log noise that can hides real issues. I opened #715 asking for a way to have hooks report that a snapshot was not taken while also not being an error.

If you think 2/3 of your snapshots can be skipped by using this, that's a lot errors that will flood your log monitoring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

4 participants