Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prune/Retention Policy #3

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from
Draft

Conversation

ngharo
Copy link
Contributor

@ngharo ngharo commented Feb 28, 2020

Closes #2

This is a work in progress.

This uses a super simple algorithm to determine with snapshots to keep, see listKeepers function.

The PruneSnapshots command currently does not perform any destructive operations on your ZFS datasets. The debug logging is how I've been testing. I want to write some tests but I wanted to get this out there for more discussion around the listKeepers function.

(Side note: I don't know why commit f9b040a is showing up in this PR since it's already in master ???)

@ngharo ngharo mentioned this pull request Feb 28, 2020
@ngharo
Copy link
Contributor Author

ngharo commented Feb 28, 2020

I'm still figuring out tests so i've been testing by setting retention periods in my globals and running the prune command with -v for debug logging.

zackup prune [host [...]] [flags]
ngharo@datalord:~$ ./zackup prune apu2.ngha.ro -v
[Feb 27 23:58:04.509] DEBUG commands: increase log level (debug)
[Feb 27 23:58:04.510]  INFO commands: config tree read root=/etc/zackup
[Feb 27 23:58:04.593] DEBUG app: keeping snapshot bucket=daily snapshot={Ds:tank/data/backup/apu2.ngha.ro@2020-02-27T09:15:59Z Time:2020-02-27 09:15:59 +0000 UTC}
[Feb 27 23:58:04.593] DEBUG app: keeping snapshot bucket=daily snapshot={Ds:tank/data/backup/apu2.ngha.ro@2020-02-26T08:41:00Z Time:2020-02-26 08:41:00 +0000 UTC}
...

@dmke
Copy link
Member

dmke commented Feb 28, 2020

I don't know how I yet feel about string-ish duration types when we have time.Duration in Go, but I also haven't looked into the code. I'll do that later.

Regarding the "Update documentation to match reality" commit

This is probably a side effect of me rebasing your last PR (#1) onto master (instead of merging it). To resolve this (on your end, in this PR), you need to do something like this:

$# add this repo as "upstream"
$ git remote add upstream [email protected]:digineo/zackup

$# verify remotes
$ git remote -v
origin	[email protected]:ngharo/zackup (fetch)
origin	[email protected]:ngharo/zackup (push)
upstream	[email protected]:digineo/zackup (fetch)
upstream	[email protected]:digineo/zackup (push)

$# fetch changes from upstream
$ git fetch upstream master
From github.com:digineo/zackup
 * branch            master     -> FETCH_HEAD
 * [new branch]      master     -> upstream/master

$# where are we now?
$ git status
On branch feature/prune
nothing to commit, working tree clean

$# rebase onto upstream master. this will replace f9b040a with 9398c53
$ git rebase upstream/master 
 README.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)
First, rewinding head to replay your work on top of it...
Applying: Laying out snapshot pruning functionality
Applying: Loop over each bucket when finding keepers

$# do a similar thing with your master branch
$ git checkout master
$ git reset --hard upstream/master

$# update your repository
$ git push --force-with-lease origin

Copy link
Member

@dmke dmke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some comments, which requires a bit of refactoring.

For now, I've avoided to review the listKeepers function, because I believe it will look quite different after you've addressed the other comments. I'll also need some time to think about the patterns variable.

app/prune.go Outdated Show resolved Hide resolved
app/prune.go Outdated Show resolved Hide resolved
app/prune.go Outdated Show resolved Hide resolved
config/job.go Outdated Show resolved Hide resolved
app/prune.go Outdated Show resolved Hide resolved
Comment on lines +130 to +131
for _, ss := range strings.Fields(o.String()) {
ts, err := time.Parse(time.RFC3339, strings.Split(ss, "@")[1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string.Fields(o.String()) and strings.Split produce quite a few allocations. We can avoid them with something like this:

s := bufio.NewScanner(o)
for s.Scan() {
  name := s.Bytes()
  at := bytes.IndexRune(line, '@')
  if at < 0 {
    // malformed snapshot name
    continue
  }

  ts, err := time.Parse(time.RFC3339, string(line[at]:) // might be off-by-one
  ...
}

if err := s.Err(); err != nil {
  // could not parse the output
  log(...)
  return nil
}

This parses the output line-by-line (with a bufio.Scanner), and converts only the part after the @ in each line to a string.

app/prune.go Outdated Show resolved Hide resolved
@dmke
Copy link
Member

dmke commented Mar 2, 2020

I've been thinking about the patterns map for a bit now, and I've concluded, that this is not an elegant solution; mainly due to the copying values from and to different maps. I do agree though, that using (time.Time).Format is a nifty trick. But since Go implementation has no format verb for ISO week (as you've notices), it makes the implementation too bit clunky.

What I'd prefer to have instead is a list of (interval, keepCount) tuples to define the retention policy. This brings a few other changes with it. I hope I've covered everything here, but please feel free to ask if you have questions :-)

First, we want/need a few constants to translate the RetentionConfig into a list of buckets:

const (
  daily   = 24*time.Hour
  weekly  = 7*daily
  monthly = 30*daily
  yearly  = 360*daily
)

type bucket struct {
  interval time.Duration
  keep     *int
}

func PruneSnapshots(job *config.JobConfig) {
  // keep sorted by interval (or call sort.StringSlice for safety)
  buckets := []bucket{
    {daily, job.Retention.Daily},
    {weekly, job.Retention.Weekly},
    {monthly, job.Retention.Monthly},
    {yearly, job.Retention.Yearly},
  }

  // ...
}

Then, instead of a function listKeepers([]snapshot, string, *int) []snapshot , I'd make []bucket a distinct type and add an apply method:

// A retentionPolicy is an ordered list of buckets, which define a
// keep-or-delete policy for a set of snapshots.
type retentionPolicy []bucket

// apply partitions the given input snapshot list into two new slices: one
// containing snapshots to keep and one containing snapshots to delete.
// You need to pass a reference time for the policy's buckets to anchor on.
func (rp retentionPolicy) apply(now time.Time, snapshots []snapshot) (toKeep, toDelete []snapshot) {
  // iterate over rp and snapshots and sort snapshots[i] in either toKeep or
  // toDelete, depending on weather snapshots[i].Time < now.Add(rp[j].interval)

  // A naïve implementation could collect a list of candidates per entry in rp
  // and define toKeep as the unique set of all candidates (and toDelete as
  // snapshots - toKeep). This is easy to implement in more functional languages,
  // but a bit of a hassle in Go.
}

This would reduce PruneSnapshots() to:

func PruneSnapshots(job *config.JobConfig) {
  policy := retentionPolicy{
    {daily, job.Retention.Daily},
    // ...
  }

  snapshots := listSnapshots(job.Host)
  _, del := policy.apply(time.Now(), snapshots)

  deleteSnapshots(del) // TODO
}

Notes on testing:

  • passing time.Now() as reference allows to either read static test fixtures or vary the timestamp
  • returning both a list of snapshots to delete and to not delete allows to test for correctness + completeness
    • (returning only one of those would also require finding a better suited name than apply ;-))
  • we can test the matching of snapshots to buckets independently of running zfs list
    • I'd go so far and move the parsing of zfs list into a separate function and test that function against test fixtures
A table-driven test as sample
func TestRetentionPolicy_apply(t *testing.T) {
  now := time.Now()
  const D = 24*time.Hour
  intptr := func(i int) *int { return &i }

  cases := map[string]struct{
    subject  retentionPolicy
    input    []snapshot
    // only the names of input
    keepNames, deleteName []string
  }{
    "test demo": {
      subject:     retentionPolicy{{D, intptr(2)}},
      input:       []snapshot{{"a", now}, {"b", now.Add(-1*D}, {"c", now.Add(-2*D}},
      keepNames:   []string{"a", "b"},
      deleteNames: []string{"c"},
    },
    // ... other more non-trivial cases ...
  }

  for name := range cases {
    tc := cases[name]
    t.Run(name, func(t *testing.T) {
      keep, del := tc.subject.apply(now, tc.input)
      // assertEqual(tc.toKeep, [s.name for s in keep])
      // assertEqual(tc.toDelete, [s.name for s in del])
    }
  }
}

In another test, we could define a single retentionPolicy instance + a single snapshot slice, and vary the value of now.

@dmke dmke added the enhancement New feature or request label Mar 2, 2020
@dmke dmke changed the title Feature/prune Prune/Retention Policy Mar 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pruning old snapshots
2 participants