Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zrepl pull job fails after restarting service, also failures not shown in prometheus #757

Open
deajan opened this issue Oct 23, 2023 · 2 comments

Comments

@deajan
Copy link

deajan commented Oct 23, 2023

Hello,

Playing around with zrepl 0.6.1 on two RHEL 9 systems.
I've setup a source job that get's pulled via TCP transport.

So far so good, I had to stop zrepl on the puller side after some hours (via systemctl stop zrepl).
Now, when I started zrepl again, I get lots of errors:

On source side, I just got a pretty normal error when I shut down the puller side zrepl:

Oct 20 18:04:39 source.local systemd[1]: Starting zrepl daemon...
Oct 20 18:04:39 source.local systemd[1]: Started zrepl daemon.
Oct 23 01:25:01 source.local zrepl[62571]: [target_lac][rpc.data][8t1w$K31T$K31T.ffm2]: cannot write send stream err="frameconn: shutting down"

On the puller side, I have lots and lots of errors since:

Oct 23 18:34:30 puller.local systemd[1]: Starting zrepl daemon...
Oct 23 18:34:30 puller.local systemd[1]: Started zrepl daemon.
Oct 23 18:44:46 puller.local zrepl[49445]: [pull_tls][endpoint][zyCd$+7dH$V1xB$V1xB.96fi.YKL6.SHJ3]: zfs receive failed local_fs="tank/backups/backups/abcd_stash_mobile" proto_fs="ice/backups/abcd_stash_mobile" err="zfs exited with error: exit status 1\nstderr:\ncannot receive incremental stream: destination tank/backups/backups/abcd_stash_mobile has been modified\nsince most recent snapshot\n" opts="zfs.RecvOptions{RollbackAndForceRecv:false, SavePartialRecvState:true, InheritProperties:[]property.Property{}, OverrideProperties:map[property.Property]string{}}"
Oct 23 18:44:46 puller.local zrepl[49445]: [pull_tls][repl][zyCd$+7dH$V1xB$V1xB.96fi.YKL6]: receive request failed (might also be error on sender) filesystem="ice/backups/abcd_stash_mobile" err="zfs exited with error: exit status 1\nstderr:\ncannot receive incremental stream: destination tank/backups/backups/abcd_stash_mobile has been modified\nsince most recent snapshot\n" rr="Filesystem:\"ice/backups/abcd_stash_mobile\" To:{Name:\"zrepl_2023-10-20T17:04:39.975Z\" Guid:1866063764418384462 CreateTXG:944611 Creation:\"2023-10-20T19:04:39+02:00\"} ClearResumeToken:true ReplicationConfig:{protection:{Initial:GuaranteeResumability Incremental:GuaranteeResumability}}" errType="*zfs.ZFSError"
Oct 23 18:44:47 puller.local zrepl[49445]: [pull_tls][endpoint][zyCd$+7dH$3uav$3uav.3Bbt.q7V/.SaFp]: zfs receive failed proto_fs="ice/backups/abcd_stash/dataset4" local_fs="tank/backups/backups/abcd_stash/dataset4" err="zfs exited with error: exit status 1\nstderr:\ncannot receive incremental stream: destination tank/backups/backups/abcd_stash/dataset4 has been modified\nsince most recent snapshot\n" opts="zfs.RecvOptions{RollbackAndForceRecv:false, SavePartialRecvState:true, InheritProperties:[]property.Property{}, OverrideProperties:map[property.Property]string{}}"
Oct 23 18:44:47 puller.local zrepl[49445]: [pull_tls][repl][zyCd$+7dH$3uav$3uav.3Bbt.q7V/]: receive request failed (might also be error on sender) filesystem="ice/backups/abcd_stash/dataset4" err="zfs exited with error: exit status 1\nstderr:\ncannot receive incremental stream: destination tank/backups/backups/abcd_stash/dataset4 has been modified\nsince most recent snapshot\n" errType="*zfs.ZFSError" rr="Filesystem:\"ice/backups/abcd_stash/dataset4\" To:{Name:\"zrepl_2023-10-20T17:04:39.920Z\" Guid:12584830747523361689 CreateTXG:944610 Creation:\"2023-10-20T19:04:39+02:00\"} ClearResumeToken:true ReplicationConfig:{protection:{Initial:GuaranteeResumability Incremental:GuaranteeResumability}}"
Oct 23 18:44:47 puller.local zrepl[49445]: [pull_tls][endpoint][zyCd$+7dH$Bkmo$Bkmo.rkps.+rn2.Xeeh]: zfs receive failed opts="zfs.RecvOptions{RollbackAndForceRecv:false, SavePartialRecvState:true, InheritProperties:[]property.Property{}, OverrideProperties:map[property.Property]string{}}" err="zfs exited with error: exit status 1\nstderr:\ncannot receive incremental stream: destination tank/backups/backups/dataset3 has been modified\nsince most recent snapshot\n" local_fs="tank/backups/backups/dataset3" proto_fs="ice/backups/dataset3"
Oct 23 18:44:47 puller.local zrepl[49445]: [pull_tls][repl][zyCd$+7dH$Bkmo$Bkmo.rkps.+rn2]: receive request failed (might also be error on sender) err="zfs exited with error: exit status 1\nstderr:\ncannot receive incremental stream: destination tank/backups/backups/dataset3 has been modified\nsince most recent snapshot\n" errType="*zfs.ZFSError" filesystem="ice/backups/dataset3" rr="Filesystem:\"ice/backups/dataset3\" To:{Name:\"zrepl_2023-10-20T17:04:40.475Z\" Guid:2425343861036177484 CreateTXG:944617 Creation:\"2023-10-20T19:04:40+02:00\"} ClearResumeToken:true ReplicationConfig:{protection:{Initial:GuaranteeResumability Incremental:GuaranteeResumability}}"
Oct 23 18:44:47 puller.local zrepl[49445]: [pull_tls][endpoint][zyCd$+7dH$d0Af$d0Af.d8mq.hD0R.cPU4]: zfs receive failed local_fs="tank/backups/backups" proto_fs="ice/backups" opts="zfs.RecvOptions{RollbackAndForceRecv:false, SavePartialRecvState:true, InheritProperties:[]property.Property{}, OverrideProperties:map[property.Property]string{}}" err="zfs exited with error: exit status 1\nstderr:\ncannot receive incremental stream: destination tank/backups/backups has been modified\nsince most recent snapshot\n"
Oct 23 18:44:47 puller.local zrepl[49445]: [pull_tls][repl][zyCd$+7dH$d0Af$d0Af.d8mq.hD0R]: receive request failed (might also be error on sender) err="zfs exited with error: exit status 1\nstderr:\ncannot receive incremental stream: destination tank/backups/backups has been modified\nsince most recent snapshot\n" errType="*zfs.ZFSError" rr="Filesystem:\"ice/backups\" To:{Name:\"zrepl_2023-10-20T17:04:40.229Z\" Guid:1628105291511277732 CreateTXG:944616 Creation:\"2023-10-20T19:04:40+02:00\"} ClearResumeToken:true ReplicationConfig:{protection:{Initial:GuaranteeResumability Incremental:GuaranteeResumability}}" filesystem="ice/backups"
Oct 23 18:44:48 puller.local zrepl[49445]: [pull_tls][endpoint][zyCd$+7dH$EP8Q$EP8Q.Th79.CLwM.R0cy]: zfs receive failed proto_fs="ice/backups/abcd_stash/dataset1" opts="zfs.RecvOptions{RollbackAndForceRecv:false, SavePartialRecvState:true, InheritProperties:[]property.Property{}, OverrideProperties:map[property.Property]string{}}" local_fs="tank/backups/backups/abcd_stash/dataset1" err="zfs exited with error: exit status 1\nstderr:\ncannot receive incremental stream: destination tank/backups/backups/abcd_stash/dataset1 has been modified\nsince most recent snapshot\n"
Oct 23 18:44:48 puller.local zrepl[49445]: [pull_tls][repl][zyCd$+7dH$EP8Q$EP8Q.Th79.CLwM]: receive request failed (might also be error on sender) filesystem="ice/backups/abcd_stash/dataset1" err="zfs exited with error: exit status 1\nstderr:\ncannot receive incremental stream: destination tank/backups/backups/abcd_stash/dataset1 has been modified\nsince most recent snapshot\n" errType="*zfs.ZFSError" rr="Filesystem:\"ice/backups/abcd_stash/dataset1\" To:{Name:\"zrepl_2023-10-20T17:04:39.857Z\" Guid:4655906848194918181 CreateTXG:944609 Creation:\"2023-10-20T19:04:39+02:00\"} ClearResumeToken:true ReplicationConfig:{protection:{Initial:GuaranteeResumability Incremental:GuaranteeResumability}}"
Oct 23 18:44:48 puller.local zrepl[49445]: [pull_tls][endpoint][zyCd$+7dH$L3gF$L3gF.TSV3.25Q7.64PZ]: zfs receive failed proto_fs="ice/backups/abcd_stash" opts="zfs.RecvOptions{RollbackAndForceRecv:false, SavePartialRecvState:true, InheritProperties:[]property.Property{}, OverrideProperties:map[property.Property]string{}}" err="zfs exited with error: exit status 1\nstderr:\ncannot receive incremental stream: destination tank/backups/backups/abcd_stash has been modified\nsince most recent snapshot\n" local_fs="tank/backups/backups/abcd_stash"
Oct 23 18:44:48 puller.local zrepl[49445]: [pull_tls][repl][zyCd$+7dH$L3gF$L3gF.TSV3.25Q7]: receive request failed (might also be error on sender) err="zfs exited with error: exit status 1\nstderr:\ncannot receive incremental stream: destination tank/backups/backups/abcd_stash has been modified\nsince most recent snapshot\n" filesystem="ice/backups/abcd_stash" errType="*zfs.ZFSError" rr="Filesystem:\"ice/backups/abcd_stash\" To:{Name:\"zrepl_2023-10-20T17:04:40.076Z\" Guid:2137059722432639515 CreateTXG:944613 Creation:\"2023-10-20T19:04:40+02:00\"} ClearResumeToken:true ReplicationConfig:{protection:{Initial:GuaranteeResumability Incremental:GuaranteeResumability}}"
Oct 23 18:44:48 puller.local zrepl[49445]: [pull_tls][endpoint][zyCd$+7dH$7TOQ$7TOQ.WGRw.aARu.KWgP]: zfs receive failed err="zfs exited with error: exit status 1\nstderr:\ncannot receive incremental stream: destination tank/backups/backups/abcd_stash/dataset2 has been modified\nsince most recent snapshot\n" local_fs="tank/backups/backups/abcd_stash/dataset2" opts="zfs.RecvOptions{RollbackAndForceRecv:false, SavePartialRecvState:true, InheritProperties:[]property.Property{}, OverrideProperties:map[property.Property]string{}}" proto_fs="ice/backups/abcd_stash/dataset2"
Oct 23 18:44:48 puller.local zrepl[49445]: [pull_tls][repl][zyCd$+7dH$7TOQ$7TOQ.WGRw.aARu]: receive request failed (might also be error on sender) err="zfs exited with error: exit status 1\nstderr:\ncannot receive incremental stream: destination tank/backups/backups/abcd_stash/dataset2 has been modified\nsince most recent snapshot\n" filesystem="ice/backups/abcd_stash/dataset2" errType="*zfs.ZFSError" rr="Filesystem:\"ice/backups/abcd_stash/dataset2\" To:{Name:\"zrepl_2023-10-20T17:04:40.125Z\" Guid:5552329214498588795 CreateTXG:944614 Creation:\"2023-10-20T19:04:40+02:00\"} ClearResumeToken:true ReplicationConfig:{protection:{Initial:GuaranteeResumability Incremental:GuaranteeResumability}}"
Oct 23 18:44:48 puller.local zrepl[49445]: [pull_tls][repl][zyCd$+7dH$+7dH.9Ff2.xN7o.Kndp]: most recent error in this attempt attempt_number="0" err="zfs exited with error: exit status 1\nstderr:\ncannot receive incremental stream: destination tank/backups/backups/abcd_stash/dataset2 has been modified\nsince most recent snapshot\n"
Oct 23 18:44:48 puller.local zrepl[49445]: [pull_tls][repl][zyCd$+7dH$+7dH.9Ff2.xN7o.Kndp]: most recent error cannot be solved by reconnecting, aborting run attempt_number="0"
Oct 23 18:54:45 puller.local zrepl[49445]: [pull_tls][endpoint][zyCd$+7dH$kAT6$kAT6.mLSS.WOWu.G/nH]: zfs receive failed proto_fs="ice/backups/abcd_stash_mobile" err="zfs exited with error: exit status 1\nstderr:\ncannot receive incremental stream: destination tank/backups/backups/abcd_stash_mobile has been modified\nsince most recent snapshot\n" opts="zfs.RecvOptions{RollbackAndForceRecv:false, SavePartialRecvState:true, InheritProperties:[]property.Property{}, OverrideProperties:map[property.Property]string{}}" local_fs="tank/backups/backups/abcd_stash_mobile"
Oct 23 18:54:45 puller.local zrepl[49445]: [pull_tls][repl][zyCd$+7dH$kAT6$kAT6.mLSS.WOWu]: receive request failed (might also be error on sender) filesystem="ice/backups/abcd_stash_mobile" err="zfs exited with error: exit status 1\nstderr:\ncannot receive incremental stream: destination tank/backups/backups/abcd_stash_mobile has been modified\nsince most recent snapshot\n" errType="*zfs.ZFSError" rr="Filesystem:\"ice/backups/abcd_stash_mobile\" To:{Name:\"zrepl_2023-10-20T17:04:39.975Z\" Guid:1866063764418384462 CreateTXG:944611 Creation:\"2023-10-20T19:04:39+02:00\"} ClearResumeToken:true ReplicationConfig:{protection:{Initial:GuaranteeResumability Incremental:GuaranteeResumability}}"
Oct 23 18:54:46 puller.local zrepl[49445]: [pull_tls][endpoint][zyCd$+7dH$e3jN$e3jN.eKpr.gkcY.zFPA]: zfs receive failed err="zfs exited with error: exit status 1\nstderr:\ncannot receive incremental stream: destination tank/backups/backups/abcd_stash/dataset4 has been modified\nsince most recent snapshot\n" opts="zfs.RecvOptions{RollbackAndForceRecv:false, SavePartialRecvState:true, InheritProperties:[]property.Property{}, OverrideProperties:map[property.Property]string{}}" proto_fs="ice/backups/abcd_stash/dataset4" local_fs="tank/backups/backups/abcd_stash/dataset4"
Oct 23 18:54:46 puller.local zrepl[49445]: [pull_tls][repl][zyCd$+7dH$e3jN$e3jN.eKpr.gkcY]: receive request failed (might also be error on sender) err="zfs exited with error: exit status 1\nstderr:\ncannot receive incremental stream: destination tank/backups/backups/abcd_stash/dataset4 has been modified\nsince most recent snapshot\n" errType="*zfs.ZFSError" rr="Filesystem:\"ice/backups/abcd_stash/dataset4\" To:{Name:\"zrepl_2023-10-20T17:04:39.920Z\" Guid:12584830747523361689 CreateTXG:944610 Creation:\"2023-10-20T19:04:39+02:00\"} ClearResumeToken:true ReplicationConfig:{protection:{Initial:GuaranteeResumability Incremental:GuaranteeResumability}}" filesystem="ice/backups/abcd_stash/dataset4"
Oct 23 18:54:46 puller.local zrepl[49445]: [pull_tls][endpoint][zyCd$+7dH$fsaJ$fsaJ.6nRs.tNN3.uuvs]: zfs receive failed opts="zfs.RecvOptions{RollbackAndForceRecv:false, SavePartialRecvState:true, InheritProperties:[]property.Property{}, OverrideProperties:map[property.Property]string{}}" proto_fs="ice/backups/dataset3" err="zfs exited with error: exit status 1\nstderr:\ncannot receive incremental stream: destination tank/backups/backups/dataset3 has been modified\nsince most recent snapshot\n" local_fs="tank/backups/backups/dataset3"
Oct 23 18:54:46 puller.local zrepl[49445]: [pull_tls][repl][zyCd$+7dH$fsaJ$fsaJ.6nRs.tNN3]: receive request failed (might also be error on sender) filesystem="ice/backups/dataset3" err="zfs exited with error: exit status 1\nstderr:\ncannot receive incremental stream: destination tank/backups/backups/dataset3 has been modified\nsince most recent snapshot\n" errType="*zfs.ZFSError" rr="Filesystem:\"ice/backups/dataset3\" To:{Name:\"zrepl_2023-10-20T17:04:40.475Z\" Guid:2425343861036177484 CreateTXG:944617 Creation:\"2023-10-20T19:04:40+02:00\"} ClearResumeToken:true ReplicationConfig:{protection:{Initial:GuaranteeResumability Incremental:GuaranteeResumability}}"
Oct 23 18:54:46 puller.local zrepl[49445]: [pull_tls][endpoint][zyCd$+7dH$eC++$eC++.4A9n.cQ/F.6nfX]: zfs receive failed proto_fs="ice/backups" local_fs="tank/backups/backups" err="zfs exited with error: exit status 1\nstderr:\ncannot receive incremental stream: destination tank/backups/backups has been modified\nsince most recent snapshot\n" opts="zfs.RecvOptions{RollbackAndForceRecv:false, SavePartialRecvState:true, InheritProperties:[]property.Property{}, OverrideProperties:map[property.Property]string{}}"
Oct 23 18:54:46 puller.local zrepl[49445]: [pull_tls][repl][zyCd$+7dH$eC++$eC++.4A9n.cQ/F]: receive request failed (might also be error on sender) errType="*zfs.ZFSError" filesystem="ice/backups" err="zfs exited with error: exit status 1\nstderr:\ncannot receive incremental stream: destination tank/backups/backups has been modified\nsince most recent snapshot\n" rr="Filesystem:\"ice/backups\" To:{Name:\"zrepl_2023-10-20T17:04:40.229Z\" Guid:1628105291511277732 CreateTXG:944616 Creation:\"2023-10-20T19:04:40+02:00\"} ClearResumeToken:true ReplicationConfig:{protection:{Initial:GuaranteeResumability Incremental:GuaranteeResumability}}"
Oct 23 18:54:47 puller.local zrepl[49445]: [pull_tls][endpoint][zyCd$+7dH$Kcrr$Kcrr.PaU5.M4or.7rpX]: zfs receive failed err="zfs exited with error: exit status 1\nstderr:\ncannot receive incremental stream: destination tank/backups/backups/abcd_stash/dataset2 has been modified\nsince most recent snapshot\n" opts="zfs.RecvOptions{RollbackAndForceRecv:false, SavePartialRecvState:true, InheritProperties:[]property.Property{}, OverrideProperties:map[property.Property]string{}}" local_fs="tank/backups/backups/abcd_stash/dataset2" proto_fs="ice/backups/abcd_stash/dataset2"
Oct 23 18:54:47 puller.local zrepl[49445]: [pull_tls][repl][zyCd$+7dH$Kcrr$Kcrr.PaU5.M4or]: receive request failed (might also be error on sender) rr="Filesystem:\"ice/backups/abcd_stash/dataset2\" To:{Name:\"zrepl_2023-10-20T17:04:40.125Z\" Guid:5552329214498588795 CreateTXG:944614 Creation:\"2023-10-20T19:04:40+02:00\"} ClearResumeToken:true ReplicationConfig:{protection:{Initial:GuaranteeResumability Incremental:GuaranteeResumability}}" filesystem="ice/backups/abcd_stash/dataset2" errType="*zfs.ZFSError" err="zfs exited with error: exit status 1\nstderr:\ncannot receive incremental stream: destination tank/backups/backups/abcd_stash/dataset2 has been modified\nsince most recent snapshot\n"
Oct 23 18:54:47 puller.local zrepl[49445]: [pull_tls][endpoint][zyCd$+7dH$dj4t$dj4t.0JQw.jkwK.ksVB]: zfs receive failed proto_fs="ice/backups/abcd_stash/dataset1" opts="zfs.RecvOptions{RollbackAndForceRecv:false, SavePartialRecvState:true, InheritProperties:[]property.Property{}, OverrideProperties:map[property.Property]string{}}" local_fs="tank/backups/backups/abcd_stash/dataset1" err="zfs exited with error: exit status 1\nstderr:\ncannot receive incremental stream: destination tank/backups/backups/abcd_stash/dataset1 has been modified\nsince most recent snapshot\n"
Oct 23 18:54:47 puller.local zrepl[49445]: [pull_tls][repl][zyCd$+7dH$dj4t$dj4t.0JQw.jkwK]: receive request failed (might also be error on sender) err="zfs exited with error: exit status 1\nstderr:\ncannot receive incremental stream: destination tank/backups/backups/abcd_stash/dataset1 has been modified\nsince most recent snapshot\n" rr="Filesystem:\"ice/backups/abcd_stash/dataset1\" To:{Name:\"zrepl_2023-10-20T17:04:39.857Z\" Guid:4655906848194918181 CreateTXG:944609 Creation:\"2023-10-20T19:04:39+02:00\"} ClearResumeToken:true ReplicationConfig:{protection:{Initial:GuaranteeResumability Incremental:GuaranteeResumability}}" filesystem="ice/backups/abcd_stash/dataset1" errType="*zfs.ZFSError"
Oct 23 18:54:47 puller.local zrepl[49445]: [pull_tls][endpoint][zyCd$+7dH$gHzb$gHzb.zeXF.nRVM.2H8L]: zfs receive failed opts="zfs.RecvOptions{RollbackAndForceRecv:false, SavePartialRecvState:true, InheritProperties:[]property.Property{}, OverrideProperties:map[property.Property]string{}}" local_fs="tank/backups/backups/abcd_stash" err="zfs exited with error: exit status 1\nstderr:\ncannot receive incremental stream: destination tank/backups/backups/abcd_stash has been modified\nsince most recent snapshot\n" proto_fs="ice/backups/abcd_stash"
Oct 23 18:54:47 puller.local zrepl[49445]: [pull_tls][repl][zyCd$+7dH$gHzb$gHzb.zeXF.nRVM]: receive request failed (might also be error on sender) rr="Filesystem:\"ice/backups/abcd_stash\" To:{Name:\"zrepl_2023-10-20T17:04:40.076Z\" Guid:2137059722432639515 CreateTXG:944613 Creation:\"2023-10-20T19:04:40+02:00\"} ClearResumeToken:true ReplicationConfig:{protection:{Initial:GuaranteeResumability Incremental:GuaranteeResumability}}" filesystem="ice/backups/abcd_stash" err="zfs exited with error: exit status 1\nstderr:\ncannot receive incremental stream: destination tank/backups/backups/abcd_stash has been modified\nsince most recent snapshot\n" errType="*zfs.ZFSError"
Oct 23 18:54:47 puller.local zrepl[49445]: [pull_tls][repl][zyCd$+7dH$+7dH.9Ff2.Ieb1.edAh]: most recent error in this attempt attempt_number="0" err="zfs exited with error: exit status 1\nstderr:\ncannot receive incremental stream: destination tank/backups/backups/abcd_stash has been modified\nsince most recent snapshot\n"
Oct 23 18:54:47 puller.local zrepl[49445]: [pull_tls][repl][zyCd$+7dH$+7dH.9Ff2.Ieb1.edAh]: most recent error cannot be solved by reconnecting, aborting run attempt_number="0"

As a side note, I've setup prometheus to monitor zrepl, and the dashboard doesn't show any errors, which IMO is quite alarming:

image

Relevant configuration:
Source side /etc/zrepl/zrepl.yml:

global:
  logging:
    # use syslog instead of stdout because it makes journald happy
    - type: syslog
      format: human
      level: warn

jobs:

# Source job for target LAC
  - name: target_lac
    type: source
    serve:
      # NPF: 2023/10/20: We'll keep tls/tcp since it provides better error handling and detection according to author
      type: tls
      listen: :12345
      ca: /etc/zrepl/puller.local.crt
      cert: /etc/zrepl/source.local.crt
      key: /etc/zrepl/source.local.key
      client_cns:
        - puller.local
    filesystems:
      'backups<': true # all filesystems
    send:
      # NPF: we don't use pool encryption
      # Don't make encrypted: false because of placeholder filesystem encryption handling is unspecified in receiver config
      #encrypted: false
      bandwidth_limit:
        max: 250 MiB
    # Snapshots are handled by the separate snap job
    snapshotting:
      type: periodic
      interval: 1h
      prefix: zrepl_
      # (second, optional) minute hour day-of-month month day-of-week
      # This example takes snapshots daily at 3:00.
      # Timestamp format that is used as snapshot suffix.
      # Can be any of "dense" (default), "human", "iso-8601", "unix-seconds" or a custom Go time format (see https://go.dev/src/time/format.go)
      timestamp_format: iso-8601

Puller side /etc/zrepl/zrepl.yml:

global:
  logging:
    # use syslog instead of stdout because it makes journald happy
    - type: syslog
      format: human
      level: warn

  monitoring:
    - type: prometheus
      listen: ':9811'
      listen_freebind: true # optional, default false
jobs:
- name: pull_tls
  type: pull
  connect:
    type: tls
    address: "mysourceserver.tld:12345"
    ca: "/etc/zrepl/source.local.crt"
    cert: "/etc/zrepl/puller.local.crt"
    key: "/etc/zrepl/puller.local.key"
    server_cn: "source.local"
  root_fs: "tank/backups"
  interval: 10m
  recv:
    # NPF: we don't use pool encryption
    bandwidth_limit:
      max: 250 MiB
    placeholder:
      encryption: off
  pruning:
    keep_sender:
    - type: not_replicated
    - type: last_n
      count: 10
    - type: grid
      grid: 1x1h(keep=all) | 24x1h | 14x1d
      regex: "zrepl_.*"
    keep_receiver:
    - type: grid
      grid: 1x1h(keep=all) | 24x1h | 35x1d | 6x30d
      regex: "zrepl_.*"

I'm fairly new to zrepl, so there might be something I didn't see. But so far, just restarting the service made all replications fail, which terrifies me. Any suggestions ?

Thanks.

@deajan deajan changed the title zrepl pull job after restarting service, also failures not shown in prometheus zrepl pull job fails after restarting service, also failures not shown in prometheus Oct 23, 2023
@deajan
Copy link
Author

deajan commented Oct 24, 2023

Well I've roamed the issues and found that mounting the replicas and doing a single ls -l will prevent zrepl from working, since atime will be modified, which is something I obviously did ;)
So far, I've rolledback the snapshots and replication works again.

My two cents:

  1. It would be nice if recv placeholder could set readonly=on and atime=off to prevent these issues
  2. There's still a problem with the prometheus support since nowhere the dashboard shows that replication fails. Could this be improved perhaps ?

@deajan
Copy link
Author

deajan commented Jan 12, 2024

Any thoughts on this ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant