-
-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encrypted send to untrusted target not quite working ... pretty sure it's a simple config issue #742
Comments
Sorry for the delayed reply. I think I understand the problem. The
dont' have snapshots because they're placeholder filesystem. And so, the replication planner code bails out early here: zrepl/replication/logic/replication_logic.go Lines 343 to 359 in 2d8c369
Would you be willing to test a fix for this? You'd download a zrepl binary from the GitHub CI and replace your distro's zrepl binaries with it. Also, can you try this workaround: what if you change the - type: source
name: laptop_source
filesystems:
- "bondi3/zrepl<": true
+ "bondi3/zrepl/halflapp/halflapp/home/alice": true |
Before this PR, when chaining replication from A => B => C, if B had placeholders and the `filesystems` included these placeholders, we'd incorrectly fail the planning phase with error `sender does not have any versions`. The non-placeholder child filesystems of these placeholders would then fail to replicate because of the initial-replication-dependency-tracking that we do, i.e., their parent failed to initially replication, hence they fail to replicate as well (`parent(s) failed during initial replication`). We can do better than that because we have the information whether a sender-side filesystem is a placeholder. This PR makes the planner act on that information. The outcome is that placeholders are replicated as placeholders (albeit the receiver remains in control of how these placeholders are created, i.e., `recv.placeholders`) The mechanism to do it is: 1. Don't plan any replication steps for filesystems that are placeholders on the sender. 2. Ensure that, if a receiving-side filesystem exists, it is indeed a placeholder. Check (2) may seem overly restrictive, but, the goal here is not just to mirror all non-placeholder filesystems, but also to mirror the hierarchy. TODO: - test with user - regression test fixes #742
Untested fix in: #744 Please try it for both initial replication and also a few incremental replication runs after the initial replication. Binaries available from CircleCI (click the appropriate |
Just tried the new binary on widey - no luck :( You did mean for it to be replaced on the pull/target system right ? I picked it from To be sure, here are the md5sums of the original 0.6.1 (zrepl.orig) and the new test one (zrepl)
It gives the exact same error |
The planner runs on the active side of the replication setup. Just to make sure you deployed the binary correctly: you need to
My PR adds debug log messages.
|
Yup - stopped the service, then in
Here's the syslog looking for placeholder (didn't find "sender filesystem")
|
Hm, that's not helpful. Please run
on the middle node of your replication chain, i.e., the bondi host. And just to make sure, please confirm your setup looks like this
|
Placeholders on bondi - only looking at the
The setup is almost right above ... ryzen2 pushes to bondi (since it's almost alway on - main workstation) |
Ok, so, setup is
I pushed a new commit to the PR, please wait for CI to finish and download the new artifacts, the try the replication again. |
Nice :) Fast work, mikey likes it. Same CircleCI url above ? I don't know CircleCI - github-actions/gitlab-CI/Drone yes, not CircleCI |
The 1.21 one is is part of the first stage, i.e., in the group of 3 boxes, the one in the middle.
Ah, I forgot to mention: with the commit that I added, it's necessary to update the sending side as well, i.e., you need to deploy the updated binary to both |
Ah, OK. Done. Errors out, but different ...
|
Hm, yeah, the root_fs dataset The following should do the trick: - type: source
name: laptop_source
filesystems:
- "bondi3/zrepl": true
+ "bondi3/zrepl": false
+ "bondi3/zrepl<": true If that doesn't work, try - type: source
name: laptop_source
filesystems:
- "bondi3/zrepl": true
+ "bondi3/zrepl/halflapp<": true
+ "bondi3/zrepl/ryzen2<": true Doc context: https://zrepl.github.io/configuration/filter_syntax.html#pattern-filter (I haven't thought about Please report back which one worked. |
Bingo ! That was it - replicating the halflapp dataset now. I'll post here if the ryzen2 one fails
|
OK, the ryzen2 one seemed to backup fine, so that's good. I set up a timer to trigger backups, came back after a while to see zrepl was stuck in planning stage for over an hour. bondi seemed to have trouble -
I don't know if it's related. |
Point of interest - it's definitely the replication from ryzen2 to bondi that kicks that spl |
Well, nice to hear that it works. But this is definitely a ZFS bug, zrepl just uses the CLI and that should never cause panics. we may still need to work around the issue. What’s the OS and ZFS version? |
Doh - just remembered, I had NOT replaced the bondi is ubuntu 18.04, using zfs from the jonathan repo
ryzen2 is ubuntu 22.04, also with zfs from jonathan
Panic on bondi looks like this
I think it's something in the dataset doing it ... I moved back to the original |
That seems to have done the trick ... Changed the bondi config to point to a new dataset. Replication from both ryzen2 and halflapp to bondi worked fine with the new binary. Replication from bondi to widey is ongoing, and seems to be working fine. So something got mucked up in the original Question - can multiple |
Sort of answering my own question above ... From the docs https://zrepl.github.io/quickstart/fan_out_replication.html it appears that each pull client wants its own source job on the server. But the config seems to indicate that you can have multiple client_cns in the job. I noticed with 2 clients (widey and another one wideload) pulling at different times from bondi one or the other would start failing with missing snapshots. Which was odd, since it was only ryzen2 that was managing the pruning on bondi Refresher ...
Experimenting with a 2nd box wideload to ALSO pull those backups. Kind of another question ... The pull jobs on widey and wideload had guarantee_incremental set. But I switched it to guarantee_recumability. Now seems to have a persistent error (yet replication works)
What's the right way to clean this up ? |
Encrypted send to untrusted target not quite working ...
I have a backup dataset
bondi3/zrepl
on a server bondi that receives laptop backups. The dataset is encrypted, and laptop sends are plain, so they're re-encrypted on bondi. For offsite backups, I want to replicate this to a backup datasetwider/zrepl
on remote server widey. Should be fairly simple ...The two laptop sub-datasets under
bondi3/zrepl
- both look like thisbondi zrepl.yml source job
widey zrepl.yml pull job
Except the two laptop backups (halflapp and ryzen2) fail ... this is the status on the remote widey
The text was updated successfully, but these errors were encountered: