-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Allow zero repo-archive-retention #1926
base: integration
Are you sure you want to change the base?
Conversation
… when backup is running
# Conflicts: # src/common/lock.c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did an initial review of this and there are some issues we'll need to address, see comments.
src/command/archive/push/file.c
Outdated
cfgOptionIdxName(cfgOptRepoRetentionArchive, repoData->repoIdx)); | ||
destinationCopy[repoListIdx] = false; | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want a warning here? Since this is an intentional setting I don't think we want to fill the log with warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently not report any missing wal file in the info command (nor in the verify command the last time I tried). So perhaps not warning but perhaps detail (just like we have for the copied files in the backup command)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to detail in 3b806c6.
This seems important to note that the .backup files aren't pushed either 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.backup files are pretty useless so I'm OK with that. One problem: this detail messages is not going to make it into the main log. It will only show up when logging with --log-subprocess. This is why the warnings are passed back to the main process to be output there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, that's weird because it's not what I saw during the tests (explicitly setting --no-log-subprocess
) 🤔
P00 INFO: archive-push command begin 2.44dev: [pg_wal/000000010000000000000016] --exec-id=37899-96f2d7d8 --log-level-console=detail --log-level-file=detail --no-log-subprocess --pg1-path=/var/lib/pgsql/15/main/data --process-max=2 --repo1-cipher-pass=<redacted> --repo2-cipher-pass=<redacted> --repo1-cipher-type=aes-256-cbc --repo2-cipher-type=aes-256-cbc --repo1-host=172.18.0.2 --repo2-host=172.18.0.2 --repo1-host-user=pgbackrest --repo2-host-user=pgbackrest --repo1-path=/repo1 --repo2-path=/repo2 --repo1-retention-archive=0 --stanza=ro8pg
P00 DETAIL: '000000010000000000000016' skipped for repo1, because option 'repo1-retention-archive' is set to 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has to do with the way local processes are forked off in unit tests. It works differently when running pgbackrest for real, i.e. you won't get the logs for the subprocess in the main log.
@@ -2238,6 +2243,9 @@ cmdBackup(void) | |||
cfgOptionGroupName(cfgOptGrpRepo, cfgOptionGroupIdxDefault(cfgOptGrpRepo))); | |||
} | |||
|
|||
// Store the repo idx inside the lock file | |||
lockWriteDataP(lockTypeBackup, .repoIdx = VARUINT(cfgOptionGroupIdxDefault(cfgOptGrpRepo))); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not going to work as expected for remote backups since updates to the lock file are only written locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, that's annoying 😢
I admit I didn't tested on repo-hosts yet. But the all point of this is to only push to the repo where the backup is running (so offline repos don't affect backups on online repos) 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to push the lock file to the pg host where archiving is happening or it will never archive while archive-retention=0.
@@ -198,75 +198,80 @@ cfgLoadUpdateOption(void) | |||
// If archive retention is valid for the command, then set archive settings | |||
if (cfgOptionValid(cfgOptRepoRetentionArchive)) | |||
{ | |||
// For each possible repo, check and adjust the settings as appropriate | |||
for (unsigned int optionIdx = 0; optionIdx < cfgOptionGroupIdxTotal(cfgOptGrpRepo); optionIdx++) | |||
// !!! NEED CHECK HERE THAT IF ARCHIVE RETENTION IS 0 THEN ARCHIVE-CHECK MUST BE ENABLED WHEN VALID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a data loss scenario where repo-archive-retention=0 on the pg host and archive-check=n on the repo host. In this case the backup lock will be released too early and WAL segments will probably be lost.
This will be tricky because we somehow need to figure out what the settings for the archive-push process are, not the settings that the backup remote gets. I have thought about this for a bit and I don't see a way to solve it except "don't do that."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, it makes sense to require archive-check to be enabled imo 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but this check needs to be done across hosts so that makes it pretty complicated. In other words we need to make sure that archive-check is enabled on the repo host when repo-retention-archive is enabled on the pg host. I don't see any obvious way to do this.
Hi, can i please get an update on this patch . |
@vkaplan40 This patch requires a new locking scheme and I have not come up with anything I'm happy with. I'm planning to look at locking again in the next release or two, but no guarantees. |
There are some valid use-cases where someone might want to disable archiving for a specific repository.
With
repo-retention-archive
set to 0, the archives are pushed only when the check or backup commands are running. That required to add a check lock.For the multi-repo context, I added the repoIdx into the backup lock, using that info to skip archiving on disabled repos (except the one where the backup is running). Choosing this path could let us add more info to the lock file later if we want.
Imho, since backupStart happens after the lock update inside the backup command, archive-push wouldn't skip archiving the starting wal but I might not be imaginative enough 🤔
Previously the expire command kept all archives after the newest backup. If i.e. the check command is executed, we'd have a broken chain of archives in the disabled repo. So I've updated the expire command to be able to remove those.
If this seems like a good approach, I'll gladly add the tests 😉
Many thanks in advance for the review and advice 😊