Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Allow zero repo-archive-retention #1926

Open
wants to merge 11 commits into
base: integration
Choose a base branch
from

Conversation

pgstef
Copy link
Member

@pgstef pgstef commented Nov 16, 2022

There are some valid use-cases where someone might want to disable archiving for a specific repository.

With repo-retention-archive set to 0, the archives are pushed only when the check or backup commands are running. That required to add a check lock.

For the multi-repo context, I added the repoIdx into the backup lock, using that info to skip archiving on disabled repos (except the one where the backup is running). Choosing this path could let us add more info to the lock file later if we want.

Imho, since backupStart happens after the lock update inside the backup command, archive-push wouldn't skip archiving the starting wal but I might not be imaginative enough 🤔

Previously the expire command kept all archives after the newest backup. If i.e. the check command is executed, we'd have a broken chain of archives in the disabled repo. So I've updated the expire command to be able to remove those.

If this seems like a good approach, I'll gladly add the tests 😉

Many thanks in advance for the review and advice 😊

Copy link
Member

@dwsteele dwsteele left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did an initial review of this and there are some issues we'll need to address, see comments.

src/build/config/config.yaml Show resolved Hide resolved
src/command/archive/push/file.c Outdated Show resolved Hide resolved
cfgOptionIdxName(cfgOptRepoRetentionArchive, repoData->repoIdx));
destinationCopy[repoListIdx] = false;
continue;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want a warning here? Since this is an intentional setting I don't think we want to fill the log with warnings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently not report any missing wal file in the info command (nor in the verify command the last time I tried). So perhaps not warning but perhaps detail (just like we have for the copied files in the backup command)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to detail in 3b806c6.
This seems important to note that the .backup files aren't pushed either 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.backup files are pretty useless so I'm OK with that. One problem: this detail messages is not going to make it into the main log. It will only show up when logging with --log-subprocess. This is why the warnings are passed back to the main process to be output there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, that's weird because it's not what I saw during the tests (explicitly setting --no-log-subprocess) 🤔

P00   INFO: archive-push command begin 2.44dev: [pg_wal/000000010000000000000016] --exec-id=37899-96f2d7d8 --log-level-console=detail --log-level-file=detail --no-log-subprocess --pg1-path=/var/lib/pgsql/15/main/data --process-max=2 --repo1-cipher-pass=<redacted> --repo2-cipher-pass=<redacted> --repo1-cipher-type=aes-256-cbc --repo2-cipher-type=aes-256-cbc --repo1-host=172.18.0.2 --repo2-host=172.18.0.2 --repo1-host-user=pgbackrest --repo2-host-user=pgbackrest --repo1-path=/repo1 --repo2-path=/repo2 --repo1-retention-archive=0 --stanza=ro8pg

P00 DETAIL: '000000010000000000000016' skipped for repo1, because option 'repo1-retention-archive' is set to 0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has to do with the way local processes are forked off in unit tests. It works differently when running pgbackrest for real, i.e. you won't get the logs for the subprocess in the main log.

@@ -2238,6 +2243,9 @@ cmdBackup(void)
cfgOptionGroupName(cfgOptGrpRepo, cfgOptionGroupIdxDefault(cfgOptGrpRepo)));
}

// Store the repo idx inside the lock file
lockWriteDataP(lockTypeBackup, .repoIdx = VARUINT(cfgOptionGroupIdxDefault(cfgOptGrpRepo)));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not going to work as expected for remote backups since updates to the lock file are only written locally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, that's annoying 😢
I admit I didn't tested on repo-hosts yet. But the all point of this is to only push to the repo where the backup is running (so offline repos don't affect backups on online repos) 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to push the lock file to the pg host where archiving is happening or it will never archive while archive-retention=0.

src/build/help/help.xml Show resolved Hide resolved
@@ -198,75 +198,80 @@ cfgLoadUpdateOption(void)
// If archive retention is valid for the command, then set archive settings
if (cfgOptionValid(cfgOptRepoRetentionArchive))
{
// For each possible repo, check and adjust the settings as appropriate
for (unsigned int optionIdx = 0; optionIdx < cfgOptionGroupIdxTotal(cfgOptGrpRepo); optionIdx++)
// !!! NEED CHECK HERE THAT IF ARCHIVE RETENTION IS 0 THEN ARCHIVE-CHECK MUST BE ENABLED WHEN VALID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a data loss scenario where repo-archive-retention=0 on the pg host and archive-check=n on the repo host. In this case the backup lock will be released too early and WAL segments will probably be lost.

This will be tricky because we somehow need to figure out what the settings for the archive-push process are, not the settings that the backup remote gets. I have thought about this for a bit and I don't see a way to solve it except "don't do that."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, it makes sense to require archive-check to be enabled imo 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this check needs to be done across hosts so that makes it pretty complicated. In other words we need to make sure that archive-check is enabled on the repo host when repo-retention-archive is enabled on the pg host. I don't see any obvious way to do this.

src/command/archive/push/file.c Outdated Show resolved Hide resolved
@vkaplan40
Copy link

Hi, can i please get an update on this patch .
This feature is crucial to my long term backup setup.

@dwsteele
Copy link
Member

Hi, can i please get an update on this patch.

@vkaplan40 This patch requires a new locking scheme and I have not come up with anything I'm happy with. I'm planning to look at locking again in the next release or two, but no guarantees.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants