Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signal decoupling #462

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

calistoc
Copy link
Contributor

@calistoc calistoc commented Apr 8, 2021

Changed zrepl cli signaling decoupling the signals for each task, snapshot, replicate, prune and reset.

$ zrepl [snapshot|replicate|prune|reset] JOB

Added the same signals in the zrepl status GUI.

Every signal will just do what it says, nothing more, as an example the snapshot signal will just do snapshots and won’t trigger replication nor pruning.

Should address #451

It took me a while to figure out how zrepl works with the signaling and I had to change quite a few files, hopefully I haven’t missed anything.
Let me know what you think.

@problame
Copy link
Member

First of all, thanks for iterating on this.

I think @InsanePrawn should have communicated more clearly what we were working on when we reverted the original signal snapshot commit from the status-v2 PR in #452 .
We have a system in mind where signalling a job (or an activity within it, e.g. 'take snapshot' or 'trigger replication') returns a token that can be used to poll the daemon for completion. E.g.:

$ zrepl signal JOB [snapshot|replication]
b8cf4462-3960-4754-86e0-6f927c3fee52
$ zrepl wait b8cf4462-3960-4754-86e0-6f927c3fee52

The signal command would have a --wait flag that does the polling internally, like so:

$ zrepl signal --wait JOB [snapshot|replication]

The current work on this feature is in branch https://github.com/zrepl/zrepl/tree/trigger-wait-reset-rearchitecture .
It includes the decoupling of the signals by sub-activity of the job, i.e. snapshot, replication, prune-sender, prune-receiver.
Work has stalled because

  • I'm swamped with work on my thesis ATM
  • There are some implementation level issues, e.g., whether we want to have even more fine-grained signals in the future, e.g., on the file system levle (zrepl signal JOB replication data/set/name).

Sadly, the branch is not exactly in a state where it can be picked up by someone other than myself.

Since you are obviously interested in this topic, maybe we (= @calistoc @InsanePrawn and myself) should find some time to discuss how we proceed on this? (Feel free to join the IRC chat, freenode #zrepl )

@calistoc
Copy link
Contributor Author

I think @InsanePrawn should have communicated more clearly what we were working on when we reverted the original signal snapshot commit from the status-v2 PR in #452 .

Yes I read it but I didn’t know where you were with the reworking
Considering I was missing the snapshot signal I added it again for myself plus some more changes, then I thought to share it in case it could be useful to you and others

The current work on this feature is in branch https://github.com/zrepl/zrepl/tree/trigger-wait-reset-rearchitecture .
It includes the decoupling of the signals by sub-activity of the job, i.e. snapshot, replication, prune-sender, prune-receiver.

That looks great!

Work has stalled because

* I'm swamped with work on my thesis ATM

* There are some implementation level issues, e.g., whether we want to have even more fine-grained signals in the future, e.g., on the file system levle (`zrepl signal JOB replication data/set/name`).

If you know this will take some time you may consider my PR as a step forward to the same direction, otherwise just reject it and no worries from my side :)

@problame
Copy link
Member

If you know this will take some time you may consider my PR as a step forward to the same direction, otherwise just reject it and no worries from my side :)

I'll have to think a little more about whether the work-in-progress will be a strict subset of the externally visible behavior. If it's a strict subset we might as well do it. If not then I'd rather let the PR sit open and set some artificial deadline for the work-in-progress :D

Every signal will just do what it says, nothing more, as an example the snapshot signal will just do snapshots and won’t trigger replication nor pruning.

The question whether a manually-triggered snapshot invocation should trigger invocation was an open design question with the work-in-progress. Did you have a specific reason to not trigger replicatione? Can you think of use cases that would need it? What do you think of zrepl signal JOB snapshot --trigger-replication={yes,no} (In the work-in-progress design, this would effectively add payload to signals).

@calistoc
Copy link
Contributor Author

The question whether a manually-triggered snapshot invocation should trigger invocation was an open design question with the work-in-progress. Did you have a specific reason to not trigger replicatione? Can you think of use cases that would need it?

Not many to be honest, let say you need some additional snapshots but the network is overloaded so having automatic replication could create network issues.

Anyhow talking about cli commands I just thought that being very specific about what is triggered would give more flexibility, that imho would mean more use cases could be addressed even those we still can’t think of.

What do you think of zrepl signal JOB snapshot --trigger-replication={yes,no} (In the work-in-progress design, this would effectively add payload to signals).

This looks good but I would prefer the following, especially with scripting:

zrepl signal -–wait snapshot JOB
zrepl signal -–wait replicate JOB

I really like the idea of the --wait with signals, once it will be implemented of course, also my PR would benefit of the --wait as it would be better to be sure some tasks are completed before running other signals with the same JOB and snapshots.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants