-
Notifications
You must be signed in to change notification settings - Fork 881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add graceful-restart support #2386
base: main
Are you sure you want to change the base?
Conversation
Note to discuss:
it seems it fails, and retries to apply that the speaker/frr-reloader. |
The above might be related to FRRouting/frr#8403
|
In my test: I achieved zero downtime if I have graceful restart either in global (Test= in runtime I delete the speaker and see if the client loses connectivity) The second command that enables the F -bit is only available in global mode, not available per neighbor. Therefore a solution which we add graceful-restart in neighbor level and f-bit in global (because is not safe to just have that there by default) seem not optimal. @fedepaol How much do you object to have it global (one for enable gracefull restart/one to on/off f-bit), and also have the option to opt out per neighbor? Any other ideas? |
FRRouting/frr#15880 created this issue upstream, even though we are not really blocked on that (well we will get an error but doe not look like it has an impact) |
Won't the global F-bit affect only peers with gr enabled? If so, doesn't sound too terrible to always enable it. |
My preference would be to be configurable to be in the safe side. I cannot find argument against always having it (or being default on) but no idea for which case this option exists. Should we go always having it in the config? In the same topic, should we not allow tuning the timer time? default is 120sec but seems long (ideally we need < holdtime) |
I'd try to keep the ux as best as we can, which means not having to set an extra parameter. We can either set the global when at least one peer needs GR, or just always enable it. I am leaning towards the second, as we'll have to deal with peers without GR and the global anyway, so if it doesn't work we'll have a problem.
You mean the graceful restart timer? That should be meaningful only for the helper side I guess (but we may need it in frr-k8s where we might be helpers). |
cc @oribon |
14ff603
to
0b43ca8
Compare
api/v1beta2/bgppeer_types.go
Outdated
// known routes while the routing protocol information is being restored. | ||
// Needed for FRR mode only. | ||
// +optional | ||
GracefulRestart bool `json:"gracefulRestart,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: EnableGracefulRestart
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change that, sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit on the commit subject: this commit does more than adding the field to the api, we are adding the support end to end (which is totally fine, but let's change the subject to "add graceful restart support"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have three commits and the e2e is added on its own commit, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
err := ConfigUpdater.Update(resources) | ||
Expect(err).NotTo(HaveOccurred()) | ||
|
||
for _, c := range FRRContainers { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd make sense to have a real e2e test where we shut down a speaker and we check the traffic is still flowing.
First thing that comes to my mind is, focus on a node, exclude it from the daemonset selector, have a pod running only there with etp=local, ensure the traffic flows when the speaker is down
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably easier if we raise the graceful restart timers of the peered neighbors, so we'll have control
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure, because at the end we test whether FRR supports the functionality. Nevertheless, we can implement that, is there any similar test?
Why you think we should raise the timers (default is 120 sec)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's start with the default and see if it's enough then
72e3c27
to
d65bb88
Compare
api/v1beta2/bgppeer_types.go
Outdated
// Needed for FRR mode only. | ||
// +optional | ||
// +kubebuilder:validation:XValidation:rule="self == oldSelf",message="EnableGracefulRestart cannot be changed after creation, because GR requires rest BGP session" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "Needed" should be more about "Supported" here?
also I'd put in the description the point about "this field is immutable because it requires restart of the BGP session" and keep the validation message "EnableGracefulRestart cannot be changed after creation"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, done
internal/config/validation.go
Outdated
@@ -41,6 +41,9 @@ func DiscardFRROnly(c ClusterResources) error { | |||
if p.Spec.ConnectTime != nil { | |||
return fmt.Errorf("peer %s has connect time set on native bgp mode", p.Spec.Address) | |||
} | |||
if p.Spec.EnableGracefulRestart { | |||
return fmt.Errorf("peer %s has enabled GR (GracefulRestart) flag set on native bgp mode", p.Spec.Address) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return fmt.Errorf("peer %s has enabled GR (GracefulRestart) flag set on native bgp mode", p.Spec.Address) | |
return fmt.Errorf("peer %s has GracefulRestart flag set on native bgp mode", p.Spec.Address) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
50f6988
to
363fa20
Compare
63944e7
to
a7cbd33
Compare
95e4970
to
e57aee6
Compare
e2etest/bgptests/bgp.go
Outdated
eg, ctx := errgroup.WithContext(ctx) | ||
for _, c := range FRRContainers { | ||
validateService(svc, allNodes.Items, c) | ||
cc := c // https://go.dev/doc/faq#closures_and_goroutines |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use cc above (or just c := c) and use c? Seems a bit less confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done (probably not needed anymore though or will be done)
e2etest/pkg/metallb/metallb.go
Outdated
|
||
ret := true | ||
for _, p := range pods { | ||
ret = ret && k8s.PodIsReady(p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you just exit early if the pod is not ready?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
e2etest/pkg/metallb/metallb.go
Outdated
} | ||
|
||
f := func(context.Context) (bool, error) { | ||
pods, err := SpeakerPods(cs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this may be quick enough to pick the pods being deleted (and seeing them as ready). We should check that the names changed, or add a non existing node selector to the daemonset in order to guarantee a full restart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO(i time.Sleep(5) for the time being)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
e2etest/pkg/metallb/metallb.go
Outdated
@@ -32,6 +32,38 @@ func init() { | |||
} | |||
} | |||
|
|||
func RestartSpeakerPods(cancel context.CancelFunc, cs clientset.Interface) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will return nil the moment we see the pods restarted (bugs aside).
I'd keep this func context un-aware and let it do what it promises (restart the pods) and handle the asynch part from the calling site.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made it non blocksing as we discussed
e2etest/pkg/metallb/metallb.go
Outdated
} | ||
} | ||
|
||
f := func(context.Context) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be embedded as anonymous function in the call below. No need to assign it to a variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, but is there a drawback to have it assigned to variable?
e2etest/bgptests/bgp.go
Outdated
}) | ||
} | ||
|
||
err = metallb.RestartSpeakerPods(cancel, cs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd try to simplify this, because it adds complexity and is a bit hard to follow.
How about restarting the pods, running an eventually assertion that checks all the pods are restarted, and inside the eventually body we check that the service keeps being accessible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean "are restarted". Are you suggesting blocking in a function which will check speakers are restarted and then continue to an eventuallly gingko body that loops over all external frr docker containers doing in serial (validateServiceNoWait)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored
e2etest/bgptests/metrics.go
Outdated
@@ -61,6 +62,21 @@ var _ = ginkgo.Describe("BGP metrics", func() { | |||
}) | |||
|
|||
ginkgo.BeforeEach(func() { | |||
clientconfig := k8sclient.RestConfig() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this was moved?
e2etest/bgptests/bgp.go
Outdated
@@ -98,9 +101,25 @@ var _ = ginkgo.Describe("BGP", func() { | |||
}) | |||
|
|||
ginkgo.BeforeEach(func() { | |||
|
|||
clientconfig := k8sclient.RestConfig() | |||
var err error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this was moved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to do that because otherwise it fails like https://github.com/metallb/metallb/actions/runs/9267228538/job/25493436433 (a wip commit that I reverse)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some state that changes because I delete the speaker containers, so it must that logic go per test not per suite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, I saw the reason. The FRR provider holds a reference to the speaker / frr-k8s pods.
This is a bit confusing though. FRRProvider is global, we initialize it two different times and we use it also in other files. I have the feeling this might bit us in the future (for example if we add a new context in a new file).
How about giving the FRRProvider a Refresh() method that updates the speaker? So we re-load the proper speakers after any test that restarts the speakers?
Either that, or we fetch the speakers every time we call Execute. Shouldn't be terrible either as it fetches them from the local cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could go either, maybe the second seems simpler. Do you mean changing here
func (f frrModeProvider) FRRExecutorFor(ns, name string) (executor.Executor, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
7459922
to
f93f754
Compare
e2etest/bgptests/bgp.go
Outdated
@@ -98,9 +101,25 @@ var _ = ginkgo.Describe("BGP", func() { | |||
}) | |||
|
|||
ginkgo.BeforeEach(func() { | |||
|
|||
clientconfig := k8sclient.RestConfig() | |||
var err error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, I saw the reason. The FRR provider holds a reference to the speaker / frr-k8s pods.
This is a bit confusing though. FRRProvider is global, we initialize it two different times and we use it also in other files. I have the feeling this might bit us in the future (for example if we add a new context in a new file).
How about giving the FRRProvider a Refresh() method that updates the speaker? So we re-load the proper speakers after any test that restarts the speakers?
Either that, or we fetch the speakers every time we call Execute. Shouldn't be terrible either as it fetches them from the local cache.
e2etest/bgptests/bgp.go
Outdated
Expect(metallb.RestartSpeakerPodsNoWait(cs)).NotTo(HaveOccurred()) | ||
|
||
Eventually(func() error { | ||
c := func() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need to assign to a variable? Can't you just iterate over the containers and assert from there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find easier to read but not strong opinion, I will change it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
e2etest/bgptests/bgp.go
Outdated
} | ||
|
||
if err := c(); err != nil { | ||
if !errors.Is(err, ErrStaleRoute) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is staleroute skipped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
during the controlplane reboot (when peer has started graceful restart timers), the routes are stale and that is okay(happy path), we should ignore them during that time. We should NOT when we generally check for svc to be validated.
e2etest/bgptests/bgp.go
Outdated
// } | ||
// }() | ||
|
||
Expect(metallb.RestartSpeakerPodsNoWait(cs)).NotTo(HaveOccurred()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd stick with the err := and the assert on the next line pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
BGP graceful restart functionality as defined in RFC-4724 defines the mechanisms that allows BGP speaker to continue to forward data packets along known routes while the routing protocol information is being restored. This allows DS to be updated without routes being retracted in the peer side. We enable by default if GR then the F-bit (preserve-fw-state) to be set. We make gracefulRestart immutable according to https://kubernetes.io/blog/2022/09/29/enforce-immutability-using-cel/#immutablility-after-first-modification Signed-off-by: karampok <[email protected]>
Signed-off-by: karampok <[email protected]>
80948aa
to
f59de1e
Compare
- Restarts speakers pod in non-blocking - Wait speakers to be ready and monitor at the same time if downtime BDD Description looks like BGP GracefulRestart, when speakers restart and when GR enabled dataplane should keep working BGP GracefulRestart, when speakers restart when GR disabled dataplane should have a downtime Signed-off-by: karampok <[email protected]>
@fedepaol I have completed the last commit with adding the e2e test (plus the reverse to see a failure without graceful restart). I think I addressed all your comments. Let me know if I miss something. Thanks |
/kind feature
What this PR does / why we need it:
Issue covers #2368
Special notes for your reviewer:
Release note: