Add graceful-restart support #2386

karampok · 2024-04-26T11:40:37Z

/kind feature

What this PR does / why we need it:
Issue covers #2368

Special notes for your reviewer:

Release note:

TODO

karampok · 2024-04-26T13:20:33Z

Note to discuss:

Graceful restart configuration changed, reset this peer to take effect
% The Graceful Restart command used is not valid at this moment.
line 76: Failure to communicate[13] to bgpd, line:  neighbor 11.11.11.1 graceful-restart

[311|bfdd] done
[294|zebra] done
[300|bgpd] Configuration file[/etc/frr/frr.conf] processing failure: 13
2024-04-26 13:18:54,515 WARNING: frr-reload.py failed due to
vtysh (exec file) exited with status 13
Failed to fully apply configuration file 1 seconds
Caught SIGHUP and acquired lock! Reloading FRR..
Checking the configuration file syntax

it seems it fails, and retries to apply that the speaker/frr-reloader.

karampok · 2024-04-29T14:10:04Z

The above might be related to FRRouting/frr#8403

 vtysh -c configure -c 'router bgp 7003' -c 'bgp graceful-restart'
Graceful restart configuration changed, reset all peers to take effect
k00-worker:/# echo $?
0
k00-worker:/# vtysh -c configure -c 'router bgp 7003' -c 'neighbor 11.11.11.1 graceful-restart'
Graceful restart configuration changed, reset this peer to take effect
% The Graceful Restart command used is not valid at this moment.
k00-worker:/# echo $?
1

karampok · 2024-04-30T14:42:06Z

In my test: I achieved zero downtime if I have graceful restart either in global bgp graceful restarst or in neighbor mode neighbor x.y.z.k graceful restart and having that in bgp graceful-restart preserve-fw-state.

(Test= in runtime I delete the speaker and see if the client loses connectivity)

The second command that enables the F -bit is only available in global mode, not available per neighbor. Therefore a solution which we add graceful-restart in neighbor level and f-bit in global (because is not safe to just have that there by default) seem not optimal.

@fedepaol How much do you object to have it global (one for enable gracefull restart/one to on/off f-bit), and also have the option to opt out per neighbor? Any other ideas?

karampok · 2024-04-30T14:46:14Z

FRRouting/frr#15880 created this issue upstream, even though we are not really blocked on that (well we will get an error but doe not look like it has an impact)

fedepaol · 2024-05-07T14:18:58Z

The second command that enables the F -bit is only available in global mode, not available per neighbor. Therefore a solution which we add graceful-restart in neighbor level and f-bit in global (because is not safe to just have that there by default) seem not optimal.

@fedepaol How much do you object to have it global (one for enable gracefull restart/one to on/off f-bit), and also have the option to opt out per neighbor? Any other ideas?

Won't the global F-bit affect only peers with gr enabled? If so, doesn't sound too terrible to always enable it.

karampok · 2024-05-07T20:28:04Z

My preference would be to be configurable to be in the safe side. I cannot find argument against always having it (or being default on) but no idea for which case this option exists. Should we go always having it in the config?

In the same topic, should we not allow tuning the timer time? default is 120sec but seems long (ideally we need < holdtime)

fedepaol · 2024-05-08T13:23:28Z

My preference would be to be configurable to be in the safe side. I cannot find argument against always having it (or being default on) but no idea for which case this option exists. Should we go always having it in the config?

I'd try to keep the ux as best as we can, which means not having to set an extra parameter. We can either set the global when at least one peer needs GR, or just always enable it. I am leaning towards the second, as we'll have to deal with peers without GR and the global anyway, so if it doesn't work we'll have a problem.
Also, hopefully CI will tell us if there's a problem.

In the same topic, should we not allow tuning the timer time? default is 120sec but seems long (ideally we need < holdtime)

You mean the graceful restart timer? That should be meaningful only for the helper side I guess (but we may need it in frr-k8s where we might be helpers).

fedepaol · 2024-05-08T13:51:35Z

cc @oribon

fedepaol · 2024-05-14T07:23:32Z

api/v1beta2/bgppeer_types.go

+ // known routes while the routing protocol information is being restored.
+ // Needed for FRR mode only.
+ // +optional
+ GracefulRestart bool `json:"gracefulRestart,omitempty"`


nit: EnableGracefulRestart

I can change that, sure

fedepaol · 2024-05-14T07:26:03Z

api/v1beta2/bgppeer_types.go

nit on the commit subject: this commit does more than adding the field to the api, we are adding the support end to end (which is totally fine, but let's change the subject to "add graceful restart support"

I have three commits and the e2e is added on its own commit, no?

fedepaol · 2024-05-14T07:28:49Z

e2etest/bgptests/bgp.go

+ err := ConfigUpdater.Update(resources)
+ Expect(err).NotTo(HaveOccurred())
+
+ for _, c := range FRRContainers {


I think it'd make sense to have a real e2e test where we shut down a speaker and we check the traffic is still flowing.

First thing that comes to my mind is, focus on a node, exclude it from the daemonset selector, have a pod running only there with etp=local, ensure the traffic flows when the speaker is down

It's probably easier if we raise the graceful restart timers of the peered neighbors, so we'll have control

not sure, because at the end we test whether FRR supports the functionality. Nevertheless, we can implement that, is there any similar test?

Why you think we should raise the timers (default is 120 sec)?

Let's start with the default and see if it's enough then

oribon · 2024-05-15T07:03:48Z

api/v1beta2/bgppeer_types.go

+ // Needed for FRR mode only.
+ // +optional
+ // +kubebuilder:validation:XValidation:rule="self == oldSelf",message="EnableGracefulRestart cannot be changed after creation, because GR requires rest BGP session"


nit: "Needed" should be more about "Supported" here?
also I'd put in the description the point about "this field is immutable because it requires restart of the BGP session" and keep the validation message "EnableGracefulRestart cannot be changed after creation"

sounds good, done

oribon · 2024-05-15T07:06:45Z

internal/config/validation.go

@@ -41,6 +41,9 @@ func DiscardFRROnly(c ClusterResources) error {
 if p.Spec.ConnectTime != nil {
 return fmt.Errorf("peer %s has connect time set on native bgp mode", p.Spec.Address)
 }
+ if p.Spec.EnableGracefulRestart {
+ return fmt.Errorf("peer %s has enabled GR (GracefulRestart) flag set on native bgp mode", p.Spec.Address)


Suggested change

return fmt.Errorf("peer %s has enabled GR (GracefulRestart) flag set on native bgp mode", p.Spec.Address)

return fmt.Errorf("peer %s has GracefulRestart flag set on native bgp mode", p.Spec.Address)

fedepaol · 2024-05-27T15:17:17Z

e2etest/bgptests/bgp.go

+ eg, ctx := errgroup.WithContext(ctx)
+ for _, c := range FRRContainers {
+ validateService(svc, allNodes.Items, c)
+ cc := c // https://go.dev/doc/faq#closures_and_goroutines


can you use cc above (or just c := c) and use c? Seems a bit less confusing.

done (probably not needed anymore though or will be done)

fedepaol · 2024-05-27T15:21:39Z

e2etest/pkg/metallb/metallb.go

+
+ ret := true
+ for _, p := range pods {
+ ret = ret && k8s.PodIsReady(p)


can you just exit early if the pod is not ready?

fedepaol · 2024-05-27T15:23:19Z

e2etest/pkg/metallb/metallb.go

+ }
+
+ f := func(context.Context) (bool, error) {
+ pods, err := SpeakerPods(cs)


this may be quick enough to pick the pods being deleted (and seeing them as ready). We should check that the names changed, or add a non existing node selector to the daemonset in order to guarantee a full restart.

TODO(i time.Sleep(5) for the time being)

fedepaol · 2024-05-27T15:26:05Z

e2etest/pkg/metallb/metallb.go

@@ -32,6 +32,38 @@ func init() {
 }
 }

+func RestartSpeakerPods(cancel context.CancelFunc, cs clientset.Interface) error {


This will return nil the moment we see the pods restarted (bugs aside).
I'd keep this func context un-aware and let it do what it promises (restart the pods) and handle the asynch part from the calling site.

made it non blocksing as we discussed

fedepaol · 2024-05-27T15:26:49Z

e2etest/pkg/metallb/metallb.go

+ }
+ }
+
+ f := func(context.Context) (bool, error) {


this can be embedded as anonymous function in the call below. No need to assign it to a variable.

done, but is there a drawback to have it assigned to variable?

fedepaol · 2024-05-27T15:30:59Z

e2etest/bgptests/bgp.go

+ })
+ }
+
+ err = metallb.RestartSpeakerPods(cancel, cs)


I'd try to simplify this, because it adds complexity and is a bit hard to follow.

How about restarting the pods, running an eventually assertion that checks all the pods are restarted, and inside the eventually body we check that the service keeps being accessible?

What do you mean "are restarted". Are you suggesting blocking in a function which will check speakers are restarted and then continue to an eventuallly gingko body that loops over all external frr docker containers doing in serial (validateServiceNoWait)?

fedepaol · 2024-05-27T15:44:36Z

e2etest/bgptests/metrics.go

@@ -61,6 +62,21 @@ var _ = ginkgo.Describe("BGP metrics", func() {
 })

 ginkgo.BeforeEach(func() {
+ clientconfig := k8sclient.RestConfig()


why this was moved?

fedepaol · 2024-05-27T15:45:49Z

e2etest/bgptests/bgp.go

@@ -98,9 +101,25 @@ var _ = ginkgo.Describe("BGP", func() {
 })

 ginkgo.BeforeEach(func() {
+
+ clientconfig := k8sclient.RestConfig()
+ var err error


why this was moved?

I had to do that because otherwise it fails like https://github.com/metallb/metallb/actions/runs/9267228538/job/25493436433 (a wip commit that I reverse)

There is some state that changes because I delete the speaker containers, so it must that logic go per test not per suite

ah, I saw the reason. The FRR provider holds a reference to the speaker / frr-k8s pods.

This is a bit confusing though. FRRProvider is global, we initialize it two different times and we use it also in other files. I have the feeling this might bit us in the future (for example if we add a new context in a new file).
How about giving the FRRProvider a Refresh() method that updates the speaker? So we re-load the proper speakers after any test that restarts the speakers?

Either that, or we fetch the speakers every time we call Execute. Shouldn't be terrible either as it fetches them from the local cache.

I could go either, maybe the second seems simpler. Do you mean changing here

metallb/e2etest/pkg/frr/provider/provider.go

Line 49 in 95274d9

func (f frrModeProvider) FRRExecutorFor(ns, name string) (executor.Executor, error) {

? Can you expand?

fedepaol · 2024-05-30T12:27:22Z

e2etest/bgptests/bgp.go

@@ -98,9 +101,25 @@ var _ = ginkgo.Describe("BGP", func() {
 })

 ginkgo.BeforeEach(func() {
+
+ clientconfig := k8sclient.RestConfig()
+ var err error


ah, I saw the reason. The FRR provider holds a reference to the speaker / frr-k8s pods.

This is a bit confusing though. FRRProvider is global, we initialize it two different times and we use it also in other files. I have the feeling this might bit us in the future (for example if we add a new context in a new file).
How about giving the FRRProvider a Refresh() method that updates the speaker? So we re-load the proper speakers after any test that restarts the speakers?

Either that, or we fetch the speakers every time we call Execute. Shouldn't be terrible either as it fetches them from the local cache.

fedepaol · 2024-05-30T12:29:29Z

e2etest/bgptests/bgp.go

+ Expect(metallb.RestartSpeakerPodsNoWait(cs)).NotTo(HaveOccurred())
+
+ Eventually(func() error {
+ c := func() error {


why do you need to assign to a variable? Can't you just iterate over the containers and assert from there?

I find easier to read but not strong opinion, I will change it

fedepaol · 2024-05-30T12:30:22Z

e2etest/bgptests/bgp.go

+ }
+
+ if err := c(); err != nil {
+ if !errors.Is(err, ErrStaleRoute) {


why is staleroute skipped?

during the controlplane reboot (when peer has started graceful restart timers), the routes are stale and that is okay(happy path), we should ignore them during that time. We should NOT when we generally check for svc to be validated.

fedepaol · 2024-05-30T12:31:08Z

e2etest/bgptests/bgp.go

+ // }
+ // }()
+
+ Expect(metallb.RestartSpeakerPodsNoWait(cs)).NotTo(HaveOccurred())


I'd stick with the err := and the assert on the next line pattern.

BGP graceful restart functionality as defined in RFC-4724 defines the mechanisms that allows BGP speaker to continue to forward data packets along known routes while the routing protocol information is being restored. This allows DS to be updated without routes being retracted in the peer side. We enable by default if GR then the F-bit (preserve-fw-state) to be set. We make gracefulRestart immutable according to https://kubernetes.io/blog/2022/09/29/enforce-immutability-using-cel/#immutablility-after-first-modification Signed-off-by: karampok <[email protected]>

Signed-off-by: karampok <[email protected]>

- Restarts speakers pod in non-blocking - Wait speakers to be ready and monitor at the same time if downtime BDD Description looks like BGP GracefulRestart, when speakers restart and when GR enabled dataplane should keep working BGP GracefulRestart, when speakers restart when GR disabled dataplane should have a downtime Signed-off-by: karampok <[email protected]>

karampok · 2024-05-31T13:11:50Z

@fedepaol I have completed the last commit with adding the e2e test (plus the reverse to see a failure without graceful restart). I think I addressed all your comments. Let me know if I miss something. Thanks

github-actions bot added the kind/feature label Apr 26, 2024

karampok force-pushed the main branch 4 times, most recently from 14ff603 to 0b43ca8 Compare May 13, 2024 13:32

karampok marked this pull request as ready for review May 13, 2024 14:03

karampok requested review from fedepaol, russellb, gclawes and oribon as code owners May 13, 2024 14:03

fedepaol reviewed May 14, 2024

View reviewed changes

karampok force-pushed the main branch 2 times, most recently from 72e3c27 to d65bb88 Compare May 14, 2024 11:47

oribon reviewed May 15, 2024

View reviewed changes

karampok force-pushed the main branch 4 times, most recently from 50f6988 to 363fa20 Compare May 21, 2024 14:11

karampok force-pushed the main branch 4 times, most recently from 63944e7 to a7cbd33 Compare May 23, 2024 12:54

karampok force-pushed the main branch 2 times, most recently from 95e4970 to e57aee6 Compare May 27, 2024 08:40

fedepaol reviewed May 27, 2024

View reviewed changes

karampok force-pushed the main branch 2 times, most recently from 7459922 to f93f754 Compare May 30, 2024 07:27

fedepaol reviewed May 30, 2024

View reviewed changes

karampok added 2 commits May 30, 2024 14:40

E2E: add test for gracefulRestart

e1fb4d1

Signed-off-by: karampok <[email protected]>

karampok force-pushed the main branch 2 times, most recently from 80948aa to f59de1e Compare May 31, 2024 11:05

karampok force-pushed the main branch from f59de1e to a286563 Compare May 31, 2024 12:12

	return fmt.Errorf("peer %s has enabled GR (GracefulRestart) flag set on native bgp mode", p.Spec.Address)
	return fmt.Errorf("peer %s has GracefulRestart flag set on native bgp mode", p.Spec.Address)

Add graceful-restart support #2386

Are you sure you want to change the base?

Add graceful-restart support #2386

Conversation

karampok commented Apr 26, 2024

karampok commented Apr 26, 2024

karampok commented Apr 29, 2024 • edited

karampok commented Apr 30, 2024

karampok commented Apr 30, 2024

fedepaol commented May 7, 2024 • edited

karampok commented May 7, 2024 • edited

fedepaol commented May 8, 2024

fedepaol commented May 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karampok May 28, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karampok commented May 31, 2024

karampok commented Apr 29, 2024 •

edited

fedepaol commented May 7, 2024 •

edited

karampok commented May 7, 2024 •

edited

karampok May 28, 2024 •

edited