Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for #1739 breaks localPref for different prefix lengths #2345

Open
8 tasks done
busterswt opened this issue Mar 27, 2024 · 11 comments
Open
8 tasks done

Fix for #1739 breaks localPref for different prefix lengths #2345

busterswt opened this issue Mar 27, 2024 · 11 comments

Comments

@busterswt
Copy link

busterswt commented Mar 27, 2024

MetalLB Version

0.13.11

Deployment method

Not relevant

Main CNI

Calico

Kubernetes Version

1.26.10

Cluster Distribution

Platform9

Describe the bug

When upgrading from MetalLB v0.13.7 to v0.13.11, our team experienced an issue when applying BGPAdvertisements that resulted in the following error:

Error from server (invalid local preference 100: local preferernce 0 was already set for the same type of BGP update. Check existing BGP advertisements with common pools and aggregation lengths): error when creating "bgpadvertisement1.yaml": admission webhook "bgpadvertisementvalidationwebhook.metallb.io" denied the request: invalid local preference 100: local preferernce 0 was already set for the same type of BGP update. Check existing BGP advertisements with common pools and aggregation lengths

This message/fix was introduced in #1820 as a result of the issue described in #1739. That issues provides an example of two BGPAdvertisements using the same aggregationLength of /32 and different localPref values. However, the upstream documentation at https://metallb.universe.tf/configuration/_advanced_bgp_configuration/ claims that different localPref values can be used when the aggregationLength is different.

To Reproduce

To reproduce the issue, one could apply a configuration similar to that described in https://metallb.universe.tf/configuration/_advanced_bgp_configuration/.. Our local configuration looks like the following:

[host]# cat  bgpadvertisement1.yaml
apiVersion: metallb.io/v1beta1
kind: BGPAdvertisement
metadata:
  name: bgpadvertisement1
  namespace: metallb-system
spec:
  aggregationLength: 32
  ipAddressPools:
  - default
  communities:
    - 65535:65282
  localPref: 100

[host]# cat  bgpadvertisement2.yaml
apiVersion: metallb.io/v1beta1
kind: BGPAdvertisement
metadata:
  name: bgpadvertisement2
  namespace: metallb-system
spec:
  aggregationLength: 27
  ipAddressPools:

Applying both results in the following:

Error from server (invalid local preference 100: local preferernce 0 was already set for the same type of BGP update. Check existing BGP advertisements with common pools and aggregation lengths): error when creating "bgpadvertisement1.yaml": admission webhook "bgpadvertisementvalidationwebhook.metallb.io" denied the request: invalid local preference 100: local preferernce 0 was already set for the same type of BGP update. Check existing BGP advertisements with common pools and aggregation lengths

Expected Behavior

Currently, the code compares localPref values and fails if they're not equal.

https://github.com/metallb/metallb/blob/main/internal/config/config.go#L841-L850

If localPref values can actually be different based on the aggregationLength of the advertisement (same pool), we'd expect no error. If this is not intended behavior, then the documentation at https://metallb.universe.tf/configuration/_advanced_bgp_configuration/ needs to be updated.

Additional Context

N/A

I've read and agree with the following

  • I've checked all open and closed issues and my request is not there.
  • I've checked all open and closed pull requests and my request is not there.

I've read and agree with the following

  • I've checked all open and closed issues and my issue is not there.
  • This bug is reproducible when deploying MetalLB from the main branch
  • I have read the troubleshooting guide and I am still not able to make it work
  • I checked the logs and MetalLB is not discarding the configuration as not valid
  • I enabled the debug logs, collected the information required from the cluster using the collect script and will attach them to the issue
  • I will provide the definition of my service and the related endpoint slices and attach them to this issue
@busterswt busterswt added the bug label Mar 27, 2024
@busterswt
Copy link
Author

Collect logs attached.
metallb_report.tgz

@busterswt
Copy link
Author

Correction on our configuration, copy and paste error.

[host]# cat  bgpadvertisement1.yaml
apiVersion: metallb.io/v1beta1
kind: BGPAdvertisement
metadata:
  name: bgpadvertisement1
  namespace: metallb-system
spec:
  aggregationLength: 32
  ipAddressPools:
  - default
  communities:
    - 65535:65282
  localPref: 100

[host]# cat  bgpadvertisement2.yaml
apiVersion: metallb.io/v1beta1
kind: BGPAdvertisement
metadata:
  name: bgpadvertisement2
  namespace: metallb-system
spec:
  aggregationLength: 27
  ipAddressPools:
  - default

@oribon
Copy link
Member

oribon commented Apr 1, 2024

thanks a lot for raising this!
I can confirm this is easily reproducible and I think it is indeed a bug.
What I think is happening is that we don't return here (advertisementsAreCompatible) early:

	if adv.AggregationLength != newAdv.AggregationLength && adv.AggregationLengthV6 != newAdv.AggregationLengthV6 {
		return true
	}

since both of the AggregationLengthV6 are equal (unspecified) even though you might not care about v6 (I guess not since your pool is v4 only), and we return false later (on something valid for the wrong reasons).

Can you try only as a hack for now to specify a different aggregationLengthV6 to your advs as well and see if it reaches a good state?

@busterswt
Copy link
Author

@oribon Thanks for the response! Let me work with my team and see if we can test this workaround in the next few days.

@oribon
Copy link
Member

oribon commented Apr 10, 2024

For some clarity when someone takes this: an advertisement for a service (together with its aggregation length) should have at most one localpref defined. The bug mentioned here happens because in the advertisementsAreCompatible function we look at both the v4 and v6 lengths in any case, even when the relevant ips are only v4 -> when the v6 length isn't specified for both advs they'll be equal (the default) and the first conditional of the function doesn't help.
We need to make sure that we consider only the right aggregation length(s) when validating.

@pratik705
Copy link

[host]# cat bgpadvertisement1.yaml
apiVersion: metallb.io/v1beta1
kind: BGPAdvertisement
metadata:
  name: bgpadvertisement1
  namespace: metallb-system
spec:
  aggregationLength: 32
  ipAddressPools:
  - default
  localPref: 100

[host]# cat bgpadvertisement2.yaml
apiVersion: metallb.io/v1beta1
kind: BGPAdvertisement
metadata:
  name: bgpadvertisement2
  namespace: metallb-system
spec:
  aggregationLength: 27
  aggregationLengthV6: 124
  ipAddressPools:
  - default
[host]# kubectl  apply -f bgpadvertisement1.yaml
bgpadvertisement.metallb.io/bgpadvertisement1 created

[host]# kubectl  apply -f bgpadvertisement2.yaml
bgpadvertisement.metallb.io/bgpadvertisement2 created

@oribon With the above configuration, I am able to apply the BGP advertisement with different 'aggregationLength/aggregationLengthV6' while specifying 'localPref' for a single resource sharing the same IP address pool (default).

Can you please help confirm if the above configuration is valid? Should two BGP advertisements sharing the same address pool can contain the same 'localPref'?

@AlinaSecret
Copy link
Contributor

I would be happy to work on this one:)

@oribon
Copy link
Member

oribon commented Apr 10, 2024

Can you please help confirm if the above configuration is valid? Should two BGP advertisements sharing the same address pool can contain the same 'localPref'?

@pratik705 yeah it seems fine (as a hack) assuming the pool is v4 only.

@AlinaSecret thanks! assigning this to you :)

@pratik705
Copy link

@oribon Just want to share my observarions:

If I set localPref to same value then it works even with same aggregationLengthV6.[1]

[host]# cat bgpadvertisement1.yaml
apiVersion: metallb.io/v1beta1
kind: BGPAdvertisement
metadata:
  name: bgpadvertisement1
  namespace: metallb-system
spec:
  aggregationLength: 32
  aggregationLengthV6: 128
  ipAddressPools:
  - default
  localPref: 100

[host]# cat bgpadvertisement2.yaml
apiVersion: metallb.io/v1beta1
kind: BGPAdvertisement
metadata:
  name: bgpadvertisement2
  namespace: metallb-system
spec:
  aggregationLength: 27
  aggregationLengthV6: 128
  ipAddressPools:
  - default
  localPref: 100
[host]# kubectl  apply -f bgpadvertisement1.yaml
bgpadvertisement.metallb.io/bgpadvertisement1 created
[host]# kubectl  apply -f bgpadvertisement2.yaml
bgpadvertisement.metallb.io/bgpadvertisement2 created

@oribon
Copy link
Member

oribon commented Apr 10, 2024

@oribon Just want to share my observarions:

If I set localPref to same value then it works even with same aggregationLengthV6.[1]

That is intended, what shouldn't work is specifying different localprefs for the same advertisement (combined with its aggregation length). What this bug is saying is that a configuration is denied even though the aggregation lengths are different (the v4 ones which are the only ones that matter in this case) - see my other comment or lmk if it's still not clear :)

@pratik705
Copy link

Thanks for the info @oribon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants