Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve HAVE_FIB_IFINDEX probe #27642

Open
julianwiedmann opened this issue Aug 23, 2023 · 7 comments · May be fixed by #32466
Open

Improve HAVE_FIB_IFINDEX probe #27642

julianwiedmann opened this issue Aug 23, 2023 · 7 comments · May be fixed by #32466
Assignees
Labels
area/loadbalancing Impacts load-balancing and Kubernetes service implementations kind/enhancement This would improve or streamline existing functionality. pinned These issues are not marked stale by our issue bot. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. sig/loader Impacts the loading of BPF programs into the kernel.

Comments

@julianwiedmann
Copy link
Member

julianwiedmann commented Aug 23, 2023

#27622 lifted a limitation that requires Cilium-managed native devices (where bpf_host is attached) to have an ifindex <= MAX_UINT16.

This limitation no longer applies on systems with a kernel that has the HAVE_FIB_IFINDEX feature. Currently this feature is only detected on kernels >= 5.10. But the relevant patch was actually backported to older kernels. It would be good to also detect the feature on those older kernels, to further reduce the impact of the MAX_UINT16 limitation.

Quoting @borkmann

this can be further refined by not relying
on the asm.FnRedirectPeer helper presence but by actually doing a runtime
BPF program probe so that stable kernels can even be covered.

When this is fixed, we should also add HAVE_FIB_IFINDEX to the complexity test configs for the relevant kernels.

@julianwiedmann julianwiedmann added kind/enhancement This would improve or streamline existing functionality. sig/loader Impacts the loading of BPF programs into the kernel. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. area/loadbalancing Impacts load-balancing and Kubernetes service implementations labels Aug 23, 2023
@borkmann
Copy link
Member

borkmann commented Aug 23, 2023

The probe would roughly be as follows: create a new netns, add a route in there, assign a BPF prog to the interface in the netns which does a fib lookup, ensure the neighbor resolution of the fib lookup fails but the route lookup succeeds and finally test whether the fib_params->ifindex output is non-zero while the input was zero.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Oct 23, 2023
Copy link

github-actions bot commented Nov 6, 2023

This issue has not seen any activity since it was marked stale.
Closing.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 6, 2023
@julianwiedmann julianwiedmann reopened this Nov 6, 2023
@julianwiedmann julianwiedmann added pinned These issues are not marked stale by our issue bot. and removed stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. labels Nov 6, 2023
@ti-mo
Copy link
Contributor

ti-mo commented Dec 14, 2023

The probe would roughly be as follows: create a new netns, add a route in there, assign a BPF prog to the interface in the netns which does a fib lookup, ensure the neighbor resolution of the fib lookup fails but the route lookup succeeds and finally test whether the fib_params->ifindex output is non-zero while the input was zero.

@borkmann Would there be a way to test this using prog_run that doesn't involve sending a packet? Our netns tooling isn't great at the moment, it still shells out to ip2 and I'd like to avoid that in new code. Reworking it first will take time and will be annoying to backport.

Triggering a prog by sending a packet is going to be error-prone, and we'd need to write the retcode to a map instead of receiving it from prog_run.

Maybe we can find something that works from the hostns? I understand we need to induce a failed fib lookup and a successful route lookup on the same ifindex? If we only need to read out fib_params->ifindex after the call, why do we need the route lookup?

@julianwiedmann
Copy link
Member Author

The probe would roughly be as follows: create a new netns, add a route in there, assign a BPF prog to the interface in the netns which does a fib lookup, ensure the neighbor resolution of the fib lookup fails but the route lookup succeeds and finally test whether the fib_params->ifindex output is non-zero while the input was zero.

@borkmann Would there be a way to test this using prog_run that doesn't involve sending a packet? Our netns tooling isn't great at the moment, it still shells out to ip2 and I'd like to avoid that in new code. Reworking it first will take time and will be annoying to backport.

I would hope we can do without sending a packet. Just calling bpf_fib_lookup() with a bpf_fib_lookup struct that matches the route setup in the netns.

Triggering a prog by sending a packet is going to be error-prone, and we'd need to write the retcode to a map instead of receiving it from prog_run.

Maybe we can find something that works from the hostns? I understand we need to induce a failed fib lookup and a successful route lookup on the same ifindex? If we only need to read out fib_params->ifindex after the call, why do we need the route lookup?

We basically need to control that the routing succeeds, but the kernel can't provide a L2 resolution for the next-hop. Controlling that from hostns would be difficult I believe.

@msune
Copy link
Contributor

msune commented May 10, 2024

Team,

I've been looking into this issue as my first contrib. I think the probe is almost done here (still need to check ret value of lookup to make sure it fails). It's follows a similar approach to what Paul did in another probe.

On this:

ensure the neighbor resolution of the fib lookup fails but the route lookup succeeds and finally test whether the fib_params->ifindex output is non-zero while the input was zero.

I think ifindex can't be 0 (see). The probe needs to create a tap iface or something, and inject the pkt from a valid ifindex (e.g. ifindex=lo) and check that the call returns the params with ifindex=tap.

Let me know if my understanding is correct, or I am missing s/t.

@julianwiedmann
Copy link
Member Author

julianwiedmann commented May 10, 2024

I think ifindex can't be 0 (see). The probe needs to create a tap iface or something, and inject the pkt from a valid ifindex (e.g. ifindex=lo) and check that the call returns the params with ifindex=tap.

Ack, I think that's correct. The bpf_fib_lookup struct we pass in must provide a valid ifindex. So the check needs to be whether the ifindex field is updated after the failed FIB lookup.

msune added a commit to msune/cilium that referenced this issue May 10, 2024
Previously, `HaveFibIfindex` probe checked the presence of
commit d1c362e1dd68 in the Linux kernel indirectly, by verifying
the presence of the redirect helpers (which were merged alongside).

Since `d1c362e1dd68` has been backported to older kernels - but
not helpers -, this commit improves `HaveFibIfindex` detection
by running a live probe.

Fixes: cilium#27642

Signed-off-by: Marc Suñé <[email protected]>
msune added a commit to msune/cilium that referenced this issue May 10, 2024
Previously, `HaveFibIfindex` probe checked the presence of
commit d1c362e1dd68 in the Linux kernel indirectly, by verifying
the presence of the redirect helpers (which were merged alongside).

Since `d1c362e1dd68` has been backported to older kernels - but
not helpers -, this commit improves `HaveFibIfindex` detection
by running a live probe.

Fixes: cilium#27642

Signed-off-by: Marc Suñé <[email protected]>
msune added a commit to msune/cilium that referenced this issue May 10, 2024
Previously, `HaveFibIfindex` probe checked the presence of
commit d1c362e1dd68 in the Linux kernel indirectly, by verifying
the presence of the redirect helpers (which were merged alongside).

Since `d1c362e1dd68` has been backported to older kernels - but
not helpers -, this commit improves `HaveFibIfindex` detection
by running a live probe.

Fixes: cilium#27642

Signed-off-by: Marc Suñé <[email protected]>
@msune msune linked a pull request May 10, 2024 that will close this issue
msune added a commit to msune/cilium that referenced this issue May 10, 2024
Previously, `HaveFibIfindex` probe checked the presence of
commit d1c362e1dd68 in the Linux kernel indirectly, by verifying
the presence of the redirect helpers (which were merged alongside).

Since `d1c362e1dd68` has been backported to older kernels - but
not helpers -, this commit improves `HaveFibIfindex` detection
by running a live probe.

Fixes: cilium#27642

Signed-off-by: Marc Suñé <[email protected]>
@julianwiedmann julianwiedmann changed the title Improve HAVE_FIB_INDEX probe Improve HAVE_FIB_IFINDEX probe May 13, 2024
msune added a commit to msune/cilium that referenced this issue May 13, 2024
Previously, `HaveFibIfindex` probe checked the presence of
commit d1c362e1dd68 in the Linux kernel indirectly, by verifying
the presence of the redirect helpers (which were merged alongside).

Since `d1c362e1dd68` has been backported to older kernels - but
not helpers -, this commit improves `HaveFibIfindex` detection
by running a live probe.

Fixes: cilium#27642

Signed-off-by: Marc Suñé <[email protected]>
msune added a commit to msune/cilium that referenced this issue May 13, 2024
Previously, `HaveFibIfindex` probe checked the presence of
commit d1c362e1dd68 in the Linux kernel indirectly, by verifying
the presence of the redirect helpers (which were merged alongside).

Since `d1c362e1dd68` has been backported to older kernels - but
not helpers -, this commit improves `HaveFibIfindex` detection
by running a live probe.

Fixes: cilium#27642

Signed-off-by: Marc Suñé <[email protected]>
msune added a commit to msune/cilium that referenced this issue May 14, 2024
Previously, `HaveFibIfindex` probe checked the presence of
commit d1c362e1dd68 in the Linux kernel indirectly, by verifying
the presence of the redirect helpers (which were merged alongside).

Since `d1c362e1dd68` has been backported to older kernels - but
not helpers -, this commit improves `HaveFibIfindex` detection
by running a live probe.

Fixes: cilium#27642

Signed-off-by: Marc Suñé <[email protected]>
msune added a commit to msune/cilium that referenced this issue May 14, 2024
Previously, `HaveFibIfindex` probe checked the presence of
commit d1c362e1dd68 in the Linux kernel indirectly, by verifying
the presence of the redirect helpers (which were merged alongside).

Since `d1c362e1dd68` has been backported to older kernels - but
not helpers -, this commit improves `HaveFibIfindex` detection
by running a live probe.

Fixes: cilium#27642

Signed-off-by: Marc Suñé <[email protected]>
msune added a commit to msune/cilium that referenced this issue May 15, 2024
Previously, `HaveFibIfindex` probe checked the presence of
commit d1c362e1dd68 in the Linux kernel indirectly, by verifying
the presence of the redirect helpers (which were merged alongside).

Since `d1c362e1dd68` has been backported to older kernels - but
not helpers -, this commit improves `HaveFibIfindex` detection
by running a live probe.

Fixes: cilium#27642

Signed-off-by: Marc Suñé <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/loadbalancing Impacts load-balancing and Kubernetes service implementations kind/enhancement This would improve or streamline existing functionality. pinned These issues are not marked stale by our issue bot. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. sig/loader Impacts the loading of BPF programs into the kernel.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants