Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loader: do ELF substitutions in memory #32059

Merged
merged 5 commits into from
May 17, 2024
Merged

Conversation

lmb
Copy link
Contributor

@lmb lmb commented Apr 18, 2024

pkg/bpf: allow overriding constants during load

Signed-off-by: Lorenz Bauer <[email protected]>

loader: remove loader.ELFSubstitutions

Seems like a needless abstraction, remove it.

Signed-off-by: Lorenz Bauer <[email protected]>

loader: make EndpointHash infallible

The loader exports a method to turn an endpoint into a hash which is meant
to change any time the BPF needs to be regenerated. This method is currently
fallible because of a work around needed because hash.Hash is not easily
copied. This in turn causes a bunch of awkward error handling where there is
none needed. It also means that the endpoint code in theory has to deal with 
the case of datapathRegenCtx.bpfHeaderfileHash being empty.

Replace the fallible implementation with an infallible one. The new
implementation doesn't produce the same hashes as the old one, but this
doesn't matter since the semantics are unchanged. The hashes are used to
identify templates which aren't reused across agent restarts.

Signed-off-by: Lorenz Bauer <[email protected]>

loader: stop linking template.o into ep.StateDir()

Compilation of an endpoint program creates a template.o symlink in the
endpoint state directory. This is problematic since we really should be
creating the file in the "next" state directory. Overall, the loader should
not fiddle around in the endpoints state directory.

Replace the symlink with template.txt in the endpoint state directory, which
contains the hash of the used template. Importantly, we write this file from
the endpoint code, not from the loader.

Move the only user of template.o to template.txt.

Signed-off-by: Lorenz Bauer <[email protected]>

loader: remove ELF substitution logic

In theory, we need to recompile the BPF for an endpoint any time its
configuration changes. Compiling is fairly expensive, so a mechanism called
ELF templating / substitution was developed to avoid recompilation in the
most common cases.

1. bpf_lxc.c or bpf_host.c are compiled with a "generic"
  ep_config.h that contains placeholders into
  template.o.

2. For each endpoint, we replace all the placeholders in
  template.o and write a bpf_lxc.o, etc.

3. Each time we need to reload the datapath bpf_lxc.o is read
  from disk.

Replacing the placeholders in step (2) works by parsing the ELF and
manipulating the string table and data sections directly.

I believe that this approach was chosen when the loader still relied on
iproute2 to do all the loading. Now we use cilium/ebpf and the templating
mechanism is becoming a problem. It is very hard to follow how it works and
it impedes other refactoring.

Replace ELF templating with manipulation of ebpf.CollectionSpec directly.

Signed-off-by: Lorenz Bauer <[email protected]>

remove pkg/elf

There are no users of the package left in the code base. Remove it.

Signed-off-by: Lorenz Bauer <[email protected]>

@lmb lmb added sig/loader Impacts the loading of BPF programs into the kernel. release-note/misc This PR makes changes that have no direct user impact. labels Apr 18, 2024
@lmb
Copy link
Contributor Author

lmb commented Apr 18, 2024

/test

@lmb
Copy link
Contributor Author

lmb commented Apr 18, 2024

/test

@lmb lmb changed the title loader: remove ELF substition logic loader: do ELF substitutions in memory Apr 22, 2024
@lmb
Copy link
Contributor Author

lmb commented Apr 22, 2024

/test

@lmb
Copy link
Contributor Author

lmb commented Apr 23, 2024

/ci-clustermesh

@lmb
Copy link
Contributor Author

lmb commented Apr 23, 2024

/ci-clustermesh

@lmb
Copy link
Contributor Author

lmb commented Apr 23, 2024

/ci-clustermesh

@lmb
Copy link
Contributor Author

lmb commented Apr 23, 2024

/test

@lmb lmb force-pushed the pr/lmb/elf-substitution branch 5 times, most recently from 2ef558c to 5f456b0 Compare April 29, 2024 12:48
@lmb
Copy link
Contributor Author

lmb commented Apr 29, 2024

/test

@lmb
Copy link
Contributor Author

lmb commented Apr 29, 2024

/ci-gingko

@lmb
Copy link
Contributor Author

lmb commented Apr 29, 2024

/ci-ginkgo

@lmb lmb force-pushed the pr/lmb/elf-substitution branch from 9e99b7c to 69f37e7 Compare May 8, 2024 10:36
@lmb
Copy link
Contributor Author

lmb commented May 8, 2024

/ci-ginkgo

@lmb lmb force-pushed the pr/lmb/elf-substitution branch from 69f37e7 to 0765300 Compare May 9, 2024 10:58
@lmb
Copy link
Contributor Author

lmb commented May 9, 2024

/test

@lmb lmb force-pushed the pr/lmb/elf-substitution branch from 0765300 to 30a1c65 Compare May 9, 2024 12:25
@lmb
Copy link
Contributor Author

lmb commented May 9, 2024

/test

@lmb lmb requested review from derailed and qmonnet May 10, 2024 08:48
Copy link
Member

@dylandreimerink dylandreimerink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@derailed derailed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmb Nice work and clean up!

@lmb lmb marked this pull request as draft May 10, 2024 16:07
@lmb
Copy link
Contributor Author

lmb commented May 10, 2024

Putting this back into draft. #32439 needs to go in first, so that I can add a debug tool.

@lmb lmb force-pushed the pr/lmb/elf-substitution branch from 34f8451 to 7ee9341 Compare May 13, 2024 17:31
@lmb lmb mentioned this pull request May 13, 2024
@lmb lmb force-pushed the pr/lmb/elf-substitution branch from 7ee9341 to 63a5e4d Compare May 16, 2024 09:05
@lmb
Copy link
Contributor Author

lmb commented May 16, 2024

/test

Copy link
Contributor

@ti-mo ti-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 Left a few nits

pkg/bpf/collection.go Outdated Show resolved Hide resolved
pkg/bpf/collection.go Outdated Show resolved Hide resolved
pkg/bpf/collection_test.go Outdated Show resolved Hide resolved
pkg/bpf/collection_test.go Outdated Show resolved Hide resolved
pkg/datapath/loader/netlink.go Show resolved Hide resolved
@lmb lmb force-pushed the pr/lmb/elf-substitution branch from 63a5e4d to 2ef17ee Compare May 16, 2024 12:50
lmb added 5 commits May 16, 2024 14:31
Seems like a needless abstraction, remove it.

Signed-off-by: Lorenz Bauer <[email protected]>
Compilation of an endpoint program creates a template.o
symlink in the endpoint state directory. This is problematic
since we really should be creating the file in the "next"
state directory. Overall, the loader should not fiddle
around in the endpoints state directory.

Replace the symlink with template.txt in the endpoint state
directory, which contains the hash of the used template.
Importantly, we write this file from the endpoint code,
not from the loader.

Move the only user of template.o to template.txt.

Signed-off-by: Lorenz Bauer <[email protected]>
In theory, we need to recompile the BPF for an endpoint any time
its configuration changes. Compiling is fairly expensive, so
a mechanism called ELF templating / substitution was developed to
avoid recompilation in the most common cases.

1. bpf_lxc.c or bpf_host.c are compiled with a "generic"
   ep_config.h that contains placeholders into
   template.o.

2. For each endpoint, we replace all the placeholders in
   template.o and write a bpf_lxc.o, etc.

3. Each time we need to reload the datapath bpf_lxc.o is read
   from disk.

Replacing the placeholders in step (2) works by parsing the ELF
and manipulating the string table and data sections directly.

I believe that this approach was chosen when the loader still
relied on iproute2 to do all the loading. Now we use cilium/ebpf
and the templating mechanism is becoming a problem. It is very hard
to follow how it works and it impedes other refactoring.

Replace ELF templating with manipulation of ebpf.CollectionSpec
directly.

Signed-off-by: Lorenz Bauer <[email protected]>
There are no users of the package left in the code base. Remove it.

Signed-off-by: Lorenz Bauer <[email protected]>
@lmb lmb force-pushed the pr/lmb/elf-substitution branch from 2ef17ee to 348cd86 Compare May 16, 2024 13:33
@lmb
Copy link
Contributor Author

lmb commented May 16, 2024

/test

@ti-mo ti-mo marked this pull request as ready for review May 16, 2024 15:10
@lmb
Copy link
Contributor Author

lmb commented May 16, 2024

@christarazi @qmonnet could you take a look please? This is finally ready!

Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ci-structure LGTM

@aanm aanm added this pull request to the merge queue May 17, 2024
Merged via the queue into cilium:main with commit 03b614f May 17, 2024
64 checks passed
@lmb lmb deleted the pr/lmb/elf-substitution branch May 17, 2024 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/misc This PR makes changes that have no direct user impact. sig/loader Impacts the loading of BPF programs into the kernel.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants