-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Add remediation for BPF ABBA deadlock bug caused by hash map access #3144
Comments
The things I'm not keen on in the proposed bpftrace mitigations are:
|
Yeah, agreed. But if there was a way to feature probe, then at least we could delete the mitigation later. |
Previously we only banned kretprobes from using banned kernel functions but a bpftrace script at Meta saw a crash from utilizing one of these functions in kfunc/kretfunc. This changes prevents any probe from attaching to one of these functions. Issue: bpftrace#3144
It is worth keeping in mind that the bpf lock change being made upstream does not prevent ABBA deadlocks, but only backs out of them after the code fails to acquire a lock. It remains to be seen whether the approach of placing the entire bpftrace script in the same recursion prevention domain will be faster or slower than relying on the deadlock detection code in the new bpf lock code. |
Would this workaround prevent tracing of recursive functions except at the topmost level? |
No, it's only limited to a small subset of functions. Here is the PR: #3206 |
Weren't the deadlocks caused by attaching a probe to a locking function ( |
There was an issue discovered recently via a bpftrace script whereby a kfunc was attached on the same path used to access BPF_MAP_TYPE_HASH which created an ABBA deadlock in the kernel and a subsequent crash e.g.
@spinlock_start
was being accessed by another bpf prog on a different CPU.Example stack trace:
This issue is being addressed in a bpf kernel patch however because this is going into a later kernel release we should consider adding a temp fix for this in bpftrace.
One possible option, suggested by Alexei, is for a bpftrace script to create a per-cpu variable that is tested when a functional block is called. If it is already set, we exit early (and increment a missed counter?), if it is not yet set, set it, run the functional block, and clear it at exit time. We could do this conditionally if we detect that the prog is accessing a non per-cpu map type.
Another option would be to block use of certain kfuncs/kprobe if bpftrace can detect if these progs are accessing non per-cpu maps in a script with other progs accessing this map (which may end up being more code).
This issue is meant to be a place to discuss possible solutions and the priority of this fix.
The text was updated successfully, but these errors were encountered: