Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bwmap: Reconcile cilium_throttle with StateDB reconciler #32438

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

joamaki
Copy link
Contributor

@joamaki joamaki commented May 9, 2024

Store the per-endpoint EDT settings in Table[bwmap.Edt] and reconcile it to the cilium_throttle BPF map with the StateDB reconciler.

This is the first BPF map being converted over to this style and aims to serve as an example.

This improves resilience (failed operations are retried, and the map re-synced periodically, similar to WithCache) and simplifies usage as error handling (and health reporting) is moved to the reconciler and the users only need to care about populating Table[Edt] (though the Status can still be queried if reconciliation result is of interest).

The reconciliation status of the entries can inspected with "cilium-dbg statedb bandwidth-edts":

root@gke-jussi-test-20981-default-pool-a2b97336-6rhl:/home/cilium# cilium statedb bandwidth-edts
EndpointID   BitsPerSecond   TimeHorizonDrop   Status
3923         10k             2000000000        Done (18m ago)

And the status of reconciliation is reported as part of module health ("cilium-dbg status --all-health"):

$ cilium-dbg status --all-health
...
          ├── maps
          │   └── bwmap
          │       ├── job-reconciler-loop                         [OK] OK, 1 objects (19m, x5)
          │       └── timer-job-pressure-metric-throttle          [OK] OK (16.218µs) (3s, x1)

With this pattern we can also start using BPF batch update as the reconciler can decide when and how to perform the operation.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 9, 2024
@joamaki
Copy link
Contributor Author

joamaki commented May 9, 2024

/ci-ginkgo

pkg/maps/bwmap/cell.go Outdated Show resolved Hide resolved
@joamaki
Copy link
Contributor Author

joamaki commented May 9, 2024

/test

@joamaki
Copy link
Contributor Author

joamaki commented May 9, 2024

/test

@joamaki joamaki added the release-note/misc This PR makes changes that have no direct user impact. label May 10, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 10, 2024
@joamaki joamaki changed the title [DRAFT] bwmap: Reconcile cilium_throttle with StateDB reconciler bwmap: Reconcile cilium_throttle with StateDB reconciler May 10, 2024
@joamaki
Copy link
Contributor Author

joamaki commented May 10, 2024

/test

@joamaki joamaki marked this pull request as ready for review May 10, 2024 17:36
@joamaki joamaki requested review from a team as code owners May 10, 2024 17:36
@joamaki
Copy link
Contributor Author

joamaki commented May 10, 2024

@ti-mo @lmb especially interested in your review on this and how we could perhaps simplify pkg/bpf and remove e.g. WithCache.

@joamaki
Copy link
Contributor Author

joamaki commented May 13, 2024

/test

// cell.Invoke(
// bpf.RegisterTablePressureMetricsJob[MyObj, myBPFMap],
// )
func RegisterTablePressureMetricsJob[Obj any, Map mapPressureMetricsOps](g job.Group, db *statedb.DB, table statedb.Table[Obj], m Map) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have to live in package bpf? Seems like it only calls public methods.

Copy link
Contributor Author

@joamaki joamaki May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm where else could it live? This is the package where we have the pressure metrics stuff and this is a companion utility for NewMapOps, so it makes sense to live next to it. Open for suggestions though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think another option would be in pkg/metrics next to NewBPFMapPressureGauge, centralizing the logic for map pressure metrics.

@lmb
Copy link
Contributor

lmb commented May 13, 2024

simplify pkg/bpf and remove e.g. WithCache.

I'd love that 😀 Main problem is manpower, figuring out why WithCache was added in the first place and whether the assumptions from back then still hold!

@joamaki
Copy link
Contributor Author

joamaki commented May 13, 2024

simplify pkg/bpf and remove e.g. WithCache.

I'd love that 😀 Main problem is manpower, figuring out why WithCache was added in the first place and whether the assumptions from back then still hold!

Every syscall the agent makes needs a reconciliation loop to retry failed operations (no matter how rare we think failure is). WithCache() was the mechanism to accomplish that (and for the ability to dump "undumpable" maps) for cases where the BPF map operation was not part of some larger reconciliation mechanism. This is a full replacement for WithCache for these cases, though with the semantic difference that we don't need to store the "raw" keys and values and store the more high-level representation. I would expect most maps to be reconciled in isolation, with the exceptions being at least the policy and lbmap. And the point with using StateDB and its reconciler is to have unified and well tested reconciliation mechanism with common metrics, rate limiting, resynchronization, health reporting, cilium-dbg inspection etc. that we'd use, where applicable, for BPF maps, BPF programs, sysctl, etcd updates, and so on.

I don't see a hurry for this move away from WithCache though. I wanted to do this for one map first so we have example usage that can be applied to new maps so they don't end up lacking proper retry & error handling.

@joamaki
Copy link
Contributor Author

joamaki commented May 13, 2024

/test

@joamaki
Copy link
Contributor Author

joamaki commented May 14, 2024

/test

@joamaki joamaki requested a review from lmb May 15, 2024 11:31
@joamaki
Copy link
Contributor Author

joamaki commented May 15, 2024

/test

@joamaki
Copy link
Contributor Author

joamaki commented May 16, 2024

Copy link
Contributor

@gentoo-root gentoo-root left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not well familiar with this area of code, but I did my best to understand the new design, and it makes sense to me in general.

const (
metricOpCreate = "create"
metricOpUpdate = "update"
metricOpLookup = "lookup"
metricOpDelete = "delete"
metricOpGetNextKey = "getNextKey"
)

const (
tablePressureMetricsInterval = 30 * time.Second // Interval for updating the pressure gauge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to make it configurable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question to ask in general. Here my reasoning is that the pressure metric is a slow indicator for the map pressure so its exact value doesn't matter as long as it's "reasonable" at a human time scale. It's probably not pulled very often either. I can't imagine a need for actually configuring this to be a lower value in production, except maybe for demo purposes? My preference would be to avoid making things configurable that clearly don't need to be as it increases the overall complexity of the software.

return ops
}

// Delete implements reconciler.Operations.
func (ops *mapOps[KV]) Delete(ctx context.Context, txn statedb.ReadTxn, entry KV) error {
return ops.m.Delete(entry.BinaryKey())
return ops.m.m.Delete(entry.BinaryKey())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does something guarantee that ops.m.m != nil at Delete, Prune and Update?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, this is now putting the responsibility on the user of NewMapOps to actually take care that the map is opened. MapOps should at the very least validate that this is the case and panic with a clear explanation, or it can keep trying until the map is opened. I'll think about this next week when I'm back...

}
} else {
err = bwmap.SilentDelete(e.ID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line made me think: with the new reconciliation mechanism, there is no way to specify ignoreMissing, and we always increase the error metric whenever statedb tries to remove a non-existent entry. While the latter makes sense to me, probably we'd also need to increase the error metric ourselves at DeleteBandwidthLimit if it were a regular Delete (not SilentDelete) here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes! Excellent observation. I thought I had used SilentDelete in ops_linux.go, but now that I checked it I hadn't, so now it'll forever retry deletions if the entry isn't there, which obviously is wrong. I'll fix this!

I think in the general case we're probably fine with ignoring the error from a delete of a non-existing entry. Though of course we could make it configurable, but not quite sure when we'd want to retry failed deletes (though we can log a warning perhaps?).

Copy link
Contributor

@learnitall learnitall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs changes look good! I left a few non-blocking questions and feedback regarding the usage of statedb here to understand the new pattern. It feels pretty intuitive and I appreciate how it takes care of a lot underneath the hood.

pkg/datapath/linux/bandwidth/bandwidth.go Show resolved Hide resolved
pkg/maps/bwmap/cell.go Show resolved Hide resolved
pkg/bpf/ops_linux.go Show resolved Hide resolved
pkg/maps/bwmap/cell.go Show resolved Hide resolved
pkg/datapath/linux/bandwidth/bandwidth.go Show resolved Hide resolved
Copy link
Member

@nathanjsweet nathanjsweet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for CLI and endpoint changes

Helper for a timer job that updates the pressure gauge for
a BPF map where its desired state is in a table.

Since in some cases we don't want to open a map if a feature
is not enabled, I've added a IsOpen() method to Map and the
timer callback will do nothing until the map is opened.

Usage:

  // Define an object suitable for use with NewMapOps
  type MyBPFObject struct {
    Key uint32
	Value uint64
  }
  func (MyBPFObject) BinaryKey() encoding.BinaryMarshaler {}
  func (MyBPFObject) BinaryValue() encoding.BinaryMarshaler {}

  // Wrapper type for *bpf.Map to give it a unique name.
  type myBPFMap struct {
	*bpf.Map
  }

  var Cell = cell.Module("my-bpf-map", "...",
    // ... provide & register Table[MyBPFObject]
	// ... provide myBPFMap
	// ... set up reconciler (see ops_linux_test.go)

	cell.Invoke(
		bpf.RegisterTablePressureMetricsJob[MyBPFObject, myBPFMap],
	),
  )

Signed-off-by: Jussi Maki <[email protected]>
The *ebpf.Map inside *bpf.Map is only created when the map is opened,
so we cannot use it from the NewMapOps constructor, but only when
the application has started. There's no use-case yet for using the
MapOps directly with an *ebpf.Map, so store the *bpf.Map and drop
the NewMapOps2.

Signed-off-by: Jussi Maki <[email protected]>
Store the per-endpoint EDT settings in Table[bwmap.Edt] and
reconcile it to the cilium_throttle BPF map with the StateDB
reconciler.

This is the first BPF map being converted over to this style
and aims to serve as an example. The overall point is to use
a more general mechanisms for recilience and status rather than
ones specific to pkg/bpf.

This improves resilience (failed operations are retried, and
the map re-synced periodically, similar to WithCache()).
The reconciliation status of the entries can inspected with
"cilium-dbg statedb bandwidth-edts" and the failed reconciliation
is reported as part of module health ("cilium-dbg status --all-health")

The replacements to existing functionality in pkg/bpf.Map map
as follows:

- WithEvents => cilium-dbg statedb bandwidth-ets --watch=100ms
- WithCache => Table[bwmap.Edt] & reconciler
- WithPressureMetric => RegisterTableMetricsJob

Signed-off-by: Jussi Maki <[email protected]>
@joamaki
Copy link
Contributor Author

joamaki commented May 23, 2024

/test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants