Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for Hubble control plane in Retina agent #432

Open
wants to merge 51 commits into
base: main
Choose a base branch
from

Conversation

anubhabMajumdar
Copy link
Contributor

@anubhabMajumdar anubhabMajumdar commented Jun 3, 2024

Description

This PR adds support for Hubble control plane in Retina agent. This is being done in the most backward compatible way possible.

I am adding a new subcommand called hubble-control-plane which will start Hubble instead of existing control plane.

$  ./retina -h                                                                                                                                 ✔  took 5s 
Start Retina Agent

Usage:
  retina-agent [flags]
  retina-agent [command]

Available Commands:
  completion           Generate the autocompletion script for the specified shell
  help                 Help about any command
  hubble-control-plane Start Hubble control plane

Flags:
      --config string                      config file (default "/retina/config/config.yaml")
      --health-probe-bind-address string   The address the probe endpoint binds to. (default ":18081")
  -h, --help                               help for retina-agent
      --leader-elect                       Enable leader election for controller manager. Enabling this will ensure there is only one active controller manager.
      --metrics-bind-address string        The address the metric endpoint binds to. (default ":18080")

Use "retina-agent [command] --help" for more information about a command.
$
$
$  ./retina                                                                                                                                                ✔
Starting Retina Agent
starting Retina daemon with legacy control plane
...
$
$
$ ./retina hubble-control-plane -h
Start Hubble control plane

Usage:
  retina-agent hubble-control-plane [flags]
  retina-agent hubble-control-plane [command]

Available Commands:
  hive        Inspect the hive

Flags:
      --cluster-name string                        name of the cluster (default "default")
...
$
$
$ ./retina hubble-control-plane --config-dir .
ts=2024-06-13T18:20:47.888Z level=info caller=hubble/daemon_main.go:276 msg="Traces telemetry initialized with zapai" version= appInsightsID=
time="2024-06-13T18:20:47Z" level=info msg=Invoked duration="483.869µs" function="pprof.init.func1 (pkg/pprof/cell.go:49)" subsys=hive
time="2024-06-13T18:20:47Z" level=info msg=Invoked duration="77.011µs" function="gops.registerGopsHooks (pkg/gops/cell.go:38)" subsys=hive
time="2024-06-13T18:20:47Z" level=info msg=Invoked duration=10.503879ms function="github.com/microsoft/retina/cmd/hubble.init.func3 (cmd/hubble/daemon.go:68)" subsys=hive
time="2024-06-13T18:20:47Z" level=info msg="&{{ 0}  [] 0s false true false false false false}" subsys=agent-config
time="2024-06-13T18:20:47Z" level=info msg="configuring telemetry" app-insights-id= retina-version= subsys=telemetry
time="2024-06-13T18:20:47Z" level=info msg="telemetry disabled" subsys=telemetry
ts=2024-06-13T18:20:47.902Z level=info caller=metrics/metrics.go:169 msg="Metrics initialized"
...

Changes made

  • Adopting Hive for dependency injection
  • Moving to Cobra for CLI
  • controller/main.go now is just the starting point of the command
  • retina/cmd now houses rootCmd (starts retina as is) and hubble (starts Hubble control plane)
  • In terms of Dockerfiles, packaging Hubble cli in agent image
  • Adding new YAML files to install Retina with Hubble
  • Moved the current YAML files under deploy/legacy
  • Fix the links in doc
  • Update cilium version to pull in commits from upstream needed for starting Hubble
  • Update init to add a step that creates Cilium dirs (This will happen for current control plane as well, but it consumes no resources, just creates an empty directory)
  • All new packages under pkg contains business logic required to run Hubble (node reconciler, Hubble control plane, IPCache, etc.)
  • Minor changes to test/e2e to support change to deployment directory (deploy -> deploy/legacy)

Related Issue

#418

Checklist

  • I have read the contributing documentation.
  • I signed and signed-off the commits (git commit -S -s ...). See this documentation on signing commits.
  • I have correctly attributed the author(s) of the code.
  • I have tested the changes locally.
  • I have followed the project's style guidelines.
  • I have updated the documentation, if necessary.
  • I have added tests, if applicable.

Screenshots (if applicable) or Testing Completed

Retina with Hubble

image

Retina

image

Please refer to the CONTRIBUTING.md file for more information on how to contribute to this project.

@anubhabMajumdar anubhabMajumdar added type/enhancement New feature or request lang/go The Go Programming Language area/controllers labels Jun 3, 2024
@anubhabMajumdar anubhabMajumdar added this to the Hubble milestone Jun 3, 2024
@anubhabMajumdar anubhabMajumdar requested a review from a team as a code owner June 3, 2024 22:11
cmd/root.go Show resolved Hide resolved
cmd/legacy/subcmd.go Outdated Show resolved Hide resolved
@anubhabMajumdar anubhabMajumdar marked this pull request as draft June 4, 2024 18:11
anubhabMajumdar and others added 7 commits June 4, 2024 21:22
Signed-off-by: Anubhab Majumdar <[email protected]>
Signed-off-by: Anubhab Majumdar <[email protected]>
Signed-off-by: Anubhab Majumdar <[email protected]>
fix import

replace usage of manifest to add legacy

replace usage of manifest to add legacy

remove unused file

update legacy manifest reference

fix directories

update path to use single folders

add new make targets

update images and tag references
Signed-off-by: Anubhab Majumdar <[email protected]>
Signed-off-by: Anubhab Majumdar <[email protected]>
@anubhabMajumdar anubhabMajumdar changed the title Refactor controller to allow adding hubble control plane entrypoint feat: Refactor controller to allow adding hubble control plane entrypoint Jun 10, 2024
Signed-off-by: Anubhab Majumdar <[email protected]>
Signed-off-by: Anubhab Majumdar <[email protected]>
Signed-off-by: Anubhab Majumdar <[email protected]>
Signed-off-by: Anubhab Majumdar <[email protected]>
Signed-off-by: Anubhab Majumdar <[email protected]>
Signed-off-by: Anubhab Majumdar <[email protected]>
Signed-off-by: Anubhab Majumdar <[email protected]>
@anubhabMajumdar anubhabMajumdar changed the title feat: Refactor controller to allow adding hubble control plane entrypoint feat: Add support for Hubble control plane in Retina agent Jun 13, 2024
This adds some rudimentary error wrapping with fmt.Errorf, then renames
a few errors to match the errXXX convention. Finally, addNode has been
augmented to return an error and the callsite modified to handle that
error.
Both of these two had no good migration path documented, so these have
both been disabled with a TODO added to address them.
timraymond and others added 22 commits June 14, 2024 11:28
This is the correct capitalization for abbreviations in Go.
This switch statement has a default clause that will effectively ignore
the cases that we don't care about, so this linter isn't providing value
here.
A linter pointed out that these assertion values were reversed. The
expected value should be first.
The cells don't really have any meaningful logic, since they're
quasi-declarative. Given this, it's not really clear how you would even
meaningfully wrap errors returned here. Consequently, we just disable
the lint.
"opts" was unused, so a linter complained. It's replaced with "_"
instead.
If a value reads 3 * time.Minute, it's obvious that it's 3 minutes.
Constantizing this value just makes it less readable.
The linter complained about these two in particular.
This linter suggests an ordering of imports that is contrary to the
following logical ordering we've discussed as a team:

import (
  // stdlib

  // module-local imports

  // third-party dependencies
)

In particular, it mixes up the module-local imports and the third-party
dependencies. Consequently, this PR disables it for the repo.
One instance of this is just inherently long... partly due to a linter
disablement. The other one is a long function signature, which is still
a problem. At least it's broken out onto multiple lines so it's a little
easier to deal with, but tbh I'm not crazy about doing that with
function / method signatures in Go.
The linter identified two instances of this. One of them makes sense to
leave since we don't have an easy way to make a granular commit to get
just that commented-out code back.
Calling os.Exit directly should be done with great care, since it
doesn't allow deferred functions to run. The linter spotted this here.
The solution is to change the cobra command to accept an error return
and return an error instead.
This function returns an error that is provably always nil, so there's
no need to return it, handle it, etc.
The instances that it found here were all obvious usages from looking at
the context. The numbers were not magic in any way.
Generally these are fine, with how errors are typically used.
This feels a little silly, but I can see that for a large number of
constants it's probably worthwhile. Fixing.
Two instances of unwrapped errors were found by the linter. This makes
sure that they are wrapped.
Signed-off-by: Anubhab Majumdar <[email protected]>
Signed-off-by: Anubhab Majumdar <[email protected]>
Signed-off-by: Anubhab Majumdar <[email protected]>
Signed-off-by: Anubhab Majumdar <[email protected]>
Signed-off-by: Anubhab Majumdar <[email protected]>
Copy link
Contributor

@huntergregory huntergregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome. I mainly had nitpicks about dashboards and deleting unused bits

Makefile Outdated Show resolved Hide resolved
deploy/legacy/graphana/dashboards/README.md Outdated Show resolved Hide resolved
deploy/legacy/graphana/dashboards/clusters.json Outdated Show resolved Hide resolved
deploy/legacy/graphana/dashboards/simplify-grafana.go Outdated Show resolved Hide resolved
pkg/shared/config/type.go Outdated Show resolved Hide resolved
pkg/shared/telemetry/type.go Outdated Show resolved Hide resolved
pkg/monitoragent/cell_linux.go Show resolved Hide resolved
cmd/hubble/subcmd_linux.go Outdated Show resolved Hide resolved
Signed-off-by: Anubhab Majumdar <[email protected]>
pkg/config/hubble_config.go Outdated Show resolved Hide resolved
pkg/plugin/dropreason/kprobe_bpfel_x86.o Outdated Show resolved Hide resolved

func Cmd(agentHive *hive.Hive) *cobra.Command {
hubbleCmd := &cobra.Command{
Use: "hubble-control-plane",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thoughts about this being a verb then maybe noun? and maybe a flag on the parent process? seems easier to deprecate flags later than subcommands
ex retina --use-hubble-control-plane

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It cannot be a flag, it's very different than current control plane and requires all new flags and setup. IMO, a new sub command makes sense. And because it's. sub-command, I kept it a noun.

pkg/k8s/watcher_linux.go Show resolved Hide resolved
pkg/monitoragent/monitoragent_linux.go Show resolved Hide resolved
pkg/monitoragent/monitoragent_linux.go Show resolved Hide resolved
pkg/monitoragent/monitoragent_linux.go Show resolved Hide resolved
.golangci.yaml Show resolved Hide resolved
pkg/hubble/hubble_linux.go Outdated Show resolved Hide resolved
Copy link
Member

@matmerr matmerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛰️

Copy link
Contributor

@huntergregory huntergregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the UT files are missing in deploy/hubble/grafana/ (can copy from deploy/legacy/grafana/)

@@ -527,4 +527,4 @@ quick-deploy-hubble:

.PHONY: simplify-dashboards
simplify-dashboards:
cd deploy/legacy/grafana/dashboards/ && go test . -tags=dashboard,simplifydashboard -v
go test -tags=dashboard,simplifydashboard -v ./deploy/...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test logic actually modifies JSON files in the current directory. Could we cd into both directories?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controllers lang/go The Go Programming Language type/enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants