-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Span tracing #589
base: main
Are you sure you want to change the base?
Span tracing #589
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this a lot.
Should trace snapshots be collected upon server exit, client requests or external signal to admin ports?
Not sure about this one. NVIDIAs profiling tooling was pretty nice (https://docs.nvidia.com/cuda/profiler-users-guide/index.html#flush-profile-data). Maybe we could take some inspiration from it.
Should
clippy-tracing
be a preinstalled expectation or configured/added in nix?
We should certainly add it to the flake. Crane (the cargo wrapper we use) probably supports various cargo extensions already (https://github.com/ipetkov/crane?tab=readme-ov-file#features). We might be able to add clippy-tracing there somehow.
Should we checkin the tracing macro and create linting that auto lints (fix) the source tree when instrumentation is missing?
Yes that seems like the way to go. This would also give us a good starting point to remove regular tracing::info
calls etc over time in favor of the instrument
approach.
I'm a bit confused by the enable_tracing
feature. There is the tracing
library and the tracing
log level. Libraries like tokio
use a feature called tracing
to turn the dependency on the tracing
crate on/off. AFAIU we always want to have the library enabled though as it's also the thing that handles the critical
, error
, info
, warn
, debug
and tracing
log levels.
Wouldn't the runtime performance impact only be noticeable when the tracing
or debug
log levels are enabled? But that wouldn't have to be a feature and instead would just be a config option like RUST_LOG=tracing
. Or maybe I'm misunderstanding some inner workings or the tracing
crate?
If the chrome_tracing
layer is just a subscriber I also wonder whether it would have any noticeable runtime impact. If it did, would it make sense to have a chrome_tracing
feature instead?
Reviewable status: 0 of 1 LGTMs obtained, and pending CI: Bazel Dev / ubuntu-22.04, Local / ubuntu-22.04, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), pre-commit-checks, ubuntu-20.04, ubuntu-22.04
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by the
enable_tracing
feature.
Didn't want to conflate other features with span tracing for nativelink, https://gist.github.com/adam-singer/fe59163849cc232a7ba47d0befe48b0d seems reasonable enough that we could use tracing
instead of a specific feature/label for nativelink.
Wouldn't the runtime performance impact only be noticeable when the
tracing
ordebug
log levels are enabled?
Was thinking more of the layer and guard since it can be defined before enabling tokio console. The annotations shouldn't cause any performance impact, we could move to a chrome_tracing
feature/label.
Reviewable status: 0 of 1 LGTMs obtained, and pending CI: Bazel Dev / ubuntu-22.04, Local / ubuntu-22.04, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), pre-commit-checks, ubuntu-20.04, ubuntu-22.04
f2efaa8
to
a580094
Compare
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
a580094
to
1a1fd1d
Compare
1a1fd1d
to
7ad96cd
Compare
Description
Adding feature to enable/disable builds of nativelink to include trace spans.
Trace spans can be use for instrumenting historical call paths of the tokio workers. Implementation provided shunts enabling/disabling at the cfg/features flags level. Enabling this feature would have performance runtime impact and should be used in debugging/introspection scenarios. Applying the instrumentation updates the source tree and not expected to be checked in (atm). The span format uses chrome tracing json and can be viewed in
chrome://tracing
(or https://ui.perfetto.dev/). Trace snapshots are collected upon sever exit.Output traces upon server exit are dumped in the cwd of the nativelink bin in the format
trace-*.json
Example
Open Questions
Should trace snapshots be collected upon server exit, client requests or external signal to admin ports?
Should
clippy-tracing
be a preinstalled expectation or configured/added in nix?Should we checkin the tracing macro and create linting that auto lints (fix) the source tree when instrumentation is missing?
Fixes # (issue)
Type of change
Please delete options that are not relevant.
not work as expected)
How Has This Been Tested?
Please also list any relevant details for your test configuration
Checklist
bazel test //...
passes locallygit amend
see some docsThis change is