Consistent Use of Context #642

devigned · 2021-05-28T18:52:29Z

Is your feature request related to a problem?/Why is this needed
Much of the context usage in the project is ineffectual. For example, the Azure DiskClient.Get uses a context, but the context provided to the method is largely ineffectual. As you can see here, a call to AttachDisk first calls getContextWithCancel, which simply constructs a context.Background with cancelation. The cancellation is only called in a defer when calling AzureDisk.Get. The cancellation provides no benefit as a way of cancelling processing further down the tree of execution.

Describe the solution you'd like in detail
Context should be used to time bound execution hierarchies, and are best when they start from the root of execution. Perhaps, having a context started at the CLI entry point where OS signals are used to establish the initial lifespan of the root context.

Further, context should be exposed upward from any I/O dependent task. If a network request or any other long running activity is called, the context should be required and passed in from higher up in the call hierarchy.

Additional context
By setting a consistent approach to passing context, the application will also be able to be more easily instrumented with tracing. For example, Cluster API Provider Azure uses OpenTelemetry to create a distributed trace of the entire reconciliation loop for all resources, https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/8a44c6d6f7b394b7929c3ae5b0e79730609c6cf1/controllers/azurecluster_controller.go#L114-L120. This makes it super easy to observe what the operator is doing and provides insight when trying to diagnosis issues. In fact, in CAPZ, we are exporting the traces to Application Insights to visualize the running operator. We'd love to be able to something similar in Cloud Provider Azure.

/cc @craiglpeters

The text was updated successfully, but these errors were encountered:

feiskyer · 2021-05-30T03:53:57Z

Thanks @devigned for reporting the issue and sharing the ideas to provide better observability. Would you like to contribute the OpenTelemetry support?

devigned · 2021-06-01T16:59:24Z

Definitely a possibility. I'm discussing this with @ritazh.

First thing I wanted to do is open the issue and start the discussion. I'd suspect an enhancement doc would be proper for such a change.

The major items that would need to be changed:

Pipe context.Context from process entry point down through the call hierarchy to any function that may have i/o or an operation that should gracefully cancel.
Add OpenTelemetry instrumentation throughout the call hierarchy
Document the usage

wdyt?

feiskyer · 2021-06-02T01:45:35Z

@devigned thanks, LGTM

k8s-triage-robot · 2021-09-05T14:54:56Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

feiskyer · 2021-09-06T01:50:16Z

@devigned are you still available to work on this?

devigned · 2021-09-07T14:41:40Z

Yes, but it there are a couple things ahead of this. I think I'll be able to start it toward the end of the month.

k8s-triage-robot · 2021-12-06T15:30:54Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 5, 2021

feiskyer removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 6, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 6, 2021

feiskyer added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consistent Use of Context #642

Consistent Use of Context #642

devigned commented May 28, 2021 •

edited

feiskyer commented May 30, 2021

devigned commented Jun 1, 2021

feiskyer commented Jun 2, 2021

k8s-triage-robot commented Sep 5, 2021

feiskyer commented Sep 6, 2021

devigned commented Sep 7, 2021 •

edited

k8s-triage-robot commented Dec 6, 2021

Consistent Use of Context #642

Consistent Use of Context #642

Comments

devigned commented May 28, 2021 • edited

feiskyer commented May 30, 2021

devigned commented Jun 1, 2021

feiskyer commented Jun 2, 2021

k8s-triage-robot commented Sep 5, 2021

feiskyer commented Sep 6, 2021

devigned commented Sep 7, 2021 • edited

k8s-triage-robot commented Dec 6, 2021

devigned commented May 28, 2021 •

edited

devigned commented Sep 7, 2021 •

edited