-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scheduler-perf: create any object from YAML #122419
scheduler-perf: create any object from YAML #122419
Conversation
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
41bfa9f
to
76fbf73
Compare
/retest |
76fbf73
to
616621f
Compare
616621f
to
1dd2d01
Compare
/hold To avoid automatic merge on LGTM. |
/assign |
// The type of that context is still context.Context because replacing | ||
// it with TContext breaks tests which use `WithCancel`. | ||
// | ||
// TODO(pohly): change all of that code together with changing the return type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a follow-up commit that I intend to submit in a separate PR which does that across the code base:
$ git log -p -n1 20a10317bbb1f81580f532d85aa94c46539233c2 | diffstat
pkg/controller/disruption/disruption_test.go | 12
pkg/kubelet/config/config_test.go | 60 ---
test/integration/apiserver/apiserver_test.go | 39 --
test/integration/apiserver/certreload/certreload_test.go | 20 -
test/integration/apiserver/flowcontrol/concurrency_test.go | 9
test/integration/apiserver/flowcontrol/concurrency_util_test.go | 11
test/integration/apiserver/max_json_patch_operations_test.go | 11
test/integration/apiserver/max_request_body_bytes_test.go | 46 +-
test/integration/apiserver/openapi/openapi_enum_test.go | 8
test/integration/apiserver/openapi/openapiv3_test.go | 14
test/integration/apiserver/podlogs/podlogs_test.go | 11
test/integration/apiserver/watchcache_test.go | 34 -
test/integration/auth/accessreview_test.go | 26 -
test/integration/auth/auth_test.go | 74 +---
test/integration/auth/bootstraptoken_test.go | 8
test/integration/auth/dynamic_client_test.go | 9
test/integration/auth/rbac_test.go | 48 +-
test/integration/auth/selfsubjectreview_test.go | 22 -
test/integration/auth/svcaccttoken_test.go | 78 ++--
test/integration/controlplane/synthetic_controlplane_test.go | 7
test/integration/daemonset/daemonset_test.go | 18 -
test/integration/defaulttolerationseconds/defaulttolerationseconds_test.go | 10
test/integration/disruption/disruption_test.go | 92 ++---
test/integration/dualstack/dualstack_endpoints_test.go | 30 -
test/integration/dualstack/dualstack_test.go | 180 ++++------
test/integration/evictions/evictions_test.go | 88 ++--
test/integration/examples/webhook_test.go | 12
test/integration/framework/test_server.go | 1
test/integration/network/services_test.go | 60 +--
test/integration/scheduler_perf/scheduler_test.go | 2
test/integration/serviceaccount/service_account_test.go | 56 +--
test/integration/servicecidr/allocator_test.go | 20 -
test/integration/util/util.go | 9
test/utils/ktesting/ktesting.go | 8
34 files changed, 447 insertions(+), 686 deletions(-)
Overall the code becomes shorter. The changes are like this one:
diff --git a/pkg/controller/disruption/disruption_test.go b/pkg/controller/disruption/disruption_test.go
index 3e9fb4e4fd3..223ac8dc3d8 100644
--- a/pkg/controller/disruption/disruption_test.go
+++ b/pkg/controller/disruption/disruption_test.go
@@ -1538,12 +1538,10 @@ func TestStalePodDisruption(t *testing.T) {
}
for name, tc := range cases {
t.Run(name, func(t *testing.T) {
- _, ctx := ktesting.NewTestContext(t)
- ctx, cancel := context.WithCancel(ctx)
- defer cancel()
- dc, _ := newFakeDisruptionControllerWithTime(ctx, now)
- go dc.Run(ctx)
- if _, err := dc.coreClient.CoreV1().Pods(tc.pod.Namespace).Create(ctx, tc.pod, metav1.CreateOptions{}); err != nil {
+ tCtx := ktesting.Init(t)
+ dc, _ := newFakeDisruptionControllerWithTime(tCtx, now)
+ go dc.Run(tCtx)
+ if _, err := dc.coreClient.CoreV1().Pods(tc.pod.Namespace).Create(tCtx, tc.pod, metav1.CreateOptions{}); err != nil {
t.Fatalf("Failed to create pod: %v", err)
}
dc.clock.Sleep(tc.timePassed)
@@ -1552,7 +1550,7 @@ func TestStalePodDisruption(t *testing.T) {
}
diff := ""
if err := wait.Poll(100*time.Millisecond, wait.ForeverTestTimeout, func() (bool, error) {
- pod, err := dc.kubeClient.CoreV1().Pods(tc.pod.Namespace).Get(ctx, tc.pod.Name, metav1.GetOptions{})
+ pod, err := dc.kubeClient.CoreV1().Pods(tc.pod.Namespace).Get(tCtx, tc.pod.Name, metav1.GetOptions{})
if err != nil {
t.Fatalf("Failed getting updated pod: %v", err)
}
The decision to use tCtx
instead of ctx
become part of our "coding style". It would work to not rename the variable, then it's less clear when a ctx
is a context.Context
and when it is a ktesting.TContext
. I prefer using tCtx
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't review the last commit yet.
} | ||
} | ||
|
||
// createResourceClassOpType customizes createOp for creating a ResourceClass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
create := func() error { | ||
mapping, err := restMapper.RESTMapping(gk, gv.Version) | ||
if err != nil { | ||
// Cached mapping might be stale, refresh on next try. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can directly return here? Because we did the same thing in RESTMapping
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to call restMapper.Reset()
before returning. Otherwise restMapper
will never refresh itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call Reset() here is harmless, but doesn't the RESTMapping already covered your scenario? It will call the Reset when failed to find the resource. (Actually I didn't use this function a lot ...)
func (d *DeferredDiscoveryRESTMapper) RESTMapping(gk schema.GroupKind, versions ...string) (m *meta.RESTMapping, err error) {
del, err := d.getDelegate()
if err != nil {
return nil, err
}
m, err = del.RESTMapping(gk, versions...)
if err != nil && !d.cl.Fresh() {
d.Reset()
m, err = d.RESTMapping(gk, versions...)
}
return
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note the !d.cl.Fresh()
: this depends on the discovery.CachedDiscoveryInterface
implementation detecting "stale" data.
If I remember correctly, it doesn't do that just because some result is not found and the Reset
in the code above was needed.
createNamespacesOpcode operationCode = "createNamespaces" | ||
createPodsOpcode operationCode = "createPods" | ||
createPodSetsOpcode operationCode = "createPodSets" | ||
createResourceClaimsOpcode operationCode = "createResourceClaims" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this as well? Or we can't create from yaml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs additional logic which goes beyond "create one object from YAML", therefore this has to stay.
@@ -674,6 +674,11 @@ func RunBenchmarkPerfScheduling(b *testing.B, outOfTreePluginRegistry frameworkr | |||
if !enabled(*perfSchedulingLabelFilter, append(tc.Labels, w.Labels...)...) { | |||
b.Skipf("disabled by label filter %q", *perfSchedulingLabelFilter) | |||
} | |||
tCtx := ktesting.Init(b) | |||
if !*useTestingLog { | |||
// Use the normal global logger instead of the per-test logger initialized by Init. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this into Init, users will not quite care about the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because if we don't use per-test logging, then it has to be done here by reverting the default behavior of ktesting.Init
, which is per-test logging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably misunderstood the question: yes, this could be done in ktesting.Init
if it had options. So far, this is the only usage where per-test output wasn't desirable, but perhaps there'll be others. I'll add a WithPerTestOutput(enabled bool)
and use that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to not put a seldomly used option into the main ktesting package. Therefore I created "ktesting/initoption" and placed PerTestOutput
there. Because it's a package specifically for functional options, there is no need to add the With
prefix. That is used instead for the functions which create a new context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kerthcet: can you perhaps take another look? I just pushed.
ctx, cancel := context.WithTimeout(ctx, 30*time.Minute) | ||
b.Cleanup(cancel) | ||
timeout := 30 * time.Minute | ||
tCtx = ktesting.WithTimeout(tCtx, timeout, fmt.Sprintf("timed out after the %s per-test timeout", timeout)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tCtx = ktesting.WithTimeout(tCtx, timeout, fmt.Sprintf("timed out after the %s per-test timeout", timeout)) | |
tCtx = ktesting.WithTimeout(tCtx, timeout, fmt.Sprintf("timed out after the %s per-test", timeout)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would lead to a message saying "timed out after the 30s per-test".
I think we should keep "per-test timeout" in the message.
case <-ctx.Done(): | ||
tb.Fatalf("op %d: %v", opIndex, ctx.Err()) | ||
case <-tCtx.Done(): | ||
tCtx.Fatalf("op %d: %v", opIndex, context.Cause(tCtx)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: why not implement the same method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same method as what? Sorry, I don't follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, mixed with tCtx, forgot this.
// No alpha APIs (overrides api/all=true in https://github.com/kubernetes/kubernetes/blob/d647d19f6aef811bace300eec96a67644ff303d4/staging/src/k8s.io/apiextensions-apiserver/pkg/cmd/server/testing/testserver.go#L136), | ||
// except for DRA API group when needed. | ||
runtimeConfig := []string{"api/alpha=false"} | ||
if enabledFeatures[features.DynamicResourceAllocation] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specify the features is ugly but we can improve later since not related with this PR.
util.StartFakePVController(ctx, client, informerFactory) | ||
runGC := util.CreateGCController(ctx, tb, *cfg, informerFactory) | ||
runNS := util.CreateNamespaceController(ctx, tb, *cfg, informerFactory) | ||
_, informerFactory := util.StartScheduler(tCtx, tCtx.Client(), cfg, config, outOfTreePluginRegistry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have several duplicated parameters here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there's definitely room for future refactoring/simplification of test/integration/util to get rid of redundant parameters like this one here. I'm gradually going to do more of those, but wanted to start small(ish) initially.
go runGC() | ||
go runNS() | ||
go runResourceClaimController() | ||
|
||
return informerFactory, client, dynClient | ||
return informerFactory, tCtx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking can we not return the ctx again here, but this definitely needs some refactoring of TContext.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mustSetupCluster
gets a TContext
without Kubernetes clients and creates a new TContext
which has them, so returning that here is okay.
TContext
instances are intentionally immutable, i.e. you cannot change one instance by changing some of its attributes. That is the same design that logr.Logger
and context.Context
are using. I think it make sense and has some nice properties (no race conditions although there is no locking, prevents callees from changing something that is used by their caller, "composing" contexts by wrapping a base context).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make very sense to me.
3a86200
to
74c8366
Compare
The new TContext interface combines a normal context and the testing interface, then adds some helper methods. The context gets canceled when the test is done, but that can also be requested earlier via Cancel. The intended usage is to pass a single `tCtx ktesting.TContext` parameter around in all helper functions that get called by a unit or integration test. Logging is also more useful: Log[f] and Fatal[f] output is prefixed with "[FATAL] ERROR: " to make it stand out more from regular log output. If this approach turns out to be useful, it could be extended further (for example, with a per-test timeout) and might get moved to a staging repository to enable usage of it in other staging repositories. To allow other implementations besides testing.T and testing.B, a custom ktesting.TB interface gets defined with the methods expected from the actual implementation. One such implementation can be ginkgo.GinkgoT().
ktesting.TContext combines several different interfaces. This makes the code simpler because less parameters need to be passed around. An intentional side effect is that the apiextensions client interface becomes available, which makes it possible to use CRDs. This will be needed for future DRA tests. Support for CRDs depends on starting the apiserver via k8s.io/kubernetes/cmd/kube-apiserver/app/testing because only that enables the CRD extensions. As discussed on Slack, the long-term goal is to replace the in-tree StartTestServer with the one in staging, so this is going in the right direction.
With a dynamic client and a rest mapper it is possible to load arbitrary YAML files and create the object defined by it. This is simpler than adding specific Go code for each supported type. Because the version now matters, the incorrect version in the DRA YAMLs were found and fixed.
74c8366
to
da0c9a9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one question, otherwise LGTM for scheduler.
/approve
create := func() error { | ||
mapping, err := restMapper.RESTMapping(gk, gv.Version) | ||
if err != nil { | ||
// Cached mapping might be stale, refresh on next try. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call Reset() here is harmless, but doesn't the RESTMapping already covered your scenario? It will call the Reset when failed to find the resource. (Actually I didn't use this function a lot ...)
func (d *DeferredDiscoveryRESTMapper) RESTMapping(gk schema.GroupKind, versions ...string) (m *meta.RESTMapping, err error) {
del, err := d.getDelegate()
if err != nil {
return nil, err
}
m, err = del.RESTMapping(gk, versions...)
if err != nil && !d.cl.Fresh() {
d.Reset()
m, err = d.RESTMapping(gk, versions...)
}
return
}
case <-ctx.Done(): | ||
tb.Fatalf("op %d: %v", opIndex, ctx.Err()) | ||
case <-tCtx.Done(): | ||
tCtx.Fatalf("op %d: %v", opIndex, context.Cause(tCtx)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, mixed with tCtx, forgot this.
go runGC() | ||
go runNS() | ||
go runResourceClaimController() | ||
|
||
return informerFactory, client, dynClient | ||
return informerFactory, tCtx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make very sense to me.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kerthcet, pohly The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Sorry I thought I had time to review this during my holiday, but it's always out of control. |
now := time.Now() | ||
|
||
cancelCtx, cancel := context.WithCancelCause(ctx) | ||
after := time.NewTimer(timeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use the same deadline as below, the sooner one, rather than the given timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need for that because if the underlying ctx
has a deadline which is sooner than the new timeout, then the resulting context will get canceled via the underlying ctx
. Wrapping it doesn't remove that.
Below we just need to be more careful because the Deadline
implementation is the one provided by the new context.
if deadlineOK { | ||
if deadline, ok := deadlineTB.Deadline(); ok { | ||
timeLeft := time.Until(deadline) | ||
timeLeft -= CleanupGracePeriod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen if timeLeft is negative here? Why not add the cleanupGracePeriod here? That's say when timeout, we'll have additional grace period for cleanup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen if timeLeft is negative here?
withTimeout
and it's underlying time.NewTimer
allow negative time left. It simply triggers immediately.
Why not add the cleanupGracePeriod here? That's say when timeout, we'll have additional grace period for cleanup.
We need to stop the normal test code CleanupGracePeriod
before the final deadline (hence subtract it here). Then if the test code reacts immediately to the context cancellation, there will be CleanupGracePeriod
left for the registered cleanup callbacks.
|
||
cancelCtx, cancel := context.WithCancelCause(ctx) | ||
after := time.NewTimer(timeout) | ||
stopCtx, stop := context.WithCancel(ctx) // Only used internally, doesn't need a cause. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any special reason to have another cancel context here? Seems cancelContext
is enough. I'm ok to leave it here just want to know any reason I can't get here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without stopCtx
, the anonymous function below would block only on after.C
. It could also block on the parent ctx.Done()
, but both of these have the same problem: they don't trigger when the test is complete. That would leave the goroutine running after the test is done.
We need the anonymous function for the situation that the timeout triggers while the test still runs.
For scheduling, we have some refactor works left, e.g. remove the duplicated parameters, otherwise, LGTM. For ktesting, I just have several questions, would you want someone else to look again or we'll just go forward since we use in scheduler-perf only right now. @pohly |
@kerthcet: your review of ktesting was very thorough, thanks! Assuming that I have answered your question sufficiently, I suggest that we simply move ahead and merge it without further reviews. Your formal lgtm will be sufficient. |
/lgtm |
LGTM label has been added. Git tree hash: 61d0dba393d9824bb7d220e9ff8a6ae0a42d0274
|
Let's merge it. /hold cancel |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
With a dynamic client and a rest mapper it is possible to load arbitrary YAML
files and create the object defined by it. This is simpler than adding specific
Go code for each supported type.
Special notes for your reviewer:
This is a more flexible alternative to #122391.
Because the version now matters, the incorrect version in the DRA YAMLs were
found and fixed.
Does this PR introduce a user-facing change?
/assign @kerthcet