-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WaitForPodsReady: Store last requeued count and time #2175
Comments
What about just reverting that? Would it be enough? |
I would be worried about adding more fields that could just cause confusion. But maybe it isn't too bad. WDYT @mimowo? |
@alculquicondor Sorry for the confusion. Actually, we need to reset the |
I think we need to reset the Now, if we clear If the message is not enough, because some automation wants to consume the information about the number of retries, then I think the proposal with |
I like the idea of just updating the condition message. The users just need a quick signal to understand what happened. |
My motivation is to provide a machine-readable state. So, I would prefer to have @alculquicondor @mimowo Regarding this idea, WDYT? |
Do you have a concrete use case where this information would be parsed by automation? If not, then it can introduce some form of keeping the information structured (like
I'm not sure. Wouldn't we clear the |
Actually, if we need this information structured I would be leaning towards using |
+1 to not reuse, but I still want to know why would automation need to know all of this details? As opposed to just a reason in the Evicted condition. |
In the platform engineering context, the admins (SWE/Ops/SRE) often develop and provide common platforms across the company to users (Researcher/DS/ML Engineer). So, we often provide in-house CLI, and Concole wrapped to allow users to operate Jobs (Create/List/Delete). Therefore, I would like to provide machine-readable API via Workload resources. |
In that case, let's go with the dedicated API |
As you are pointing out here, it seems that we can not avoid resetting the So, if @alculquicondor and @mimowo are ok, I would like to add a dedicated API ( |
/assign |
In addition to the |
Yeah, I also think it would be worth a dedicated reason. |
What would you like to be added:
Depend on #2174
Since #2063, the workload controller resets the
.status.requeueState.requeueAt
if therequeueAt
exceeds the current time.Also, we will reset the
.status.requeueState.count
as well to fix the bug reported in #2174.So, I would like to propose the dedicated APIs that do not involve scheduling like this:
Why is this needed:
As initially designed, the
requeueState
is responsible for storing the last requeued time and counting and notifying the users as well.But, to avoid the race condition, we dropped/will drop the functionality from those APIs (
.status.requeueState
) in the Workload.Completion requirements:
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.
The text was updated successfully, but these errors were encountered: