Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAX_DURATION option does not count the Pod Status Start time #77

Open
rogerioefonseca opened this issue Dec 10, 2021 · 3 comments
Open

Comments

@rogerioefonseca
Copy link

rogerioefonseca commented Dec 10, 2021

The option MAX_DURATION option does not count the Pod Status Start time but instead the Pod Start time.

I deployed a pod with an entry point that Evicts it after 10 minutes using the command below:
sleep 600; apt update; apt install curl -y; while true; do curl http://some.url --output some.file; done

The pod reaper is configured with a MAX_DURATION of 5 minutes, POD_STATUSES with Evicted, and running every 1 minute.

image

I was expecting to see the pod-reaper reaping the Evicted pod at minute 15 of the POD, but instead, the POD was reaped right away at minute 11.
image

I took a look in the code and it is using the Pod Status start time, but looks like it is getting the first status Start time and not the pod-reaper configured.
https://github.com/target/pod-reaper/blob/master/rules/duration.go#L33

@rogerioefonseca rogerioefonseca changed the title Multiple Configuration is not Working. MAX_DURATION option does not count the Pod Status Start time Dec 10, 2021
@rogerioefonseca
Copy link
Author

haha, I tried to debug but I'm not familiar with go so, now, I'm not able to investigate and help further more.

@brianberzins
Copy link
Collaborator

Interesting case! Given the configuration, I would have expected it to be removed at that 11 minute mark. So in this case the human readable "5 minutes after it gets evicted" doesn't translate well. I think that might require a different rule from what I originally had in mind with MAX_DURATION (which going from memory was designed to be "how long has this pod existed").

It might mean I need to implement a rule for MAX_POD_STATUS_DURATION.
I'm honestly not sure off the top of me head if that information is readily available in the pod, but if it is, the rule would be pretty easy to add.

On the side here, I'm working to rebuild the CI/CD system for this project based on some changes that Docker made that basically broke the CD system for me. Trying to get all that worked out ASAP! Then this'll be high on my "fix or implement new!"

@brianberzins
Copy link
Collaborator

Looking into this, I'm actually a bit surprised to find that there isn't (at least an obvious) timestamp about when pods shift phases at the top level. Right now, the check for eviction uses pod.Status.Reason and the duration check is looking at pod.Status.StartTime (both off the PodStatus object). When looking around for any sort of transition timestamp, I the only other timestamps I found were in an array (not 1:1 mapping) of PodCondition structures. That would make it pretty difficult to specify a rule around because it would probably have to be really specifically "when the pod status is X and there is a PodConditionStatus of Y and that transition happened over Z time ago" feels like it would be difficult to generalize in a consumable way.

Really surprised I didn't see something like "lastPodStatusChangedTime" or something like that...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants