Add `--condition` #276

felipecrs · 2023-08-25T00:03:12Z

Usage:

stern . --condition=ready=false

TODOs:

Only starts tailing logs for pods matching --condition
Stop tailing logs for pods that no longer match --condition

Fixes #213

felipecrs · 2023-08-25T00:04:56Z

I'm giving #213 a go because I really need it. But I'm struggling with the second TODO. Is there a chance someone can help me?

PS: code is not yet ready for final review, but it's working. Also, I have almost zero experience with Golang.

superbrothers · 2023-09-01T11:53:01Z

I will look into it this weekend.

felipecrs · 2023-09-01T11:54:51Z

Thanks a lot @superbrothers!

superbrothers · 2023-09-07T12:18:46Z

It is on the way, but I have a working patch. I won't have much time this weekend but will be able to work on it after that. 816c896

You can try it out as soon as you build it.

felipecrs · 2023-09-08T12:33:26Z

@superbrothers, thanks a lot, you are the best!

However, it's not working quite as I expected. Let me help build a testing scenario too:

kubectl apply -f - <<'EOF'
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test
  serviceName: test
  template:
    metadata:
      labels:
        app: test
    spec:
      terminationGracePeriodSeconds: 1
      containers:
      - name: test
        image: bash
        command:
          - bash
          - -c
          - |
            count=0
            while true; do
              echo "Hello World: $count"
              count=$((count+1))
              if [[ $count -eq 10 ]]; then
                touch /tmp/healthy
              fi
              sleep 1
            done
        readinessProbe:
          exec:
            command:
            - cat
            - /tmp/healthy
          initialDelaySeconds: 10
          periodSeconds: 1
EOF

This pod becomes ready in about 11 seconds.

Start the stern process with:

./dist/stern '.*' --condition ready=false --tail 0

Stern correctly filters out the pod if the pod is already ready, but Stern does not stop tailing the logs when the pod becomes ready:

WindowsTerminal_lMhjxZZGe5.mp4

felipecrs · 2023-09-08T12:46:26Z

Ok, I think I know what's going on: K8s does not seem to emit an event for when the pod becomes ready. I'm investigating what can be done about it.

felipecrs · 2023-09-08T14:26:57Z

Checking kubectl wait's source code, it looks like they invoke an additional watcher for each resource until the condition is met:

https://github.com/kubernetes/kubectl/blob/0266cec8bce880387a33c4d948673749defeb0bc/pkg/cmd/wait/wait.go#L530

I wonder if and how we could do something similar:

If the pod initially matches condition and gets added, we invoke this additional watcher for such pod until the opposite condition is met. When the opposite condition is met, we add the pod to deleted.

(I already tried to tinker with it myself without much success)

tksm · 2023-09-09T03:47:21Z

Hi,

It appears that Stern has detected the events, but they could be filtered out by targetFilter.shouldAdd().

The following patch may resolve the issue, but I've noticed the behavior of --condition ready=false can be somewhat confusing. When we delete pods, the pod's Ready status reverts to False, causing the logs to restart from the beginning. Additionally, the Ready status can change due to other reasons.

diff --git a/stern/stern.go b/stern/stern.go
index c492519..dcece50 100644
--- a/stern/stern.go
+++ b/stern/stern.go
@@ -222,7 +222,7 @@ func Run(ctx context.Context, config *Config) error {
 					if !ok {
 						continue
 					}
-					cancel.(func())()
+					cancel.(context.CancelFunc)()
 				case <-nctx.Done():
 					return nil
 				}
diff --git a/stern/target.go b/stern/target.go
index 553e8c2..cbc6788 100644
--- a/stern/target.go
+++ b/stern/target.go
@@ -136,8 +136,14 @@ OUTER:
 			Container: c.Name,
 		}
 
+		if !conditionFound {
+			visitor(t, false)
+			f.forget(string(pod.UID))
+			continue
+		}
+
 		if f.shouldAdd(t, string(pod.UID), c) {
-			visitor(t, conditionFound)
+			visitor(t, true)
 		}
 	}
 }

The demo below illustrates that the logs restart when the pod is deleted (at 00:18).

felipecrs · 2023-09-09T04:03:06Z

Wow, it looks to be working indeed as per your demo. Can't wait to try your patch out.

BTW the behavior you described is indeed misleading.

That would not happen though when using --tail=0, which is my use case (only capture live logs, not past logs).

I will think about what can be done to circumvent this.

Maybe Stern has to remember up until which point it tailed logs for a given pod, and when it starts tailing it again, it should tail from that point on.

tksm · 2023-09-09T04:15:14Z

@felipecrs Oh, sorry. I missed the --tail=0 option. Thank you for pointing it out.

As you mentioned, Stern behaves as expected when using --tail=0.

I believe it would be better to automatically set --tail=0 when the --condition option is specified in order to avoid confusion.

felipecrs · 2023-09-09T18:26:14Z

@tksm really amazing, indeed it works like a charm. I added both @superbrothers' commit and your patch to this PR now.

Something I believe I need to do before merging is to fix the condition syntax for the config file. I left it as a string there, which makes it exactly as the CLI, but perhaps I should convert it to an object like:

- condition: ready=false
+ condition:
+   name: ready
+   value: false

What do you think? Personally, I don't mind. Even the first one looks good to me.

Another thing to think about is the concern raised by @tksm above when not using --tail=0. I proposed one solution above but I'm not sure if it's worth implementing it. Another option is to simply have this caveat acknowledged somewhere, maybe in the README.

Co-authored-by: Takashi Kusumi <[email protected]>

tksm · 2023-09-10T02:42:34Z

I'm glad to hear it worked as expected. 🎉

Config file

I believe that the first one (condition: ready=false) is better because it maintains consistency with other options. Additionally, we do not need to implement anything in that case.

--tail=0 option

For the initial implementation, I think that the caveat in README.md and the error message when --tail=0 is not set are sufficient. Maybe we should avoid automatically setting --tail=0 for future compatibility.

# Raise an error when `--tail=0` is not set
$ ./dist/stern '.*' --condition ready=false
Error: Currently, --condition and --no-follow=false only work with --tail=0
Usage:
  stern pod-query [flags]

diff --git a/cmd/cmd.go b/cmd/cmd.go
index 35e3060..2c70027 100644
--- a/cmd/cmd.go
+++ b/cmd/cmd.go
@@ -130,6 +130,9 @@ func (o *options) Validate() error {
 	if o.selector != "" && o.resource != "" {
 		return errors.New("--selector and the <resource>/<name> query can not be set at the same time")
 	}
+	if o.condition != "" && !o.noFollow && o.tail != 0 {
+		return errors.New("Currently, --condition and --no-follow=false only work with --tail=0")
+	}
 
 	return nil
 }

felipecrs · 2023-09-10T15:57:47Z

I'm sorry @tksm, but why enforce --tail=0 to --no-follow=false? Maybe you meant --no-follow=true, where it makes more sense: if --no-follow=true, no live logs would be shown, only past logs. Thus, filtering only live logs with --tail=0 makes no sense.

I changed it in the commit.

felipecrs · 2023-09-10T16:15:08Z

I also just realized that --condition=ready=false will filter out pods which does not have a health check, like Jobs.

For example, with --condition=ready=false, logs for a pod like below will not be shown:

apiVersion: batch/v1
kind: Job
metadata:
  name: test-two
spec:
  completions: 1
  template:
    metadata:
      labels:
        app: test-two
    spec:
      terminationGracePeriodSeconds: 1
      restartPolicy: OnFailure
      containers:
      - name: test-two
        image: bash
        command:
          - bash
          - -c
          - |
            count=0
            while true; do
              echo "World Hello: $count"
              count=$((count+1))
              if [[ $count -eq 10 ]]; then
                exit 1
              fi
              sleep 1
            done

Similarly, for a pod which does not have a readinessProbe:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: test-two
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test-two
  serviceName: test-two
  template:
    metadata:
      labels:
        app: test-two
    spec:
      terminationGracePeriodSeconds: 1
      containers:
      - name: test-two
        image: bash
        command:
          - bash
          - -c
          - |
            count=0
            while true; do
              echo "World Hello: $count"
              count=$((count+1))
              if [[ $count -eq 10 ]]; then
                exit 0
              fi
              sleep 1
            done

My use case (#213) was to filter out only healthy pods, and simple conditions like that cannot fulfil my needs.

The only solution I thought to circumvent this is: --only-condition-pods-with-readiness. But:

The flag name is ugly as hell (don't you think? or do you have better suggestions?)
I could just simplify this whole thing to --only-unhealthy-pods and drop --condition entirely.

Feedback is appreciated.

Co-authored-by: Takashi Kusumi <[email protected]>

felipecrs · 2023-09-10T16:30:59Z

Another thing, @tksm, I think allowing people to specify --tail more than 0 can be useful. Imagine a situation like:

I want to only display logs for pods which are not ready, but displaying their last 3 lines before they became unhealthy.

felipecrs · 2023-09-10T18:00:17Z

Ok, now it fulfils my use case:

stern . --condition=ready=false --tail=0 --only-condition-pods-with-readiness

Please let me know what you think about it. And, as always, thanks for the help.

felipecrs · 2023-09-11T22:29:02Z

Ok, I have been soaking this internally during the day and the results are really good for my use case.

I would like to ask the maintainers to give me the go ahead on the current solution (--only-condition-pods-with-readiness) before I start cleaning up the PR and working on fixing the tests.

tksm · 2023-09-13T02:32:26Z

@felipecrs Let me clarify the conditions under which you want to view pod logs in your use case.

I assume you do NOT want to view logs when:

When the pod has at least one readinessProbe OR ReadinessGate.
AND the pod.status.conditions[] has Ready=True.

Please inform me if there are any additional conditions.

In that case, I believe specifying these conditions using flags might be overly complex.

I came up with another approach with --template.

It may be possible to address this use case using the --template by adding the reference to the latest pod as a template variable and a helper function like podIsReady instead of adding the flags. What are your thoughts on this approach?

Although templates can be complex, this approach offers the flexibility to handle a wide range of use cases. You can use --template-file if you want to store the template in a separate file.

./dist/stern . --template '
{{- $pod := .Reference.GetPod -}}
{{- $hasReadinessGates := gt (len $pod.Spec.ReadinessGates) 0 -}}
{{- $hasReadinessProbe := false}}
{{- range $c := $pod.Spec.Containers -}}
  {{- if $c.ReadinessProbe -}}
    {{- $hasReadinessProbe = true -}}
  {{- end -}}
{{- end -}}
{{- if not (and (or $hasReadinessProbe $hasReadinessGates) (podIsReady $pod)) -}}
{{- .Message -}}{{"\n"}}
{{- end -}}
'

felipecrs · 2023-09-13T02:59:12Z

Isn't that a very costly operation? Checking the pod per log line as opposed to only per event?

felipecrs · 2023-09-13T03:53:56Z

But yes, this set of conditions satisfies my use case, which can be described in high level with something like:

I want Stern to tail logs from all pods which are not healthy.

If the pod does not have a health checker (readiness probe), then I want to tail its logs since I can't tell whether it's healthy or not.

Precisely, I want to run Stern as a side car process to my helm upgrade command within my CI/CD pipeline, and these conditions allows for tracking the helm installation as it happens more decently than just showing all pod logs no matter what.

In my use case, I have something around 50 pods to tail, but I install them incrementally (one helm chart per time). Filtering out pods which are healthy ensures I'm not showing logs for pods which were installed by previous helm installations.

That said, I believe more people could leverage the same approach.

If you really believe we should not be too narrow in the implementation, maybe we can invent some kind of --condition-template, which allows to run a template that should result in a "false" or "true" to decide whether the pod should be tailed or not. And in that case, I believe not even the helper podIsReady function needs to exist, as it can be easily implemented in the template itself.

tksm · 2023-09-13T06:08:37Z

Thanks for your detailed explanation!

I understand the use case, and it sounds worth implementing. @superbrothers What do you think?

I prefer creating a dedicated flag for this use case to improve clarity, but I find the term unhealthy in --only-unhealthy-pods somewhat ambiguous. How about using a more specific name like --exclude-readiness-succeeded.

felipecrs · 2023-09-13T20:03:34Z

I don't mind going the --condition-template approach BTW. :)

tksm · 2023-09-14T00:52:18Z

If you don't mind using a complex template, the --condition-template approach looks better since it can be used in a wide range of use cases.

For your reference, if you would have the pod variable in a condition template, the template for your use case would be something like below.

{{- $podIsReady := false -}}
{{- range $c := $pod.Status.Conditions -}}
  {{- if and (eq $c.Type "Ready") (eq $c.Status "True") -}}
    {{- $podIsReady = true -}}
  {{- end -}}
{{- end -}}

{{- $hasReadinessProbe := false}}
{{- range $c := $pod.Spec.Containers -}}
  {{- if $c.ReadinessProbe -}}
    {{- $hasReadinessProbe = true -}}
  {{- end -}}
{{- end -}}

{{- $hasReadinessGates := gt (len $pod.Spec.ReadinessGates) 0 -}}

{{- not (and (or $hasReadinessProbe $hasReadinessGates) ($podIsReady))}}

It would be great if it had a debug functionality, such as the ability to print a rendered template. For example, I debug my template by showing variables like below when using --template.

{{- if $skip -}}
[skip] {{.Message}}
{{- else -}}
{{colorGreen .Message}}
{{- end }} / ready={{$podIsReady}}, probe={{$hasReadinessProbe}}, gates={{$hasReadinessGates}}
{{- "\n" -}}

guettli · 2024-04-23T12:15:37Z

btw, I wrote a small tool to check all conditions in a cluster: https://github.com/guettli/check-conditions PRs are welcome.

Add --condition condition-name[=condition-value]

7552fbe

Clarify --help

4ae54ad

felipecrs changed the title ~~Add --condition condition-name[=condition-value]~~ Add support for --condition Aug 25, 2023

wip

a8192f2

Fix stern not stopping when ready

31cae76

Co-authored-by: Takashi Kusumi <[email protected]>

felipecrs force-pushed the condition branch from 1d57083 to 31cae76 Compare September 9, 2023 18:27

felipecrs and others added 2 commits September 10, 2023 13:17

Ensure --tail=0 is set

d39a3e7

Co-authored-by: Takashi Kusumi <[email protected]>

Update README

c19b7e0

felipecrs added 3 commits September 10, 2023 14:55

Add --only-condition-pods-with-readiness

7bf930d

Allow --tail different than 0 for --condition

03388c9

Update README example

2962e79

felipecrs changed the title ~~Add support for --condition~~ Add --condition Sep 10, 2023

Optimize condition comparison a little bit

ace7303

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `--condition` #276

Add `--condition` #276

felipecrs commented Aug 25, 2023 •

edited

felipecrs commented Aug 25, 2023 •

edited

superbrothers commented Sep 1, 2023

felipecrs commented Sep 1, 2023

superbrothers commented Sep 7, 2023

felipecrs commented Sep 8, 2023 •

edited

felipecrs commented Sep 8, 2023

felipecrs commented Sep 8, 2023 •

edited

tksm commented Sep 9, 2023

felipecrs commented Sep 9, 2023

tksm commented Sep 9, 2023 •

edited

felipecrs commented Sep 9, 2023

tksm commented Sep 10, 2023 •

edited

felipecrs commented Sep 10, 2023

felipecrs commented Sep 10, 2023 •

edited

felipecrs commented Sep 10, 2023 •

edited

felipecrs commented Sep 10, 2023 •

edited

felipecrs commented Sep 11, 2023

tksm commented Sep 13, 2023 •

edited

felipecrs commented Sep 13, 2023

felipecrs commented Sep 13, 2023

tksm commented Sep 13, 2023

felipecrs commented Sep 13, 2023

tksm commented Sep 14, 2023 •

edited

guettli commented Apr 23, 2024

Add --condition #276

Are you sure you want to change the base?

Add --condition #276

Conversation

felipecrs commented Aug 25, 2023 • edited

felipecrs commented Aug 25, 2023 • edited

superbrothers commented Sep 1, 2023

felipecrs commented Sep 1, 2023

superbrothers commented Sep 7, 2023

felipecrs commented Sep 8, 2023 • edited

felipecrs commented Sep 8, 2023

felipecrs commented Sep 8, 2023 • edited

tksm commented Sep 9, 2023

felipecrs commented Sep 9, 2023

tksm commented Sep 9, 2023 • edited

felipecrs commented Sep 9, 2023

tksm commented Sep 10, 2023 • edited

Config file

--tail=0 option

felipecrs commented Sep 10, 2023

felipecrs commented Sep 10, 2023 • edited

felipecrs commented Sep 10, 2023 • edited

felipecrs commented Sep 10, 2023 • edited

felipecrs commented Sep 11, 2023

tksm commented Sep 13, 2023 • edited

felipecrs commented Sep 13, 2023

felipecrs commented Sep 13, 2023

tksm commented Sep 13, 2023

felipecrs commented Sep 13, 2023

tksm commented Sep 14, 2023 • edited

guettli commented Apr 23, 2024

Add `--condition` #276

Add `--condition` #276

felipecrs commented Aug 25, 2023 •

edited

felipecrs commented Aug 25, 2023 •

edited

felipecrs commented Sep 8, 2023 •

edited

felipecrs commented Sep 8, 2023 •

edited

tksm commented Sep 9, 2023 •

edited

tksm commented Sep 10, 2023 •

edited

felipecrs commented Sep 10, 2023 •

edited

felipecrs commented Sep 10, 2023 •

edited

felipecrs commented Sep 10, 2023 •

edited

tksm commented Sep 13, 2023 •

edited

tksm commented Sep 14, 2023 •

edited