[loki.source.kubernetes] restart of an alloy pod doubles the amount of logs in the log volume #876

ToonTijtgat2 · 2024-05-16T10:58:47Z

What's wrong?

I'm running loki.source.kubernetes in cluster mode with auto scaling enabled. everything works as expected until one of the pods restarts or an additional pod is added. from that moment the logs volume in the grafana explorer is doubled. (however the amount of logs listed in the log sections stays the same) this means that logs are not double ingested according to the logs section, but you can not trust the logs volume section anymore because these values have doubled.

amount of logs before an alloy pod restarts:

amount of logs after an alloy pod has restarted:

It seems that a pod only knows it's own tailing files and that if it restarts, it forgot where it left, and start to sent all the logs again.

Could it be that I missed something? or is this maybe a bug in the component?

Thanks for checking.

Steps to reproduce

deploy an alloy component in cluster mode (statefullset)
Sent logs using the loki.source.kubernetes component, and restart the pods.

System information

kubernetes

Software version

Grafana Alloy v1.1.0

Configuration

discovery.kubernetes "podstolog" {
        role = "pod"
      }

      discovery.relabel "podstologdropistio" {
        targets = discovery.kubernetes.podstolog.targets

        rule {
          source_labels = ["__meta_kubernetes_pod_container_name"]
          regex = "istio-proxy|istio-init"
          action = "drop"
        }
        rule {
          source_labels = ["__meta_kubernetes_pod_container_name"]
          target_label = "container_name"
        }
        rule {
          source_labels = ["__meta_kubernetes_pod_node_name"]
          target_label = "host"
        }
        rule {
          source_labels = ["__meta_kubernetes_namespace"]
          target_label = "namespace_name"
        }
        rule {
          source_labels = ["__meta_kubernetes_pod_name"]
          target_label = "pod_name"
        }
        rule {
          source_labels = ["__meta_kubernetes_pod_label_app"]
          target_label = "app"
        }
        rule {
          source_labels = ["__meta_kubernetes_pod_label_version"]
          target_label = "version"
        }
        rule {
          source_labels = ["__meta_kubernetes_pod_label_configuration_version"]
          target_label = "config_version"
        }
        rule {
          source_labels = ["__meta_kubernetes_pod_label_database_version"]
          target_label = "database_version"
        }
      }

      loki.source.kubernetes "podlogging" {
        targets = discovery.relabel.podstologdropistio.output
        forward_to = [loki.relabel.dropnotneededlabel.receiver]
        clustering {
          enabled = true
        }
      }

      loki.relabel "dropnotneededlabel" {
        forward_to = [loki.write.lokimandev.receiver]
        rule {
          action = "labeldrop"
          regex = "job|instance"
        }
      }

      loki.write "lokimandev" {
        endpoint {
      		url       = "https://xxx/loki/api/v1/push"
      		tenant_id = "logging-testalloy-dev"
      	}
      }

Logs

No response

The text was updated successfully, but these errors were encountered:

ToonTijtgat2 · 2024-05-16T14:22:30Z

FYI, I also tried to use a persistant volume in the hope that the state of the tailing would be saved there and that the component would pick up where he left. but the effect on the logs volume is the same with persistent volumes.

ToonTijtgat2 · 2024-05-30T13:57:45Z

The same effect happens when you are running the components in cluster mode. if you then restart one of the pods, I assume the load is passed to other pod, but the position file is not known by the other pod. so I seems that it just starts from the beginning again. causing a lott of fuss in loki and in the memory of the alloy pods.

ToonTijtgat2 · 2024-05-30T13:58:53Z

Could it be that the /data storage should be a shared storage over all the pods? so that all pods can write in the same position file? so that if one pods stops, an other pod in the cluster can take over the load and start where the other left?

Can you please check/advice?

ToonTijtgat2 added the bug Something isn't working label May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[loki.source.kubernetes] restart of an alloy pod doubles the amount of logs in the log volume #876

[loki.source.kubernetes] restart of an alloy pod doubles the amount of logs in the log volume #876

ToonTijtgat2 commented May 16, 2024

ToonTijtgat2 commented May 16, 2024

ToonTijtgat2 commented May 30, 2024

ToonTijtgat2 commented May 30, 2024

[loki.source.kubernetes] restart of an alloy pod doubles the amount of logs in the log volume #876

[loki.source.kubernetes] restart of an alloy pod doubles the amount of logs in the log volume #876

Comments

ToonTijtgat2 commented May 16, 2024

What's wrong?

Steps to reproduce

System information

Software version

Configuration

Logs

ToonTijtgat2 commented May 16, 2024

ToonTijtgat2 commented May 30, 2024

ToonTijtgat2 commented May 30, 2024