-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[loki.source.kubernetes] restart of an alloy pod doubles the amount of logs in the log volume #876
Comments
FYI, I also tried to use a persistant volume in the hope that the state of the tailing would be saved there and that the component would pick up where he left. but the effect on the logs volume is the same with persistent volumes. |
The same effect happens when you are running the components in cluster mode. if you then restart one of the pods, I assume the load is passed to other pod, but the position file is not known by the other pod. so I seems that it just starts from the beginning again. causing a lott of fuss in loki and in the memory of the alloy pods. |
Could it be that the /data storage should be a shared storage over all the pods? so that all pods can write in the same position file? so that if one pods stops, an other pod in the cluster can take over the load and start where the other left? Can you please check/advice? |
What's wrong?
I'm running loki.source.kubernetes in cluster mode with auto scaling enabled. everything works as expected until one of the pods restarts or an additional pod is added. from that moment the logs volume in the grafana explorer is doubled. (however the amount of logs listed in the log sections stays the same) this means that logs are not double ingested according to the logs section, but you can not trust the logs volume section anymore because these values have doubled.
amount of logs before an alloy pod restarts:
![image](https://private-user-images.githubusercontent.com/109516706/331173375-7214b74e-534c-4ee0-8913-d899a7ae85fe.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTg2MTA2MzAsIm5iZiI6MTcxODYxMDMzMCwicGF0aCI6Ii8xMDk1MTY3MDYvMzMxMTczMzc1LTcyMTRiNzRlLTUzNGMtNGVlMC04OTEzLWQ4OTlhN2FlODVmZS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNjE3JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDYxN1QwNzQ1MzBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1lNzNjNzQ5OTFiYzEzOGJlZjM4NjZkOTRiMGE4ZGY0NzFkZWU4MWZiYWViYjE5ZTJiMzAzYmUxMTA5Y2FjNjJkJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.m2BIt324Mi-UTZ03blJK2JyCAAGLMm5SwiTXKiwcHu8)
![image](https://private-user-images.githubusercontent.com/109516706/331174551-f270d203-1cef-4b77-b9ad-de3a38b9bd7d.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTg2MTA2MzAsIm5iZiI6MTcxODYxMDMzMCwicGF0aCI6Ii8xMDk1MTY3MDYvMzMxMTc0NTUxLWYyNzBkMjAzLTFjZWYtNGI3Ny1iOWFkLWRlM2EzOGI5YmQ3ZC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNjE3JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDYxN1QwNzQ1MzBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1lMDc1ZWJjNWI2OGI1OWI5MjZmYThmMzFmNWIwYjAzN2E4MWM0OWU1NGI1OGI3ZjQ5Mzg5NTExMjFkZjRlNTA5JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.f6AXy4eMX78aEFanQPrNcDfRa2ofZLwDqFEVmCZ7F6Q)
amount of logs after an alloy pod has restarted:
It seems that a pod only knows it's own tailing files and that if it restarts, it forgot where it left, and start to sent all the logs again.
Could it be that I missed something? or is this maybe a bug in the component?
Thanks for checking.
Steps to reproduce
deploy an alloy component in cluster mode (statefullset)
Sent logs using the loki.source.kubernetes component, and restart the pods.
System information
kubernetes
Software version
Grafana Alloy v1.1.0
Configuration
Logs
No response
The text was updated successfully, but these errors were encountered: