New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ever-growing WAL folder #14002
Comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
What did you do?
I have prometheus version 2.46 installed with the follwoing configuration..
I deleted the WAL folder as it took more than 7 hours for a replay to be compelted
ts=2024-04-26T11:40:10.556Z caller=head.go:792 level=info component=tsdb msg="WAL replay completed" checkpoint_replay_duration=4m27.932583409s wal_replay_duration=7h15m17.527240298s wbl_replay_duration=205ns total_replay_duration=7h23m9.09340072s
There is enough mem and CPU on the machine where the prometheus is running.
This is the stats from prometheus
I am unable to find a right configuration for Prometheus to stop it from growing WAL forever. When I deleted the WAL folder it had files as old 20 days.
Is there any confiuration that I have wrong?
The current WAL size in 4 hours is 140GB.
Upgrade is planned, but would it be the solution for the problem here?
Thanks
What did you expect to see?
The WAL is only retained for 2 hours or so.
What did you see instead? Under which circumstances?
The WAL folder not cleaned up.
System information
Linux 6.1.58+ x86_64
Prometheus version
Prometheus configuration file
No response
Alertmanager version
No response
Alertmanager configuration file
No response
Logs
No response
The text was updated successfully, but these errors were encountered: