Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Graceful handling of WAL disk space exhaustion #4521

Closed
2 tasks done
leonardoce opened this issue May 13, 2024 · 2 comments · Fixed by #4404
Closed
2 tasks done

[Feature]: Graceful handling of WAL disk space exhaustion #4521

leonardoce opened this issue May 13, 2024 · 2 comments · Fixed by #4404
Assignees
Milestone

Comments

@leonardoce
Copy link
Contributor

Is there an existing issue already for this feature request/idea?

  • I have searched for an existing issue, and could not find anything. I believe this is a new feature request to be evaluated.

What problem is this feature going to solve? Why should it be added?

PostgreSQL will cleanly shut down when there's no space left for WAL files.
The operator will perceive that condition as a failure of the primary instance, leading to a failover.

Failing over will not help because every other replica will have the same error condition.

Describe the solution you'd like

Having the operator automatically fence the primary would help because it would prevent any automatic failover from happening and would give the user time to increase the WAL disk space, fixing the root cause of the issue.

Describe alternatives you've considered

Automatically scaling up the disk space will work, too, but I feel this as a different and broader topic.

Additional context

No response

Backport?

No

Are you willing to actively contribute to this feature?

Yes

Code of Conduct

  • I agree to follow this project's Code of Conduct
@gbartolini
Copy link
Contributor

Closes #3775

@mnencia mnencia removed the triage Pending triage label May 27, 2024
@NiharDudam
Copy link

Hi team, are we also planning to track disk statistics in cluster CR?

As a consumer I would want to track the disk usages and warn my customers that disk is filling up at certain threshold and then let cloud native pg operator shutdown the postgres when disk actually fills up

leonardoce added a commit that referenced this issue Jun 4, 2024
PostgreSQL will shut down cleanly when there is not enough disk space to
store WAL files.

The operator did not recognize this condition and, since the primary
failed, was performing a failover to the most advanced replica. This
action will not fix the underlying issue.

Only a manual disk resize, initiated by the user, can ultimately lead to
a fully working PostgreSQL cluster.

This patch makes the instance manager recognize this condition and
report it to the operator. Upon detecting it, the operator will not
trigger a switchover and set a phase describing the situation.

After the PVCs are resized, the cluster will restart working correctly.

Closes: #4521

Signed-off-by: Leonardo Cecchi <[email protected]>
Signed-off-by: Francesco Canovai <[email protected]>
Signed-off-by: Armando Ruocco <[email protected]>
Signed-off-by: Jaime Silvela <[email protected]>
Signed-off-by: Gabriele Bartolini <[email protected]>
Co-authored-by: Leonardo Cecchi <[email protected]>
Co-authored-by: Francesco Canovai <[email protected]>
Co-authored-by: Armando Ruocco <[email protected]>
Co-authored-by: Jaime Silvela <[email protected]>
Co-authored-by: Gabriele Bartolini <[email protected]>
@gbartolini gbartolini modified the milestones: 1.23.2, 1.24.0 Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants