Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Disable django disk usage health check #7180

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

Podesta
Copy link

@Podesta Podesta commented Nov 25, 2023

Motivation and context

Currently, CVAT's server completely stops working once disk usage on the drive reaches over 90% usage. This is simply a default setting, that has never been changed: heatlh-check docs. As it is a django thing, it is not documented anywhere on CVAT docs.

Not only it should not be CVAT's responsibility to check disk usage, the check is not very useful, as it checks for percentage use. So even though I may have terabytes of free disk on my server, it still breaks it if I have over 90% usage.

There are several issues related to this, to name a few:
Fixes #5449
Related #5724, #5685, #6664, #6757, #6564

Maybe it makes sense to reintroduce a similar feature in the future, but as is, I share @azhavoro view here, that it creates more problems than it solves. But to be sensible I believe it should at least:

  1. Do not fail silently to the end user.
  2. Trigger a warning instead of effectively blocking the server.
  3. Check for both percent disk free and total disk free.

How has this been tested?

Images have been successfully built and used with the changes.

Checklist

  • I submit my changes into the develop branch
  • I have created a changelog fragment
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • I have linked related issues (see GitHub docs)
  • I have increased versions of npm packages if it is necessary
    (cvat-canvas,
    cvat-core,
    cvat-data and
    cvat-ui)

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.

@ValV
Copy link

ValV commented Jan 5, 2024

But is it necessary to completely disable max disk usage check? I agree that 90% is kinda inadequate threshold, but isn't it possible to set to some reasonable value, say, 99% (or in MB)?

My point is that such a health check is a good indicator that the host requires manual intervention. In scenario where we have a simple VM with only CVAT deployed, we can detect lack of disk space just in time. Assuming disk size check is disabled, we can detect the problem too late (when we have 0% of free space), so we may fail to log in to the VM to clean the disk.

@Podesta
Copy link
Author

Podesta commented Jan 29, 2024

I believe it's mostly about whether or not that should be cvat task to monitor disk usage. As far as I understand, that's just a default config that have been brought over from django by default. I don't think there was a conscious decision by the cvat team to say, we should block the server if someone uses over 90% of the disk.

But yeah, I agree that a better logic, taking into account both percentage usage and absolute free disk, would be a far better solution than the current implementation. I don't think django allows for that though.

As it stands, the current implementation, I see it as an issue. Nearly weekly there is someone that gets bitten by this.

Copy link

@gapsong gapsong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@bsekachev bsekachev mentioned this pull request May 1, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CVAT fails health check using >90% disk
3 participants