Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disk read and write too fast, resulting in server lag, how to reduce read and write speed #1267

Open
frankst-debug opened this issue Oct 18, 2022 · 0 comments

Comments

@frankst-debug
Copy link

❓ Questions and Help

I use two gpu to run code on vqav2 dataset using movie_mcan model, the gpu memory is not enough so the batch_size is set to 16, but every time I run the code will cause the server abnormal lag, I use sar -d 3 5 to check the disk read and write, I found that the read speed is very fast, how to improve this problem, when the lag I can't do any operation.
This is my training code
CUDA_VISIBLE_DEVICES=2,3 mmf_run config=projects/movie_mcan/configsqa2/defaults.yaml model=movie_mcan dataset=vqa2 run_type=train env.cache_dir=/data/students/zzj/ env.data_dir=/data/students/zzj/ training.batch_size=16

Here are the read and write speeds
8Q2OJIC30({P~ Y2(XJ08QD
51AFL@1@S9U~RZH48G5P7AD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant