-
Notifications
You must be signed in to change notification settings - Fork 607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why Decoder consumes 24 CPU cores even if using "mixed" device #5274
Comments
Hi @mengwanguc, Thank you for reaching out. Please read this blog post to learn how image decoding is accelerated in detail. |
Thanks @JanuszL ! This makes a lot of sense! |
Hi @JanuszL , Sorry I have a follow-up question. I observed that when I use more CPU threads, the GPU memory consumption by DALI also increases, even though I use the same batch size. For example, when I use batch size 64. If I use 4 CPU threads, the GPU memory consumption by DALI will keep increasing and stop at 2.4GB Is it expected? If so, why does this happen since they are using the same batch size? |
It comes from how DALI uses the nvJPEG library. We create one instance of the decoder per CPU thread to improve the performance of the serial CPU part. Each instance uses CPU and GPU memory for the decoding so even if you are using the same batch size the memory consumption will grow with the number of threads when you use a hybrid image decoder. |
Describe the question.
Hi,
I'm using DALI to preprocess the Imagenet data. I have all my data cached in memory. I want to test the performance of GPU preprocessing.
I'm using batch size 256, and thread=64 (I make threads large enough so that CPU is not the bottleneck).
My pipeline only has reader and decoder. And I don't train any model.
And I'm using device='cpu' for reader, and device='mixed' for decoder per advised by the documentation: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/image_processing/decoder_examples.html
However, I found that DALI is consuming 2400% CPU usage, which means 24 CPU cores.
I'm surprised because I thought DALI will offload the all the decoding to GPU, so the CPU usage should be low.
I can see it's indeed offloaded to GPU though, as I see the GPU utilization be ~54%.
But I don't understand what is consuming so much CPU resources?
I understand that there is some memory copy that could take CPU, e.g. copying data from pageable memory to pinned_memory. But I don't think it will consume 2400% CPU.
This is my pipeline:
Thanks!
Check for duplicates
The text was updated successfully, but these errors were encountered: