Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seems that images are not being cached in RAM when cache=ram #9871

Closed
1 task done
PacificDou opened this issue Apr 8, 2024 · 6 comments
Closed
1 task done

Seems that images are not being cached in RAM when cache=ram #9871

PacificDou opened this issue Apr 8, 2024 · 6 comments
Labels
fixed Bug has been resolved question Further information is requested Stale

Comments

@PacificDou
Copy link
Contributor

PacificDou commented Apr 8, 2024

Search before asking

Question

When init the trainer, there is an option cache for switching on/off cache on ram/disk, acceptable values are ram/True, disk, False. https://docs.ultralytics.com/modes/train/#train-settings

For example, DetectionTrainer will init a YOLODataset object, and then wrap it in an InfiniteDataloader.
The cache parameter was set during initialling the YOLODataset object, which inherits BaseDataset.

In BaseDataset's constructor, the image dataset will be cached to RAM/Disk.

For case of cache=ram, function load_image will be called for each image: load to RAM, clear old buffer if len(self.buffer) >= self.max_buffer_length;
For case of cache=disk, function cache_images_to_disk will be called for each image: load to RAM, save as numpy file.

Thus, in case of cache=ram, after initialisation of BaseDataset, there will be only self.max_buffer_length images in RAM (and buffer), not the whole dataset.

In addition, because of the following instruction in the BaseDataset's constructor, the buffer size will be capped at 1000. So if a dataset has more than 1000 images (and sufficient RAM), we still can NOT benefit from reduced disk IO.

self.max_buffer_length = min((self.ni, self.batch_size * 8, 1000)) if self.augment else 0

Additional

No response

@PacificDou PacificDou added the question Further information is requested label Apr 8, 2024
@glenn-jocher
Copy link
Member

@PacificDou thanks for raising an issue. This was resolved recently in 8.1.45 in #9828.

@glenn-jocher glenn-jocher added the fixed Bug has been resolved label Apr 8, 2024
@PacificDou
Copy link
Contributor Author

@glenn-jocher That's great!

As a follow-up, do you think it's a good idea to cache images in RAM up to user-specified limit?
For example, if the user specifies 4GB, then we only cache the dataset up to 4GB.

In current version, we only have two options: either load whole dataset to RAM, or give up RAM cache, because of function check_cache_ram. It would be more flexible if user can specify the maximum RAM he wants to use.

If you agree with this idea, I can try to submit a PR later for this.

@glenn-jocher
Copy link
Member

@PacificDou hi there! 😊

That's indeed an interesting thought! Allowing users to specify a maximum RAM limit for caching images sounds like a valuable enhancement to offer more flexibility and control. It could especially benefit users with limited resources or those managing large datasets.

If you're up for it, we'd certainly welcome a PR on this feature. Your contribution could make a big difference to users looking for that sweet spot between performance and resource usage. Just ensure that the implementation is user-friendly and integrates smoothly with the existing setup.

Looking forward to seeing your ideas come to life in a PR! 🚀

@PacificDou
Copy link
Contributor Author

Hi @glenn-jocher , here is the PR for add memory cache limit control: #10258

@glenn-jocher
Copy link
Member

@PacificDou, thanks for submitting the PR! 🌟 We'll review it shortly to ensure everything aligns with our vision for flexible and efficient data handling. This addition could indeed provide a valuable improvement for users working with diverse datasets and hardware configurations.

Stay tuned for feedback or further instructions! 🛠️

Copy link

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

@github-actions github-actions bot added the Stale label May 24, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fixed Bug has been resolved question Further information is requested Stale
Projects
None yet
Development

No branches or pull requests

2 participants