Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REQUEST] TFDatasetMultiShotMemorySampler for custom datasets #300

Open
Lunatik00 opened this issue Nov 21, 2022 · 1 comment
Open

[REQUEST] TFDatasetMultiShotMemorySampler for custom datasets #300

Lunatik00 opened this issue Nov 21, 2022 · 1 comment

Comments

@Lunatik00
Copy link
Contributor

hi, I am testing using different dataflows for training, I have tested using the sampler and the dataset (using tf.keras.utils.image_dataset_from_directory), I have found that loading the data and feeding it to the sampler ends up with a very different max batch size for the same gpu, 20 in one, over 30 in other, the dataset is the one that can have the most, but the data is not divided in a good way per batch, so, I want to try the dataset as the input for the memory sampler, but the current function is made to only download a non custom dataset, I will try modifications to make it work but I don't think my code will be generic and I haven't used overload functions before so I leave this as a request that should be simple to implement.

@owenvallis
Copy link
Collaborator

Hi Lunatik00, apologies for the slow response. We currently support loading custom data using the MultiShotMemorySampler. The data is loaded into memory and properly sampled over the classes to ensure that the batches are created correctly. However, some datasets can be too large to hold in memory, e.g., larger image datasets.

Fortunately, we just had a recent PR that adds support for loading examples from disk, see here. You'll need to pass the paths to your examples as the x input and then the load function will take that path and load the example from disk when constructing the batches.

Hopefully this helps, but let me know if you run into issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants