-
Notifications
You must be signed in to change notification settings - Fork 607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Webdataset reader behavior with many sources #5429
Comments
Hi @evgeniishch, Thank you for reaching out.
Abstracting away sharding (where each pipeline is assigned to a separate, non-overlapping shard of data) reading is done in sequence in each pipeline.
DALI uses an internal buffer of fixed size ( |
Describe the question.
nvidia.dali.fn.readers.webdataset supports reading from multiple tar files, specified as a list of paths
How is reading from multiple sources performed? Are all sources read sequentially one after another?
What happens when
random_shuffle
parameter is set toTrue
? Are samples drawn to buffer from one source or from all sources with some distribution?Thank you
Check for duplicates
The text was updated successfully, but these errors were encountered: