Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

馃挕[feat] Adding an option to specify docker volumes besides bind_mounts. #5799

Open
chjz1024 opened this issue Jan 23, 2023 · 1 comment
Labels
feature Feature requests

Comments

@chjz1024
Copy link

chjz1024 commented Jan 23, 2023

Describe the problem

Currently an experiment must specify the bind_mounts option in order to reuse existing files (like datasets) in the host. However, each agent must also have the same copy of files in the path specified by host_path to guarentee the same bahavior for each experiment. This is painful and problematic when the number of agents increases, even if NFS service is deployed to ensure data consistency e.g. bind-mounting a subdirectory of an existing NFS share seems to raise a permission problem.

The root cause is that bind_mounts option cannot specify the storage driver. In addition, many modern cloud storage solutions like HDFS of Hadoop and custom Object Storage of AWS, GCP, Azure, Alibaba must use a custom driver to be mounted as normal file storage in the container. The docker volume provides such solutions via the --mount option. For example, we can specify to use the NFS volume driver in the command line like docker run --mount 'type=volume,src=<VOLUME-NAME>,dst=<CONTAINER-PATH>,volume-driver=local,volume-opt=type=nfs,volume-opt=device=<nfs-server>:<nfs-path>,"volume-opt=o=addr=<nfs-address>,vers=4,soft,timeo=180,bg,tcp,rw"' <image> <command>. In this case, we no longer need to manually mount the NFS share in each agent or download files in each experiment or modify existing DataLoaders. Furthermore, it's also possible to use an existing cloud storage service like normal files, in which the Quality of Service is instead managed by the storage driver.

Describe the solution you'd like

Have not read the source code of this project, but I'm guessing the config is translated into raw docker run commands? In this case simply adding the --mount translation should work.

Describe alternatives you've considered

The biggest problem to me is how to easily use existing cloud storage service. So a specific solution for a common cloud storage service like NFS is also acceptable.

Additional context

Also add the option to mount tmpfs?

@chjz1024 chjz1024 added the feature Feature requests label Jan 23, 2023
@rb-determined-ai
Copy link
Member

This is a reasonable feature request, I'll make an internal ticket for it.

But also I want to point out that we normally don't recommend using network-mounted filesystems for training, for performance reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Feature requests
Projects
None yet
Development

No branches or pull requests

2 participants