-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VCK integration proposal #123
Comments
Need to figure out a shared file storage for all the learner pods (required for many distributed learning methods) and a way to store the model results for our users. So this shared file storage, will be satisfied the PVC work? Or you explicitly need NFS under the covers? |
Many distributed learning methods required shared file storage to sync with the other workers. Currently all our workers are mounted on the same input and result bucket, so we have that satisfied. However, with VCK that pulls the data to HostPath, each K8s node will have their own path for the input and result directory. So we need to figure out a shared place where we should store the result files and other files that are required to shared among all the workers. With the PVC work, this definitely could be solved for the NFS use case because it is mounted with PV. However, for |
Integration Proposal
volumemanage
for VCKvolumemanage
resource and monitor it for completion before executing the training job workload.For more details, please refer to https://github.com/IBM/FfDL/blob/vck-patch/etc/examples/vck-integration.md
The text was updated successfully, but these errors were encountered: