VCK integration proposal #123

Tomcli · 2018-08-01T22:21:48Z

Integration Proposal

Implement a new module that handles creating the volumemanage for VCK
Insert logic to provision volumemanage resource and monitor it for completion before executing the training job workload.
To make it more elastic, we need to come up with some algorithm on how much data replicas we need for each job. Then create some labels/tags to allow users to reuse the same dataset volume.
Need to figure out a shared file storage for all the learner pods (required for many distributed learning methods) and a way to store the model results for our users.

For more details, please refer to https://github.com/IBM/FfDL/blob/vck-patch/etc/examples/vck-integration.md

The text was updated successfully, but these errors were encountered:

animeshsingh · 2018-08-03T19:32:36Z

Need to figure out a shared file storage for all the learner pods (required for many distributed learning methods) and a way to store the model results for our users.

So this shared file storage, will be satisfied the PVC work? Or you explicitly need NFS under the covers?

Tomcli · 2018-08-03T20:46:52Z

Many distributed learning methods required shared file storage to sync with the other workers. Currently all our workers are mounted on the same input and result bucket, so we have that satisfied. However, with VCK that pulls the data to HostPath, each K8s node will have their own path for the input and result directory. So we need to figure out a shared place where we should store the result files and other files that are required to shared among all the workers.

With the PVC work, this definitely could be solved for the NFS use case because it is mounted with PV. However, for S3 or Pachyderm using VCK we still have the same issue since VCK technically create replicas in the HostPath for the files (can be from multiple sources) that you want to cache.

Tomcli added the enhancement New feature or request label Aug 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VCK integration proposal #123

VCK integration proposal #123

Tomcli commented Aug 1, 2018 •

edited

animeshsingh commented Aug 3, 2018

Tomcli commented Aug 3, 2018

VCK integration proposal #123

VCK integration proposal #123

Comments

Tomcli commented Aug 1, 2018 • edited

Integration Proposal

animeshsingh commented Aug 3, 2018

Tomcli commented Aug 3, 2018

Tomcli commented Aug 1, 2018 •

edited