-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
example failed: examples/tensorflow/criteo_deeprec/manual_job.yaml #1136
Comments
这个例子我已在 PR #1141 中修复了。你可以按如下步骤
这个job 将有如下 Pods
当 chief-0 和 worker-0 开始运行后,可以手动扩容增加一个worker
然后会看到有个新的worker-1
如果不成功的话,可以确认下这个 master pod elasticjob-deepctr-manual-scale-dlrover-master 的镜像是不是 registry.cn-hangzhou.aliyuncs.com/intell-ai/dlrover:master |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
环境
问题
执行了
kubectl apply -f examples/tensorflow/criteo_deeprec/manual_job.yaml
,worker 节点一直未出现,只有一个 master 在出现了几千条 scanPlan 数据
且这些 scanPlan 的数据都是空的 :
请问如何才能验证一个 tensorflow 的弹性,无论是手工的还是自动的
The text was updated successfully, but these errors were encountered: