horovodrun errors in jenkins automation server #3419
Unanswered
chengchen666
asked this question in
Q&A
Replies: 1 comment 1 reply
-
Hi @Magnus1cheng! I think the shell So a way to fix this problem would be to change your Jenkins pipeline so it doesn't expose that Groovy or Java variable as an environment variable. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
My command is
horovodrun -np 2 --verbose pytest test_torch.py::TorchTests::test_horovod_allreduce
In my local machine, everything runs perfectly fine. But when I put it into CI test automation server (I was using jenkins), it shows
[0]<stderr>:/bin/sh: library.jenkins_pipeline_shared_lib.version=sr: command not found
.As I add
--verbose
option , I found outhorovodrun
was just a command that generates another command with more detailed infomation, including env. variables. And somehow, some jenkins info. were also included. Part of the verbosed command is followingRUN_DISPLAY_URL=http://10.115.0.218:8081/job/UT_horovod/43/display/redirect PWD=/opt/tests/horovod/examples/pytorch CUDA_VISIBLE_DEVICES=0,1 RUN_TESTS_DISPLAY_URL='http://10.115.0.218:8081/job/UT_horovod/43/display/redirect?page=tests' SHLVL=3 HOME=/root CI=true library.jenkins_pipeline_shared_lib.version=sr JENKINS_SERVER_COOKIE=durable-72e46c56f30b7247ea650c524cdfaad51de6091a3a5c90b8fa8399ef8c1b05f4 EXECUTOR_NUMBER=0 WORKSPACE_TMP=/home/workspace/UT_horovod@tmp PACKAGE_URL='' NODE_LABELS='219_chip bi_daily_test' PYTHONPATH=/opt/apps/local/lib64/python3/dist-packages
So the part
library.jenkins_pipeline_shared_lib.version=sr
in the generated command cannot be properly treated like a shell command.I also tried some settings in my CI test server but still couldn't hide that. So my question is, how did
horovodrun
include that, and how can I modify or filter such things in horovod.Beta Was this translation helpful? Give feedback.
All reactions