You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ConnectionRefusedError: [Errno 111] Connection refused and wandb.sdk.wandb_manager.ManagerConnectionRefusedError: Connection to wandb service failed since the process is not available.
#7329
Open
DavidoF3 opened this issue
Apr 8, 2024
· 0 comments
I am trying to resume a wandb run on a SLURM job, by running:
run = wandb.init(
entity=entity,
project=project,
id=<id_of_run>,
resume="must",
)
I pass my API key with export WANDB_API_KEY= in the sbatch script (based on other issues).
I get the errors below:
Traceback (most recent call last):
File "/rds/project/rds-pRsZozWgfxI/user_files/doc_files/repos/octiocor-train-segmenter/venv_noapp/lib/python3.9/site-packages/wandb/sdk/wandb_manager.py", line 116, in _service_connect
svc_iface._svc_connect(port=port)
File "/rds/project/rds-pRsZozWgfxI/user_files/doc_files/repos/octiocor-train-segmenter/venv_noapp/lib/python3.9/site-packages/wandb/sdk/service/service_sock.py", line 30, in _svc_connect
self._sock_client.connect(port=port)
File "/rds/project/rds-pRsZozWgfxI/user_files/doc_files/repos/octiocor-train-segmenter/venv_noapp/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 102, in connect
s.connect(("localhost", port))
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/rds/project/rds-pRsZozWgfxI/user_files/doc_files/repos/octiocor-train-segmenter/venv_noapp/lib/python3.9/site-packages/versiontools/wandb_utils.py", line 65, in resume_wandb_run
run = instanciate_resume_run(
File "/rds/project/rds-pRsZozWgfxI/user_files/doc_files/repos/octiocor-train-segmenter/venv_noapp/lib/python3.9/site-packages/versiontools/wandb_utils.py", line 48, in instanciate_resume_run
wandb.setup(settings=dict(_executable=sys.executable))
File "/rds/project/rds-pRsZozWgfxI/user_files/doc_files/repos/octiocor-train-segmenter/venv_noapp/lib/python3.9/site-packages/wandb/sdk/wandb_setup.py", line 327, in setup
ret = _setup(settings=settings)
File "/rds/project/rds-pRsZozWgfxI/user_files/doc_files/repos/octiocor-train-segmenter/venv_noapp/lib/python3.9/site-packages/wandb/sdk/wandb_setup.py", line 320, in _setup
wl = _WandbSetup(settings=settings)
File "/rds/project/rds-pRsZozWgfxI/user_files/doc_files/repos/octiocor-train-segmenter/venv_noapp/lib/python3.9/site-packages/wandb/sdk/wandb_setup.py", line 303, in __init__
_WandbSetup._instance = _WandbSetup__WandbSetup(settings=settings, pid=pid)
File "/rds/project/rds-pRsZozWgfxI/user_files/doc_files/repos/octiocor-train-segmenter/venv_noapp/lib/python3.9/site-packages/wandb/sdk/wandb_setup.py", line 114, in __init__
self._setup()
File "/rds/project/rds-pRsZozWgfxI/user_files/doc_files/repos/octiocor-train-segmenter/venv_noapp/lib/python3.9/site-packages/wandb/sdk/wandb_setup.py", line 250, in _setup
self._setup_manager()
File "/rds/project/rds-pRsZozWgfxI/user_files/doc_files/repos/octiocor-train-segmenter/venv_noapp/lib/python3.9/site-packages/wandb/sdk/wandb_setup.py", line 277, in _setup_manager
self._manager = wandb_manager._Manager(settings=self._settings)
File "/rds/project/rds-pRsZozWgfxI/user_files/doc_files/repos/octiocor-train-segmenter/venv_noapp/lib/python3.9/site-packages/wandb/sdk/wandb_manager.py", line 153, in __init__
wandb._sentry.reraise(e)
File "/rds/project/rds-pRsZozWgfxI/user_files/doc_files/repos/octiocor-train-segmenter/venv_noapp/lib/python3.9/site-packages/wandb/analytics/sentry.py", line 154, in reraise
raise exc.with_traceback(sys.exc_info()[2])
File "/rds/project/rds-pRsZozWgfxI/user_files/doc_files/repos/octiocor-train-segmenter/venv_noapp/lib/python3.9/site-packages/wandb/sdk/wandb_manager.py", line 151, in __init__
self._service_connect()
File "/rds/project/rds-pRsZozWgfxI/user_files/doc_files/repos/octiocor-train-segmenter/venv_noapp/lib/python3.9/site-packages/wandb/sdk/wandb_manager.py", line 125, in _service_connect
raise ManagerConnectionRefusedError(message)
wandb.sdk.wandb_manager.ManagerConnectionRefusedError: Connection to wandb service failed since the process is not available.
Additional Files
No response
Environment
WandB version: 0.16.3
OS: Linux Rocky
Python version: 3.9.17
Versions of relevant libraries:
Additional Context
No response
The text was updated successfully, but these errors were encountered:
Describe the bug
I am trying to resume a wandb run on a SLURM job, by running:
I pass my API key with
export WANDB_API_KEY=
in the sbatch script (based on other issues).I get the errors below:
Additional Files
No response
Environment
WandB version: 0.16.3
OS: Linux Rocky
Python version: 3.9.17
Versions of relevant libraries:
Additional Context
No response
The text was updated successfully, but these errors were encountered: