Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address family not supported by protocol Error #215

Open
mehulparmariitr opened this issue Mar 16, 2024 · 0 comments
Open

Address family not supported by protocol Error #215

mehulparmariitr opened this issue Mar 16, 2024 · 0 comments

Comments

@mehulparmariitr
Copy link

On running samples I am getting this error. I want to generate code context/documentation in simple language when provided a code in java. For that is codellama better or llama?

myenv) [10:52]:[mehparmar@py029:codellama-main]$ torchrun --nproc_per_node 1 example_infilling.py \
>     --ckpt_dir CodeLlama-7b/ \
>     --tokenizer_path CodeLlama-7b/tokenizer.model \
>     --max_seq_len 192 --max_batch_size 4
[W socket.cpp:464] [c10d] The server socket cannot be initialized on [::]:29500 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:697] [c10d] The client socket cannot be initialized to connect to [localhost]:29500 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:697] [c10d] The client socket cannot be initialized to connect to [localhost]:29500 (errno: 97 - Address family not supported by protocol).
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Traceback (most recent call last):
  File "example_infilling.py", line 79, in <module>
    fire.Fire(main)
  File "/home/mehparmar/.conda/envs/myenv/lib/python3.8/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/mehparmar/.conda/envs/myenv/lib/python3.8/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/mehparmar/.conda/envs/myenv/lib/python3.8/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "example_infilling.py", line 18, in main
    generator = Llama.build(
  File "/vol/etl_jupyterdata1/home/github/public/Sreeramm/codellama-main/llama/generation.py", line 97, in build
    assert len(checkpoints) > 0, f"no checkpoint files found in {ckpt_dir}"
AssertionError: no checkpoint files found in CodeLlama-7b/
[2024-03-16 10:54:20,433] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 75378) of binary: /home/mehparmar/.conda/envs/myenv/bin/python
Traceback (most recent call last):
  File "/home/mehparmar/.conda/envs/myenv/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/mehparmar/.conda/envs/myenv/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
  File "/home/mehparmar/.conda/envs/myenv/lib/python3.8/site-packages/torch/distributed/run.py", line 812, in main
    run(args)
  File "/home/mehparmar/.conda/envs/myenv/lib/python3.8/site-packages/torch/distributed/run.py", line 803, in run
    elastic_launch(
  File "/home/mehparmar/.conda/envs/myenv/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 135, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/mehparmar/.conda/envs/myenv/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
example_infilling.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-03-16_10:54:20
  host      : py029.lvs.abc.com
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 75378)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant