Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mamba Installation Failed; PyTorch+ROCm version 6.0 & 6.1 not working #412

Open
eliranwong opened this issue Jun 20, 2024 · 9 comments
Open

Comments

@eliranwong
Copy link

Mamba Installation Failed; PyTorch+ROCm version 6.0 & 6.1 not working

I tried to install mamba with two containers on Ubuntu 22.04 LTS, one with ROCm 6.0.2 & PyTorch+rocm6.0 installed, another with ROCm 6.1.2 & PyTorch+rocm6.1 installed.

Notes on my ROCm 6.1.2 setup: https://github.com/eliranwong/incus_container_gui_setup/blob/main/ubuntu_22.04_LTS_latest_rocm_tested.md

Notes on my ROCM 6.0.2 setup: https://github.com/eliranwong/incus_container_gui_setup/blob/main/ubuntu_22.04_LTS_tested.md

I already applied the path https://github.com/state-spaces/mamba/blob/main/rocm_patch/rocm6_0.patch in container running 6.0.2.

When I run pip install mamba-ssm, I encountered errors:

With PyTorch + rocm 6.0

          with open(fin_path, encoding='utf-8') as fin:
      FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pip-install-iy9acbtt/mamba-ssm_aae2c1df8bb54f62a59b41fd74fafbe0/csrc/selective_scan/selective_scan.cpp'
      
      
      torch.__version__  = 2.3.1+rocm6.0

With PyTorch + rocm 6.1

          with open(fin_path, encoding='utf-8') as fin:
      FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pip-install-onfy5yn9/mamba-ssm_70828647adec4a73aa94f62ff7a0c1d1/csrc/selective_scan/selective_scan.cpp'
      
      
      torch.__version__  = 2.5.0.dev20240618+rocm6.1
@eliranwong
Copy link
Author

I just tried to install directly on the host, but no luck, same errors:

...
        File "/home/eliran/apps/mamba/lib/python3.10/site-packages/torch/utils/hipify/hipify_python.py", line 826, in preprocessor
          with open(fin_path, encoding='utf-8') as fin:
      FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pip-install-2d_pvv7e/mamba-ssm_19b07ba0f4b54a60b6feb761a9d6d942/csrc/selective_scan/selective_scan.cpp'
      
      
      torch.__version__  = 2.3.1+rocm6.0

@lfb-julien
Copy link

lfb-julien commented Jun 20, 2024

Same problem . on docker host with rocm 6.0 with your patch or 6.1

pip install mamba-ssm:

  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [19 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-y3jhawqw/mamba-ssm_1617e4dcea5044fabfc486e5325fed98/setup.py", line 239, in <module>
          CUDAExtension(
        File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1098, in CUDAExtension
          hipify_result = hipify_python.hipify(
        File "/usr/local/lib/python3.10/dist-packages/torch/utils/hipify/hipify_python.py", line 1150, in hipify
          preprocess_file_and_save_result(output_directory, filepath, all_files, header_include_dirs,
        File "/usr/local/lib/python3.10/dist-packages/torch/utils/hipify/hipify_python.py", line 206, in preprocess_file_and_save_result
          result = preprocessor(output_directory, filepath, all_files, header_include_dirs, stats,
        File "/usr/local/lib/python3.10/dist-packages/torch/utils/hipify/hipify_python.py", line 826, in preprocessor
          with open(fin_path, encoding='utf-8') as fin:
      FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pip-install-y3jhawqw/mamba-ssm_1617e4dcea5044fabfc486e5325fed98/csrc/selective_scan/selective_scan.cpp'
      
      
      torch.__version__  = 2.5.0.dev20240620+rocm6.1
      
      
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

With a compilation i've error

@ajassani
Copy link
Contributor

ajassani commented Jun 21, 2024

We don't have support for direct pip installation yet. Can you try building from source:
git clone https://github.com/state-spaces/mamba.git
cd mamba
pip install .
Let me know if that works.
Thanks!

@gabeweisz
Copy link
Contributor

Instead of checking out, you can also run:
pip install git+https://github.com/state-spaces/mamba.git

To check out, build, and install in one step

@eliranwong
Copy link
Author

pip install git+https://github.com/state-spaces/mamba.git

Tried, but unsuccessful, errors:

      In file included from /opt/rocm-6.0.2/include/hipcub/backend/rocprim/hipcub.hpp:40:
      /opt/rocm-6.0.2/include/hipcub/backend/rocprim/block/block_load.hpp:134:20: error: no member named 'load' in 'rocprim::block_load<unsigned long, 32, 1, rocprim::block_load_method::block_load_warp_transpose>'
              base_type::load(block_iter, items, valid_items, temp_storage_);
                         ^
      /home/ubuntu/mamba/mamba/csrc/selective_scan/selective_scan_common_hip.h:187:56: note: in instantiation of function template specialization 'hipcub::BlockLoad<unsigned long, 32, 1, hipcub::BLOCK_LOAD_WARP_TRANSPOSE>::Load<unsigned long *>' requested here
              typename Ktraits::BlockLoadVecT(smem_load_vec).Load(
                                                             ^
      /home/ubuntu/mamba/mamba/csrc/selective_scan/selective_scan_bwd_kernel_hip.cuh:159:9: note: in instantiation of function template specialization 'load_input<Selective_Scan_bwd_kernel_traits<32, 4, true, true, true, true, true, c10::BFloat16, c10::complex<float>>>' requested here
              load_input<Ktraits>(u, u_vals, smem_load, params.seqlen - chunk * kChunkSize);
              ^
      /home/ubuntu/mamba/mamba/csrc/selective_scan/selective_scan_bwd_kernel_hip.cuh:513:40: note: in instantiation of function template specialization 'selective_scan_bwd_kernel<Selective_Scan_bwd_kernel_traits<32, 4, true, true, true, true, true, c10::BFloat16, c10::complex<float>>>' requested here
                              auto kernel = &selective_scan_bwd_kernel<Ktraits>;
                                             ^
      /home/ubuntu/mamba/mamba/csrc/selective_scan/selective_scan_bwd_kernel_hip.cuh:548:13: note: in instantiation of function template specialization 'selective_scan_bwd_launch<32, 4, c10::BFloat16, c10::complex<float>>' requested here
                  selective_scan_bwd_launch<32, 4, input_t, weight_t>(params, stream);
                  ^
      fatal error: too many errors emitted, stopping now [-ferror-limit=]
      1 warning and 20 errors generated when compiling for host.
      error: command '/opt/rocm-6.0.2/bin/hipcc' failed with exit code 1
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for mamba-ssm
  Running setup.py clean for mamba-ssm
Failed to build mamba-ssm
Installing collected packages: ninja, urllib3, triton, tqdm, safetensors, regex, pyyaml, idna, einops, charset-normalizer, certifi, requests, huggingface-hub, tokenizers, transformers, mamba-ssm
  Running setup.py install for mamba-ssm ... \

@gabeweisz
Copy link
Contributor

I ran this successfully on the rocm/pytorch:latest docker image. Can you try?

@ajassani
Copy link
Contributor

@eliranwong
We have reproduced this issue and are working to fix it.
Thanks for reporting it!

@ajassani
Copy link
Contributor

@eliranwong
There is some bug related to the warp size of Radeon in one of the rocm libraries. We are working to fix that.
For now, we have a temporary fix in which we compile the same kernel launch parameters for both Instinct and Radeon. The performance hit is negligible.
Here's the branch for the fix on our repo: https://github.com/rocm-port/mamba-rocm/tree/radeon_tempfix

@eliranwong
Copy link
Author

Your work and updates are much appreciated. Thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants