Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Horovod 0.28.1 incompatibility with PyTorch 2.1.0 #3996

Open
rithwik-db opened this issue Oct 25, 2023 · 2 comments · May be fixed by #3998
Open

Horovod 0.28.1 incompatibility with PyTorch 2.1.0 #3996

rithwik-db opened this issue Oct 25, 2023 · 2 comments · May be fixed by #3998
Labels

Comments

@rithwik-db
Copy link

Environment:

  1. Framework: PyTorch
  2. Framework version: 2.1.0
  3. Horovod version: 0.28.1

Checklist:

  1. Did you search issues to find if somebody asked this question before? Yes
  2. If your question is about docker, did you read this doc? Yes
  3. Did you check if you question is answered in the troubleshooting guide? Yes

Bug report:

I created the following Dockerfile to test out whether the latest Horovod version (0.28.1) is compatible with the latest PyTorch version (2.1.0):

# Use the official Python image as the base image
FROM python:3.10-slim-buster

# Update the system and install necessary libraries
RUN apt-get update && apt-get install -y \
    build-essential \
    cmake \
    git \
    curl \
    ca-certificates \
    libjpeg-dev \
    libpng-dev && \
    rm -rf /var/lib/apt/lists/*

# Install PyTorch
RUN pip install torch==2.1.0

# Install Horovod
RUN HOROVOD_WITH_PYTORCH=1 pip install horovod==0.28.1

And it returned various errors that I am unable to decipher. Here is the full build trace: error.txt

@stricklandye
Copy link

stricklandye commented Apr 25, 2024

Hi there. I also encountered similiar issue. It seems there are some compatibility issues between torch and horovod. I have tried to use latest version torch (2.3.0 or 2.2.x or 2.1.x) but I got same error info like:

      subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--config', 'RelWithDebInfo', '--', '-j8', 'VERBOSE=1']' returned non-zero exit status 2.

A exeception is torch==2.0.1, I can install horovod with torch 2.0.1. Any suggestions ?

@matbun
Copy link

matbun commented May 7, 2024

Issue persists as #3998 is still unmerged. However, as a temporary quick fix, you can run:

pip install --no-cache-dir git+https://github.com/thomas-bouvier/horovod.git@compile-cpp17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

3 participants