Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PEP517 compatible build backend #3991

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
5 changes: 2 additions & 3 deletions .github/gen-workflow-ci.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,9 +191,8 @@ def jobs(*jobs: str) -> str:
' HOROVOD_WITHOUT_MPI: 1\n' \
' run: |\n' \
' python -m pip install --upgrade pip\n' \
' python -m pip install setuptools wheel\n' \
' python setup.py sdist\n' \
' pip -v install dist/horovod-*.tar.gz\n' \
' pip -v install --use-pep517 dist/horovod-*.tar.gz\n' \
'\n' + \
'\n'.join(jobs)

Expand Down Expand Up @@ -480,7 +479,7 @@ def build_and_test_macos(id: str, name: str, needs: List[str], attempts: int = 3
f' if [[ ${{TENSORFLOW}} == 1.* ]] || [[ ${{TENSORFLOW}} == 2.[012345].* ]]; then pip install "h5py<3" "protobuf~=3.20"; fi\n'
f' pip install torch==${{PYTORCH}} pytorch_lightning==${{PYTORCH_LIGHTNING}} torchvision==${{TORCHVISION}}\n'
f' pip install mxnet==${{MXNET}}\n'
f' HOROVOD_WITH_TENSORFLOW=1 HOROVOD_WITH_PYTORCH=1 HOROVOD_WITH_MXNET=1 pip install --no-cache-dir .[test]\n'
f' HOROVOD_WITH_TENSORFLOW=${{TENSORFLOW}} HOROVOD_WITH_PYTORCH=${{PYTORCH}} HOROVOD_WITH_MXNET=${{MXNET}} pip install --no-cache-dir --use-pep517 .[test]\n'
f' horovodrun --check-build\n'
f'\n' +
'\n'.join([f' - name: Test [attempt {attempt} of {attempts}]\n'
Expand Down
5 changes: 2 additions & 3 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -69,9 +69,8 @@ jobs:
HOROVOD_WITHOUT_MPI: 1
run: |
python -m pip install --upgrade pip
python -m pip install setuptools wheel
python setup.py sdist
pip -v install dist/horovod-*.tar.gz
pip -v install --use-pep517 dist/horovod-*.tar.gz

init-workflow:
name: "Init Workflow"
Expand Down Expand Up @@ -4499,7 +4498,7 @@ jobs:
if [[ ${TENSORFLOW} == 1.* ]] || [[ ${TENSORFLOW} == 2.[012345].* ]]; then pip install "h5py<3" "protobuf~=3.20"; fi
pip install torch==${PYTORCH} pytorch_lightning==${PYTORCH_LIGHTNING} torchvision==${TORCHVISION}
pip install mxnet==${MXNET}
HOROVOD_WITH_TENSORFLOW=1 HOROVOD_WITH_PYTORCH=1 HOROVOD_WITH_MXNET=1 pip install --no-cache-dir .[test]
HOROVOD_WITH_TENSORFLOW=${TENSORFLOW} HOROVOD_WITH_PYTORCH=${PYTORCH} HOROVOD_WITH_MXNET=${MXNET} pip install --no-cache-dir --use-pep517 .[test]
horovodrun --check-build

- name: Test [attempt 1 of 3]
Expand Down
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

### Fixed

## [0.29.0] - 2022-10-05

### Changed
- Installation environment variables to enable a PEP517 compliant build process. ([#3991](https://github.com/horovod/horovod/pull/3991)

## [v0.28.1] - 2023-06-12

Expand Down
2 changes: 1 addition & 1 deletion Dockerfile.test.cpu
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ RUN if [[ ${MPI_KIND} == "ONECCL" ]]; then \
fi; \
cd /horovod && \
python setup.py sdist && \
bash -c "${HOROVOD_BUILD_FLAGS} HOROVOD_WITH_TENSORFLOW=1 HOROVOD_WITH_PYTORCH=1 HOROVOD_WITH_MXNET=1 pip install --no-cache-dir -v $(ls /horovod/dist/horovod-*.tar.gz)[spark,ray]"
Copy link
Collaborator

@EnricoMi EnricoMi Dec 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a breaking change, as Horovod can still be installed via the old HOROVOD_WITH_*=1 vars using --no-build-isolation, right?

HOROVOD_WITH_TENSORFLOW=1 HOROVOD_WITH_PYTORCH=1 HOROVOD_WITH_MXNET=1 pip install --no-build-isolation ...

Can we somehow imply the --no-build-isolation when those HOROVOD_WITH_* vars are 1? Otherwise this may be considered a breaking change...

bash -c "${HOROVOD_BUILD_FLAGS} HOROVOD_WITH_TENSORFLOW=1 HOROVOD_WITH_PYTORCH=1 HOROVOD_WITH_MXNET=1 pip install --no-cache-dir --use-pep517 --no-build-isolation -v $(ls /horovod/dist/horovod-*.tar.gz)[spark,ray]"

# Show the effective python package version to easily spot version differences
RUN pip freeze | sort
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile.test.gpu
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,7 @@ RUN if [[ ${MXNET_PACKAGE} == "mxnet-nightly-cu"* ]]; then \
RUN cd /horovod && \
python setup.py sdist && \
ldconfig /usr/local/cuda/targets/x86_64-linux/lib/stubs && \
bash -c "${HOROVOD_BUILD_FLAGS} HOROVOD_WITH_TENSORFLOW=1 HOROVOD_WITH_PYTORCH=1 HOROVOD_WITH_MXNET=1 pip install --no-cache-dir -v $(ls /horovod/dist/horovod-*.tar.gz)[spark,ray]" && \
bash -c "${HOROVOD_BUILD_FLAGS} HOROVOD_WITH_TENSORFLOW=1 HOROVOD_WITH_PYTORCH=1 HOROVOD_WITH_MXNET=1 pip install --no-cache-dir --use-pep517 --no-build-isolation -v $(ls /horovod/dist/horovod-*.tar.gz)[spark,ray]" && \
ldconfig

# Show the effective python package version to easily spot version differences
Expand Down
2 changes: 1 addition & 1 deletion Jenkinsfile.ppc64le
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ pipeline {
. ${CONDA_INIT}
conda activate ${CONDA_ENV}
set -xe
HOROVOD_WITHOUT_MXNET=1 HOROVOD_WITHOUT_GLOO=1 HOROVOD_WITH_PYTORCH=1 HOROVOD_WITH_TENSORFLOW=1 \
HOROVOD_WITHOUT_MXNET=1 HOROVOD_WITHOUT_GLOO=1 HOROVOD_WITH_PYTORCH=1.9.1 HOROVOD_WITH_TENSORFLOW=2.6.0 \
HOROVOD_CUDA_HOME="/usr/local/cuda" HOROVOD_GPU_OPERATIONS=NCCL \
pip install -v . --no-cache-dir --no-deps
'''
Expand Down
2 changes: 2 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ recursive-include * *.h *.hpp *.cc *.cu *.md *.cmake CMakeLists.txt

include LICENSE horovod.lds horovod.exp CMakeLists.txt
include cmake/build_utils.py
include _custom_build/backend.py

prune .eggs

# prune eigen LGPL2
Expand Down
43 changes: 43 additions & 0 deletions _custom_build/backend.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
import os
import sys
import sysconfig
from packaging import version
from importlib import metadata
from setuptools import build_meta as _orig

prepare_metadata_for_build_wheel = _orig.__legacy__.prepare_metadata_for_build_wheel
build_wheel = _orig.__legacy__.build_wheel
build_sdist = _orig.__legacy__.build_sdist
get_requires_for_build_sdist = _orig.__legacy__.get_requires_for_build_sdist


def get_requires_for_build_wheel(self, config_settings=None):
"""
Custom backend to enable PEP517, utilises env variables to define which extra build
packages we should be installing into the isolated build env.
These should match the users expected versions installed outside the isolated environment or it will
cause library mismatch failures.
"""
new_pkgs = []
MXNET = "mxnet"
key_pkg_map = {'HOROVOD_WITH_MXNET': MXNET,
'HOROVOD_WITH_PYTORCH': 'torch',
'HOROVOD_WITH_TENSORFLOW': 'tensorflow'}
for key in key_pkg_map.keys():
try:
version_string = os.environ[key]
try:
version.Version(version_string)
new_pkgs.append(f"{key_pkg_map[key]}=={version_string}")
except version.InvalidVersion:
new_pkgs.append(f"{version_string}")
if key_pkg_map[key] == MXNET:
# MxNet has np.bool everywhere which is removed in newer
# versions...
new_pkgs.append("numpy==1.20.3")
except BaseException:
# Pass for now, elsewhere will alert the user has built this wrong.
...

return _orig.__legacy__.get_requires_for_build_wheel(
config_settings) + new_pkgs
2 changes: 1 addition & 1 deletion docker/horovod-cpu/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ RUN pip install --no-cache-dir ${PYSPARK_PACKAGE}
WORKDIR /horovod
COPY . .
RUN python setup.py sdist && \
bash -c "HOROVOD_WITH_TENSORFLOW=1 HOROVOD_WITH_PYTORCH=1 HOROVOD_WITH_MXNET=1 pip install --no-cache-dir -v $(ls /horovod/dist/horovod-*.tar.gz)[spark,ray]" && \
bash -c "HOROVOD_WITH_TENSORFLOW=${TENSORFLOW_VERSION} HOROVOD_WITH_PYTORCH=${PYTORCH_VERSION} HOROVOD_WITH_MXNET=${MXNET_VERSION} pip install --no-cache-dir --use-pep517 -v $(ls /horovod/dist/horovod-*.tar.gz)[spark,ray]" && \
horovodrun --check-build

# Check all frameworks are working correctly
Expand Down
2 changes: 1 addition & 1 deletion docker/horovod-nvtabular/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,7 @@ RUN if [[ ${MXNET_PACKAGE} == "mxnet-nightly-cu"* ]]; then \
RUN cd /horovod && \
python setup.py sdist && \
ldconfig /usr/local/cuda/targets/x86_64-linux/lib/stubs && \
bash -c "${HOROVOD_BUILD_FLAGS} HOROVOD_WITH_TENSORFLOW=1 HOROVOD_WITH_PYTORCH=1 HOROVOD_WITH_MXNET=1 pip install --no-cache-dir -v $(ls /horovod/dist/horovod-*.tar.gz)[spark,ray]" && \
bash -c "${HOROVOD_BUILD_FLAGS} HOROVOD_WITH_TENSORFLOW=${TENSORFLOW_VERSION} HOROVOD_WITH_PYTORCH=${PYTORCH_VERSION} HOROVOD_WITH_MXNET=${MXNET_VERSION} pip install --no-cache-dir --use-pep517 -v $(ls /horovod/dist/horovod-*.tar.gz)[spark,ray]" && \
ldconfig

# Show the effective python package version to easily spot version differences
Expand Down
2 changes: 1 addition & 1 deletion docker/horovod-ray/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ WORKDIR /horovod
COPY --chown=ray:users . .
RUN python setup.py sdist && \
sudo ldconfig /usr/local/cuda/targets/x86_64-linux/lib/stubs && \
HOROVOD_GPU_OPERATIONS=NCCL HOROVOD_WITH_TENSORFLOW=1 HOROVOD_WITH_PYTORCH=1 pip install --no-cache-dir -v $(ls /horovod/dist/horovod-*.tar.gz)[ray] && \
HOROVOD_GPU_OPERATIONS=NCCL HOROVOD_WITH_TENSORFLOW=${TENSORFLOW_VERSION} HOROVOD_WITH_PYTORCH=${PYTORCH_VERSION} pip install --no-cache-dir --use-pep517 -v $(ls /horovod/dist/horovod-*.tar.gz)[ray] && \
horovodrun --check-build && \
sudo ldconfig

Expand Down
2 changes: 1 addition & 1 deletion docker/horovod/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ WORKDIR /horovod
COPY . .
RUN python setup.py sdist && \
ldconfig /usr/local/cuda/targets/x86_64-linux/lib/stubs && \
bash -c "HOROVOD_GPU_OPERATIONS=NCCL HOROVOD_WITH_TENSORFLOW=1 HOROVOD_WITH_PYTORCH=1 HOROVOD_WITH_MXNET=1 pip install --no-cache-dir -v $(ls /horovod/dist/horovod-*.tar.gz)[spark,ray]" && \
bash -c "HOROVOD_GPU_OPERATIONS=NCCL HOROVOD_WITH_TENSORFLOW=${TENSORFLOW_VERSION} HOROVOD_WITH_PYTORCH=${PYTORCH_VERSION} HOROVOD_WITH_MXNET=${MXNET_VERSION} pip install --no-cache-dir --use-pep517 -v $(ls /horovod/dist/horovod-*.tar.gz)[spark,ray]" && \
horovodrun --check-build && \
ldconfig

Expand Down
4 changes: 2 additions & 2 deletions docs/contributors.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,12 +41,12 @@ From *inside* the Horovod root directory, install Horovod in develop/editable mo

.. code-block:: bash

$ HOROVOD_WITH_PYTORCH=1 HOROVOD_WITH_TENSORFLOW=1 pip install -v -e .
$ HOROVOD_WITH_PYTORCH={YOUR_PYTORCH_VERSION} HOROVOD_WITH_TENSORFLOW={YOUR_TF_VERSION} pip install -v -e .

Set ``HOROVOD_WITHOUT_[FRAMEWORK]=1`` to disable building Horovod plugins for that framework.
This is useful when you’re testing a feature of one framework in particular and wish to save time.

Set ``HOROVOD_WITH_[FRAMEWORK]=1`` to generate an error if the Horovod plugin for that framework failed to build.
Set ``HOROVOD_WITH_[FRAMEWORK]={FRAMEWORK_VERSION}`` to generate an error if the Horovod plugin for that framework failed to build.

Set ``HOROVOD_DEBUG=1`` for a debug build with checked assertions, disabled compiler optimizations etc.

Expand Down
18 changes: 11 additions & 7 deletions docs/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ To ensure that Horovod is built with TensorFlow support enabled:

.. code-block:: bash

$ HOROVOD_WITH_TENSORFLOW=1 pip install horovod[tensorflow]
$ HOROVOD_WITH_TENSORFLOW={YOUR_TF_VERSION} pip install horovod[tensorflow]

To skip TensorFlow, set ``HOROVOD_WITHOUT_TENSORFLOW=1`` in your environment.

Expand All @@ -65,7 +65,7 @@ To ensure that Horovod is built with PyTorch support enabled:

.. code-block:: bash

$ HOROVOD_WITH_PYTORCH=1 pip install horovod[pytorch]
$ HOROVOD_WITH_PYTORCH={YOUR_PyTorch_VERSION} pip install horovod[pytorch]

To skip PyTorch, set ``HOROVOD_WITHOUT_PYTORCH=1`` in your environment.

Expand All @@ -76,7 +76,7 @@ To ensure that Horovod is built with MXNet CPU support enabled:

.. code-block:: bash

$ HOROVOD_WITH_MXNET=1 pip install horovod[mxnet]
$ HOROVOD_WITH_MXNET={YOUR_MXNet_VERSION} pip install horovod[mxnet]

Some MXNet versions do not work with Horovod:

Expand All @@ -95,7 +95,7 @@ To ensure that Horovod is built with Keras support available:

.. code-block:: bash

$ HOROVOD_WITH_TENSORFLOW=1 pip install horovod[tensorflow,keras]
$ HOROVOD_WITH_TENSORFLOW={YOUR_TF_VERSION} pip install horovod[tensorflow,keras]

There are no plugins built for Keras, but the TensorFlow plugin must be enabled in order to use Horovod with Keras.

Expand Down Expand Up @@ -227,6 +227,10 @@ Environment Variables

Optional environment variables that can be set to configure the installation process for Horovod.

Due to `PEP-517 <https://peps.python.org/pep-0517/>`_ we can't rely on any DL library being installed into
the build env, therefore we need to tell the build env specific DL library versions we require.
This isn't the prettiest solution, however it is the most pragmatic.

Possible values are given in curly brackets: {}.

* ``HOROVOD_DEBUG`` - {1}. Install a debug build of Horovod with checked assertions, disabled compiler optimizations etc.
Expand All @@ -252,11 +256,11 @@ Possible values are given in curly brackets: {}.
* ``HOROVOD_ALLOW_MIXED_GPU_IMPL`` - {1}. Allow Horovod to install with NCCL allreduce and MPI GPU allgather / broadcast / alltoall / reducescatter. Not recommended due to a possible deadlock.
* ``HOROVOD_CPU_OPERATIONS`` - {MPI, GLOO, CCL}. Framework to use for CPU tensor allreduce, allgather, and broadcast.
* ``HOROVOD_CMAKE`` - path to the CMake binary used to build Horovod.
* ``HOROVOD_WITH_TENSORFLOW`` - {1}. Require Horovod to install with TensorFlow support enabled.
* ``HOROVOD_WITH_TENSORFLOW`` - {TF pypi version}. If set require Horovod to install with specific TensorFlow version support enabled.
* ``HOROVOD_WITHOUT_TENSORFLOW`` - {1}. Skip installing TensorFlow support.
* ``HOROVOD_WITH_PYTORCH`` - {1}. Require Horovod to install with PyTorch support enabled.
* ``HOROVOD_WITH_PYTORCH`` - {PyTorch pypi version}. If set require Horovod to install with specific PyTorch version support enabled.
* ``HOROVOD_WITHOUT_PYTORCH`` - {1}. Skip installing PyTorch support.
* ``HOROVOD_WITH_MXNET`` - {1}. Require Horovod to install with MXNet support enabled.
* ``HOROVOD_WITH_MXNET`` - {MXNet pypi version}. If set require Horovod to install with specific MXNet version support enabled.
* ``HOROVOD_WITHOUT_MXNET`` - {1}. Skip installing MXNet support.

.. inclusion-marker-end-do-not-remove
2 changes: 1 addition & 1 deletion horovod/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
from horovod.runner import run

__version__ = '0.28.1'
__version__ = '0.29.0'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't bump the version, this is done during next release.

17 changes: 12 additions & 5 deletions horovod/common/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@
# limitations under the License.
# ==============================================================================


class HorovodInternalError(RuntimeError):
"""Internal error raised when a Horovod collective operation (e.g., allreduce) fails.

Expand All @@ -28,22 +27,30 @@ class HostsUpdatedInterrupt(RuntimeError):

In elastic mode, this will result in a reset event without a restore to committed state.
"""

def __init__(self, skip_sync):
self.skip_sync = skip_sync


def get_version_mismatch_message(name, version, installed_version):
def get_version_mismatch_message(name, version, installed_version, build_flag):
return f'Framework {name} installed with version {installed_version} but found version {version}.\n\
This can result in unexpected behavior including runtime errors.\n\
Reinstall Horovod using `pip install --no-cache-dir` to build with the new version.'
Reinstall Horovod using `{build_flag} pip install --no-cache-dir` to build with the new version.'


class HorovodVersionMismatchError(ImportError):
"""Internal error raised when the runtime version of a framework mismatches its version at
Horovod installation time.
"""
def __init__(self, name, version, installed_version):
super().__init__(get_version_mismatch_message(name, version, installed_version))

def __init__(self, name, version, installed_version, build_flag):
super().__init__(
get_version_mismatch_message(
name,
version,
installed_version,
build_flag))
self.name = name
self.version = version
self.installed_version = installed_version
self.build_flag = build_flag