New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PEP517 compatible build backend #3991
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Joe Yearsley <[email protected]>
Signed-off-by: Joe Yearsley <[email protected]>
Signed-off-by: Joe Yearsley <[email protected]>
Update version Signed-off-by: Joe Yearsley <[email protected]>
Signed-off-by: Joe Yearsley <[email protected]>
Signed-off-by: Joe Yearsley <[email protected]>
Note the outstanding error seems to be a CI problem:
|
Is horovod maintained anymore? |
It's sad to see no reaction from horovod developers to this crucial PR. I just tried to install horovod via poetry directly from this PR:
but it fails with the following error:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome! Few comments.
@@ -1,3 +1,3 @@ | |||
from horovod.runner import run | |||
|
|||
__version__ = '0.28.1' | |||
__version__ = '0.29.0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't bump the version, this is done during next release.
@@ -236,7 +236,7 @@ RUN if [[ ${MPI_KIND} == "ONECCL" ]]; then \ | |||
fi; \ | |||
cd /horovod && \ | |||
python setup.py sdist && \ | |||
bash -c "${HOROVOD_BUILD_FLAGS} HOROVOD_WITH_TENSORFLOW=1 HOROVOD_WITH_PYTORCH=1 HOROVOD_WITH_MXNET=1 pip install --no-cache-dir -v $(ls /horovod/dist/horovod-*.tar.gz)[spark,ray]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a breaking change, as Horovod can still be installed via the old HOROVOD_WITH_*=1 vars using --no-build-isolation
, right?
HOROVOD_WITH_TENSORFLOW=1 HOROVOD_WITH_PYTORCH=1 HOROVOD_WITH_MXNET=1 pip install --no-build-isolation ...
Can we somehow imply the --no-build-isolation
when those HOROVOD_WITH_* vars are 1
? Otherwise this may be considered a breaking change...
@@ -221,11 +234,13 @@ def get_average_backwards_compatibility_fun(reduce_ops): | |||
def impl(op, average): | |||
if op is not None: | |||
if average is not None: | |||
raise ValueError('The op parameter supersedes average. Please provide only one of them.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we move these reformatting changes into a separate PR?
@_cache | ||
def ccl_built(verbose=False): | ||
for ext_base_name in EXTENSIONS: | ||
built_fn = lambda ext: ext.ccl_built() | ||
def built_fn(ext): return ext.ccl_built() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no functional change, just syntax, right?
@@ -26,12 +26,14 @@ | |||
import textwrap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't setup.py
be removed, or is this for backward-compatibility?
Unit Test Results 223 files - 501 223 suites - 501 1h 50m 57s ⏱️ - 6h 15m 10s Results for commit 7c6a96a. ± Comparison against base commit 9f88e1d. This pull request removes 82 tests.
This pull request skips 95 tests.
♻️ This comment has been updated with latest results. |
Unit Test Results (with flaky tests) 223 files - 665 223 suites - 665 1h 50m 57s ⏱️ - 7h 7m 44s Results for commit 7c6a96a. ± Comparison against base commit 9f88e1d. This pull request removes 82 tests.
This pull request skips 95 tests.
♻️ This comment has been updated with latest results. |
d631681
to
4b2c05a
Compare
Signed-off-by: Enrico Minack <[email protected]>
Signed-off-by: Enrico Minack <[email protected]>
Signed-off-by: Enrico Minack <[email protected]>
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Checklist before submitting
Description
Fixes #3697
The build process has yet to comply with PEP517, causing issues for users utilising Poetry and soon more build systems.
This MR alters some pre-existing environment variables and necessary CMake instructions when building Horovod to make a complaint build system.
An example local install now, the tar can be replaced with pypi post PR:
Since PEP517 build every requirement in isolation Horovod now doesn't know which versions of TF/PyTorch are already installed. It can't see which libs to build against.
We can set any version in
build_requires
as many users have many versions. For example, using TF 2.13.1 in thebuild_requires
would cause linking issues to be reported when using in a venv with TF 2.8.4.To enable an isolated build whilst also respecting the users' environment, I've utilised
HOROVOD_WITH_{MXNET|PYTORCH|TENSORFLOW}
to specify versions for which the isolated build environment should install such that Horovod can now be built against the correct library specification in isolation.Whilst also working when it is moved out of the isolated build env to its final non-isolated location.
I've updated key error messages to alert users when they might not be using the correct env var flags.
Review process to land