Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support newer versions of Pyarrow in Beam. #31305

Merged
merged 3 commits into from
May 16, 2024
Merged

Conversation

tvalentyn
Copy link
Contributor

@tvalentyn tvalentyn commented May 15, 2024

We need to upgrade pyarrow for some unit test to pass on Python 3.12 (#29149)

…e compat suites for pyarrow to reduce test suite runtime.
Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @jrmccluskey for label python.
R: @Abacn for label build.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@tvalentyn
Copy link
Contributor Author

Postcommit dependency suite might time out on this PR, but will watch the signal on https://github.com/apache/beam/actions/runs/9101961896, which runs against a branch on main repo, and should reflect the yml change. I inspected the logs manually on https://github.com/apache/beam/actions/runs/9100383893/job/25015301762?pr=31305 and pyarrow portion of tests succeded, the suite timed out after 120 min

@tvalentyn
Copy link
Contributor Author

toxTask "testPy38pyarrow-4", "py38-pyarrow-4", "${posargs}"
test.dependsOn "testPy38pyarrow-4"
postCommitPyDep.dependsOn "testPy38pyarrow-4"
toxTask "testPy38pyarrow-9", "py38-pyarrow-9", "${posargs}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to test against every minor version? Could we get away with something like testing against the oldest and newest versions we support? I'm unfamiliar with how much pyarrow changes between minor releases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we look at compatibility between beam and pyarrow, I think the value of testing each individual pyarrow version diminishes overtime. it might be useful to do a one-time test when doing a large upgrade like this one, then test newer versions as they are released. Note that the "newest supported" version is also tested in regular precommit suites.

A more thorough test combination might be warranted if we are worried about interoperability of some dependencies, like pandas and pyarrow. This might be why Beam has added special compat testing for these two dependencies.

Copy link
Contributor

@jrmccluskey jrmccluskey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM mod the clarifying question

@tvalentyn tvalentyn merged commit 126d922 into apache:master May 16, 2024
166 of 170 checks passed
@tvalentyn tvalentyn deleted the pyarrow branch May 16, 2024 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants