Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEST-#5903: Add third party unit tests for interoperability #5906

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

labanyamukhopadhyay
Copy link
Contributor

@labanyamukhopadhyay labanyamukhopadhyay commented Mar 30, 2023

What do these changes do?

Adds unit tests from third party libraries - Matplotlib, Plotly, Scikit-learn, and XGBoost to test interoperability with Modin dataframe.

  • first commit message and PR title follow format outlined here

    NOTE: If you edit the PR title to match this format, you need to add another commit (even if it's empty) or amend your last commit for the CI job that checks the PR title to pick up the new PR title.

  • passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • signed commit with git commit -s
  • Resolves Add interoperability tests for third party libraries #5903?
  • tests added and passing
  • module layout described at docs/development/architecture.rst is up-to-date

@labanyamukhopadhyay labanyamukhopadhyay requested a review from a team as a code owner March 30, 2023 17:13
Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.

Signed-off-by: Labanya Mukhopadhyay <[email protected]>
@anmyachev
Copy link
Collaborator

@labanyamukhopadhyay thanks for the contribution!

Could you tell a little more about these tests. Do I understand correctly that these are adapted tests from the libraries themselves, originally written for pandas? Are there any plans to add these tests to the appropriate libraries in the future?

In general, what is the support plan? :)

@labanyamukhopadhyay
Copy link
Contributor Author

@anmyachev Yes these are unit tests from the third party libraries that were adapted for Modin dataframes instead of pandas! So these are already in the respective libraries (for ex https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/tests/test_axes.py).

As Modin becomes more interoperable, I believe we can add to the tests/ remove the pytest skip decorators accordingly. But the currently passing tests should all continue to pass for future Modin versions as a sanity check.

@anmyachev
Copy link
Collaborator

@labanyamukhopadhyay thanks for the answer.

@modin-project/modin-core any objection to do this?

@dchigarev
Copy link
Collaborator

So these are already in the respective libraries (for ex https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/tests/test_axes.py).

@labanyamukhopadhyay, do I understand correctly that the tests were just copied from those libraries? Or there were some changes being made in order to adapt them to modin?

I they were no changes made to the tests, then maybe there's no much sense to duplicate the tests in the modin's repo? We could have run those tests directly from the third-party test suites if we would need to.

@labanyamukhopadhyay
Copy link
Contributor Author

@dchigarev The tests were changed slightly to be used with Modin dataframes instead of pandas so they are not exactly the same as the ones in the libraries' test suites.

Signed-off-by: Labanya Mukhopadhyay <[email protected]>
Signed-off-by: Labanya Mukhopadhyay <[email protected]>
@dchigarev
Copy link
Collaborator

So I'm wondering whether we should make this a separate repository (maybe a sub-repo of the modin itself) as the tests seem not to be related to the modin internals but rather to the libraries that take modin dataframes.

Also, what's the estimated time to run them all? I'm wondering whether our current CI is capable to do it.

cc @modin-project/modin-core , need your opinions

@dchigarev
Copy link
Collaborator

I'm also wondering who is going to be responsible for maintaining the tests (if they have to be updated due to dependencies or functionality changes or if they suddenly started to fail for unknown reasons, who will be in charge to investigate this?).

If this will be a separate repo it could have its own list of maintainers that may include some devs from modin side as well as other interested parties (maybe developers of those third-party libraries).

@anmyachev
Copy link
Collaborator

So I'm wondering whether we should make this a separate repository (maybe a sub-repo of the modin itself) as the tests seem not to be related to the modin internals but rather to the libraries that take modin dataframes.

Also, what's the estimated time to run them all? I'm wondering whether our current CI is capable to do it.

cc @modin-project/modin-core , need your opinions

+1 for creating sub-repo

@anmyachev
Copy link
Collaborator

@modin-project/modin-core Is there any objection if I start creating a separate repository for such tests? (under modin-project)

@vnlitvinov
Copy link
Collaborator

I like the idea of a separate repo. We can take https://github.com/data-apis/dataframe-interchange-tests as a reference BTW.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add interoperability tests for third party libraries
4 participants