Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Not using hardcoded test function names in the test groups #2010

Open
loomlike opened this issue Oct 6, 2023 · 5 comments
Open
Labels
enhancement New feature or request

Comments

@loomlike
Copy link
Collaborator

loomlike commented Oct 6, 2023

Description

tests/ci/azureml_tests/test_groups.py uses hardcoded test function names which is not scalable, introducing multiple points to maintain when the tests are added or changed.

Can someone help me understand what's the purpose of the test groups and reason for hardcoding the function names
instead of using a custom marker or sub-directories for the grouping?

Expected behavior with the suggested feature

Avoid hard-coded test function names in the codes.

@loomlike loomlike added the enhancement New feature or request label Oct 6, 2023
@miguelgfierro
Copy link
Collaborator

It is a way to distribute the tests. We want to make N groups that last less than 1h, so we can start N VMs and execute the tests in parallel.

@loomlike
Copy link
Collaborator Author

loomlike commented Oct 8, 2023

@miguelgfierro how much it will be different if we "round-robin" split tests into N, assuming the tests are already grouped into folders by heavy-vs-light workloads,
e.g. tests in "examples/deepdive" should be heavy, while tests in "utils/..." are generally light.

If round-robin load balancing is not very different from manual-grouping,
what we can do is, get all the tests names (ordered by test folders) using the following codes, then round-robin them into N groups and execute into N VMs (I assuming AzureML?).

    # collect pytests test functions 
    result = subprocess.run(
        [
            "pytest",
            "--collect-only",
            "--quiet",
            "--disable-warnings",
        ],
        stdout=subprocess.PIPE,
    )

    # split the collected output by lines as each line shows each test function 
    lines = result.stdout.decode("utf-8").split("\n")

    tests = []
    for line in lines:
        # some output lines are not the tests, so we only pick the lines starts with "tests"
        # e.g. in the following example, only the first three lines are actual tests  
        # """
        # tests/unit/recommenders/models/test_sar_singlenode.py::test_sar_item_similarity[3-lift-lift]
        # tests/unit/recommenders/models/test_sar_singlenode.py::test_sar_item_similarity[3-mutual information-mi]
        # tests/unit/recommenders/models/test_sar_singlenode.py::test_user_affinity
        #
        # 2 tests collected in 1.51s
        # """
        if line.startswith("tests"):
            # test arguments are shown in the suffix "[]" of the test function name.
            tests.append(line.split("[")[0])

    # remove duplicate test function names due to the parametrized tests
    unique_tests = list(set(tests))

    for t in unique_tests:
        print(t)  # this will be like: "tests/unit/recommenders/utils/test_timer.py::test_timer_format"

if we literally spin up N VMs, not AzureML run execution, there's a package called pytest-xdist that we may consider to use for parallel execution. It does have remote submit functionality but I haven't used it, and it also says ssh-based remote submission has been deprecated...

@miguelgfierro
Copy link
Collaborator

pfff I tried pytest-sdist and it didn't work very well.

What you are proposing seems ok, but it is a huge amount of work that will only changing a manual list to a programmatic solution. In addition, we recently discovered a way to get very cheap AzureML VMs #1996

Up to you man if you want to try this and compare with the current solution. However, there are several things that are not working in the repo that might be a better use of your effort: #2011

@loomlike
Copy link
Collaborator Author

loomlike commented Oct 9, 2023

Agree that we want to prioritize stuffs, and I'm okay w/ deferring this work until we find something easier alternatives.
Then, what's the suggestion if I add new tests? Should I add their names into one of the test group lists?

@miguelgfierro
Copy link
Collaborator

Yes, just add them to one of the more empty lists

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants