🐌 Slow tests #603

blaginin · 2023-05-08T16:38:32Z

TIA Toolbox version: 1.4.0
Python version: 3.10.8
Operating System: Mac OS 13.3.1 (arm)

Description

Tests in the toolbox are extremely slow. Even with CI=True, all the checks take almost half an hour. Non-CI mode takes even more time. This makes an interactive development approach, with running tests after small changes, impossible. This results in writing bad test-related code.

The main reasons why tests are slow are static files. Each test run is related to downloading and then deleting gigabytes of models' weights and test images. However, almost always, those files remain unchanged after the tests' execution.

Not only is this slow, but it also triggers a lot of false positive results. For example, that's the typical result of my local run:

Todo

The text was updated successfully, but these errors were encountered:

blaginin · 2023-05-08T16:40:29Z

Of course, we want tests' reproducibility and don't want our code to fail because some code before has modified our assets. But maybe we can consider more user-friendly approaches? I have those options in mind:

Compare a file's local last change time with the remote one and download only if it's greater. This is done by If-Range / Last-Modified headers, which are already supported by tiatoolbox.dcs.warwick.ac.uk server.
Store files checksum on the server side and download a file only if its checksum is different
Copy files from the local cache folder into the tests' sandbox each time tests are run

blaginin · 2023-05-08T16:45:14Z

@John-P @shaneahmed, maybe you have other ideas?

John-P · 2023-05-12T09:28:25Z

It looks like there are a few issues to address here:

The remote samples fixture is set to be scoped for the session, but it appears to be using a separate tmp path per test.
Large files could avoid being downloaded between runs if the HTTP modified header (assuming the server is sending it correctly) is < date modified for the local copy.
Downloads can be verified with a checksum which is not currently done.

shaneahmed · 2023-05-12T11:59:11Z

@blaginin As John mentioned downloads should be for a session. Please let us know if this is not the case. Feel free to create a PR to fix this.

This addresses part of issue #603 by improving model weights and datasets in tests. Previously, weights for each model were downloaded to `f"{tmp_path}/weights.pth"` and replaced each other. The only available dataset, Kather, was downloaded into `TIATOOLBOX_HOME`, but the entire home directory was deleted during test execution. This approach has several drawbacks: - Tests are extremely slow. On every test run, it downloads ~8 GB of weight files which never change. - Tests cannot be run in parallel as they depend on the same `f"{tmp_path}/weights.pth"` file. - Tests heavily load TIA's infrastructure. For each test run, the server has to transfer almost 10 GB of static assets. This might work while the number of contributors is small, but what happens when it increases? - Tests were inconsistent with their models caching. Why, for example, PatchPredictor cached their pretrained_models, but UNetModel did not? Now, all weights are downloaded to TIATOOLBOX_HOME and cached between runs. Storing more data in memory (+8 GB) makes tests much faster, especially if you're not in Coventry or have slow internet. In rare cases where you specifically need to test behaviour without cache, you can set TIATOOLBOX_HOME to a temporary directory.

Addresses part of issue #603 and improves fixtures in tests: - Some tests were using `_fetch_remote_sample` even though tests have a special `remote_sample` wrapper. - `remote_sample` previously created a new folder for each run (data-0, data-1, data-2, and so on). Now it uses just one folder for all remote fixtures during a test session. - To prevent duplicated downloads, I added download locks to the fixtures. - Combined `_fetch_remote_sample` and `download_data` since they had the same logic. --------- Co-authored-by: Shan E Ahmed Raza <[email protected]>

shaneahmed assigned John-P and shaneahmed May 12, 2023

John-P added enhancement New feature or request dev tools Changes/Updates in Development tools labels May 12, 2023

shaneahmed added this to the Release v1.5.0 milestone May 12, 2023

blaginin self-assigned this May 26, 2023

This was referenced Jul 16, 2023

✅ Upgrade remote sample fixtures #640

Merged

✅ Reuse models and datasets in tests #641

Merged

shaneahmed modified the milestones: Release v1.5.0, Release v1.6.0 Dec 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐌 Slow tests #603

🐌 Slow tests #603

blaginin commented May 8, 2023 •

edited by shaneahmed

blaginin commented May 8, 2023 •

edited

blaginin commented May 8, 2023

John-P commented May 12, 2023

shaneahmed commented May 12, 2023

🐌 Slow tests #603

🐌 Slow tests #603

Comments

blaginin commented May 8, 2023 • edited by shaneahmed

Description

Todo

blaginin commented May 8, 2023 • edited

blaginin commented May 8, 2023

John-P commented May 12, 2023

shaneahmed commented May 12, 2023

blaginin commented May 8, 2023 •

edited by shaneahmed

blaginin commented May 8, 2023 •

edited