-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compare-fail with different criteria per test or group #219
Comments
Right now you could only do it via subsequent CLI compare commands (eg: you compare the exact set of benchmarks with the right fail check). You'd probably end up with some perhaps ugly shell script... With that in mind, the comparison is implemented in specific classes - there could be a class that implements exactly what you want, see: pytest-benchmark/src/pytest_benchmark/utils.py Lines 270 to 288 in 4a99afe
But the argument parser doesn't know any of your subclasses, there's is no clear support for hooking your custom regression check. But if you'd like to use the subclassing route you could inject that via conftest either via one of the hooks (https://pytest-benchmark.readthedocs.io/en/latest/hooks.html) that pass the benchmarksession or a builtin pytest hook. Eg: Happy to elaborate more if this looks useful. |
I could not figure out a way to use the CLI compare commands and make it report failure. Tried something like this:
I manually edited one of the means to make sure it would fail. But it only prints the stats, and exits. How can I make it fail. I think this would be a fine apprach if it would work. Edit: Also, how do I specify a group or test name to compare? |
|
py.test-benchmark: error: unrecognized arguments: --fail=mean:1% |
Oh damn, look like I haven't implemented that in the standalone CLI yet... |
I guess patching stuff is the only route right now.... ooor I implement something for your usecase. Give me more details, I want to understand it more. What do you mean some tests are "more stable"? |
There are a few tests that can vary quite a bit in how long time it takes to execute. For those tests, I want to specify e.g. 50% variance in the mean to cause a failure. For other tests they typically don't vary more than 15%, so I want to report a failure if they go above this. Another case is that I would like to be able to update the baseline for these tests independently, so that I do not have to update everything when only some tests are failing (or after having improved one piece of code) I have a plan to make it work, by separating the runs of the different tests and using a separate mechanism to share my generated data. Then specifying a different storage name for each set of files, hoping pytest-benchmark will then compare against the correct filename based on the supplied --benchmark-save parameter. The tests are running, so we'll see if it works. If you would have implemented something for my usecase directly, I guess it would be to be able to specify the fail criteria in a decorator per test, perhaps. Or alternatively add several fail criteria, where one could specify group and/or test name. But if my current approach works, I think it might be the preferred one for me specifically, although I can see that might become too cumbersome for some setups. Another good approach would be if the CLI compare would support both filtering tests based on names/groups, and also to allow fail checks to be made. |
I would like to give this a bump. The performance of my tests varies so greatly that without it I will only be able to use the standard deviation as a comparison target but won't be able to set hard limits per application feature. See gh/cobbler/cobbler for the project if more context is needed (inside Click me
|
I have a setup where I am using --benchmark-compare-fail=mean:15%
I would like to specify a different percentage for each group of tests, because some tests have a varying performance, while others are more stable. Is it possible to specify this in code per test?
I want to run all tests in one go, as all test share a large run-time generated data-set and generating this takes most of the time of running the tests. Otherwise I could have just separated the tests in different runs, with different arguments.
The text was updated successfully, but these errors were encountered: