-
Notifications
You must be signed in to change notification settings - Fork 575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ideas for some strategy optimizations #3932
Comments
Those discards are actually mostly strategy-independent, and come from our logic of capping the size of early inputs. In our first roughly 10% of inputs, if we generate something too large, we'll throw it away. This shows up as overruns because we stop it early by setting a low max_length of the ConjectureData:
I don't know whether this behavior is ideal, but thankfully it at least won't compound in the way you were afraid of 😅. Some reasoning behind the change here: #2219.
We actually already try to enumerate all the time! More specifically, we track what inputs we previously generated, and avoid generating them again. If we exhaust this choice tree (called This deduplication used to happen at the bitstream level, but now happens at the ir level — one of the big selling points of the ir (#3921). ex: [True, 2, True, 5, False] represents the list [2, 5] from This deduplication is better than it used to be, because while neither bitstream ↦ input nor ir ↦ input is injective, the latter is much more injective than the former. But we're still in the midst of the ir migration and haven't seen the full benefits yet. My intuition is It's great to have an example of a strategy, Here's a neat illustration of our duplication tracking: from collections import Counter
seen = []
@given(st.integers(0, 50))
@settings(database=None)
def f(n):
seen.append(n)
f()
print(len(set(seen)))
print(Counter(seen))
My guess is |
More generally, thanks for the report, and thanks Liam for covering what I'd say! I suspect that a bunch of this will go away naturally by the time we finish migrating to the IR, but we should definitely track such issues to make sure we fix them. |
Thanks for all of that context, both of you! One thing I want to clarify: when I say "enumeration" I mean something slightly different from "remembering what we've generated and not generating that again". If you know ahead of time that you'll want every possible value from a strategy, you should be able to (modulo some implementation details) just list them off, one by one, without making random choices at all. This is much more efficient and effective than generating randomly and remembering previously generated values: if you had 1000 possible values, it'd take 1000 samples to enumerate all of them but ~7000 in expectation if you were generating randomly (assuming a uniform distribution, which Hypothesis doesn't provide). Does that make sense? Or did I misunderstand what you were getting at? Based on what you're both saying, it seems like the |
This makes sense! Viewed through this lens, Hypothesis only sort of enumerates. When generating a new value ( We could potentially take
That would be my analysis 👍. And please do keep submitting reports of strategy inefficiencies, either in discarding/overruns or in duplicates! Some of them probably would have been improved in time anyway, but I suspect there are a fair number of strategies which will require manual tweaking even after the ir. Either because they were always inefficient and nobody noticed, or we intentionally chose an inefficient generation for e.g. good shrinking behavior which has since been alleviated by the ir. |
Yeah, I think I would argue that if we can conservatively conclude that exhaustive enumeration would require fewer inputs than our limit, we should just fall back to that — I don't see a downside as long as that estimate is accurate. |
An update: I expect lots of other strategies are similarly over-duplicated due to this, including vanilla |
I've been collecting feedback using Tyche lately, and my users have pointed out a few built-in strategies that seem like they may present opportunities for optimization.
Lists
The
lists
strategy tends to give up fairly frequently, at least relative to what seems normal for a totally vanilla strategy. Runningreliably results in around 5% discards, which isn't a ton, but that inefficiency propagates to everything else that uses
lists
(which is a huge proportion of strategies used in practice, I'd wager). If it's possible to fix, I think it's probably worth it.Timezone Keys
The timezone keys strategy produces a lot of duplicates, especially when
allow_prefix=False
.produces around 30% duplicates. Not sure if this one is super fixable, since I think there are only 1024 keys, but it still seems like we could do better.
This suggests a potentially larger design conversation—if someone runs something like
we should theoretically be able to switch to enumeration 100% of the time. This can be a huge optimization in some cases and catch big issues in others. It's easy to estimate the support of a strategy (or at least, I think it should be in many cases) so it may be cool have a rule that says strategies should exhaustively enumerate if we estimate that the support is smaller than
max_examples
.The text was updated successfully, but these errors were encountered: