Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convention-based Arbitrary registration #334

Open
ploeh opened this issue Dec 17, 2016 · 8 comments
Open

Convention-based Arbitrary registration #334

ploeh opened this issue Dec 17, 2016 · 8 comments

Comments

@ploeh
Copy link
Member

ploeh commented Dec 17, 2016

FsCheck's current architecture gives preferential treatment of the built-in Arbitraries in Arb.Default. This extends to the built-in 'signal' types, like NonNegativeInt, PositiveInt, NonNull, NonEmptySet, etc.

Such 'signal' types are invaluable because they enable one to declaratively state that a value should be constrained in some way.

In a gen expression, one can write:

gen {
    let! PositiveInt i = Arb.generate
    // other stuff... 
    }

With FsCheck.Xunit (and, I suppose, FsCheck.NUnit) one can likewise pattern match in the test function itself:

let ``my test function`` (PositiveInt i) =
    // test body goes here...

Unfortunately, as useful as that is, one is constrained to those 'signal' types that ship with FsCheck itself. Imagine, for example, that I have a need for generating a string between 1 and 32 characters.

It'd be trivial for me to create a single-case discriminated 'signal' type in my test, but even if I define a generator (or Arbitrary) for it, I can't use it in the same way as above. (Well, perhaps I could if I'd be willing to entertain the idea of using Arb.register, but I don't think that's a proper alternative.)

I'd like to suggest a convention-based approach instead. If a requested type has a static member that returns an Arbitrary of that type, then use that Arbitrary.

For the above example, it could look like this:

type ShortString = ShortString of string with
    static member Arb () =
        gen {
            let! length = Gen.choose(1, 32)
            return! Arb.generate |> Gen.arrayOfLength length |> Gen.map (String >> ShortString) }
        |> Arb.fromGen

Whenever asked for an instance of the ShortString type, FsCheck would look for, and find, the Arb method, and use the Arbitrary returned by the method.

This could also replace much of the internal implementation that currently relies on Arb.Default...

@kurtschelfthout
Copy link
Member

I empathize with the suggestion.

I'd like to understand how this would improve things over using Arb.register (You say it's not a proper alternative, I'm not sure why); or using the Config options to set Arbitrary instances; or the attribute parameters if you use Nunit or xUnit.

Especially interested in how you estimate the cost of adding another way to do the same thing compared to its convenience and applications.

One seeming restriction of the above approach is that it would not work if you want to define an Arbitrary instance for a type you have defined yourself in your actual application (I'm assuming you don't want the system depend on FsCheck, at least I wouldn't want that...) because you can't or don't want to add a static method returning an Arbitrary to such a type.

Also consider we'd need to define how the different ways of associating an Arbitrary with a type interact. That is, if a type is in the Arb.Default list, and it has a static member returning an Arb, and it is in the list of Arbitrary instances via Config or Property attribute, which one takes precedence?

@ploeh
Copy link
Member Author

ploeh commented Dec 20, 2016

You always do a good job making me question my own ideas 😳 It's always good with some feedback, because one can quickly get into a rut if only thinking about things for oneself.

I do have a bit of a roundabout motive for this suggestion, but perhaps there are other ways to address them as well. I'll get back to that later...

I no longer use the part of the FsCheck API that relies on Type instances: the part of the API that attempts to emulate typeclasses. It's too awkward to have add types for the smallest tweak, and it's not type-safe either. Instead, I write all tests using either normal pipelining, gen computation expressions, or the applicative combinators. This API is both succinct, flexible, and type-safe, so I'm quite happy with that.

That also partly explains my dislike for Arb.register; the other part is that Arb.register involves mutation and TLS, and I frankly don't feel 'safe' using it. Particularly as everything becomes more and more asynchronous and parallel. I wonder how it's going to work with xUnit.net 2.2, which will get native support for F# async workflows...

In any case, I thought you wanted to get rid of that as well, but perhaps I misunderstood your paragraph about that in #198?

In any case, coming back to my actual motivation for this: It'd be nice to be able to design a mechanism for Arbitrary discovery that's a bit more flexible than the current. My problem today is actually mostly with the FsCheck source code itself. As you know. I'm attempting to add support for Uri values, but that means editing Arbitrary.fs itself. That file is now some 1200 lines long, and only keeps growing. So far, it's resisted all my attempts at splitting it up, mainly because the Arb.Default class bookends the file.

The above suggestion is an attempt at addressing this problem, because it might enable us to move many of the above-mentioned 'signal' types to later in the code base. Perhaps there's another way; if there is, I'd be happy to hear about it.

@kurtschelfthout
Copy link
Member

kurtschelfthout commented Dec 20, 2016

You always do a good job making me question my own ideas 😳 It's always good with some feedback, because one can quickly get into a rut if only thinking about things for oneself.

Thanks very much for that. I'm trying to not be so vocal about my a priori opinions on things. I have been playing that game for so long, and it mostly ends up with everyone in their corner refusing to come out...

Anyway.

I no longer use the part of the FsCheck API that relies on Type instances: the part of the API that attempts to emulate typeclasses.

Agree - this is a worrying part of the code.

Particularly as everything becomes more and more asynchronous and parallel.

Exactly. This already bit us: I made the arbitrary instances thread local. I also didn't feel "safe" accepting a PR to add some support for async, because it was unpredictable how the Arbitrary stuff would interact with it.

In any case, I thought you wanted to get rid of that as well, but perhaps I misunderstood your paragraph about that in #198?

You didn't, although for that I was more focused on getting rid of the global mutable Arbitrary state. The end-game there would indeed be to get rid of Arb.register and only allow to use custom Arbitrary instances via a new Prop combinator like forAll, with the default set being given by Config, and the "current" map of Type -> Arbitrary would have to be passed around implicitly via the Prop combinators (or maybe even in Gen ultimately). The idea being that each property would have its own Type -> Arbitrary map that is passed around functionally; and so parallel execution should not pose as many problems. I am however not sure how much of that is wistful thinking....and how easy it is to put the Type -> Arbitrary map sufficiently out of the way and sufficiently in sight so it's easily accessible.

The above suggestion is an attempt at addressing this problem, because it might enable us to move many of the above-mentioned 'signal' types to later in the code base.

That is true, though I must say that the Arbitrary.fs file is long doesn't bother me that much (and isn't a large part of it the definition of a cultures for PCL purposes? And another large part types defined in System for which this suggestion would not work?). It's very much possible I've been starting too long at F# compiler code files of 40K lines though :/

But I do still like the suggestion. I think actually if we can make the end-game I described above for fscheck 3 work, this kind of feature would be a no-brainer.

@ploeh
Copy link
Member Author

ploeh commented Dec 21, 2016

The idea being that each property would have its own Type -> Arbitrary map that is passed around functionally; and so parallel execution should not pose as many problems. I am however not sure how much of that is wistful thinking....and how easy it is to put the Type -> Arbitrary map sufficiently out of the way and sufficiently in sight so it's easily accessible.

Couldn't this be done using the State monad?

It's very much possible I've been starting too long at F# compiler code files of 40K lines though :/

I think you have 😲

The list of cultures, as well as the list of top-level domain names, is not it Arbitrary.fs, but rather in Data.fs (which I think is a reasonable separation).

What bothers me about the size of Arbitrary.fs is the lack of cohesion. In order to work with a 'signal' type like e.g. UriScheme, I have to define it towards the top of the file, then I have to jump down towards the bottom to write the Arbitrary for it, and back and forth if I need to change something.

The fact that I'm working on a four-year old laptop with F# Power Tools in Visual Studio means that I can actually feel that the computer is struggling to keep up with a largish F# code base. I don't think my computer would be able to deal with a 40 kLoC code base at all...

There's also the concern that larger code files could scare away potential new contributors. On the other hand, if the F# compiler has 40 kLoC files, that concerns is most likely unfounded - AFAICT, there are plenty of contributors to that.

@StefanBertels
Copy link

StefanBertels commented Dec 17, 2020

I know this is an old issue entry but I'm looking for a solution to pre-define arbitraries for my own types (and could not find anything for that within docs / issues so far).

Here's my use case and some ideas to solve this (and I hope that this is a valid issue to add this):

I have a classlib A with one or more custom types, packaged via nuget. Could be something like Month or FibonacciNumber.
Obviously I can write some useful Arbitrary default implementation and add that somehow to this package. (I use C#.)

I have another project B where I use those types. My favorite solution would be that the pre-defined arbitrary from that other packages is automatically used (when not overridden) for any property based test using this type.

From what I know currently I need to do two things:

  1. Add some Arbitrary implementation to the package. Problem with this is I created some dependency to FsCheck in my production code (package A).
  2. Register Arbitraries in the test project of B.

Ideas to solve the first issue:

a) Fiddle with something like PrivateAssets attribute if the PackageReference to FsCheck in A (this is messy because it can result in using incompatible calls) to at least avoid packaging FsCheck in production code of B. There would be an interface or a custom attribute in the type defining the arbitrary.

b) Use something like duck typing
If my custom type T implements two static methods ArbGenerate() returning an IEnumerable<T and optionally ArbShrink(T) returning IEnumerable<T> and fscheck would use reflection to find them and convert them to Arbitrary<T> this would avoid the dependency. Probably would need a really good (generally unused) name.

If this is interesting then one would need to define the allowed signature(s). Maybe we should use instance methods here to allow creating generators elsewhere using extensions methods (not sure, whether they can be found by reflection).
A better alternative could be a separate class ArbDefault<T> (ArbDefault<Month>) that could be defined in package A (avoiding any conflicts within the type) or anywhere else (e.g. in testproject of B). Something like this is done by LanguageExt to define / override custom comparers for types (look for Ord<T> / OrdDefault<T> there). FsCheck probably should avoid using some base interface but fully rely on duck typing (signature convention / class name convention).

Second issue (avoid registering / using default Arbitrary implementation) can be solved using reflection. I think this class (OrdClass) is the entry point where this is done for Ord*<T> in LanguageExt.

Edit: (if realized this way) I would expect that I could even have generic types in my package (classlib A) like MyList<T> for which I could create a generic default generator/shrinker (in classlib A, too) that uses ArbDefault<T> internally to produces random values for my list. The used ArbDefault<T> would (on runtime) be resolved so could be a default generator in some other package (A2) if T is a type in A2 .... or overridden in test project (of B).
Finally this means: classlibs could supply default generators for their types and use any other generator (e.g. generic). A simple "conflict" resolution for runtime resolver would be: prefer any generator explicitly defined, if not found search within test assembly (current assembly), if not found search in all assemblies (fail if more than one), fallback to FsCheck defaults.

PS: Thanks to all contributors / creators of this great library.

@kurtschelfthout
Copy link
Member

For problem 1

Add some Arbitrary implementation to the package. Problem with this is I created some dependency to FsCheck in my production code (package A).

How about creating a separate "test support" NuGet package which depends on package A and FsCheck?

No matter the structure, the question I guess is always how do you make it so that the Arbitrary instances are loaded by default. This is a rather tricky problem to solve, because if it is done "automagically" then how do you deal with potential conflicts and order dependence e.g. two packages providing a default Arbitrary for the same type. Also note that typically an Arbitrary instance uses other Arbitrary instances in its implementation, so there is a question of what happens when those are overridden as well.

For these reasons for a long time I've wanted to move to a more explicit mechanism. This is slightly more work on the test side, i.e. for the Arbitrary instances that don't ship with FsCheck you'll have to add an attribute or otherwise declare you want to use them in the tests, but that's a pretty low cost for the benefits of clarity, safety and robustness.

@StefanBertels
Copy link

StefanBertels commented Dec 22, 2020

Thanks for your answer. Ok, I see the point of the extra package (which probably is worth going this way).

For problem 2 (resolving): Generally I don't like "automagic" and IMO being type-safe, explicit, dependency-free etc. is a good thing. I'm not sure whether this approach "scales". What if all built-in types would need this explicit declarative approach, too?
I'm not sure if this would be a (relevant) usability issue / obstacle for users. If yes that might be an indicator to reduce obstacles for used-defined types, too.

Which are the problems of automagic:

Values get created and you don't know why / maybe is inconsistent
=> I think this is a bigger problem with e.g. reflection based generation (use public properties etc.). If there is a type in your soluution that implements Arbitrary<T> then at least its author did the explicit work (type is constructed in a valid way).

You don't know which generator is used
=> We could just allow a single Arbitrary<T> class in solution and throw if there are multiple / force explicit declaration which one to use

More work to create companion package
=> This is just some optional helper, explicit registering of generators is always possible (and always first priority).

Having some extra generator package A.Arbitrary would add expliciteness, too. If you add this package to your test project, you probably want to use its generators. In theory one could label that assembly or the Arbitrary<T> classes with some extra attribute e.g. [DefaultArbitrary(typeof(List<>))].
Edit: this might be similar to related things like JsonConverterAttribute or TypeConverterAttribute. I don't like this kind of attribute that much but finally they help solve avoiding some boiler plate.

(I don't see a big problem that mechanism within the A.Arbitrary companion package might be a bit more complex. At least does not harm consumers.)

Just my two cents as an FsCheck user. Please don't see this as request, just want to add some thoughts from a consumer / type creator perspective. Happy holidays, everyone.

@kurtschelfthout
Copy link
Member

kurtschelfthout commented Dec 25, 2020

Since this is really constructing a type to function mapping, it's instructive to look at what other languages have done in this space - the "orphan rules". For example, in Rust, traits can only be implemented if either the trait or the type is local to the crate.

A mechanism where default Arbitrary instances can come only either from FsCheck itself (because that's where Arbitrary is defined) or from the assembly where the type is defined, would be free of potential conflicts because there'd be only one possible source of a default Arbitrary for each type.

An equivalent mechanism for FsCheck, that also takes into account the desire to not have a dependency on FsCheck in the "real" assembly, could look something like this:

  • Add an assembly level attribute with a well known name to the real assembly, which gives the name of the Arbitrary provider assembly for types in this assembly. E.g. in MyGreatLib assembly, add [<assembly: FsCheckArbitraryProvider("MyGreatLib.FsCheck">]
  • When FsCheck encounters a type, it checks the assembly of this type for the FsCheckArbitraryProvider attribute. Note that while FsCheck may include a definition of such an attribute, it just has to be any attribute with the same name and signature - in other words, to include this attribute no dependency on FsCheck itself is necessary.
  • FsCheck tries to load the referenced provider assembly and tries to find an Arbitrary implementation in it that fits the type.

Note however that this includes a pretty significant amount of magic. For example, how does FsCheck let the user know that it looked for an assembly with Arbitrary but failed to load it somehow? Might just be that the user didn't want to install the provider in the first place. Anecdotally, Arbitrary registration issues are pretty common already so I'm relatively wary of adding more functionality around them. On the other hand I definitely see the the appeal of allowing library authors to plug in Arbitrary instances of their types into FsCheck somehow - if that takes off it would be a great virtuous circle dynamic for the FsCheck ecosystem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants