Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make extensible variants more efficient and mashalable. #13180

Open
toots opened this issue May 19, 2024 · 4 comments
Open

Make extensible variants more efficient and mashalable. #13180

toots opened this issue May 19, 2024 · 4 comments
Assignees

Comments

@toots
Copy link
Contributor

toots commented May 19, 2024

Hi!

We've been working with extensible variants in liquidsoap for a while. From a large project standpoint, they are a killer feature of the compiler. They allow modular and extensible designs without having to put all the application logic in one place and also make it possible to have optional extensions.

However, they are much less efficient than regular variants and cannot be marshaled. These are two serious issues, especially the first one, we're seeing 10 to 15% performance increase by removing them where we can.

Now, I apologize if this is uneducated guess but this seems like an implementation restriction. At first, I thought that extensible variants would just pick up individual block tags at compile time but the choice seems to be at runtime, which makes them non optimizable and also non mashalable.

However, playing with mashaling, it seems that there is a workaround to make them marsalable:

type t = ..
type t += Foo of int

let print = function
  | Foo d -> Printf.printf "foo: %d\n" d
  | _ -> Printf.printf "unknown\n"

let value x () = Foo x

let () =
  let foo = value 124 in
  let foo : unit -> t = Marshal.from_string (Marshal.to_string foo [Marshal.Closures]) 0 in
  print (foo ())

This works because:

  1. The value function is defined at compile time
  2. Marshal doc says:

If flags contains Marshal.Closures, functional values will be marshaled as a the position in the code of the program together with the values corresponding to the free variables captured in the closure.

Now, this really seems like an implementation limitation because, if it is in fact possible to marshal extensible variants like this, why wouldn't it be possible for the compiler to use the same trick and pick a position in the code of the program to be the extensible variants tag?

@toots toots changed the title Make extensible variants more efficient and mashallable. Make extensible variants more efficient and mashalable. May 19, 2024
@lthls
Copy link
Contributor

lthls commented May 19, 2024

Extensible types can be extended an arbitrary number of times:

type t = ..
let r : t list ref = ref []

let add_new_value () =
  let open struct type t += T end in
  r := T :: !r

let () =
  for i = 1 to 100 do
    add_new_value ()
  done

The list will contain 100 different constructors, that happen to share the same name. For this reason, it's impossible to assign block tags at compile time (also because of separate compilation: the block tags must not conflict with constructors defined in other units, but the compiler only sees all the units at link time).
When un-marshaling, something similar occurs: since we don't know if the data comes from the same run of the program, we cannot assume that they correspond to a known constructor and we create a new, fresh one. Note that even if we kept the same identifier, pattern-matching of extensible types relies on physical equality of the extension constructors, and a marshaling round-trip always loses physical equality between marshaled values and their in-memory version.
Finally, your trick with a closure only works because your extensible type is defined at toplevel, so it is allocated statically by the native compiler and the closure does not need to store it in its environment. Put the same code in a (non-inlined) functor and it won't work anymore: the function will fetch the extension constructor from its closure, which will have been re-allocated when un-marshaling.

@toots
Copy link
Contributor Author

toots commented May 19, 2024

Yes, that makes sense. I also have this example not working, as you explained:

type t = ..
type t += Foo of int

let print = function
  | Foo d -> Printf.printf "foo: %d\n" d
  | _ -> Printf.printf "unknown\n"

let value x
  let v = Foo x in
  fun () -> v

let () =
  let foo = value 124 in
  let foo : unit -> t = Marshal.from_string (Marshal.to_string foo [Marshal.Closures]) 0 in
  print (foo ())

Could it still be possible to optimize the compilation in cases where the extensible types are defined top-level? I imagine that this would cover a lot of standard cases and/or give the programmer more flexibility.

@lthls
Copy link
Contributor

lthls commented May 19, 2024

Could it still be possible to optimize the compilation in cases where the extensible types are defined top-level? I imagine that this would cover a lot of standard cases and/or give the programmer more flexibility.

I can imagine a compilation scheme where the extensible type comes with normal variant constructors, which get compiled into normal variants, and only the constructors defined later get compiled as extensible variants:

(* A and B are normal variants *)
type t = A | B of int | ..
(* C and D get compiled as extension constructors *)
type t += C | D of string

This is already going to be a bit of work to implement, and even matching only on the normal cases will occur the cost of checking for extension cases first so it won't be as fast as regular variants. But I think it's doable. (I don't know if anybody has both the motivation and the required knowledge to actually do it though.)

Automatically transforming the extension constructors at toplevel in the same module into normal variants is going to be much harder. You need to be able to know how a constructor is represented just by looking at the signature, and since signature are compiled before implementations you need to apply the optimisation to the signatures and propagate the information to the implementation. But you can't know in the signature if the optimisation is even possible:

module M : sig
  type t = ..
  type t += A
  type t += B
end = struct
  type t = ..
  type t += A
  type t += B = A
end

If you had decided, looking at the signature, to use regular variants for A and B you would get an error in the implementation, since it defines A and B to be the same.

It might be possible to do it for modules without a signature, but I don't think we would merge a feature that requires not writing signatures to work.

@toots
Copy link
Contributor Author

toots commented May 20, 2024

Thank you for these clarifications. This example with type equality inside the implementation is interesting, that's clearly something that you cannot do with regular variants.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants