Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disjunctive normal form attempt #558

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

erszcz
Copy link
Collaborator

@erszcz erszcz commented Sep 21, 2023

I'm creating this draft PR with an intention of closing it immediately afterwards. It's not a working solution, I just want to describe the problem and show an experiment aimed at solving the problem.

In #524 (comment) I said I'd experiment with disjunctive normal form to address the following problem from Gradualizer's own code. The problem is quite prevalent - it occurs everywhere where we have a structure with an inner union. For example, pretty much every time we handle type() instances:

 compat_ty({type, _, tuple, any}, {type, _, tuple, _Args}, Seen, _Env) -> 
     ret(Seen); 
 compat_ty({type, _, tuple, _Args}, {type, _, tuple, any}, Seen, _Env) -> 
     ret(Seen); 
 compat_ty({type, _, tuple, Args1}, {type, _, tuple, Args2}, Seen, Env) -> 
     %% We can assert because we match out `any' in previous clauses. 
     %% TODO: it would be perfect if Gradualizer could refine this type automatically in such a case 
     compat_tys(?assert_type(Args1, [type()]), 
                ?assert_type(Args2, [type()]), Seen, Env); 

In the above example we see that even though there's a dedicated clause matching the case of Args1 = any (and symmetrically Args2 = any), we still have to assert that Args1 :: [type()] (respectively Args2 :: [type()]) in the clause which handles exactly that - Args1 and Args2 already refined to [type()]. We have to use a superfluous assertion, even though it's obvious to us and should be obvious to the typechecker.

I think it's caused by the following:

> typechecker:subtype(gradualizer:type("{t, a} | {t, b}"), gradualizer:type("{t, a | b}"), gradualizer:env()).
{true,{constraints,#{},#{},#{}}}
> typechecker:subtype(gradualizer:type("{t, a | b}"), gradualizer:type("{t, a} | {t, b}"), gradualizer:env()).
false

In other words, while Gradualizer understands that {t, a} | {t, b} :: {t, a | b}, it does not understand that {t, a | b} :: {t, a} | {t, b}. It seems to me that this property holding in both directions is crucial for our needs.

As you can see in notes.1.md in this PR, I contrasted it with the Elixir prototype based on the CDuce set-theoretic type checker. For Elixir/CDuce, the property holds in both directions.

To some extent it can be achieved by "expanding" unions as done in this PR - and it fixes some particular cases, especially the new tests in typechecker_tests - however, the general effect on tests is disastrous :(

@erszcz
Copy link
Collaborator Author

erszcz commented Sep 21, 2023

Hmm, now that I look at the results again, they don't look THAT bad:

Failed: 20. Skipped: 0. Passed: 708.

Maybe it's not such a bad idea after all... I remembered the result from June as much worse 🤔

@zuiderkwast
Copy link
Collaborator

Nice! I believe this is a good approach, especially since CDuce is the language where these theories by Castagna et.al. are tested. The two papers mentioned below are listed on the web site of CDuce. (I read them some years ago.)

In Gradual Typing with Union and Intersection Types (2017) they talk about DNF and Gradual-DNF and some other concepts and steps how to convert types into this form, compute subtyping, etc.

In Covariance and Contravariance: a fresh look at an old issue (a primer in advanced type systems for learning functional programmers) (2018), there's an efficient datastructure "binary decision diagram with lazy unions" to represent types in DNF and an algorithm how to transform types.

@erszcz
Copy link
Collaborator Author

erszcz commented Sep 22, 2023

Indeed, especially "Covariance and Contravariance" is very informative. It contains one more brilliant idea which I tested in Gradualizer, but with quite a dramatic impact on test results. The idea is defining subtyping by type difference.

Currently, in Gradualizer type_diff and subtype are defined separately. The paper proposes to define subtyping like this:

subtype(T1, T2) -> is_empty(type_diff(T1, T2)).

While this PR's approach to DNF doesn't seem to be as dramatic, I think it's not complete either. Together with subtyping redefinition, we'd be getting very close to the theory described there. The question, though, is whether it's worth to change fundamental parts of Gradualizer in the light of etylizer being developed from scratch according to that theory? As we speak it still has small coverage of the Erlang language features, but it seems to have more sound and better understood theoretical underpinning 🤔

@zuiderkwast
Copy link
Collaborator

Etilyzer is not gradual typing, is it?

Anyway, I don't have time and energy to do such major work in Gradualizer at this time.

@erszcz
Copy link
Collaborator Author

erszcz commented Sep 22, 2023

Etilyzer is not gradual typing, is it?

At the very least they accept the dynamic type on the syntax level, so I imagine they would like to expand in that direction.

time and energy

I'm afraid that's the main problem. There's a chance for some EEF (Erlang Ecosystem Foundation) funding (I spoke to one of the board members about it, nothing certain, but still), but apart from a budget, we would also need people with capacity. cc @xxdavid @josefs

@xxdavid
Copy link
Collaborator

xxdavid commented Oct 4, 2023

Good idea! Thanks for experimenting with it. :)

20 failed tests don’t sound like that much. It seems to me that most of them are related to maps and glb, so maybe it would be sufficient to patch these places. But it might also require some deep refactoring, who knows…

Also, have you thought about the exponential explosion? Do you think it’s safe to use this expansion on type expressions of “practical size”?

Regarding the redefinition of subtype, what do you think would be the advantages (and possibly disadvantages) of it compared to the current implementation?

Funding for Gradualizer sounds really cool! I think it could help the project a lot. Not having anyone who would work on it regularly and as part of their job is one of the things that distinguishes Gradualizer from eqWAlizer and etylizer and probably one of the reasons why it doesn’t move very fast (compared to the other projects). Sadly, I don’t have much time for it either. I’m currently immersed in trying to improve the polymorphism support, and do not have much capacity left.

@erszcz
Copy link
Collaborator Author

erszcz commented Oct 4, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants