Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove eval, memoize methodwise, traits, callables, typed caches #70

Conversation

willow-ahrens
Copy link

@willow-ahrens willow-ahrens commented Jan 5, 2021

This PR picks up where #59 left off (thanks @cstjean for all your work on this repo!). If we remove eval and use one cache per method, then we need to be able to look up caches by their methods. Types can't be hashed by equivalence classes (otherwise I think Julia dispatch might be faster, consider hash(Val{A} where {A}) != hash(Val{B} where B)). Therefore, we need to use Julia's method lookup to find methods to find their cache. We use a global dictionary in the Memoize.jl module to map Method objects to their caches. Since this was already a big change, this PR also makes several related changes.

The key changes made by this PR

In summary, now you can do:

struct F{A}
	a::A
end
@memoize Dict{__Key__, __Value__}() function (::F{A})(b, ::C) where {A, C}
	println("Running")
	(A, b, C)
end

cstjean and others added 24 commits January 5, 2021 13:40
Fix JuliaCollections#48, at the cost of putting the variable in a poorly-performing global.

Not sure if this is acceptable. It's frustrating that Julia seemingly lacks the tools to deal with
this elegantly.

- If `const` worked in a local function, we'd just put `const` and be done with it.
- If `typeconst` existed for global variables, that would work too.

Memoization.jl uses generated functions, which causes other problems. And it feels like the
wrong solution too.
It makes the macro expansion much more palatable
@codecov-io
Copy link

codecov-io commented Jan 5, 2021

Codecov Report

Merging #70 (6087295) into master (697ce88) will decrease coverage by 9.18%.
The diff coverage is 90.21%.

Impacted file tree graph

@@             Coverage Diff             @@
##            master      #70      +/-   ##
===========================================
- Coverage   100.00%   90.81%   -9.19%     
===========================================
  Files            1        1              
  Lines           43       98      +55     
===========================================
+ Hits            43       89      +46     
- Misses           0        9       +9     
Impacted Files Coverage Δ
src/Memoize.jl 90.81% <90.21%> (-9.19%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 697ce88...6087295. Read the comment docs.

src/Memoize.jl Outdated
end
end

const _brain = Dict()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A global like this is problematic because of precompilation. Eg. if you have a package with

module X
@memoize foo(x) = x
end

I believe you'll find that after using Memoize, X, Memoize.brain is empty.

@cstjean
Copy link
Collaborator

cstjean commented Jan 5, 2021

Hi Peter, this is an impressive list of fixes! I love all the tests and README additions.

For cache invalidation, from an API perspective, I believe we only really need:

  1. A way to clear all caches for a function
  2. Automatically clear the specific method's cache when it's redefined

I like the direction you took with _memories(...). To make it precompilation-friendly, we might need to get rid of the global dict. I'm not sure how to do it, but consider expanding @memoize foo(x::Int, y) = ... into

...
global var"__memoize_cache_foo(x::Int,y)" = Dict()
...

then on redefinition, we can just empty!(var"__memoize_cache_foo(x::Int,y)"), achieving goal 2. For goal 1, we can take the method list as you did, find each method's definition module, and look at names(that_module) for all variables starting with __memoize_cache_foo. Then we empty all of these caches. This is not elegant, but I believe it would work.

Cache emptying is the first problem to solve. If you're interested in implementing that (or another precompilation-compatible solution), I would appreciate if that was a small, self-contained PR starting from #59 (you can choose the target on github).

It's really best to split big PRs into chunks. It's so much easier to review and merge a small PR!

@willow-ahrens
Copy link
Author

willow-ahrens commented Jan 5, 2021

Good catch regarding precompilation. I think my most recent commits fix this using something similar to your suggestion, but using only one variable holding the dictionary instead of one variable per method.

I'll split this up.

@willow-ahrens
Copy link
Author

willow-ahrens commented Jan 24, 2021

Reflecting a little bit on the proposed solution, and other potential ways to simplify #71:

I like the direction you took with _memories(...). To make it precompilation-friendly, we might need to get rid of the global dict. I'm not sure how to do it, but consider expanding @memoize foo(x::Int, y) = ... into global var"__memoize_cache_foo(x::Int,y)" = Dict()

then on redefinition, we can just empty!(var"__memoize_cache_foo(x::Int,y)"), achieving goal 2. For goal 1, we can take the method list as you did, find each method's definition module, and look at names(that_module) for all variables starting with __memoize_cache_foo. Then we empty all of these caches. This is not elegant, but I believe it would work.

I'm not convinced this approach will work. It doesn't detect overwrites when variables are named differently. For example, the string "__memoize_cache_foo(x::Int,y)" will not equal the string "__memoize_cache_foo(x::Int,z)" even when the second method would overwrite the first. It also spuriously detects overwrites when the types themselves are variables that need resolution, such as T = Int; @memoize foo(::T) = 1; T = Bool; @memoize foo(::T) = 2.

Defining a method overwrites a previous method precisely when their type signatures match, so it seems best to use the type signatures (e.g. Tuple{typeof(foo), Int, Any}) of the methods as keys to associate with a cache. Unfortunately, while it is easy to check if two types are equivalent (one can use ==), it seems that there is no straightforward way to find equivalent types. I'm imagining a few options.

  1. We use a list to look up caches. When defining a method, we walk the whole list to find signatures of potential methods that might be overwritten, invalidate and remove that cache if found, then add the signature and cache of our new method to the list.
  2. We use a sentinel method to look up caches. When defining foo(x::Int,y), we also define find_the_cache(::Type{Tuple{typeof(foo), Int, Any}}) = Dict(). Of course, we also define find_the_cache(::Any) = nothing, so that we can call find_the_cache to determine if our method has been defined before. This could greatly simplify the code. Unfortunately, if find_the_cache is to be a global method, this would put us back in the world of needing to call eval.
  3. Remove eval, memoize methodwise, traits, callables, typed caches #70 and Add support for method-wise cache invalidation. #71 use Julia introspection to find the method that would be overwritten, if any. Since the signature of that method matches the known signature exactly, we can hash types based on their structure with a dictionary that we store in each module.

I think that the first solution would likely be easier to read than #71, and I'm not sure anyone will notice the slowdown of linear search versus whatever julia uses internally to solve the "find equivalent type" problem. We might also simplify #71 a bit by calling a more standard form of which.

@willow-ahrens
Copy link
Author

willow-ahrens commented Jan 24, 2021

To make it precompilation-friendly, we might need to get rid of the global dict.

If this is a concern because there was only one global dict in the Memoize package, #71 has been amended to use a separate dictionary in each package that uses memoize.

If, however, this was a concern because of the interactions between precompilation and rehashing dictionaries; whatever problems we have with the global dictionary that Memoize creates, we'll also have with the dictionaries/caches the global dict is attempting to store, and that might be a separate issue.

@cstjean
Copy link
Collaborator

cstjean commented Jan 24, 2021

I'm not convinced this approach will work. It doesn't detect overwrites when variables are named differently. For example, the string "__memoize_cache_foo(x::Int,y)" will not equal the string "__memoize_cache_foo(x::Int,z)" even when the second method would overwrite the first.

That's true, but to me that's basically a negligible problem, affecting a vanishingly small number of real-world scenarios.

It's also interesting to think about what Revise will do with that. Does Revise delete global variables when they are deleted in the code? I don't know.

It also spuriously detects overwrites when the types themselves are variables that need resolution, such as T = Int; @memoize foo(::T) = 1; T = Bool; @memoize foo(::T) = 2.

I can't think of any code I've ever seen that would do something like that. Can you? People might write

for T in [:Int, :Vector]
     @eval @memoize f(::$T) = ...
end

but that would work fine, because the $ happens before the @memoize expansion.

@cstjean
Copy link
Collaborator

cstjean commented Jan 24, 2021

We use a list to look up caches. When defining a method, we walk the whole list to find signatures of potential methods that might be overwritten, invalidate and remove that cache if found, then add the signature and cache of our new method to the list.

Since the list is global, wouldn't you be stuck with the same precompilation problems as before?

We use a sentinel method to look up caches. When defining foo(x::Int,y), we also define find_the_cache(::Type{Tuple{typeof(foo), Int, Any}}) = Dict(). Of course, we also define find_the_cache(::Any) = nothing, so that we can call find_the_cache to determine if our method has been defined before. This could greatly simplify the code.

I like that... if it works! 🙂 Memoize.jl is such a julia puzzle to figure out.

Unfortunately, if find_the_cache is to be a global method, this would put us back in the world of needing to call eval.

I don't understand. For local variables, I don't think we need to support explicit user-side cache invalidation, so this looks fine to me (as long as the cache can be GC'ed once the function returns)

If this is a concern because there was only one global dict in the Memoize package, #71 has been amended to use a separate dictionary in each package that uses memoize.

👍

If, however, this was a concern because of the interactions between precompilation and rehashing dictionaries;

That's new to me, what are those interactions?

@willow-ahrens
Copy link
Author

After attempting to implement 2, I suspect it might be impossible to avoid introspection: We need to find the module in which the overwritten method was defined. I can't think of a way to do it without calling which, meaning that 1. or 2. aren't any simpler than 3. I think your variable-based approach might also require introspection to determine which module stores the variable. As long as we need to look up the method, we might as well use its type signature to correctly detect overwrites and avoid name-dependent behavior.

@cstjean
Copy link
Collaborator

cstjean commented Jan 24, 2021

Yes, with what I suggested we need introspection too, as I said:

For goal 1, we can take the method list as you did, find each method's definition module, and look at names(that_module) for all variables starting with __memoize_cache_foo. Then we empty all of these caches. This is not elegant, but I believe it would work

As long as we need to look up the method, we might as well use its type signature to correctly detect overwrites and avoid name-dependent behavior.

Like I said, I don't think the name-dependent behaviour is problematic, but if you can make it work with similar complexity, I will be happy to do the right thing and use types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants