Remove eval, memoize methodwise, traits, callables, typed caches #70

willow-ahrens · 2021-01-05T19:19:28Z

This PR picks up where #59 left off (thanks @cstjean for all your work on this repo!). If we remove eval and use one cache per method, then we need to be able to look up caches by their methods. Types can't be hashed by equivalence classes (otherwise I think Julia dispatch might be faster, consider hash(Val{A} where {A}) != hash(Val{B} where B)). Therefore, we need to use Julia's method lookup to find methods to find their cache. We use a global dictionary in the Memoize.jl module to map Method objects to their caches. Since this was already a big change, this PR also makes several related changes.

The key changes made by this PR

Removes eval. (fixes Remove eval from at-memoize macro #48)
Caches are associated with methods, rather than functions. (fixes Running @memoize a second time silently does nothing, even if the second call refers to nonexistent names #66)
Supports callable types, callable objects, and lambdas (Doesn't support inner constructors, I think)
Supports trait memoization (i.e. arguments of the form (::T) can now be memoized, and they use T as their key)
Removed syntactic special casing of dictionary construction, in favor of the simply supplying an expression which constructs the cache. (fixes LRU example doesn't work #58)
Defines local variables __Key__ and __Value__ in the cache constructor expression scope, to support more specific dictionary types.
Only empties caches when methods are overwritten. (fixes Saving cache to file and reading from it #68)

In summary, now you can do:

struct F{A}
	a::A
end
@memoize Dict{__Key__, __Value__}() function (::F{A})(b, ::C) where {A, C}
	println("Running")
	(A, b, C)
end

Fix JuliaCollections#48, at the cost of putting the variable in a poorly-performing global. Not sure if this is acceptable. It's frustrating that Julia seemingly lacks the tools to deal with this elegantly. - If `const` worked in a local function, we'd just put `const` and be done with it. - If `typeconst` existed for global variables, that would work too. Memoization.jl uses generated functions, which causes other problems. And it feels like the wrong solution too.

at the global scope

It makes the macro expansion much more palatable

codecov-io · 2021-01-05T19:25:17Z

Codecov Report

Merging #70 (6087295) into master (697ce88) will decrease coverage by 9.18%.
The diff coverage is 90.21%.

@@             Coverage Diff             @@
##            master      #70      +/-   ##
===========================================
- Coverage   100.00%   90.81%   -9.19%     
===========================================
  Files            1        1              
  Lines           43       98      +55     
===========================================
+ Hits            43       89      +46     
- Misses           0        9       +9

Impacted Files	Coverage Δ
src/Memoize.jl	`90.81% <90.21%> (-9.19%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 697ce88...6087295. Read the comment docs.

cstjean · 2021-01-05T20:01:45Z

src/Memoize.jl

+ end
+end
+
+const _brain = Dict()


A global like this is problematic because of precompilation. Eg. if you have a package with

module X @memoize foo(x) = x end

I believe you'll find that after using Memoize, X, Memoize.brain is empty.

cstjean · 2021-01-05T20:21:20Z

Hi Peter, this is an impressive list of fixes! I love all the tests and README additions.

For cache invalidation, from an API perspective, I believe we only really need:

A way to clear all caches for a function
Automatically clear the specific method's cache when it's redefined

I like the direction you took with _memories(...). To make it precompilation-friendly, we might need to get rid of the global dict. I'm not sure how to do it, but consider expanding @memoize foo(x::Int, y) = ... into

...
global var"__memoize_cache_foo(x::Int,y)" = Dict()
...

then on redefinition, we can just empty!(var"__memoize_cache_foo(x::Int,y)"), achieving goal 2. For goal 1, we can take the method list as you did, find each method's definition module, and look at names(that_module) for all variables starting with __memoize_cache_foo. Then we empty all of these caches. This is not elegant, but I believe it would work.

Cache emptying is the first problem to solve. If you're interested in implementing that (or another precompilation-compatible solution), I would appreciate if that was a small, self-contained PR starting from #59 (you can choose the target on github).

It's really best to split big PRs into chunks. It's so much easier to review and merge a small PR!

willow-ahrens · 2021-01-05T21:40:02Z

Good catch regarding precompilation. I think my most recent commits fix this using something similar to your suggestion, but using only one variable holding the dictionary instead of one variable per method.

I'll split this up.

willow-ahrens · 2021-01-24T02:16:01Z

Reflecting a little bit on the proposed solution, and other potential ways to simplify #71:

I like the direction you took with _memories(...). To make it precompilation-friendly, we might need to get rid of the global dict. I'm not sure how to do it, but consider expanding @memoize foo(x::Int, y) = ... into global var"__memoize_cache_foo(x::Int,y)" = Dict()

then on redefinition, we can just empty!(var"__memoize_cache_foo(x::Int,y)"), achieving goal 2. For goal 1, we can take the method list as you did, find each method's definition module, and look at names(that_module) for all variables starting with __memoize_cache_foo. Then we empty all of these caches. This is not elegant, but I believe it would work.

I'm not convinced this approach will work. It doesn't detect overwrites when variables are named differently. For example, the string "__memoize_cache_foo(x::Int,y)" will not equal the string "__memoize_cache_foo(x::Int,z)" even when the second method would overwrite the first. It also spuriously detects overwrites when the types themselves are variables that need resolution, such as T = Int; @memoize foo(::T) = 1; T = Bool; @memoize foo(::T) = 2.

Defining a method overwrites a previous method precisely when their type signatures match, so it seems best to use the type signatures (e.g. Tuple{typeof(foo), Int, Any}) of the methods as keys to associate with a cache. Unfortunately, while it is easy to check if two types are equivalent (one can use ==), it seems that there is no straightforward way to find equivalent types. I'm imagining a few options.

We use a list to look up caches. When defining a method, we walk the whole list to find signatures of potential methods that might be overwritten, invalidate and remove that cache if found, then add the signature and cache of our new method to the list.
We use a sentinel method to look up caches. When defining foo(x::Int,y), we also define find_the_cache(::Type{Tuple{typeof(foo), Int, Any}}) = Dict(). Of course, we also define find_the_cache(::Any) = nothing, so that we can call find_the_cache to determine if our method has been defined before. This could greatly simplify the code. Unfortunately, if find_the_cache is to be a global method, this would put us back in the world of needing to call eval.
Remove eval, memoize methodwise, traits, callables, typed caches #70 and Add support for method-wise cache invalidation. #71 use Julia introspection to find the method that would be overwritten, if any. Since the signature of that method matches the known signature exactly, we can hash types based on their structure with a dictionary that we store in each module.

I think that the first solution would likely be easier to read than #71, and I'm not sure anyone will notice the slowdown of linear search versus whatever julia uses internally to solve the "find equivalent type" problem. We might also simplify #71 a bit by calling a more standard form of which.

willow-ahrens · 2021-01-24T02:30:03Z

To make it precompilation-friendly, we might need to get rid of the global dict.

If this is a concern because there was only one global dict in the Memoize package, #71 has been amended to use a separate dictionary in each package that uses memoize.

If, however, this was a concern because of the interactions between precompilation and rehashing dictionaries; whatever problems we have with the global dictionary that Memoize creates, we'll also have with the dictionaries/caches the global dict is attempting to store, and that might be a separate issue.

cstjean · 2021-01-24T09:11:28Z

I'm not convinced this approach will work. It doesn't detect overwrites when variables are named differently. For example, the string "__memoize_cache_foo(x::Int,y)" will not equal the string "__memoize_cache_foo(x::Int,z)" even when the second method would overwrite the first.

That's true, but to me that's basically a negligible problem, affecting a vanishingly small number of real-world scenarios.

It's also interesting to think about what Revise will do with that. Does Revise delete global variables when they are deleted in the code? I don't know.

It also spuriously detects overwrites when the types themselves are variables that need resolution, such as T = Int; @memoize foo(::T) = 1; T = Bool; @memoize foo(::T) = 2.

I can't think of any code I've ever seen that would do something like that. Can you? People might write

for T in [:Int, :Vector]
     @eval @memoize f(::$T) = ...
end

but that would work fine, because the $ happens before the @memoize expansion.

cstjean · 2021-01-24T09:38:25Z

We use a list to look up caches. When defining a method, we walk the whole list to find signatures of potential methods that might be overwritten, invalidate and remove that cache if found, then add the signature and cache of our new method to the list.

Since the list is global, wouldn't you be stuck with the same precompilation problems as before?

We use a sentinel method to look up caches. When defining foo(x::Int,y), we also define find_the_cache(::Type{Tuple{typeof(foo), Int, Any}}) = Dict(). Of course, we also define find_the_cache(::Any) = nothing, so that we can call find_the_cache to determine if our method has been defined before. This could greatly simplify the code.

I like that... if it works! 🙂 Memoize.jl is such a julia puzzle to figure out.

Unfortunately, if find_the_cache is to be a global method, this would put us back in the world of needing to call eval.

I don't understand. For local variables, I don't think we need to support explicit user-side cache invalidation, so this looks fine to me (as long as the cache can be GC'ed once the function returns)

If this is a concern because there was only one global dict in the Memoize package, #71 has been amended to use a separate dictionary in each package that uses memoize.

👍

If, however, this was a concern because of the interactions between precompilation and rehashing dictionaries;

That's new to me, what are those interactions?

willow-ahrens · 2021-01-24T20:56:14Z

After attempting to implement 2, I suspect it might be impossible to avoid introspection: We need to find the module in which the overwritten method was defined. I can't think of a way to do it without calling which, meaning that 1. or 2. aren't any simpler than 3. I think your variable-based approach might also require introspection to determine which module stores the variable. As long as we need to look up the method, we might as well use its type signature to correctly detect overwrites and avoid name-dependent behavior.

cstjean · 2021-01-24T21:39:45Z

Yes, with what I suggested we need introspection too, as I said:

For goal 1, we can take the method list as you did, find each method's definition module, and look at names(that_module) for all variables starting with __memoize_cache_foo. Then we empty all of these caches. This is not elegant, but I believe it would work

As long as we need to look up the method, we might as well use its type signature to correctly detect overwrites and avoid name-dependent behavior.

Like I said, I don't think the name-dependent behaviour is problematic, but if you can make it work with similar complexity, I will be happy to do the right thing and use types.

cstjean and others added 24 commits January 5, 2021 13:40

Use local directive so that it's also performant

3dc0431

at the global scope

Support memoize_cache too

81ddd46

Fix the broken finalized test

d7bb206

Comment

84b61ed

Factor out try_empty_cache.

a99ab82

It makes the macro expansion much more palatable

passes tests

7de9def

callable object tests

d723971

add trait tests

7c238b4

dead code

f142782

can we condense this?

c303480

exporting just the one function.

9f77ba8

checking finalize

88ef8f9

key value types

cfb705d

cleaning up some more

895bb02

callable types.

b5f37fb

which just returns the most specific previous method.

9b0bccd

auto key val

29cb4d8

not sure if we should automagically type the dictionaries

5d0263c

documenting.

aafd53c

cleaning up

7a6f2d4

finishing touches

655db4a

update README

0d8c9d5

spell out __Value__

426b97d

cstjean reviewed Jan 5, 2021

View reviewed changes

willow-ahrens added 2 commits January 5, 2021 15:48

precompilation safe, I think.

6087295

more precompilation fixes

f4e697e

willow-ahrens closed this Jan 5, 2021

cstjean mentioned this pull request Jan 24, 2021

Add support for method-wise cache invalidation. #71

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove eval, memoize methodwise, traits, callables, typed caches #70

Remove eval, memoize methodwise, traits, callables, typed caches #70

willow-ahrens commented Jan 5, 2021 •

edited

Loading

codecov-io commented Jan 5, 2021 •

edited

Loading

cstjean Jan 5, 2021

cstjean commented Jan 5, 2021

willow-ahrens commented Jan 5, 2021 •

edited

Loading

willow-ahrens commented Jan 24, 2021 •

edited

Loading

willow-ahrens commented Jan 24, 2021 •

edited

Loading

cstjean commented Jan 24, 2021 •

edited

Loading

cstjean commented Jan 24, 2021

willow-ahrens commented Jan 24, 2021

cstjean commented Jan 24, 2021

Remove eval, memoize methodwise, traits, callables, typed caches #70

Remove eval, memoize methodwise, traits, callables, typed caches #70

Conversation

willow-ahrens commented Jan 5, 2021 • edited Loading

codecov-io commented Jan 5, 2021 • edited Loading

Codecov Report

cstjean Jan 5, 2021

Choose a reason for hiding this comment

cstjean commented Jan 5, 2021

willow-ahrens commented Jan 5, 2021 • edited Loading

willow-ahrens commented Jan 24, 2021 • edited Loading

willow-ahrens commented Jan 24, 2021 • edited Loading

cstjean commented Jan 24, 2021 • edited Loading

cstjean commented Jan 24, 2021

willow-ahrens commented Jan 24, 2021

cstjean commented Jan 24, 2021

willow-ahrens commented Jan 5, 2021 •

edited

Loading

codecov-io commented Jan 5, 2021 •

edited

Loading

willow-ahrens commented Jan 5, 2021 •

edited

Loading

willow-ahrens commented Jan 24, 2021 •

edited

Loading

willow-ahrens commented Jan 24, 2021 •

edited

Loading

cstjean commented Jan 24, 2021 •

edited

Loading