Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic einops function which infers which operation to call #84

Open
MilesCranmer opened this issue Oct 27, 2020 · 18 comments
Open

Generic einops function which infers which operation to call #84

MilesCranmer opened this issue Oct 27, 2020 · 18 comments

Comments

@MilesCranmer
Copy link
Contributor

I noticed that rearrange, repeat, and reduce can all be inferred based on the pattern alone. Therefore a single generic function which makes calls to each is possible. I think this would be very nice for power users! For example,

einops.einop(tensor, pattern, reduction=None, **axes_lengths)

Here is how each operation can be inferred:

  • Same index names on each side? Then call rearrange.
    • e.g., i j k -> k j i, or time (i j) -> i j time are both recognized as patterns for rearrange.
  • An index given on the left side is missing on the right side? Then call reduce.
    • e.g., i j -> i, so obviously a reduction. Similarly, (h1 h2) (i j) -> h1 i.
  • An index was introduced on the right side, but was not given on the left side? Then call repeat.
    • e.g., i -> i j, so obviously a repeat. Or time i k -> time i h k

An error is raised for any of the following situations:

  • A rearrange or repeat is inferred, but a reduction operation is given
  • A reduction is inferred, but a reduction operation is missing
  • A repeat is inferred, but no values are given for axes_length

These would prevent unintended behavior. Errors could also be raised for when a user gives a value for axes_lengths, but does not have that index inside the equation.

I would be happy to implement this (and also #83). Let me know if interested.

Also would be nice for if #73 is included as well. Then this operation could be einop(x, [y, ], pattern, ...), with the einsum function inferred by the presence of two input tensors.

@shoyer
Copy link

shoyer commented Nov 21, 2020

Yes, this is possible -- but should it be done? What are the actual use cases for this "all in one" function?

I really like descriptive function names like rearrange, repeat and reduce. Adding einop would encourage people to write less readable code.

@arogozhnikov
Copy link
Owner

@MilesCranmer interesting suggestion, but I think in the balance of verbosity and shortness of code einops should be on verbosity side.

Thinking of user base, right now most people when they see einops code, they see it for the first time. Let's try to make that experience smooth.

Explicit wordy clues what operation actually does are very helpful (e.g. repeat tells what's happening, so if I had no idea about einops - that's the minimal thing I will certainly understand in code).

Until einops is wide-spread enough, I'd hesitate to introduce shortcuts. Open to hear other opinions

(PR is well-written BTW, didn't mean to discourage you from contributing)

@MilesCranmer
Copy link
Contributor Author

Thanks for the comments!

@shoyer I first note that this function division isn't as clear as you might think. For example, I can do reduce(x, 'i j k -> k i', reduction='mean'), which is a rearrange and reduce operation. But this rearrange operation isn't made explicit from the function call - one needs to pay attention to the pattern. So in actuality I think one should pay attention to the pattern rather than the function call when reading einops snippets. Because of this mixing of functionality, I would actually argue that a single einop operation is more explicit than a different function for each, because it encourages one to pay special attention to the pattern.

However, the above argument is more subjective, and more importantly I note that this added function doesn't replace anything - all those functions are still there and their use can be encouraged for first-time users. The generic operation is just there for power users who would rather read the pattern than the function name, such as myself. And if one does have the opinion that three separate functions are more readable to external readers, then they are free to continue using those functions in open-sourced code.

Third, I note that einsum can do many different things: reduction, products, transposition. Yet there are not three separate functions for einsum.

Interested to hear thoughts!
Thanks again,
Miles

@MilesCranmer
Copy link
Contributor Author

MilesCranmer commented Nov 21, 2020

Also, one tiny additional comment: my brain (and vim's syntax highlighter) confuse reduce and functools.reduce (reduce in python 2.7). Yet these have different function prototypes: einops.reduce is (data, function*) and functools.reduce is (function, data), as well as different purposes. I suppose it is too late now, but I would argue einops.einsum might be a better choice here, and it would allow for easier adoption of #73 if eventually integrated, and helps differentiate the two functions.

However, seeing einops.einop when doing a reduction operation seems to prevent this segfault in my brain :)

But in general I am fairly new to API crafting, and I will trust your final opinions on this!

*The einops pattern could be considered as the function in this case.

@YannDubs
Copy link

YannDubs commented Nov 24, 2020

I think having a single operation would be a nice addition actually because (as Miles said):

  • the separation between the three functions is not always clear
  • reduce and repeat are actually in the standard library (under functools and itertools respectively), which can confuse people. I actually had a few cryptic errors because I was importing both reduce in some files and everything was breaking (the errors were cryptic because both reduce can actually be used with similar prototypes).

@MilesCranmer
Copy link
Contributor Author

^Completely agree! Seeing einop(...) makes it immediately obvious when skimming code, but repeat and reduce less so.

Re: functional division, here's some examples of where the function choice is not completely clear, but the pattern is very obvious:

  • I want to add an extra dimension at the start. Is this rearrange or repeat?
    • Okay, after thinking for a second, I decide it's a rearrange. But there's that "extra cycle" it takes me to figure it out when coding (maybe just because I'm still relatively new to this package).
    • However, the pattern is very obvious and intuitive! It's '... -> () ...', which I don't even have to think about.
  • I want to transpose and reduce a 3D tensor. Is this a rearrange, reduce, or both?
    • Okay, it turns out I can do this with a single reduce.
    • However, I could have just written out 'time sample batch -> batch time', which is IMO much more explicit than having a reader focus on reduce.

I also think that to a beginning user, "reduce" is a bit of an ambiguous term: some might say that the operation 'width height -> (width height)' is a "reduction". However, looking at the index pattern itself makes it very obvious what is going on, and one doesn't need to worry about English ambiguities.

@arogozhnikov
Copy link
Owner

My comments:

  • einsum can do many different things

... which does not help much with its adoption. It's around since (sic!) 2011, but still not as widely used as it should be.
You may note that it is immediately picked up by physics-connected people (note at least 3 participants in this thread have degree in physics, similar in others).
With all it's simplicity and friendliness, einsum still needs a preliminary introduction.
Einops is einsum squared - so that's probably a good argument to not go the same way with steep learning curve.

It is wrong to think about 'API for new users' and 'API for professionals'.

  • Adding a dummy dimension - should I use repeat vs rearrange? This is a good case when unifying is beneficial.
    Similarly reducing dummy dimension can be rearrange or reduce.

  • Reduce demands name of operation, you won't be confused by its name alone.
    Speaking of name, perfect implementation would have no reduce in the name: x.mean('time token -> token'); This requires deep integration with frameworks.

Name conflicts:

  • reduce and repeat are not python reserved words - thus shouldn't be highlighted.
    Also those are rarely used with einops together, so not a big issue. import einops as E if still in trouble

  • Einsum name conflicts with einsum from tensor frameworks (that's where einops should minimize conflicts).
    User may expect exactly the same interface (and it is the main reason why it's still not introduced in einops).

Some other considerations:

  • current splitting into three operations allow having axes only on lhs (reduce) or rhs (repeat). This restrictions are natural to operation.
    One common function would drive to predictable desire to mix everything together in a single op (which would allow writing hardly readable code if supported).

@MilesCranmer
Copy link
Contributor Author

Sorry for the late reply.

No problem at all @arogozhnikov, those are good arguments and I agree there are pros and cons to introducing this feature. Happy to submit an updated PR in the future should this come up again, as I'm still very interested in such a function. Feel free to close until then.

Cheers,
Miles

@cgarciae
Copy link
Contributor

My 2 cents:

In my initial experience with einops the existence of multiple functions was actually confusing me instead of being useful. Like in einsum, I expected the einops "language" to just let me express how the data was now and what I wanted it look after and it should just do it, I believe the language is fairly intuitive.

As @MilesCranmer pointed out, the current division can sometimes be confusing, for example say you started with code like this:

x = repeat(x, "h w c -> batch c h w", batch=32)

Here you are tiling in the batch dimension but also transposing the channel dimension. Now imagine that for some reason you don't want to do the tiling in the batch dimension anymore so you just delete it:

x = repeat(x, "h w c -> c h w")

This look good to the eye but its wrong because repeat doesn't support (or rather defends against) the base-case of having cero repeat dimensions. Since you still want the transposition of the channel dimension c previously provided by repeat you are forced to switch to rearrange:

x = rearrange(x, "h w c -> c h w")

@MilesCranmer
Copy link
Contributor Author

Hi @arogozhnikov et al,

Just wondering if any of you have any updated thoughts on this? I'm still eager to have this functionality available. I've introduced 10ish people to einops over the past year and every time I do, it feels like the missing piece for having einops be a 100% intuitive package is a single function like einsum. Each time I demo the einops API, I have to explain that all functions pretty much work the same, the different function names are effectively just a way of documenting the code. Being explicit is great, but when so much functionality overlaps between the functions, to me it sticks out like a code smell to have different function names altogether. But of course, this is all subjective.

Anyways, in recent days I've been reading through the jax source for einsum and then remembered this issue and PR; it seems the method used for inferring whether to do reductions is inferred in a similar way by simply checking uniqueness of indices; e.g., https://github.com/google/jax/blob/62230f65256728f580c5ecfa8867cac69a681cb1/jax/_src/lax/parallel.py#L408.

Cheers,
Miles

@cgarciae
Copy link
Contributor

I am still very interested in this! Comes to mind everytime I use the library.

@arogozhnikov
Copy link
Owner

At the risk of looking stubborn:
I am opposed to unification. We can continue discussion, no problems.

My previous points are still there (verbosity), and the main target auditory still consists of people with numpy/matlab experience and style of thinking. I see tremendous progress in this direction over the past year, but we're not there to say 'einops is default knowledge'. When thinking in previous paradigm, you mentally first decide operation, not result.

Unwrapping my previous argument about mixing reduce/repeat:

einop('i j toekn -> j token', 'min')

Fragile and hard to parse. But if you forbid this - consistency is lost and there are 'right' and 'wrong' patterns with no trivial rule to tell one from the other. Complaining meaningfully about patterns also becomes hard.

At the same time rearrange is the most used one, which is super-safe even for newcomers.

Other related thoughts:

  • now it's probably missed opportunity, I've recently realized that visual marking of disappearing and new axes can be helpful. E.g. if one knew that italics are only on one side, parsing rules on the fly would be very simple (would be no-brainer if it's adding new axes or removing or just doing rearrangement of elements)

    einop('b h w -> b 𝑐 h w', c=2)
    

    Another caveat: there is no reliable and simple visual modifier but capitalization

  • In the long run, string name for reduction (as right now in reduce) shouldn't be used. Proper path is x.min('i j -> j').
    I believe we'll get there sooner or later

I've introduced 10ish people to einops

👍 thank you @MilesCranmer !

@MilesCranmer
Copy link
Contributor Author

Okay, no problem, just wanted to ping the thread! I should mention I'd even be happy with a einops._einop which would discourage most people from using it as a default option.

But if you forbid this - consistency is lost and there are 'right' and 'wrong' patterns with no trivial rule to tell one from the other.

I think these sorts of requirements should be performed by a linter, rather than at the API level. If I want, I can define all my variables with single character names - python will still run (as I think it should!) - but my linter would complain. I think the same goes for good patterns and good function names - personal choices shouldn't be enforced by the API, but rather by a python linter, or by someone reviewing a pull request.

I do like the sound of using capital names for new axes. I think I'll do that from now on actually! (But I agree, it shouldn't be enforced by any API.)

@MilesCranmer
Copy link
Contributor Author

Pinging this thread again @arogozhnikov to see if your views have changed with increasing use of einops. It sounds like a lot of people would be interested in this.

Also - resolved merge conflicts in the PR.

Cheers,
Miles

@arogozhnikov
Copy link
Owner

Nobody mentioned it, but just in case einop is available in separate repo:
https://github.com/cgarciae/einop

@MilesCranmer
Copy link
Contributor Author

It’s definitely nice to have @cgarciae’s package but fragmentation of packages into smaller libraries is perhaps not great, esp. for downstream maintainability.

A single einop seems like a much-desired feature though, apparently this is the most ❤️ issue on the repo (for whatever you think that’s worth): https://github.com/arogozhnikov/einops/issues?q=is%3Aissue+is%3Aopen+sort%3Areactions-heart-desc. Would be fantastic to have it merged at some point!

@arogozhnikov
Copy link
Owner

  1. come on, einop is minimal and piggybacks on einops
  2. I'd happily defer dealing with community and explaining issues that I've raised above to someone interested in pushing this feature. I expect that wider community (think people who just started DL) will have a number of confusions you'll have to explain.

To both points, separate package is the right solution.

@thomasahle
Copy link

Does this issue also include adding this functionality into einsum? I would love to be able to do stuff like einsum(x, y, '(i j), i j -> i', i=5) without having to run rearrange on x first. In general I now have to use three rearrange's for every einsum, which leads to a lot of copy and paste.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants