Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature suggestion] support for ellipsis in unpack #253

Open
alisterburt opened this issue Apr 21, 2023 · 3 comments
Open

[Feature suggestion] support for ellipsis in unpack #253

alisterburt opened this issue Apr 21, 2023 · 3 comments

Comments

@alisterburt
Copy link

Hi @arogozhnikov - thank you so much for einops, it has been really transformative.

I am trying to write some rank-polymorphic library code for working with nD image data. I am finding that sometimes I don't know the dimensionality of the data upfront and this can make using einops unpack a little tricky.

  1. Try to collect use-cases

below is an illustrated example - I would like to be able to shift grids of arbitrary dimensionality by nD arrays of 'center points'

import numpy as np
import einops

grid = np.random.random((4, 4, 4, 3))  # (d, h, w) grid of 3D coordinates
centers = np.random.random((2, 2, 3))  # (2, 2) grid of 3D coordinates

grid, ps_grid = einops.pack([grid], pattern='* coords')
centers, ps_centers = einops.pack([centers], pattern='* coords')
centers = einops.rearrange(centers, f"b coords -> b 1 coords")
grid = grid - centers
[grid] = einops.unpack(grid, packed_shapes=ps_centers, pattern='* b coords')
# [grid] = einops.unpack(grid, packed_shapes=ps_grid, pattern='... * coords')
[grid] = einops.unpack(grid, packed_shapes=ps_grid, pattern='center_h center_w * coords') # need to know number of dims
>>> grid.shape
(2, 2, 4, 4, 4, 3)

I can dynamically generate labels for the new dimensions here easily but feel that the ellipsis would be more natural.

  1. Integrity - does it interplay well with existing operations and notation in einops?
    The ellipsis is supported in einops.rearrange() but I know it is not encouraged.

  2. Readability
    I find the solution with dynamically generated labels below less readable than the proposed ellipsis.

import numpy as np
import einops

grid = np.random.random((4, 4, 4, 3))
centers = np.random.random((2, 2, 3))

grid, ps_grid = einops.pack([grid], pattern='* coords')
centers, ps_centers = einops.pack([centers], pattern='* coords')
centers = einops.rearrange(centers, f"b coords -> b 1 coords")
grid = grid - centers
[grid] = einops.unpack(grid, packed_shapes=ps_centers, pattern='* b coords')
centers_ndim = len(ps_centers[0])
unique_characters = 'abcdefghijk'
axis_labels = ' '.join(unique_characters[:centers_ndim])
[grid] = einops.unpack(grid, packed_shapes=ps_grid, pattern=f'{axis_labels} * coords')
print(grid.shape)
@arogozhnikov
Copy link
Owner

arogozhnikov commented Apr 23, 2023

Hi @alisterburt, happy to hear that einops is helpful.

Thanks for making detailed analysis of your proposal.

There are a couple of consideration why ... was not included in unpack:

  • trying to keep consistency between pack and unpack, in pack * is already rank-polymorphic, and thus ellipsis can't be added
  • pattern '... * coords' is not that obvious (unless you have some practice with unpacking).

Neither is a strong blocker, but still are important to consider.

Regarding your particular case, I find the following snippet is more readable (and maybe it isn't, if you just handle data of arbitrary dimensionality).

pattern = {
  3: 't * c',
  4: 'h w * c',
  5: 'h w d * c',
}[grid.ndim]

[grid] = einops.unpack(grid, packed_shapes=ps_grid, pattern=pattern)

In any case, this is implementable if more people will run into this (and if you do, please describe your usecase in this issue!).

@alisterburt
Copy link
Author

Thanks for considering and providing a more readable substitute, let's see if anyone else would like this

@RuiWang1998
Copy link

Hi,

I'd like to say this is convenient to us when we try to refactor projects that look like OpenFold, where we may have different number of batched dimensions. For example, paired representation on attention, where instead of attention on sequence of shape id, where i is the node and d is the dimension, we would have ijd, where ij is the edge. These things (and some more cases in addition to edge-based attention) came up in a lot of places in protein structure codes and would love it if you could support this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants