Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add simple broadcastability shape constraints to TensorType #1122

Open
brandonwillard opened this issue Aug 17, 2022 Discussed in #1104 · 3 comments · May be fixed by #1280
Open

Add simple broadcastability shape constraints to TensorType #1122

brandonwillard opened this issue Aug 17, 2022 Discussed in #1104 · 3 comments · May be fixed by #1280
Labels
bug Something isn't working enhancement New feature or request graph rewriting help wanted Extra attention is needed important refactor This issue involves refactoring request discussion shape inference

Comments

@brandonwillard
Copy link
Member

brandonwillard commented Aug 17, 2022

We need a way to express broadcastability constraints on the static shape values in TensorType.shape so that we can distinguish between the "no information" and "dimension is not broadcastable" cases, or anything similar.

We could always consider allowing -1 to encode a "not equal to 1" (i.e. not broadcastable) constraint on TensorType.shape values. I believe this has come up before, and it isn't perhaps the most desirable approach, but it should serve to adequately express the old TensorType.broadcastable constraints while also allowing TensorType.shape[d] == None to represent a complete lack of shape information—as it intuitively does.

We would need to refactor our current usage of TensorType.shape to account for this case, but, aside from being a less than exciting task, it shouldn't be particularly difficult.

Making this change would finally allow us to reason more definitively about some ambiguous instances of compile-time broadcasting, and help us remove some of the strong and unreliable assumptions we are currently required to make in our constructors and rewrites.

More specifically, it would help us raise errors when—for example—gradients are constructed, as in #1089. However, for that case specifically, some things still need to be worked out. For instance, it's not entirely clear which default non-broadcastability assumptions/constraints we should make in certain places, or simply whether or not there are even acceptable defaults possible.

@brandonwillard brandonwillard added bug Something isn't working enhancement New feature or request help wanted Extra attention is needed important graph rewriting refactor This issue involves refactoring request discussion shape inference labels Aug 17, 2022
@brandonwillard brandonwillard changed the title Allow specification of dimensions with shape greater than one at the graph and/or Type levels Add a broadcastability shape constraints to TensorType Oct 19, 2022
@brandonwillard brandonwillard changed the title Add a broadcastability shape constraints to TensorType Add simple broadcastability shape constraints to TensorType Oct 19, 2022
@rlouf
Copy link
Member

rlouf commented Oct 19, 2022

Discussions in #1089 are hard to follow, and I have a naive question: why do we need to add a broadcastability constraints if we can do shape inference? Wouldn't it be enough to have something like np.broadcast_shapes as part of shape inference, in addition to a convention to represent unknown ranks/shapes?

@brandonwillard
Copy link
Member Author

brandonwillard commented Oct 21, 2022

Discussions in #1089 are hard to follow, and I have a naive question: why do we need to add a broadcastability constraints if we can do shape inference? Wouldn't it be enough to have something like np.broadcast_shapes as part of shape inference, in addition to a convention to represent unknown ranks/shapes?

Part of the issue is that shape inference (i.e. ShapeFeature and the associated rewrites) only really exists at compile-time (i.e. within our FunctionGraph and Function framework), yet issues like #1089 involve graphs that are constructed in places like Op.make_node, Op.grad, etc., which is something we should probably call "construction-time" (e.g. when a user constructs a graph). In those cases, we have no access to the shape inference framework. The only (shape/broadcast) information we have to work with is at the Type-level (i.e. in the TensorType objects) and that information is almost exclusively provided by and maintained by the Op.make_node implementations.

If we add a constraint to represent partial shape information, then we can continue to propagate and distinguish between the same information provided by TensorType.broadcastable and the "no shape information/None" case in TensorType.shape. Shape inference can also use that same information to its advantage, though, so it's a way to move forward in more than one direction, as well. I would much rather keep all of this in one place (i.e. compile-time), but the design of aesara.grad and its use of Op.grad doesn't allow that at the moment. As has been mentioned before, if we turned gradients into purely symbolic Ops and then expanded them at compile-time, then problems in #1089 would be entirely within compile-time scope.

@brandonwillard
Copy link
Member Author

brandonwillard commented Oct 21, 2022

The only (shape/broadcast) information we have to work with is at the Type-level (i.e. in the TensorType objects) and that information is almost exclusively provided by and maintained by the Op.make_node implementations.

By the way, this is also one of the primary reasons for actually supporting the "dynamic" broadcasting (i.e. broadcasting without the "required" static shape information) mentioned in #1089: we can't rely on every Op.make_node to correctly propagate that static shape information. Sometimes an Op won't do that correctly due to it being only partially implemented, sometimes it's due to a lack of information or information-gathering capabilities in the Op.make_node context (e.g. symbolic manipulation—i.e. "complete" shape inference—is needed), and other times it could be due to a definitional inability to make such shape inferences for a specific Op (e.g. we simply don't know how the inputs determine the output shapes).

If we simply say "no, we don't support this" across the board, then broadcasting will break in graphs containing such Ops, and I think we would rather have those graphs work whenever possible. Perhaps we'll need to say, "no, we don't support gradients" in these cases, but we definitely shouldn't say that we won't support anything when we absolutely can. This is the basic rational behind #928 and similar changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request graph rewriting help wanted Extra attention is needed important refactor This issue involves refactoring request discussion shape inference
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants