Add simple broadcastability shape constraints to `TensorType` #1122

brandonwillard · 2022-08-17T21:39:11Z

We need a way to express broadcastability constraints on the static shape values in TensorType.shape so that we can distinguish between the "no information" and "dimension is not broadcastable" cases, or anything similar.

We could always consider allowing -1 to encode a "not equal to 1" (i.e. not broadcastable) constraint on TensorType.shape values. I believe this has come up before, and it isn't perhaps the most desirable approach, but it should serve to adequately express the old TensorType.broadcastable constraints while also allowing TensorType.shape[d] == None to represent a complete lack of shape information—as it intuitively does.

We would need to refactor our current usage of TensorType.shape to account for this case, but, aside from being a less than exciting task, it shouldn't be particularly difficult.

Making this change would finally allow us to reason more definitively about some ambiguous instances of compile-time broadcasting, and help us remove some of the strong and unreliable assumptions we are currently required to make in our constructors and rewrites.

More specifically, it would help us raise errors when—for example—gradients are constructed, as in #1089. However, for that case specifically, some things still need to be worked out. For instance, it's not entirely clear which default non-broadcastability assumptions/constraints we should make in certain places, or simply whether or not there are even acceptable defaults possible.

The text was updated successfully, but these errors were encountered:

rlouf · 2022-10-19T07:22:56Z

Discussions in #1089 are hard to follow, and I have a naive question: why do we need to add a broadcastability constraints if we can do shape inference? Wouldn't it be enough to have something like np.broadcast_shapes as part of shape inference, in addition to a convention to represent unknown ranks/shapes?

brandonwillard · 2022-10-21T17:23:28Z

Discussions in #1089 are hard to follow, and I have a naive question: why do we need to add a broadcastability constraints if we can do shape inference? Wouldn't it be enough to have something like np.broadcast_shapes as part of shape inference, in addition to a convention to represent unknown ranks/shapes?

Part of the issue is that shape inference (i.e. ShapeFeature and the associated rewrites) only really exists at compile-time (i.e. within our FunctionGraph and Function framework), yet issues like #1089 involve graphs that are constructed in places like Op.make_node, Op.grad, etc., which is something we should probably call "construction-time" (e.g. when a user constructs a graph). In those cases, we have no access to the shape inference framework. The only (shape/broadcast) information we have to work with is at the Type-level (i.e. in the TensorType objects) and that information is almost exclusively provided by and maintained by the Op.make_node implementations.

If we add a constraint to represent partial shape information, then we can continue to propagate and distinguish between the same information provided by TensorType.broadcastable and the "no shape information/None" case in TensorType.shape. Shape inference can also use that same information to its advantage, though, so it's a way to move forward in more than one direction, as well. I would much rather keep all of this in one place (i.e. compile-time), but the design of aesara.grad and its use of Op.grad doesn't allow that at the moment. As has been mentioned before, if we turned gradients into purely symbolic Ops and then expanded them at compile-time, then problems in #1089 would be entirely within compile-time scope.

brandonwillard · 2022-10-21T17:34:36Z

The only (shape/broadcast) information we have to work with is at the Type-level (i.e. in the TensorType objects) and that information is almost exclusively provided by and maintained by the Op.make_node implementations.

By the way, this is also one of the primary reasons for actually supporting the "dynamic" broadcasting (i.e. broadcasting without the "required" static shape information) mentioned in #1089: we can't rely on every Op.make_node to correctly propagate that static shape information. Sometimes an Op won't do that correctly due to it being only partially implemented, sometimes it's due to a lack of information or information-gathering capabilities in the Op.make_node context (e.g. symbolic manipulation—i.e. "complete" shape inference—is needed), and other times it could be due to a definitional inability to make such shape inferences for a specific Op (e.g. we simply don't know how the inputs determine the output shapes).

If we simply say "no, we don't support this" across the board, then broadcasting will break in graphs containing such Ops, and I think we would rather have those graphs work whenever possible. Perhaps we'll need to say, "no, we don't support gradients" in these cases, but we definitely shouldn't say that we won't support anything when we absolutely can. This is the basic rational behind #928 and similar changes.

brandonwillard added bug Something isn't working enhancement New feature or request help wanted Extra attention is needed important graph rewriting refactor This issue involves refactoring request discussion shape inference labels Aug 17, 2022

brandonwillard mentioned this issue Aug 17, 2022

Performance regression with at.grad and broadcasting #1119

Open

This was referenced Oct 17, 2022

Compute dimension sums in Elemwise.grad at run-time #1260

Draft

Wrong gradients when inputs are dynamically broadcasted #1089

Open

brandonwillard changed the title ~~Allow specification of dimensions with shape greater than one at the graph and/or Type levels~~ Add a broadcastability shape constraints to TensorType Oct 19, 2022

brandonwillard changed the title ~~Add a broadcastability shape constraints to TensorType~~ Add simple broadcastability shape constraints to TensorType Oct 19, 2022

brandonwillard linked a pull request Oct 31, 2022 that will close this issue

Use a -1 non-broadcastable constraint encoding for static shapes #1280

Draft

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add simple broadcastability shape constraints to `TensorType` #1122

Add simple broadcastability shape constraints to `TensorType` #1122

brandonwillard commented Aug 17, 2022 •

edited

rlouf commented Oct 19, 2022

brandonwillard commented Oct 21, 2022 •

edited

brandonwillard commented Oct 21, 2022 •

edited

Add simple broadcastability shape constraints to TensorType #1122

Add simple broadcastability shape constraints to TensorType #1122

Comments

brandonwillard commented Aug 17, 2022 • edited

rlouf commented Oct 19, 2022

brandonwillard commented Oct 21, 2022 • edited

brandonwillard commented Oct 21, 2022 • edited

Add simple broadcastability shape constraints to `TensorType` #1122

Add simple broadcastability shape constraints to `TensorType` #1122

brandonwillard commented Aug 17, 2022 •

edited

brandonwillard commented Oct 21, 2022 •

edited

brandonwillard commented Oct 21, 2022 •

edited