[inductor] Avoid bool being upcast to int #109913

peterbell10 · 2023-09-22T21:12:49Z

Stack from ghstack (oldest at bottom):

-> [inductor] Avoid bool being upcast to int #109913

Currently the inductor code for x.any(-1) does a this strange dance:

tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask)
tmp1 = tmp0.to(tl.int64)
tmp2 = (tmp1 != 0)

This happens because register_lowering is doing type promotion with the
dimension argument, and so promotes to int64 which we then cast back to bool.
A better fix would be to fix register_lowering but for now I just remove
the unnecessary type promotion from aten.any.

In the current code we also see:

     tmp5 = tl.where(rmask & xmask, tmp3, 0)

which promotes the boolean value to int since 0 is an int32 in triton.
This fixes it to generate a boolean constant instead.

Finally there is also a triton bug where the tl.load itself upcasts to
tl.int8. I fix this by adding an explicit cast to tl.int1. The final
kernel code looks like:

tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask).to(tl.int1)
tmp1 = tl.broadcast_to(tmp0, [XBLOCK, RBLOCK])
tmp3 = tl.full([1, 1], 0, tl.int1)
tmp4 = tl.where(rmask & xmask, tmp1, tmp3)
tmp5 = triton_helpers.any(tmp4, 1)[:, None]

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

Currently the inductor code for `x.any(-1)` does a this strange dance: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask) tmp1 = tmp0.to(tl.int64) tmp2 = (tmp1 != 0) ``` This happens because `register_lowering` is doing type promotion with the dimension argument, and so promotes to `int64` which we then cast back to bool. A better fix would be to fix `register_lowering` but for now I just remove the unnecessary type promotion from `aten.any`. In the current code we also see: ```python tmp5 = tl.where(rmask & xmask, tmp3, 0) ``` which promotes the boolean value to int since `0` is an int32 in triton. This fixes it to generate a boolean constant instead. Finally there is also a triton bug where the `tl.load` itself upcasts to `tl.int8`. I fix this by adding an explicit cast to `tl.int1`. The final kernel code looks like: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask).to(tl.int1) tmp1 = tl.broadcast_to(tmp0, [XBLOCK, RBLOCK]) tmp3 = tl.full([1, 1], 0, tl.int1) tmp4 = tl.where(rmask & xmask, tmp1, tmp3) tmp5 = triton_helpers.any(tmp4, 1)[:, None] ``` [ghstack-poisoned]

pytorch-bot · 2023-09-22T21:12:51Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/109913

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 6 Unrelated Failures

As of commit 93fec8e with merge base 87ea6fb ():

NEW FAILURES - The following jobs have failed:

Check mergeability and dependencies for ghstack prs / pr-dependencies-check / check (gh)
Mergeability Error: PR #109913 is NOT mergeable into main / revertable (if it is already merged) due to 1197 conflicting commits which are:
linux-binary-manywheel / manywheel-py3_8-cuda12_1-build / build (gh)
../aten/src/ATen/native/sparse/cuda/cuSPARSELtOps.cpp:151:24: error: ‘CUSPARSE_COMPUTE_TF32’ was not declared in this scope; did you mean ‘CUSPARSE_COMPUTE_32F’?
trunk / linux-focal-rocm5.7-py3.8 / test (default, 1, 1, linux.rocm.gpu) (gh)
Process completed with exit code 1.

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / linux-focal-py3.11-clang10 / test (crossref, 1, 2, linux.2xlarge) (gh) (disabled by #113489 but the issue was closed recently and a rebase is needed to make it pass)
test_proxy_tensor.py::TestProxyTensorOpInfoCPU::test_make_fx_symbolic_exhaustive_round_decimals_3_cpu_float32
pull / linux-focal-py3.11-clang10 / test (dynamo, 2, 7, linux.2xlarge) (gh) (similar failure)
test_autocast.py::TestAutocastCPU::test_autocast_methods_expect_builtin_promote
pull / linux-focal-py3.8-clang10 / test (crossref, 1, 2, linux.2xlarge) (gh) (disabled by #113489 but the issue was closed recently and a rebase is needed to make it pass)
test_proxy_tensor.py::TestProxyTensorOpInfoCPU::test_make_fx_symbolic_exhaustive_round_decimals_3_cpu_float32
pull / linux-jammy-py3.10-clang15-asan / test (default, 5, 6, linux.4xlarge) (gh) (disabled by #113489 but the issue was closed recently and a rebase is needed to make it pass)
test_proxy_tensor.py::TestProxyTensorOpInfoCPU::test_make_fx_symbolic_exhaustive_round_decimals_3_cpu_float32
trunk / macos-12-py3-arm64 / build (gh) (detected as infra flaky with no runner)
trunk / win-vs2019-cpu-py3 / test (default, 3, 3, windows.4xlarge.nonephemeral) (gh) (disabled by #113489 but the issue was closed recently and a rebase is needed to make it pass)
test_proxy_tensor.py::TestProxyTensorOpInfoCPU::test_make_fx_symbolic_exhaustive_round_decimals_3_cpu_float32

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Currently the inductor code for `x.any(-1)` does a this strange dance: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask) tmp1 = tmp0.to(tl.int64) tmp2 = (tmp1 != 0) ``` This happens because `register_lowering` is doing type promotion with the dimension argument, and so promotes to `int64` which we then cast back to bool. A better fix would be to fix `register_lowering` but for now I just remove the unnecessary type promotion from `aten.any`. In the current code we also see: ```python tmp5 = tl.where(rmask & xmask, tmp3, 0) ``` which promotes the boolean value to int since `0` is an int32 in triton. This fixes it to generate a boolean constant instead. Finally there is also a triton bug where the `tl.load` itself upcasts to `tl.int8`. I fix this by adding an explicit cast to `tl.int1`. The final kernel code looks like: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask).to(tl.int1) tmp1 = tl.broadcast_to(tmp0, [XBLOCK, RBLOCK]) tmp3 = tl.full([1, 1], 0, tl.int1) tmp4 = tl.where(rmask & xmask, tmp1, tmp3) tmp5 = triton_helpers.any(tmp4, 1)[:, None] ``` ghstack-source-id: 4d6db6f9d7bdd302845dd082d3e171c5c60931a4 Pull Request resolved: #109913

Currently the inductor code for `x.any(-1)` does a this strange dance: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask) tmp1 = tmp0.to(tl.int64) tmp2 = (tmp1 != 0) ``` This happens because `register_lowering` is doing type promotion with the dimension argument, and so promotes to `int64` which we then cast back to bool. A better fix would be to fix `register_lowering` but for now I just remove the unnecessary type promotion from `aten.any`. In the current code we also see: ```python tmp5 = tl.where(rmask & xmask, tmp3, 0) ``` which promotes the boolean value to int since `0` is an int32 in triton. This fixes it to generate a boolean constant instead. Finally there is also a triton bug where the `tl.load` itself upcasts to `tl.int8`. I fix this by adding an explicit cast to `tl.int1`. The final kernel code looks like: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask).to(tl.int1) tmp1 = tl.broadcast_to(tmp0, [XBLOCK, RBLOCK]) tmp3 = tl.full([1, 1], 0, tl.int1) tmp4 = tl.where(rmask & xmask, tmp1, tmp3) tmp5 = triton_helpers.any(tmp4, 1)[:, None] ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

Currently the inductor code for `x.any(-1)` does a this strange dance: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask) tmp1 = tmp0.to(tl.int64) tmp2 = (tmp1 != 0) ``` This happens because `register_lowering` is doing type promotion with the dimension argument, and so promotes to `int64` which we then cast back to bool. A better fix would be to fix `register_lowering` but for now I just remove the unnecessary type promotion from `aten.any`. In the current code we also see: ```python tmp5 = tl.where(rmask & xmask, tmp3, 0) ``` which promotes the boolean value to int since `0` is an int32 in triton. This fixes it to generate a boolean constant instead. Finally there is also a triton bug where the `tl.load` itself upcasts to `tl.int8`. I fix this by adding an explicit cast to `tl.int1`. The final kernel code looks like: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask).to(tl.int1) tmp1 = tl.broadcast_to(tmp0, [XBLOCK, RBLOCK]) tmp3 = tl.full([1, 1], 0, tl.int1) tmp4 = tl.where(rmask & xmask, tmp1, tmp3) tmp5 = triton_helpers.any(tmp4, 1)[:, None] ``` ghstack-source-id: 5c1fb44d998fa5b13ff50d9fbdf9a177baf37bce Pull Request resolved: #109913

Currently the inductor code for `x.any(-1)` does a this strange dance: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask) tmp1 = tmp0.to(tl.int64) tmp2 = (tmp1 != 0) ``` This happens because `register_lowering` is doing type promotion with the dimension argument, and so promotes to `int64` which we then cast back to bool. A better fix would be to fix `register_lowering` but for now I just remove the unnecessary type promotion from `aten.any`. In the current code we also see: ```python tmp5 = tl.where(rmask & xmask, tmp3, 0) ``` which promotes the boolean value to int since `0` is an int32 in triton. This fixes it to generate a boolean constant instead. Finally there is also a triton bug where the `tl.load` itself upcasts to `tl.int8`. I fix this by adding an explicit cast to `tl.int1`. The final kernel code looks like: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask).to(tl.int1) tmp1 = tl.broadcast_to(tmp0, [XBLOCK, RBLOCK]) tmp3 = tl.full([1, 1], 0, tl.int1) tmp4 = tl.where(rmask & xmask, tmp1, tmp3) tmp5 = triton_helpers.any(tmp4, 1)[:, None] ``` ghstack-source-id: 5c1fb44d998fa5b13ff50d9fbdf9a177baf37bce Pull Request resolved: pytorch#109913

Currently the inductor code for `x.any(-1)` does a this strange dance: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask) tmp1 = tmp0.to(tl.int64) tmp2 = (tmp1 != 0) ``` This happens because `register_lowering` is doing type promotion with the dimension argument, and so promotes to `int64` which we then cast back to bool. A better fix would be to fix `register_lowering` but for now I just remove the unnecessary type promotion from `aten.any`. In the current code we also see: ```python tmp5 = tl.where(rmask & xmask, tmp3, 0) ``` which promotes the boolean value to int since `0` is an int32 in triton. This fixes it to generate a boolean constant instead. Finally there is also a triton bug where the `tl.load` itself upcasts to `tl.int8`. I fix this by adding an explicit cast to `tl.int1`. The final kernel code looks like: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask).to(tl.int1) tmp1 = tl.broadcast_to(tmp0, [XBLOCK, RBLOCK]) tmp3 = tl.full([1, 1], 0, tl.int1) tmp4 = tl.where(rmask & xmask, tmp1, tmp3) tmp5 = triton_helpers.any(tmp4, 1)[:, None] ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

Currently the inductor code for `x.any(-1)` does a this strange dance: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask) tmp1 = tmp0.to(tl.int64) tmp2 = (tmp1 != 0) ``` This happens because `register_lowering` is doing type promotion with the dimension argument, and so promotes to `int64` which we then cast back to bool. A better fix would be to fix `register_lowering` but for now I just remove the unnecessary type promotion from `aten.any`. In the current code we also see: ```python tmp5 = tl.where(rmask & xmask, tmp3, 0) ``` which promotes the boolean value to int since `0` is an int32 in triton. This fixes it to generate a boolean constant instead. Finally there is also a triton bug where the `tl.load` itself upcasts to `tl.int8`. I fix this by adding an explicit cast to `tl.int1`. The final kernel code looks like: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask).to(tl.int1) tmp1 = tl.broadcast_to(tmp0, [XBLOCK, RBLOCK]) tmp3 = tl.full([1, 1], 0, tl.int1) tmp4 = tl.where(rmask & xmask, tmp1, tmp3) tmp5 = triton_helpers.any(tmp4, 1)[:, None] ``` ghstack-source-id: fc2536d67024ce4b57c23f0b309e793225aff0cd Pull Request resolved: #109913

lezcano · 2023-09-26T07:11:15Z

torch/_inductor/codegen/triton.py

+ ttype = triton_compute_type(src_dtype)
+ other = self.cse.generate(
+ self.compute,
+ f"tl.full({[1] * self.triton_tensor_ndim()}, {default}, {ttype})",


tl.where does not broadcast on the number of dims?

This generates the same shape as ops.constant so there's a chance it gets CSE'd and cleans up the code a bit. No performance or correctness issues.

lezcano · 2023-09-26T07:13:49Z

@pytorchbot merge

pytorchmergebot · 2023-09-26T07:15:42Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

Currently the inductor code for `x.any(-1)` does a this strange dance: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask) tmp1 = tmp0.to(tl.int64) tmp2 = (tmp1 != 0) ``` This happens because `register_lowering` is doing type promotion with the dimension argument, and so promotes to `int64` which we then cast back to bool. A better fix would be to fix `register_lowering` but for now I just remove the unnecessary type promotion from `aten.any`. In the current code we also see: ```python tmp5 = tl.where(rmask & xmask, tmp3, 0) ``` which promotes the boolean value to int since `0` is an int32 in triton. This fixes it to generate a boolean constant instead. Finally there is also a triton bug where the `tl.load` itself upcasts to `tl.int8`. I fix this by adding an explicit cast to `tl.int1`. The final kernel code looks like: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask).to(tl.int1) tmp1 = tl.broadcast_to(tmp0, [XBLOCK, RBLOCK]) tmp3 = tl.full([1, 1], 0, tl.int1) tmp4 = tl.where(rmask & xmask, tmp1, tmp3) tmp5 = triton_helpers.any(tmp4, 1)[:, None] ``` ghstack-source-id: fc2536d67024ce4b57c23f0b309e793225aff0cd Pull Request resolved: pytorch#109913

Currently the inductor code for `x.any(-1)` does a this strange dance: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask) tmp1 = tmp0.to(tl.int64) tmp2 = (tmp1 != 0) ``` This happens because `register_lowering` is doing type promotion with the dimension argument, and so promotes to `int64` which we then cast back to bool. A better fix would be to fix `register_lowering` but for now I just remove the unnecessary type promotion from `aten.any`. In the current code we also see: ```python tmp5 = tl.where(rmask & xmask, tmp3, 0) ``` which promotes the boolean value to int since `0` is an int32 in triton. This fixes it to generate a boolean constant instead. Finally there is also a triton bug where the `tl.load` itself upcasts to `tl.int8`. I fix this by adding an explicit cast to `tl.int1`. The final kernel code looks like: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask).to(tl.int1) tmp1 = tl.broadcast_to(tmp0, [XBLOCK, RBLOCK]) tmp3 = tl.full([1, 1], 0, tl.int1) tmp4 = tl.where(rmask & xmask, tmp1, tmp3) tmp5 = triton_helpers.any(tmp4, 1)[:, None] ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

Currently the inductor code for `x.any(-1)` does a this strange dance: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask) tmp1 = tmp0.to(tl.int64) tmp2 = (tmp1 != 0) ``` This happens because `register_lowering` is doing type promotion with the dimension argument, and so promotes to `int64` which we then cast back to bool. A better fix would be to fix `register_lowering` but for now I just remove the unnecessary type promotion from `aten.any`. In the current code we also see: ```python tmp5 = tl.where(rmask & xmask, tmp3, 0) ``` which promotes the boolean value to int since `0` is an int32 in triton. This fixes it to generate a boolean constant instead. Finally there is also a triton bug where the `tl.load` itself upcasts to `tl.int8`. I fix this by adding an explicit cast to `tl.int1`. The final kernel code looks like: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask).to(tl.int1) tmp1 = tl.broadcast_to(tmp0, [XBLOCK, RBLOCK]) tmp3 = tl.full([1, 1], 0, tl.int1) tmp4 = tl.where(rmask & xmask, tmp1, tmp3) tmp5 = triton_helpers.any(tmp4, 1)[:, None] ``` [ghstack-poisoned]

Currently the inductor code for `x.any(-1)` does a this strange dance: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask) tmp1 = tmp0.to(tl.int64) tmp2 = (tmp1 != 0) ``` This happens because `register_lowering` is doing type promotion with the dimension argument, and so promotes to `int64` which we then cast back to bool. A better fix would be to fix `register_lowering` but for now I just remove the unnecessary type promotion from `aten.any`. In the current code we also see: ```python tmp5 = tl.where(rmask & xmask, tmp3, 0) ``` which promotes the boolean value to int since `0` is an int32 in triton. This fixes it to generate a boolean constant instead. Finally there is also a triton bug where the `tl.load` itself upcasts to `tl.int8`. I fix this by adding an explicit cast to `tl.int1`. The final kernel code looks like: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask).to(tl.int1) tmp1 = tl.broadcast_to(tmp0, [XBLOCK, RBLOCK]) tmp3 = tl.full([1, 1], 0, tl.int1) tmp4 = tl.where(rmask & xmask, tmp1, tmp3) tmp5 = triton_helpers.any(tmp4, 1)[:, None] ``` ghstack-source-id: 482c0387f874327753b885bc1584d34863830b09 Pull Request resolved: pytorch#109913

Currently the inductor code for `x.any(-1)` does a this strange dance: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask) tmp1 = tmp0.to(tl.int64) tmp2 = (tmp1 != 0) ``` This happens because `register_lowering` is doing type promotion with the dimension argument, and so promotes to `int64` which we then cast back to bool. A better fix would be to fix `register_lowering` but for now I just remove the unnecessary type promotion from `aten.any`. In the current code we also see: ```python tmp5 = tl.where(rmask & xmask, tmp3, 0) ``` which promotes the boolean value to int since `0` is an int32 in triton. This fixes it to generate a boolean constant instead. Finally there is also a triton bug where the `tl.load` itself upcasts to `tl.int8`. I fix this by adding an explicit cast to `tl.int1`. The final kernel code looks like: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask).to(tl.int1) tmp1 = tl.broadcast_to(tmp0, [XBLOCK, RBLOCK]) tmp3 = tl.full([1, 1], 0, tl.int1) tmp4 = tl.where(rmask & xmask, tmp1, tmp3) tmp5 = triton_helpers.any(tmp4, 1)[:, None] ``` [ghstack-poisoned]

Currently the inductor code for `x.any(-1)` does a this strange dance: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask) tmp1 = tmp0.to(tl.int64) tmp2 = (tmp1 != 0) ``` This happens because `register_lowering` is doing type promotion with the dimension argument, and so promotes to `int64` which we then cast back to bool. A better fix would be to fix `register_lowering` but for now I just remove the unnecessary type promotion from `aten.any`. In the current code we also see: ```python tmp5 = tl.where(rmask & xmask, tmp3, 0) ``` which promotes the boolean value to int since `0` is an int32 in triton. This fixes it to generate a boolean constant instead. Finally there is also a triton bug where the `tl.load` itself upcasts to `tl.int8`. I fix this by adding an explicit cast to `tl.int1`. The final kernel code looks like: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask).to(tl.int1) tmp1 = tl.broadcast_to(tmp0, [XBLOCK, RBLOCK]) tmp3 = tl.full([1, 1], 0, tl.int1) tmp4 = tl.where(rmask & xmask, tmp1, tmp3) tmp5 = triton_helpers.any(tmp4, 1)[:, None] ``` ghstack-source-id: 9417be1b8bacaed57806506527899f77cd3f078f Pull Request resolved: #109913

Currently the inductor code for `x.any(-1)` does a this strange dance: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask) tmp1 = tmp0.to(tl.int64) tmp2 = (tmp1 != 0) ``` This happens because `register_lowering` is doing type promotion with the dimension argument, and so promotes to `int64` which we then cast back to bool. A better fix would be to fix `register_lowering` but for now I just remove the unnecessary type promotion from `aten.any`. In the current code we also see: ```python tmp5 = tl.where(rmask & xmask, tmp3, 0) ``` which promotes the boolean value to int since `0` is an int32 in triton. This fixes it to generate a boolean constant instead. Finally there is also a triton bug where the `tl.load` itself upcasts to `tl.int8`. I fix this by adding an explicit cast to `tl.int1`. The final kernel code looks like: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask).to(tl.int1) tmp1 = tl.broadcast_to(tmp0, [XBLOCK, RBLOCK]) tmp3 = tl.full([1, 1], 0, tl.int1) tmp4 = tl.where(rmask & xmask, tmp1, tmp3) tmp5 = triton_helpers.any(tmp4, 1)[:, None] ``` [ghstack-poisoned]

peterbell10 · 2023-12-19T14:11:06Z

@pytorchbot merge

pytorchmergebot · 2023-12-19T14:14:26Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorch-bot · 2023-12-20T12:31:38Z

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: argument command: invalid choice: 'causing performance regression in relevant metrics, @malfet I believe you are the correct person to help identify and fix the issues. More details check internal OPS count for ads metricsnin the internal related diff' (choose from 'merge', 'revert', 'rebase', 'label', 'drci', 'close')

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,close} ...

Try @pytorchbot --help for more info.

jeanschmidt · 2023-12-20T12:32:04Z

@pytorchbot revert -m "causing performance regression in relevant metrics, @malfet I believe you are the correct person to help identify and fix the issues. More details check internal OPS count for ads metricsnin the internal related diff" -c nosignal

pytorchmergebot · 2023-12-20T12:33:46Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot · 2023-12-20T12:33:55Z

@peterbell10 your PR has been successfully reverted.

@malfet

This reverts commit 9299869. Reverted #109913 on behalf of https://github.com/jeanschmidt due to causing performance regression in relevant metrics, @malfet I believe you are the correct person to help identify and fix the issues. More details check internal OPS count for ads metricsnin the internal related diff ([comment](#109913 (comment)))

malfet · 2023-12-20T14:55:19Z

@pytorchbot revert -m "causing performance regression in relevant metrics, @malfet I believe you are the correct person to help identify and fix the issues. More details check internal OPS count for ads metricsnin the internal related diff" -c nosignal

Please note, that revert category should be ghfirst

peterbell10 · 2023-12-20T19:19:21Z

@malfet are you able to share any details of the test failure?

Currently the inductor code for `x.any(-1)` does a this strange dance: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask) tmp1 = tmp0.to(tl.int64) tmp2 = (tmp1 != 0) ``` This happens because `register_lowering` is doing type promotion with the dimension argument, and so promotes to `int64` which we then cast back to bool. A better fix would be to fix `register_lowering` but for now I just remove the unnecessary type promotion from `aten.any`. In the current code we also see: ```python tmp5 = tl.where(rmask & xmask, tmp3, 0) ``` which promotes the boolean value to int since `0` is an int32 in triton. This fixes it to generate a boolean constant instead. Finally there is also a triton bug where the `tl.load` itself upcasts to `tl.int8`. I fix this by adding an explicit cast to `tl.int1`. The final kernel code looks like: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask).to(tl.int1) tmp1 = tl.broadcast_to(tmp0, [XBLOCK, RBLOCK]) tmp3 = tl.full([1, 1], 0, tl.int1) tmp4 = tl.where(rmask & xmask, tmp1, tmp3) tmp5 = triton_helpers.any(tmp4, 1)[:, None] ``` Pull Request resolved: pytorch#109913 Approved by: https://github.com/lezcano

@malfet

This reverts commit 9299869. Reverted pytorch#109913 on behalf of https://github.com/jeanschmidt due to causing performance regression in relevant metrics, @malfet I believe you are the correct person to help identify and fix the issues. More details check internal OPS count for ads metricsnin the internal related diff ([comment](pytorch#109913 (comment)))

Currently the inductor code for `x.any(-1)` does a this strange dance: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask) tmp1 = tmp0.to(tl.int64) tmp2 = (tmp1 != 0) ``` This happens because `register_lowering` is doing type promotion with the dimension argument, and so promotes to `int64` which we then cast back to bool. A better fix would be to fix `register_lowering` but for now I just remove the unnecessary type promotion from `aten.any`. In the current code we also see: ```python tmp5 = tl.where(rmask & xmask, tmp3, 0) ``` which promotes the boolean value to int since `0` is an int32 in triton. This fixes it to generate a boolean constant instead. Finally there is also a triton bug where the `tl.load` itself upcasts to `tl.int8`. I fix this by adding an explicit cast to `tl.int1`. The final kernel code looks like: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask).to(tl.int1) tmp1 = tl.broadcast_to(tmp0, [XBLOCK, RBLOCK]) tmp3 = tl.full([1, 1], 0, tl.int1) tmp4 = tl.where(rmask & xmask, tmp1, tmp3) tmp5 = triton_helpers.any(tmp4, 1)[:, None] ``` Pull Request resolved: pytorch#109913 Approved by: https://github.com/lezcano

@malfet

This reverts commit 9299869. Reverted pytorch#109913 on behalf of https://github.com/jeanschmidt due to causing performance regression in relevant metrics, @malfet I believe you are the correct person to help identify and fix the issues. More details check internal OPS count for ads metricsnin the internal related diff ([comment](pytorch#109913 (comment)))

Currently the inductor code for `x.any(-1)` does a this strange dance: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask) tmp1 = tmp0.to(tl.int64) tmp2 = (tmp1 != 0) ``` This happens because `register_lowering` is doing type promotion with the dimension argument, and so promotes to `int64` which we then cast back to bool. A better fix would be to fix `register_lowering` but for now I just remove the unnecessary type promotion from `aten.any`. In the current code we also see: ```python tmp5 = tl.where(rmask & xmask, tmp3, 0) ``` which promotes the boolean value to int since `0` is an int32 in triton. This fixes it to generate a boolean constant instead. Finally there is also a triton bug where the `tl.load` itself upcasts to `tl.int8`. I fix this by adding an explicit cast to `tl.int1`. The final kernel code looks like: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128*x0)), rmask & xmask).to(tl.int1) tmp1 = tl.broadcast_to(tmp0, [XBLOCK, RBLOCK]) tmp3 = tl.full([1, 1], 0, tl.int1) tmp4 = tl.where(rmask & xmask, tmp1, tmp3) tmp5 = triton_helpers.any(tmp4, 1)[:, None] ``` Pull Request resolved: pytorch#109913 Approved by: https://github.com/lezcano

@malfet

This reverts commit 9299869. Reverted pytorch#109913 on behalf of https://github.com/jeanschmidt due to causing performance regression in relevant metrics, @malfet I believe you are the correct person to help identify and fix the issues. More details check internal OPS count for ads metricsnin the internal related diff ([comment](pytorch#109913 (comment)))

github-actions · 2024-02-18T19:33:28Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

peterbell10 · 2024-03-19T20:57:26Z

@malfet ping

github-actions · 2024-05-21T00:48:33Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

github-actions bot added module: inductor ciflow/inductor labels Sep 22, 2023

pytorchbot added the open source label Sep 22, 2023

peterbell10 requested a review from lezcano September 26, 2023 03:41

peterbell10 marked this pull request as ready for review September 26, 2023 03:41

lezcano approved these changes Sep 26, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 26, 2023

pytorchmergebot added the merging label Sep 26, 2023

pytorchmergebot removed the merging label Sep 26, 2023

peterbell10 added the topic: not user facing topic category label Sep 26, 2023

This was referenced Sep 29, 2023

[ATen] Support multi dim any and all reductions #110310

Closed

[inductor] Decompose boolean min/max into all/any #110311

Closed

pytorchmergebot added the merging label Dec 19, 2023

pytorchmergebot added the Merged label Dec 19, 2023

pytorchmergebot removed the merging label Dec 19, 2023

pytorchmergebot closed this in 9299869 Dec 19, 2023

pytorchmergebot added the Reverted label Dec 20, 2023

pytorchmergebot reopened this Dec 20, 2023

github-actions bot added the Stale label Feb 18, 2024

github-actions bot closed this Mar 19, 2024

peterbell10 reopened this Mar 19, 2024

peterbell10 removed the Stale label Mar 19, 2024

github-actions bot added the Stale label May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[inductor] Avoid bool being upcast to int #109913

[inductor] Avoid bool being upcast to int #109913

peterbell10 commented Sep 22, 2023 •

edited by pytorch-bot bot

pytorch-bot bot commented Sep 22, 2023 •

edited

lezcano Sep 26, 2023

peterbell10 Sep 29, 2023

lezcano commented Sep 26, 2023

pytorchmergebot commented Sep 26, 2023

peterbell10 commented Dec 19, 2023

pytorchmergebot commented Dec 19, 2023

pytorch-bot bot commented Dec 20, 2023

jeanschmidt commented Dec 20, 2023

pytorchmergebot commented Dec 20, 2023

pytorchmergebot commented Dec 20, 2023

malfet commented Dec 20, 2023

peterbell10 commented Dec 20, 2023

github-actions bot commented Feb 18, 2024

peterbell10 commented Mar 19, 2024

github-actions bot commented May 21, 2024

[inductor] Avoid bool being upcast to int #109913

Are you sure you want to change the base?

[inductor] Avoid bool being upcast to int #109913

Conversation

peterbell10 commented Sep 22, 2023 • edited by pytorch-bot bot

pytorch-bot bot commented Sep 22, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/109913

❌ 3 New Failures, 6 Unrelated Failures

lezcano Sep 26, 2023

Choose a reason for hiding this comment

peterbell10 Sep 29, 2023

Choose a reason for hiding this comment

lezcano commented Sep 26, 2023

pytorchmergebot commented Sep 26, 2023

Merge failed

peterbell10 commented Dec 19, 2023

pytorchmergebot commented Dec 19, 2023

Merge started

pytorch-bot bot commented Dec 20, 2023

jeanschmidt commented Dec 20, 2023

pytorchmergebot commented Dec 20, 2023

pytorchmergebot commented Dec 20, 2023

malfet commented Dec 20, 2023

peterbell10 commented Dec 20, 2023

github-actions bot commented Feb 18, 2024

peterbell10 commented Mar 19, 2024

github-actions bot commented May 21, 2024

peterbell10 commented Sep 22, 2023 •

edited by pytorch-bot bot

pytorch-bot bot commented Sep 22, 2023 •

edited