Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dynamo] Turn on guard_nn_modules #125202

Closed
wants to merge 25 commits into from

Conversation

anijain2305
Copy link
Contributor

@anijain2305 anijain2305 commented Apr 29, 2024

Stack from ghstack (oldest at bottom):

Turning on guard_nn_modules adds large number of guards, so we are bound to take a perf hit. But the perf hit is small. These are the numbers

image

First we observe that compared to Python guards, C++ guards give around 6x speedup. This reduces the total time spent in guards. This is shown in the last column (cpp_guards/inductor_optimized_latency). The worst model is around 1.61%, with most of the models below 1%. I think this is good enough signal to turn the config on.

One might also wonder how much guard slowdown occurs with guard_nn_modules=True. This is the table
image

For most models, the guard overhead with nn module guards is under 2x. There are a few outliers, where the slowdown is really high and for those models we spend 1%-2% time in C++ guards as shown in first table.

cc @ezyang @msaroufim @bdhirsh @chauhang @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng

Copy link

pytorch-bot bot commented Apr 29, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125202

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit bca250a with merge base ae5e2ab (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

cc ezyang msaroufim bdhirsh chauhang voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng

[ghstack-poisoned]
anijain2305 added a commit that referenced this pull request Apr 29, 2024
ghstack-source-id: c9739ecda6b11f88979ecf9b52307439b6b151d0
Pull Request resolved: #125202
@anijain2305 anijain2305 added the keep-going Don't stop on first failure, keep running tests until the end label Apr 30, 2024
cc ezyang msaroufim bdhirsh chauhang voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng

[ghstack-poisoned]
anijain2305 added a commit that referenced this pull request Apr 30, 2024
ghstack-source-id: 8cef25c560c6cb0b43aa7d45028f0e56ac7cad93
Pull Request resolved: #125202
cc ezyang msaroufim bdhirsh chauhang voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng

[ghstack-poisoned]
anijain2305 added a commit that referenced this pull request Apr 30, 2024
ghstack-source-id: e55402434e154600fb7f38b4bf6a4163b567aa94
Pull Request resolved: #125202
cc ezyang msaroufim bdhirsh chauhang voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng

[ghstack-poisoned]
anijain2305 added a commit that referenced this pull request Apr 30, 2024
ghstack-source-id: b782cd2203016d991c3beb91c836789b3a773e94
Pull Request resolved: #125202
cc ezyang msaroufim bdhirsh chauhang voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng

[ghstack-poisoned]
cc ezyang msaroufim bdhirsh chauhang voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng

[ghstack-poisoned]
cc ezyang msaroufim bdhirsh chauhang voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng

[ghstack-poisoned]
cc ezyang msaroufim bdhirsh chauhang voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng

[ghstack-poisoned]
cc ezyang msaroufim bdhirsh chauhang voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng

[ghstack-poisoned]
cc ezyang msaroufim bdhirsh chauhang voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng

[ghstack-poisoned]
cc ezyang msaroufim bdhirsh chauhang voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng

[ghstack-poisoned]
cc ezyang msaroufim bdhirsh chauhang voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng

[ghstack-poisoned]
cc ezyang msaroufim bdhirsh chauhang voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng

[ghstack-poisoned]
cc ezyang msaroufim bdhirsh chauhang voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng

[ghstack-poisoned]
OnlyFor pushed a commit to OnlyFor/pytorch that referenced this pull request May 9, 2024
ghstack-source-id: b2f84384af12e0c81b22e8ade7883fb2418a4aeb
Pull Request resolved: pytorch#125202
@anijain2305 anijain2305 changed the title [DONT MERGE][FOR CI][dynamo] Turn on guard_nn_modules [dynamo] Turn on guard_nn_modules May 10, 2024
@anijain2305 anijain2305 requested review from jansel and ezyang May 10, 2024 16:32
Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want a JK for internal rollout to killswitch it

@ezyang
Copy link
Contributor

ezyang commented May 10, 2024

🤔 which we don't have precedent for in this module

@ezyang
Copy link
Contributor

ezyang commented May 10, 2024

You want to call torch._utils_internal.justknobs_check to read out the default, but you can't do the obvious thing of doing it in the config module as that will cause the JK check to happen at module import time and @oulgen and I know that you can't do that because it will poison the process for forks. So you want this to happen on the first time the config is accessed. The low tech way is to default this to None, and then at the read site, if it is None, pong the JK for the real value. The high tech way is to add some capability to the config module getter (I think we've got a accessor function where you can customize) so that it is able to lazily query JK.

@ezyang
Copy link
Contributor

ezyang commented May 10, 2024

Another deployment strategy that doesn't involve JKs is to turn it on in OSS only but not fbcode, then enable it on a per PG basis by twiddling it, and then later switch the fbcode default to true.

Turning on guard_nn_modules adds large number of guards, so we are bound to take a perf hit. But the perf hit is small. These are the numbers 

![image](https://github.com/pytorch/pytorch/assets/13822661/c8793906-c8c7-432b-9af4-4594713067be)

First we observe that compared to Python guards, C++ guards give around 6x speedup. This reduces the total time spent in guards. This is shown in the last column (cpp_guards/inductor_optimized_latency). The worst model is around 1.61%, with most of the models below 1%. I think this is good enough signal to turn the config on.

cc ezyang msaroufim bdhirsh chauhang voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng

[ghstack-poisoned]
@anijain2305
Copy link
Contributor Author

Going with fbcode off and OSS on for now. Will figure out internal rollout strategy.

@anijain2305
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 11, 2024
@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@anijain2305
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

tinglvv pushed a commit to tinglvv/pytorch that referenced this pull request May 14, 2024
Turning on guard_nn_modules adds large number of guards, so we are bound to take a perf hit. But the perf hit is small. These are the numbers

![image](https://github.com/pytorch/pytorch/assets/13822661/c8793906-c8c7-432b-9af4-4594713067be)

First we observe that compared to Python guards, C++ guards give around 6x speedup. This reduces the total time spent in guards. This is shown in the last column (cpp_guards/inductor_optimized_latency). The worst model is around 1.61%, with most of the models below 1%. I think this is good enough signal to turn the config on.

One might also wonder how much guard slowdown occurs with `guard_nn_modules=True`. This is the table
![image](https://github.com/pytorch/pytorch/assets/13822661/932a885b-1c03-424b-8405-5bc8fd35dd39)

For most models, the guard overhead with nn module guards is under 2x. There are a few outliers, where the slowdown is really high and for those models we spend 1%-2% time in C++ guards as shown in first table.

Pull Request resolved: pytorch#125202
Approved by: https://github.com/ezyang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request keep-going Don't stop on first failure, keep running tests until the end Merged module: dynamo oncall: pt2 topic: not user facing topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants