[export] Fix for unflattening modules with duplicate tensors #125192

angelayi · 2024-04-29T21:03:55Z

In the given test case, we have a ModuleList of 3 modules (norm.0, norm.1, norm.2) which share the same weight and bias tensors. However when we trace, they all end up pointing to one state dict name, (ex. norm.2).

graph():
    %p_norms_0_weight : [num_users=0] = placeholder[target=p_norms_0_weight]
    %p_norms_0_bias : [num_users=0] = placeholder[target=p_norms_0_bias]
    %p_norms_1_weight : [num_users=0] = placeholder[target=p_norms_1_weight]
    %p_norms_1_bias : [num_users=0] = placeholder[target=p_norms_1_bias]
    %p_norms_2_weight : [num_users=3] = placeholder[target=p_norms_2_weight]
    %p_norms_2_bias : [num_users=3] = placeholder[target=p_norms_2_bias]
    %input_ : [num_users=1] = placeholder[target=input_]
    %native_layer_norm : [num_users=1] = call_function[target=torch.ops.aten.native_layer_norm.default](args = (%input_, [2, 2, 3], %p_norms_2_weight, %p_norms_2_bias, 1e-05), kwargs = {})
    %getitem : [num_users=1] = call_function[target=operator.getitem](args = (%native_layer_norm, 0), kwargs = {})
    %native_layer_norm_1 : [num_users=1] = call_function[target=torch.ops.aten.native_layer_norm.default](args = (%getitem, [2, 2, 3], %p_norms_2_weight, %p_norms_2_bias, 1e-05), kwargs = {})
    %getitem_3 : [num_users=1] = call_function[target=operator.getitem](args = (%native_layer_norm_1, 0), kwargs = {})
    %native_layer_norm_2 : [num_users=1] = call_function[target=torch.ops.aten.native_layer_norm.default](args = (%getitem_3, [2, 2, 3], %p_norms_2_weight, %p_norms_2_bias, 1e-05), kwargs = {})
    %getitem_6 : [num_users=1] = call_function[target=operator.getitem](args = (%native_layer_norm_2, 0), kwargs = {})
    return (getitem_6,)

This causes an error in the unflattener where after constructing the submodules for norm.0, it will have the graph pointing to norm.2.weight and norm.2.bias:

graph():
    %p_norms_2_bias : [num_users=1] = placeholder[target=p_norms_2_bias]
    %p_norms_2_weight : [num_users=1] = placeholder[target=p_norms_2_weight]
    %input_ : [num_users=1] = placeholder[target=input_]
    %native_layer_norm : [num_users=1] = call_function[target=torch.ops.aten.native_layer_norm.default](args = (%input_, [2, 2, 3], %p_norms_2_weight, %p_norms_2_bias, 1e-05), kwargs = {})
    %getitem : [num_users=1] = call_function[target=operator.getitem](args = (%native_layer_norm, 0), kwargs = {})
    return getitem

Since the attributes are not within the same scope of the graph, (norm.0 vs. norm.2), they will not be added to the subgraph, causing an error.

So this PR handles the duplicate state dict attributes by modifying the inputs_to_state dict to map from node names to a list of possible state dict target names.

pytorch-bot · 2024-04-29T21:03:58Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125192

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 73213d7 with merge base 00dd4d5 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-04-29T22:05:45Z

@angelayi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

torch/export/unflatten.py

facebook-github-bot · 2024-04-30T22:05:51Z

@angelayi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-05-01T15:51:10Z

@angelayi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-05-01T19:10:43Z

@pytorchbot merge -f 'Landed internally'

(Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally)

pytorchmergebot · 2024-05-01T19:12:44Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…#125192) In the given test case, we have a ModuleList of 3 modules (`norm.0`, `norm.1`, `norm.2`) which share the same `weight` and `bias` tensors. However when we trace, they all end up pointing to one state dict name, (ex. `norm.2`). ``` graph(): %p_norms_0_weight : [num_users=0] = placeholder[target=p_norms_0_weight] %p_norms_0_bias : [num_users=0] = placeholder[target=p_norms_0_bias] %p_norms_1_weight : [num_users=0] = placeholder[target=p_norms_1_weight] %p_norms_1_bias : [num_users=0] = placeholder[target=p_norms_1_bias] %p_norms_2_weight : [num_users=3] = placeholder[target=p_norms_2_weight] %p_norms_2_bias : [num_users=3] = placeholder[target=p_norms_2_bias] %input_ : [num_users=1] = placeholder[target=input_] %native_layer_norm : [num_users=1] = call_function[target=torch.ops.aten.native_layer_norm.default](args = (%input_, [2, 2, 3], %p_norms_2_weight, %p_norms_2_bias, 1e-05), kwargs = {}) %getitem : [num_users=1] = call_function[target=operator.getitem](args = (%native_layer_norm, 0), kwargs = {}) %native_layer_norm_1 : [num_users=1] = call_function[target=torch.ops.aten.native_layer_norm.default](args = (%getitem, [2, 2, 3], %p_norms_2_weight, %p_norms_2_bias, 1e-05), kwargs = {}) %getitem_3 : [num_users=1] = call_function[target=operator.getitem](args = (%native_layer_norm_1, 0), kwargs = {}) %native_layer_norm_2 : [num_users=1] = call_function[target=torch.ops.aten.native_layer_norm.default](args = (%getitem_3, [2, 2, 3], %p_norms_2_weight, %p_norms_2_bias, 1e-05), kwargs = {}) %getitem_6 : [num_users=1] = call_function[target=operator.getitem](args = (%native_layer_norm_2, 0), kwargs = {}) return (getitem_6,) ``` This causes an error in the unflattener where after constructing the submodules for `norm.0`, it will have the graph pointing to `norm.2.weight` and `norm.2.bias`: ``` graph(): %p_norms_2_bias : [num_users=1] = placeholder[target=p_norms_2_bias] %p_norms_2_weight : [num_users=1] = placeholder[target=p_norms_2_weight] %input_ : [num_users=1] = placeholder[target=input_] %native_layer_norm : [num_users=1] = call_function[target=torch.ops.aten.native_layer_norm.default](args = (%input_, [2, 2, 3], %p_norms_2_weight, %p_norms_2_bias, 1e-05), kwargs = {}) %getitem : [num_users=1] = call_function[target=operator.getitem](args = (%native_layer_norm, 0), kwargs = {}) return getitem ``` Since the attributes are not within the same scope of the graph, (`norm.0` vs. `norm.2`), they will not be added to the subgraph, causing an error. So this PR handles the duplicate state dict attributes by modifying the `inputs_to_state` dict to map from node names to a list of possible state dict target names. Pull Request resolved: pytorch#125192 Approved by: https://github.com/zhxchen17

pytorch-bot bot added the ci-td-distributed label Apr 29, 2024

angelayi marked this pull request as ready for review April 29, 2024 21:32

angelayi requested review from avikchaudhuri, gmagogsfm, tugsbayasgalan and zhxchen17 as code owners April 29, 2024 21:32

zhxchen17 approved these changes Apr 29, 2024

View reviewed changes

wilson100hong reviewed Apr 30, 2024

View reviewed changes

torch/export/unflatten.py Outdated Show resolved Hide resolved

wilson100hong reviewed Apr 30, 2024

View reviewed changes

torch/export/unflatten.py Show resolved Hide resolved

angelayi force-pushed the angelayi/wilson_bug branch from 9254a77 to b2eabac Compare April 30, 2024 22:05

[export] Fix for unflattening modules with duplicate tensors

73213d7

angelayi force-pushed the angelayi/wilson_bug branch from b2eabac to 73213d7 Compare April 30, 2024 23:58

pytorchmergebot added the merging label May 1, 2024

pytorchmergebot added the Merged label May 1, 2024

pytorchmergebot closed this in a216d87 May 1, 2024

pytorchmergebot removed the merging label May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[export] Fix for unflattening modules with duplicate tensors #125192

[export] Fix for unflattening modules with duplicate tensors #125192

angelayi commented Apr 29, 2024

pytorch-bot bot commented Apr 29, 2024 •

edited

facebook-github-bot commented Apr 29, 2024

facebook-github-bot commented Apr 30, 2024

facebook-github-bot commented May 1, 2024

facebook-github-bot commented May 1, 2024

pytorchmergebot commented May 1, 2024

[export] Fix for unflattening modules with duplicate tensors #125192

[export] Fix for unflattening modules with duplicate tensors #125192

Conversation

angelayi commented Apr 29, 2024

pytorch-bot bot commented Apr 29, 2024 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125192

✅ No Failures

facebook-github-bot commented Apr 29, 2024

facebook-github-bot commented Apr 30, 2024

facebook-github-bot commented May 1, 2024

facebook-github-bot commented May 1, 2024

pytorchmergebot commented May 1, 2024

Merge started

pytorch-bot bot commented Apr 29, 2024 •

edited