Replacing ShardedTensor with DTensor for RW sharding #1991

zainhuda · 2024-05-13T14:48:55Z

Summary:
This is the first part of migration TorchRec state dict checkpointing from ShardedTensor to DTensor. It sets up the necessary infra to support additional sharding schemes. The general approach is to keep ShardedTensor paths and remove them once all sharding types are supported on DTensor. This includes ShardingPlan and ShardedTensor dataclasses such as ShardedTensorMetadata.

NOTE: This version of LocalShardsWrapper does not support empty shards, that is added in the next diff enabling CW. D57063512

This diff includes:

LocalShardsWrapper torch.tensor subclass to be used with DTensor
Changes in TorchRec state_dict load and creation to use DTensor for Row Wise path in both EmbeddingCollection and EmbeddingBagCollection
Changes to DCP to support LocalShardsWrapper for saving and reading (WriteItems and ReadItems)
Added DTensor paths to callsites where ShardedTensors are expected.

LocalShardsWrapper supports the following torch ops:

torch.ops._c10d_functional.all_gather_into_tensor.default
aten._to_copy.default
aten.view.default
aten.equal.default

With extensibility to add more as required by use cases.

See https://docs.google.com/document/d/16Ptl50mGFJW2cljdF2HQ6FwsiA0scwbAbjx_4dhabJw/edit?usp=drivesdk for more info regarding design and approach.

Differential Revision: D54375878

facebook-github-bot · 2024-05-13T14:49:04Z

This pull request was exported from Phabricator. Differential Revision: D54375878

Summary: **This is the first part of migration TorchRec state dict checkpointing from ShardedTensor to DTensor. It sets up the necessary infra to support additional sharding schemes. The general approach is to keep ShardedTensor paths and remove them once all sharding types are supported on DTensor. This includes ShardingPlan and ShardedTensor dataclasses such as ShardedTensorMetadata.** NOTE: This version of LocalShardsWrapper does not support empty shards, that is added in the next diff enabling CW. D57063512 **This diff includes:** + LocalShardsWrapper torch.tensor subclass to be used with DTensor + Changes in TorchRec state_dict load and creation to use DTensor for Row Wise path in both EmbeddingCollection and EmbeddingBagCollection + Changes to DCP to support LocalShardsWrapper for saving and reading (WriteItems and ReadItems) + Added DTensor paths to callsites where ShardedTensors are expected. **LocalShardsWrapper supports the following torch ops:** + torch.ops._c10d_functional.all_gather_into_tensor.default + aten._to_copy.default + aten.view.default + aten.equal.default With extensibility to add more as required by use cases. See https://docs.google.com/document/d/16Ptl50mGFJW2cljdF2HQ6FwsiA0scwbAbjx_4dhabJw/edit?usp=drivesdk for more info regarding design and approach. Differential Revision: D54375878

facebook-github-bot · 2024-05-13T14:51:12Z

This pull request was exported from Phabricator. Differential Revision: D54375878

Summary: X-link: pytorch/torchrec#1991 **This is the first part of migration TorchRec state dict checkpointing from ShardedTensor to DTensor. It sets up the necessary infra to support additional sharding schemes. The general approach is to keep ShardedTensor paths and remove them once all sharding types are supported on DTensor. This includes ShardingPlan and ShardedTensor dataclasses such as ShardedTensorMetadata.** NOTE: This version of LocalShardsWrapper does not support empty shards, that is added in the next diff enabling CW. D57063512 **This diff includes:** + LocalShardsWrapper torch.tensor subclass to be used with DTensor + Changes in TorchRec state_dict load and creation to use DTensor for Row Wise path in both EmbeddingCollection and EmbeddingBagCollection + Changes to DCP to support LocalShardsWrapper for saving and reading (WriteItems and ReadItems) + Added DTensor paths to callsites where ShardedTensors are expected. **LocalShardsWrapper supports the following torch ops:** + torch.ops._c10d_functional.all_gather_into_tensor.default + aten._to_copy.default + aten.view.default + aten.equal.default With extensibility to add more as required by use cases. See https://docs.google.com/document/d/16Ptl50mGFJW2cljdF2HQ6FwsiA0scwbAbjx_4dhabJw/edit?usp=drivesdk for more info regarding design and approach. Test Plan: TODO: add a MAST job using RW and preempt to test checkpointing ``` buck2 test 'fbcode//mode/opt' fbcode//torchrec/distributed/tests:test_model_parallel_nccl ``` ``` buck2 test 'fbcode//mode/opt' fbcode//torchrec/distributed/tests:test_model_parallel_nccl_single_rank -- --exact 'torchrec/distributed/tests:test_model_parallel_nccl_single_rank - torchrec.distributed.tests.test_model_parallel_nccl_single_rank.ModelParallelStateDictTestNccl: test_load_state_dict' ``` Sandcastle Differential Revision: D54375878

iamzainhuda · 2024-05-30T20:47:35Z

creating new PR

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 13, 2024

facebook-github-bot added the fb-exported label May 13, 2024

zainhuda force-pushed the export-D54375878 branch from 7089269 to 0e61990 Compare May 13, 2024 14:51

zainhuda mentioned this pull request May 13, 2024

[torchrec] Replacing ShardedTensor with DTensor for RW sharding pytorch/pytorch#126072

Closed

iamzainhuda closed this May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replacing ShardedTensor with DTensor for RW sharding #1991

Replacing ShardedTensor with DTensor for RW sharding #1991

zainhuda commented May 13, 2024

facebook-github-bot commented May 13, 2024

facebook-github-bot commented May 13, 2024

iamzainhuda commented May 30, 2024

Replacing ShardedTensor with DTensor for RW sharding #1991

Replacing ShardedTensor with DTensor for RW sharding #1991

Conversation

zainhuda commented May 13, 2024

facebook-github-bot commented May 13, 2024

facebook-github-bot commented May 13, 2024

iamzainhuda commented May 30, 2024