Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add write_record_metadata to PyTorchFileWriter #125184

Conversation

mikaylagawarecki
Copy link
Contributor

@mikaylagawarecki mikaylagawarecki commented Apr 29, 2024

Add PyTorchFileWriter.write_record_metadata(record_name, num_bytes) that

  • writes the zipfile header/end of central directory metadata for an entry*
  • reserves num_bytes in the zipfile for the payload.

*Since the payload is not provided, the CRC32 computation is skipped and 0s are written in the corresponding entry of the zipfile header

Stack from ghstack (oldest at bottom):

cc @albanD

Copy link

pytorch-bot bot commented Apr 29, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125184

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 7b20c03 with merge base 4d41015 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: jit release notes category label Apr 29, 2024
mikaylagawarecki added a commit that referenced this pull request Apr 29, 2024
ghstack-source-id: 51d4369e275e13fe1d93e7286fcd45271a09eb55
Pull Request resolved: #125184
Add `PyTorchFIleWriter.write_record_metadata(record_name, num_bytes)`, that will write the zipfile header/end of central directory metadata for an entry, and reserve `num_bytes` in the zipfile for the payload. 

Since the payload is not provided, the CRC32 computation is skipped and 0s are written in the corresponding entry of the zipfile header




[ghstack-poisoned]
mikaylagawarecki added a commit that referenced this pull request Apr 29, 2024
ghstack-source-id: 16b3909de68ecaac2c938c9e7347725dddc948cb
Pull Request resolved: #125184
@mikaylagawarecki mikaylagawarecki marked this pull request as ready for review April 29, 2024 20:09
caffe2/serialize/inline_container.cc Outdated Show resolved Hide resolved
caffe2/serialize/inline_container.cc Outdated Show resolved Hide resolved
@@ -649,10 +676,22 @@ void PyTorchStreamWriter::setup(const string& file_name) {
file_stream_.write(static_cast<const char*>(buf), nbytes);
return !file_stream_ ? 0 : nbytes;
};
seek_func_ = [this](size_t nbytes) -> size_t {
file_stream_.seekp(nbytes, std::ios_base::cur);
return !file_stream_ ? -1L : static_cast<size_t>(file_stream_.tellp());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit surprised by this conditions? If (!file_stream_), I guess the line above would have segfaulted? :p

caffe2/serialize/inline_container.cc Outdated Show resolved Hide resolved
test/test_serialization.py Outdated Show resolved Hide resolved
third_party/miniz-2.1.0/miniz.c Outdated Show resolved Hide resolved
third_party/miniz-2.1.0/miniz.c Outdated Show resolved Hide resolved
Add `PyTorchFileWriter.write_record_metadata(record_name, num_bytes)` that
- writes the zipfile header/end of central directory metadata for an entry*
- reserves `num_bytes` in the zipfile for the payload. 

*Since the payload is not provided, the CRC32 computation is skipped and 0s are written in the corresponding entry of the zipfile header




[ghstack-poisoned]
mikaylagawarecki added a commit that referenced this pull request May 1, 2024
ghstack-source-id: 1aef1219579c59c85e2c880c251360fc01d262a7
Pull Request resolved: #125184
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit but SGTM!

class TORCH_API PyTorchStreamWriter final {
public:
explicit PyTorchStreamWriter(const std::string& archive_name);
explicit PyTorchStreamWriter(
const std::function<size_t(const void*, size_t)> writer_func);
const std::function<size_t(const void*, size_t)> writer_func,
const std::function<size_t(size_t)> seek_func = default_seek_func);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const std::function<size_t(size_t)> seek_func = default_seek_func);
const std::function<size_t(size_t)> seek_func = {});

If you don't want to deal with the default, std::function has an empty state that works very much like an optional and you can do if (seek_func) on it.

@mikaylagawarecki mikaylagawarecki added topic: not user facing topic category module: python frontend For issues relating to PyTorch's Python frontend topic: improvements topic category and removed release notes: jit release notes category topic: not user facing topic category labels May 2, 2024
@mikaylagawarecki
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 2, 2024
@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@mikaylagawarecki
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

Add `PyTorchFileWriter.write_record_metadata(record_name, num_bytes)` that
- writes the zipfile header/end of central directory metadata for an entry*
- reserves `num_bytes` in the zipfile for the payload. 

*Since the payload is not provided, the CRC32 computation is skipped and 0s are written in the corresponding entry of the zipfile header




cc albanD

[ghstack-poisoned]
@mikaylagawarecki mikaylagawarecki added release notes: python_frontend release notes category and removed module: python frontend For issues relating to PyTorch's Python frontend labels May 3, 2024
@pytorch-bot pytorch-bot bot added the release notes: jit release notes category label May 3, 2024
mikaylagawarecki added a commit that referenced this pull request May 3, 2024
ghstack-source-id: 2b57a55587f881fdaa747e7716290e19f0ef0224
Pull Request resolved: #125184
@mikaylagawarecki
Copy link
Contributor Author

@pytorchbot merge

@mikaylagawarecki mikaylagawarecki removed the release notes: jit release notes category label May 3, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@izaitsevfb
Copy link
Contributor

@pytorchmergebot  revert -m "breaks internal builds, see D56962076" -c ghfirst

Copy link

pytorch-bot bot commented May 5, 2024

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: argument command: invalid choice: ' revert' (choose from 'merge', 'revert', 'rebase', 'label', 'drci', 'cherry-pick', 'close')

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick,close} ...

Try @pytorchbot --help for more info.

@izaitsevfb
Copy link
Contributor

@pytorchmergebot revert -m "breaks internal builds, see D56962076" -c ghfirst

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

@mikaylagawarecki your PR has been successfully reverted.

pytorchmergebot added a commit that referenced this pull request May 5, 2024
This reverts commit dd92637.

Reverted #125184 on behalf of https://github.com/izaitsevfb due to breaks internal builds, see D56962076 ([comment](#125184 (comment)))
albanD pushed a commit to albanD/pytorch that referenced this pull request May 7, 2024
ghstack-source-id: 2b57a55587f881fdaa747e7716290e19f0ef0224
Pull Request resolved: pytorch#125184
@albanD albanD mentioned this pull request May 7, 2024
@albanD
Copy link
Collaborator

albanD commented May 7, 2024

Closing this in favor of #125686 to reland myself.

@albanD albanD closed this May 7, 2024
pytorchmergebot pushed a commit that referenced this pull request May 14, 2024
Reland of #125184 with compiler warning fixed by extending `m_pWrite` rather than adding `m_pSeek` to miniz API

Differential Revision: [](https://our.internmc.facebook.com/intern/diff/)

Differential Revision: [D57287327](https://our.internmc.facebook.com/intern/diff/D57287327)
Pull Request resolved: #126087
Approved by: https://github.com/albanD
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: python_frontend release notes category Reverted topic: improvements topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants