New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add write_record_metadata
to PyTorchFileWriter
#125184
Add write_record_metadata
to PyTorchFileWriter
#125184
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125184
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 7b20c03 with merge base 4d41015 (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
ghstack-source-id: 51d4369e275e13fe1d93e7286fcd45271a09eb55 Pull Request resolved: #125184
Add `PyTorchFIleWriter.write_record_metadata(record_name, num_bytes)`, that will write the zipfile header/end of central directory metadata for an entry, and reserve `num_bytes` in the zipfile for the payload. Since the payload is not provided, the CRC32 computation is skipped and 0s are written in the corresponding entry of the zipfile header [ghstack-poisoned]
ghstack-source-id: 16b3909de68ecaac2c938c9e7347725dddc948cb Pull Request resolved: #125184
caffe2/serialize/inline_container.cc
Outdated
@@ -649,10 +676,22 @@ void PyTorchStreamWriter::setup(const string& file_name) { | |||
file_stream_.write(static_cast<const char*>(buf), nbytes); | |||
return !file_stream_ ? 0 : nbytes; | |||
}; | |||
seek_func_ = [this](size_t nbytes) -> size_t { | |||
file_stream_.seekp(nbytes, std::ios_base::cur); | |||
return !file_stream_ ? -1L : static_cast<size_t>(file_stream_.tellp()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit surprised by this conditions? If (!file_stream_), I guess the line above would have segfaulted? :p
Add `PyTorchFileWriter.write_record_metadata(record_name, num_bytes)` that - writes the zipfile header/end of central directory metadata for an entry* - reserves `num_bytes` in the zipfile for the payload. *Since the payload is not provided, the CRC32 computation is skipped and 0s are written in the corresponding entry of the zipfile header [ghstack-poisoned]
ghstack-source-id: 1aef1219579c59c85e2c880c251360fc01d262a7 Pull Request resolved: #125184
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit but SGTM!
class TORCH_API PyTorchStreamWriter final { | ||
public: | ||
explicit PyTorchStreamWriter(const std::string& archive_name); | ||
explicit PyTorchStreamWriter( | ||
const std::function<size_t(const void*, size_t)> writer_func); | ||
const std::function<size_t(const void*, size_t)> writer_func, | ||
const std::function<size_t(size_t)> seek_func = default_seek_func); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const std::function<size_t(size_t)> seek_func = default_seek_func); | |
const std::function<size_t(size_t)> seek_func = {}); |
If you don't want to deal with the default, std::function has an empty state that works very much like an optional and you can do if (seek_func)
on it.
@pytorchbot merge |
Merge failedReason: This PR needs a If not, please add the To add a label, you can comment to pytorchbot, for example For more information, see Details for Dev Infra teamRaised by workflow job |
@pytorchbot merge |
Merge failedReason: This PR needs a If not, please add the To add a label, you can comment to pytorchbot, for example For more information, see Details for Dev Infra teamRaised by workflow job |
Add `PyTorchFileWriter.write_record_metadata(record_name, num_bytes)` that - writes the zipfile header/end of central directory metadata for an entry* - reserves `num_bytes` in the zipfile for the payload. *Since the payload is not provided, the CRC32 computation is skipped and 0s are written in the corresponding entry of the zipfile header cc albanD [ghstack-poisoned]
ghstack-source-id: 2b57a55587f881fdaa747e7716290e19f0ef0224 Pull Request resolved: #125184
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@pytorchmergebot revert -m "breaks internal builds, see D56962076" -c ghfirst |
❌ 🤖 pytorchbot command failed:
Try |
@pytorchmergebot revert -m "breaks internal builds, see D56962076" -c ghfirst |
@pytorchbot successfully started a revert job. Check the current status here. |
@mikaylagawarecki your PR has been successfully reverted. |
This reverts commit dd92637. Reverted #125184 on behalf of https://github.com/izaitsevfb due to breaks internal builds, see D56962076 ([comment](#125184 (comment)))
ghstack-source-id: 2b57a55587f881fdaa747e7716290e19f0ef0224 Pull Request resolved: pytorch#125184
Closing this in favor of #125686 to reland myself. |
Reland of #125184 with compiler warning fixed by extending `m_pWrite` rather than adding `m_pSeek` to miniz API Differential Revision: [](https://our.internmc.facebook.com/intern/diff/) Differential Revision: [D57287327](https://our.internmc.facebook.com/intern/diff/D57287327) Pull Request resolved: #126087 Approved by: https://github.com/albanD
Add
PyTorchFileWriter.write_record_metadata(record_name, num_bytes)
thatnum_bytes
in the zipfile for the payload.*Since the payload is not provided, the CRC32 computation is skipped and 0s are written in the corresponding entry of the zipfile header
Stack from ghstack (oldest at bottom):
write_record_metadata
to PyTorchFileWriter #125184cc @albanD