Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance optimization of StreamSource #675

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

magoogli
Copy link

At the moment when a call to 'slice(start, end)' occurs on StreamSource, if buffers in memory all data from wherever the current position in the stream is up to end.

This is a serious problem if the stream has never been read from and you call slice with a start and end position near the end of the stream. E.g if I have a 2 gigabyte file on disk that I am reading via a Web ReadableStream, and am using that as the source of my data for the StreamSource, and I ask for the last megabyte slice at the end of the StreamSource it will currently buffer the entire 2 gigabytes into a memory buffer, before returning the last megabyte!

Obviously with a stream you will always have to read the entire 2 gigabytes to get to the end, but buffering the data in memory is much more problematic and wasteful than just reading through the stream.

@magoogli magoogli marked this pull request as ready for review March 15, 2024 12:21
@Acconut
Copy link
Member

Acconut commented Mar 15, 2024

This is a serious problem if the stream has never been read from and you call slice with a start and end position near the end of the stream.

In which situation would this occur with real-world usage of tus-js-client? Does that happen when you resume an upload that is nearly finished with a new tus.Upload object?

@magoogli
Copy link
Author

magoogli commented Mar 15, 2024 via email

@magoogli
Copy link
Author

magoogli commented Mar 15, 2024

As to your question about real world use - Yes, I am creating a new tus.Upload object. When dealing with a standard javscript File object the scenario I described above works without issue ( as with a file you can get a slice without having to read any data prior to 'start' ). I then started playing around with the idea of on the fly compression using the Compressions Streams API and starting observing the massive slowdown in the scenario above when using a stream. After some debugging I found out that it was all the concatenation and buffering in memory which was causing the issue.

I don't think my solution is correct at the moment however as I am seeing incorrect file size uploads in the scenario I described above. Busy investigating.

@magoogli
Copy link
Author

@Acconut I fixed the bug in my code and files are uploaded correctly now. The optimization definitely helps for the use case I described ( in a real world application we are developing ). Would you like any further clarification or code changes?

@Acconut
Copy link
Member

Acconut commented Mar 25, 2024

That change itself looks good and also makes sense to me. Before we can merge this, we should also add a test case for this. In https://github.com/tus/tus-js-client/blob/f410679e8ae4b3821ffb919a5cfe7f04acd1528d/test/spec/test-browser-specific.js/#L251, we have test cases specific to readers in browsers. There a test case can be added, where a stream is supplied for resuming an upload with a non-zero offset. Seeking to this offset should result in multiple calls to the reader. In the end, an assertion should take place confirming that the data in the PATCH request is correct. Could you look into this?

@magoogli
Copy link
Author

magoogli commented Apr 4, 2024

Hi @Acconut thanks for checking out my updates. I will try and push a test for the changes in the next few days.

@magoogli
Copy link
Author

magoogli commented Apr 4, 2024

@Acconut I have added a test for you - it tests that the data sent in the PATCH request is correct when starting off at a non zero offset. We could I guess add some tests for the StreamSource itself - but I'm not sure what the best way would be to test that the concat method is NOT called when we don't expect it to be without exposing some internals of the class just for the sake of testing. Please let me know if you would like any further changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants