Performance optimization of StreamSource #675

magoogli · 2024-03-15T12:20:55Z

At the moment when a call to 'slice(start, end)' occurs on StreamSource, if buffers in memory all data from wherever the current position in the stream is up to end.

This is a serious problem if the stream has never been read from and you call slice with a start and end position near the end of the stream. E.g if I have a 2 gigabyte file on disk that I am reading via a Web ReadableStream, and am using that as the source of my data for the StreamSource, and I ask for the last megabyte slice at the end of the StreamSource it will currently buffer the entire 2 gigabytes into a memory buffer, before returning the last megabyte!

Obviously with a stream you will always have to read the entire 2 gigabytes to get to the end, but buffering the data in memory is much more problematic and wasteful than just reading through the stream.

Acconut · 2024-03-15T12:59:02Z

This is a serious problem if the stream has never been read from and you call slice with a start and end position near the end of the stream.

In which situation would this occur with real-world usage of tus-js-client? Does that happen when you resume an upload that is nearly finished with a new tus.Upload object?

magoogli · 2024-03-15T13:12:02Z

Hi Marius, Thank you for the quick response on this. To reproduce the issue: 1) Create a ReadableStream to some large file. 2) Start uploading the file. 3) Pause the upload half way through 4) Refresh your browser window - ** this is the important part ** 5) Resume the upload ( By selecting the same file again ). The StreamSource now has to process all of the data up to the middle ( which i have no problem with and is impossible to get around ), but at the moment it also buffers it in memory which is unnecessary and hangs the browser actually if the file is big enough. Pausing and resuming within the same session is not a problem. Kind regards, Matthew

…

On Fri, Mar 15, 2024 at 2:59 PM Marius ***@***.***> wrote: This is a serious problem if the stream has never been read from and you call slice with a start and end position near the end of the stream. In which situation would this occur with real-world usage of tus-js-client? Does that happen when you resume an upload that is nearly finished? — Reply to this email directly, view it on GitHub <#675 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAUIQ6NAKSVCQ4AWARCT6XLYYLWCXAVCNFSM6AAAAABEX4XBA6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJZGYYTGOJWGI> . You are receiving this because you authored the thread.Message ID: ***@***.***>

magoogli · 2024-03-15T13:23:49Z

As to your question about real world use - Yes, I am creating a new tus.Upload object. When dealing with a standard javscript File object the scenario I described above works without issue ( as with a file you can get a slice without having to read any data prior to 'start' ). I then started playing around with the idea of on the fly compression using the Compressions Streams API and starting observing the massive slowdown in the scenario above when using a stream. After some debugging I found out that it was all the concatenation and buffering in memory which was causing the issue.

I don't think my solution is correct at the moment however as I am seeing incorrect file size uploads in the scenario I described above. Busy investigating.

magoogli · 2024-03-18T08:04:34Z

@Acconut I fixed the bug in my code and files are uploaded correctly now. The optimization definitely helps for the use case I described ( in a real world application we are developing ). Would you like any further clarification or code changes?

Acconut · 2024-03-25T07:25:44Z

That change itself looks good and also makes sense to me. Before we can merge this, we should also add a test case for this. In https://github.com/tus/tus-js-client/blob/f410679e8ae4b3821ffb919a5cfe7f04acd1528d/test/spec/test-browser-specific.js/#L251, we have test cases specific to readers in browsers. There a test case can be added, where a stream is supplied for resuming an upload with a non-zero offset. Seeking to this offset should result in multiple calls to the reader. In the end, an assertion should take place confirming that the data in the PATCH request is correct. Could you look into this?

magoogli · 2024-04-04T08:04:08Z

Hi @Acconut thanks for checking out my updates. I will try and push a test for the changes in the next few days.

magoogli · 2024-04-04T11:48:40Z

@Acconut I have added a test for you - it tests that the data sent in the PATCH request is correct when starting off at a non zero offset. We could I guess add some tests for the StreamSource itself - but I'm not sure what the best way would be to test that the concat method is NOT called when we don't expect it to be without exposing some internals of the class just for the sake of testing. Please let me know if you would like any further changes.

Improve performance of StreamSource

340dcdc

magoogli marked this pull request as ready for review March 15, 2024 12:21

Fix linting errors

076d79e

Take buffer size into account when dropping unneeded data

064e769

Fix linting issues

f410679

Add test for uploading data from a stream starting at a non zero offset

65eb912

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance optimization of StreamSource #675

Performance optimization of StreamSource #675

magoogli commented Mar 15, 2024

Acconut commented Mar 15, 2024 •

edited

magoogli commented Mar 15, 2024 via email

magoogli commented Mar 15, 2024 •

edited

magoogli commented Mar 18, 2024

Acconut commented Mar 25, 2024

magoogli commented Apr 4, 2024

magoogli commented Apr 4, 2024 •

edited

Performance optimization of StreamSource #675

Are you sure you want to change the base?

Performance optimization of StreamSource #675

Conversation

magoogli commented Mar 15, 2024

Acconut commented Mar 15, 2024 • edited

magoogli commented Mar 15, 2024 via email

magoogli commented Mar 15, 2024 • edited

magoogli commented Mar 18, 2024

Acconut commented Mar 25, 2024

magoogli commented Apr 4, 2024

magoogli commented Apr 4, 2024 • edited

Acconut commented Mar 15, 2024 •

edited

magoogli commented Mar 15, 2024 •

edited

magoogli commented Apr 4, 2024 •

edited