Streaming requests read full response contents before initial cache write #878

tgrandje · 2023-09-19T15:47:22Z

Problem with streaming queries

Queries do not seem to stream appropriately. It seems the whole response is awaited by the CachedSession before releasing the handle (the example in the doc is not appropriate as it is very quick).

Expected behavior

Chunks should be parsed as soon as they are gathered.

Steps to reproduce the behavior

import requests
import requests_cache

url = "https://wxs.ign.fr/859x8t863h6a09o9o6fy4v60/telechargement/inspire/BDTOPO-FRANCE-ADMIN-SSDOUBLON-PACK_17.1$BDTOPO_2-2_ADMINISTRATIF_SHP_WGS84G_FRA_2017-01-01/file/BDTOPO_2-2_ADMINISTRATIF_SHP_WGS84G_FRA_2017-01-01.7z"

s = requests.Session()
r = s.get(url, stream=True)
print(r.status_code)
for chunk in r.iter_content(chunk_size=1024):
    print(chunk[:10])
    break  # will break asap

requests_cache.install_cache()
r = s.get(url, stream=True)
print(r.status_code)
for chunk in r.iter_content(chunk_size=1024):
    print(chunk[:10])
    break # will still do

# Lets' use a new cache
s = requests_cache.CachedSession("dummy")
r = s.get(url, stream=True)
print(r.status_code) # will not even be printed until the end of the request
for chunk in r.iter_content(chunk_size=1024):
    print(chunk[:10])
    break

Workarounds

It seems the install_cache is enough to circumvent this behaviour. I checked the behaviour of requests_cache and can confirm that the stream argument is correctly passed to requests.

Environment

requests-cache version: 1.1.0
Python version: 3.9
Platform: Windows 10 pro

The text was updated successfully, but these errors were encountered:

JWCook · 2023-10-05T20:23:12Z

You're correct, the current level of support for streaming requests is making sure the stream can be played back correctly when returned from the cache. In other words, chunking behavior in the underlying file-like object used by urllib3 is the same between original and cached responses. The initial request is expected to be slower, though, since the entire response contents must be read and cached before being returned to the user. I'm not able to reproduce any difference in behavior with install_cache(), though.

I definitely agree that it would be an improvement for large requests like this if we could cache a streaming response only after it reaches the end of the stream. In general, this library isn't optimized for file downloads and other large requests, but it is something on my radar (#407). There would be a few different ways to approach this, but I can't think of any particularly clean solutions right now, and I will need to give it some more thought. Meanwhile, I will try to at least come up with a workaround you can use.

JWCook · 2023-11-24T17:24:53Z

Thanks for the example. I'm guessing Content-Length isn't being set correctly. I'll look into it!

JWCook · 2023-11-24T17:26:28Z

I don't think that's related, though. Could you create a separate issue for that, please?

tgrandje added the bug label Sep 19, 2023

JWCook added the enhancement label Oct 5, 2023

JWCook mentioned this issue Apr 15, 2022

Improve performance for large responses #407

Open

JWCook added the performance label Oct 5, 2023

JWCook changed the title ~~Problem with streaming queries~~ Streaming requests read full response contents before initial cache write Oct 23, 2023

nschloe mentioned this issue Nov 24, 2023

stream=True: IncompleteRead(50440 bytes read, -25220 more expected) #910

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming requests read full response contents before initial cache write #878

Streaming requests read full response contents before initial cache write #878

tgrandje commented Sep 19, 2023

JWCook commented Oct 5, 2023 •

edited

JWCook commented Nov 24, 2023

JWCook commented Nov 24, 2023

Streaming requests read full response contents before initial cache write #878

Streaming requests read full response contents before initial cache write #878

Comments

tgrandje commented Sep 19, 2023

Problem with streaming queries

Expected behavior

Steps to reproduce the behavior

Workarounds

Environment

JWCook commented Oct 5, 2023 • edited

JWCook commented Nov 24, 2023

JWCook commented Nov 24, 2023

JWCook commented Oct 5, 2023 •

edited