Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for httpx as backend #1085

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open

Add support for httpx as backend #1085

wants to merge 14 commits into from

Conversation

jakkdl
Copy link
Contributor

@jakkdl jakkdl commented Feb 2, 2024

First step of #749 as described in #749 (comment)

I was tasked with implementing this, but it's been a bit of a struggle not being very familiar with aiohttp, httpx or aiobotocore - and there being ~zero in-line types. But I think I've fixed enough of the major problems that it's probably useful to share my progress.

There's a bunch of random types added. I can split those off into a separate PR or remove if requested. Likewise for from __future__ import annotations.

TODO:

  • exceptions
    • retryable_exceptions: mostly just need to go through all httpx exceptions and decide which ones are fine
    • The mapping between httpx exceptions and aiobotocore exceptions can likely be improved.
      # **previous exception mapping**
      # aiohttp.ClientSSLError -> SSLError
      # aiohttp.ClientProxyConnectiorError
      # aiohttp.ClientHttpProxyError -> ProxyConnectionError
      # aiohttp.ServerDisconnectedError
      # aiohttp.ClientPayloadError
      # aiohttp.http_exceptions.BadStatusLine -> ConnectionClosedError
      # aiohttp.ServerTimeoutError -> ConnectTimeoutError|ReadTimeoutError
      # aiohttp.ClientConnectorError
      # aiohttp.ClientConnectionError
      # socket.gaierror -> EndpointConnectionError
      # asyncio.TimeoutError -> ReadTimeoutError
      # **possible httpx exception mapping**
      # httpx.CookieConflict
      # httpx.HTTPError
      # * httpx.HTTPStatusError
      # * httpx.RequestError
      # * httpx.DecodingError
      # * httpx.TooManyRedirects
      # * httpx.TransportError
      # * httpx.NetworkError
      # * httpx.CloseError -> ConnectionClosedError
      # * httpx.ConnectError -> EndpointConnectionError
      # * httpx.ReadError
      # * httpx.WriteError
      # * httpx.ProtocolError
      # * httpx.LocalProtocolError -> SSLError??
      # * httpx.RemoteProtocolError
      # * httpx.ProxyError -> ProxyConnectionError
      # * httpx.TimeoutException
      # * httpx.ConnectTimeout -> ConnectTimeoutError
      # * httpx.PoolTimeout
      # * httpx.ReadTimeout -> ReadTimeoutError
      # * httpx.WriteTimeout
      # * httpx.UnsupportedProtocol
      # * httpx.InvalidURL
      except httpx.ConnectError as e:
      raise EndpointConnectionError(endpoint_url=request.url, error=e)
      except (socket.gaierror,) as e:
      raise EndpointConnectionError(endpoint_url=request.url, error=e)
      except asyncio.TimeoutError as e:
      raise ReadTimeoutError(endpoint_url=request.url, error=e)
      except httpx.ReadTimeout as e:
      raise ReadTimeoutError(endpoint_url=request.url, error=e)
      except NotImplementedError:
      raise
      except Exception as e:
      message = 'Exception received when sending urllib3 HTTP request'
      logger.debug(message, exc_info=True)
      raise HTTPClientError(error=e)
  • proxy support
    • postponed to later PR
    • this was previously handled per-request, but AFAICT you can only configure proxies per-client in httpx. So need to move the logic for it, and cannot use botocore.httpsession.ProxyConfiguration.proxy_[url,headers]_for(request.url)
    • raising of ProxyConnectionError is very ugly atm, and probably not "correct"?
    • BOTO_EXPERIMENTAL__ADD_PROXY_HOST_HEADER
      • seems not possible to do when configuring proxies per-client?
  • wrap io.IOBase data in a non-sync-iterable async iterable
    • converted to bytes for now.
  • I have added change info to CHANGES.rst

No longer TODOs after changing the scope to implement httpx alongside aiohttp:

  • test_patches previously cared about aiohttp. That can probably be retired?
  • replace aiohttp with httpx in tests.mock_server.AIOServer?
  • The following connector_args now raise NotImplementedError:
    • use_dns_cache: did not find any mentions of dns caches on a quick skim of httpx docs
    • force_close: same. Can maybe find out more by digging into docs on what this option does in aiohttp.
    • resolver: this is an aiohttp.abc.AbstractResolver which is obviously a no-go.
      • raise error for code passing this
      • figure out equivalent functionality for httpx
  • url's were previously wrapped with yarl.URL(url, encoding=True). httpx does not support yarl. I don't know what this achieved (maybe the non-normalization??), so skipping it for now.

Some extra tests would probably also be good, but not super critical when we're just implementing httpx alongside aiohttp.

aiobotocore/awsrequest.py Outdated Show resolved Hide resolved

# previously data was wrapped in _IOBaseWrapper
# github.com/aio-libs/aiohttp/issues/1907
# I haven't researched whether that's relevant with httpx.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya silly decision of aiohttp, they took over the stream. Most likely httpx does the right thing. I think to get around the sync/async thing we can just make a stream wrapper that hides the relevant methods...I think I did this somewhere...will try to remember

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the current tests catch if httpx didn't do the right thing?

@jakkdl
Copy link
Contributor Author

jakkdl commented Feb 6, 2024

I started wondering whether response.StreamingBody should wrap httpx.Response or one of its iterators (aiter_bytes, aiter_text, aiter_lines or aiter_raw), but am now starting to think that maybe it doesn't make sense to have at all and we should just surface the httpx.Response object to the user and let them handle it as they want.

The way that aiohttp.StreamReader behaves is just different enough that providing a translation layer that handles httpx.Response streams the same way becomes quite clunky/inefficient/tricky/very different. StreamingBody.iter_chunks should be done by specifying chunk size when calling httpx.Response.aiter_bytes, and StreamingBody.iter_lines should use httpx.Response.aiter_lines, but the current API does nothing to stop you from reading one chunk, then one byte, but httpx.Response (very reasonably) only lets you initialize the iterators once.
Implementing iter_chunks/iter_lines/etc as reading one byte at a time with await anext() on an aiter_raw sounds awful, since there's no read() method that can return a set number of bytes. That in general makes StreamingBody.read() quite clunky to implement.

Copy link

codecov bot commented Feb 19, 2024

Codecov Report

Attention: Patch coverage is 80.06231% with 64 lines in your changes are missing coverage. Please review.

Project coverage is 86.13%. Comparing base (a904bd1) to head (af293b0).

Files Patch % Lines
tests/test_basic_s3.py 67.53% 25 Missing ⚠️
aiobotocore/httpsession.py 79.13% 24 Missing ⚠️
tests/conftest.py 82.00% 9 Missing ⚠️
aiobotocore/awsrequest.py 68.75% 5 Missing ⚠️
tests/python38/test_eventstreams.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1085      +/-   ##
==========================================
- Coverage   86.34%   86.13%   -0.22%     
==========================================
  Files          62       62              
  Lines        5910     6173     +263     
==========================================
+ Hits         5103     5317     +214     
- Misses        807      856      +49     
Flag Coverage Δ
unittests 86.13% <80.06%> (-0.22%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jakkdl jakkdl changed the title Replace aiohttp with httpx Add support for httpx as alternate backend Feb 19, 2024
@jakkdl jakkdl changed the title Add support for httpx as alternate backend Add support for httpx as backend Feb 19, 2024
@jakkdl
Copy link
Contributor Author

jakkdl commented Feb 19, 2024

Whooooo, all tests are passing!!!!
though I did an ugly with test_non_normalized_key_paths - I understand nothing about that test so I currently made the test pass if httpx returns a normalized path.

current TODOs:

  • I should add a command-line parameter that sets the http backend to be tested, so I can set up a CI environment without httpx installed to make sure that works.
  • Retryable exceptions.
    • Maybe try to write a test for it
  • figure out the branches in convert_to_response_dict.
    • I think they're fine?
  • ~~figure out proxies, or ~~raise NotImplementedError.
    • There is at least one test that sorta checks it so if raising I need to work around it.
  • Maybe add test for http_session_cls
  • Add documentation - RTD is broken?

codecov is very sad, but most of that is due to me duplicating code that wasn't covered to start with, or extending tests that aren't run in CI. I'll try to make it very slightly less sad, but making it completely unsad is very much out of scope for this PR.

Likewise RTD is failing ... and I think that's unrelated to the PR?

Add no-httpx run to CI on 3.12
Tests can now run without httpx installed.
Exclude `if TYPE_CHECKING` blocks from coverage.
various code cleanup
…ive errors on connector args not compatible with httpx. Remove proxy code and raise NotImplementedError. fix/add tests
@jakkdl jakkdl marked this pull request as ready for review February 20, 2024 12:28
@jakkdl
Copy link
Contributor Author

jakkdl commented Feb 21, 2024

@thejcannon @aneeshusa if you wanna do a review pass

@jakkdl jakkdl requested a review from thehesiod March 1, 2024 10:26
@jakkdl
Copy link
Contributor Author

jakkdl commented Mar 20, 2024

Hey @thehesiod what's the feeling on this? It is turning out to be a messier and more disruptive change than initially thought in #749. I can pull out some of the changes to a separate PR to make this a bit smaller at least

@thehesiod
Copy link
Collaborator

hey sorry been down with a cold, will look asap. I don't mind big PRs

Comment on lines +644 to +646
if httpx and isinstance(aio_session, httpx.AsyncClient):
async with aio_session.stream("GET", presigned_url) as resp:
data = await resp.aread()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think of making an adapter class so no test changes are necessary. We can expose the raw stream via a new property if needed

Copy link
Contributor Author

@jakkdl jakkdl Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no test changes is not going to be possible - see #1085 (comment). But I could try to make one that implements a bunch of the basic functionality which would minimize test changes.

Copy link
Contributor Author

@jakkdl jakkdl Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, translating between await resp.read() and await resp.aread() with an adapter class is trivial.

But having an adapter class turn calls to resp['Body'].close() into await resp['Body'].aclose...
I guess it's possible in theory to hook into the currently running async framework and run .aclose() from a sync .close() in an adapter class... but that feels like a very bad idea. Especially as we're looking to support anyio and structured concurrency.

I suppose I could write a wrapper class that .. only translates read->aread, and gives specific errors for the other ones? It could maybe help transitioning code currently written, but I think perhaps more appropriate is to if/when dropping aiohttp we make the adapter class raise DeprecationError if calling .read() or .close().

Comment on lines 459 to 462
if current_http_backend == 'httpx':
assert key == 'key./name'
else:
assert key == 'key./././name'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, ya this needs to be fixed. We can't have the response changing. This should match whatever botocore does, if current way is incorrect this is fine, otherwise needs to be fixed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current behaviour is indeed how botocore handles it:
https://github.com/boto/botocore/blob/970d577087d404b0927cc27dc57178e01a3371cd/tests/integration/test_s3.py#L599-L606
so I'll have to do some digging

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

digging success!... but looks like it needs changes in upstream httpx to get fixed.

When we call self._session.build_request in

httpx_request = self._session.build_request(
method=request.method,
url=url,
headers=headers,
content=content,
)

httpx turns the string url into a httpx.URL object, which explicitly normalizes the path https://github.com/encode/httpx/blob/392dbe45f086d0877bd288c5d68abf860653b680/httpx/_urlparse.py#L387

We can manually create a httpx.URL object to be passed into httpx.build_request, but there's no parameter that control normalization, so at that point we'd have to create a subclass of httpx.URL or something to customize the behaviour.

I can open an issue for it in the httpx repo, but I'll first try to figure out why botocore seems to explicitly not want to normalize the path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so the reason these slashes aren't normalized is because having slashes (and periods) in a key name is allowed
But I think if we can replace the slashes in the key name with %2F somewhere along the chain then I think it'll be handled correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the current percent encoding happens deep within the guts of botocore, where they explicitly mark / as safe for some reason: https://github.com/boto/botocore/blob/970d577087d404b0927cc27dc57178e01a3371cd/botocore/serialize.py#L520
Why? no clue. Can it be marked unsafe? I tried and ran unit tests and some functional tests in botocore/, and some tests did start to fail - though unclear if they're bad errors or just defensive asserts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been requested in httpx since forever: encode/httpx#1805
let's see if me offering to write a PR can give it some traction.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya main thing is downstream consumers may be doing string matching and could break logic

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

appreciate the digging!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turns out httpx does have a way to do it now! encode/httpx#1805 (reply in thread)

@thehesiod
Copy link
Collaborator

btw check out: https://awslabs.github.io/aws-crt-python/api/http.html#awscrt.http.HttpClientConnection perhaps we should go in that direction so it will be complete AWS impl

@thehesiod
Copy link
Collaborator

discussion here: #1106

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants