Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for PSR-18 HTTP clients #826

Draft
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

jtojnar
Copy link
Contributor

@jtojnar jtojnar commented Mar 20, 2023

This is #777 rebased and commits squashed/reordered for easier reviewing and cleaner Git history.

Similarly to how #815 is rebased #774.

Full diff: https://github.com/Art4/simplepie/compare/c7330e69af805877bf599bf8c48e7082d44d3b84..jtojnar:simplepie:add-psr18-http-client-support-rebased

@jtojnar jtojnar force-pushed the add-psr18-http-client-support-rebased branch 2 times, most recently from f8d385f to e5dce13 Compare March 20, 2023 12:37
src/HTTP/Psr18Client.php Outdated Show resolved Hide resolved
src/HTTP/Psr18Client.php Outdated Show resolved Hide resolved
src/HTTP/Psr7Response.php Outdated Show resolved Hide resolved
@jtojnar jtojnar force-pushed the add-psr18-http-client-support-rebased branch from e5dce13 to bd54a17 Compare March 20, 2023 12:53
Copy link
Contributor Author

@jtojnar jtojnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed up to Add SimplePie::set_http_client(), add tests

src/HTTP/Psr18Client.php Outdated Show resolved Hide resolved
src/HTTP/Psr18Client.php Outdated Show resolved Hide resolved
$permanentUrl = $requestedUrl;
}

$request = $request->withUri($this->uriFactory->createUri($requestedUrl));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-absolute URLs are allowed these days: https://httpwg.org/specs/rfc9110.html#field.location

Copy link
Contributor

@Art4 Art4 Jun 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what's the problem. This is part of the UriFactory and Request class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the location header could have a non-absolute url (without host). But this is not a problem.

$this->uriFactory->createUri() creates a new Uri instance from the non-absolute url. This uri is set to $request->withUri($uri). This will take care of the missing host:

This method MUST update the Host header of the returned request by
* default if the URI contains a host component. If the URI does not
* contain a host component, any pre-existing Host header MUST be carried
* over to the returned request.

See https://github.com/php-fig/http-message/blob/2.0/src/RequestInterface.php#L99-L129

$requestedUrl = $response->getHeaderLine('Location');

if ($statusCode === 301) {
$permanentUrl = $requestedUrl;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can do anything about being passed a client that already handles redirects (like selfoss does).

We might not have other choice than adding support for each specific one. For example:

if (class_exists(\GuzzleHttp\RedirectMiddleware::class) && $response instanceof \GuzzleHttp\Psr7\Response && $response->hasHeader(\GuzzleHttp\RedirectMiddleware::HISTORY_HEADER)) {
    $permanentUrl = self::getGuzzleHistoryPermanentUrl($url, $response);
}


/**
 * Get permanent URL of the response.
 * RedirectMiddleware will need to be enabled for this to work.
 *
 * @param string $url requested URL, to use as a fallback
 * @param GuzzleHttp\Psr7\Response $response response to inspect
 *
 * @return string URL we should change the future requests to
 */
public static function getGuzzleHistoryPermanentUrl(string $url, \GuzzleHttp\Psr7\Response $response) {
    // Sequence of fetched URLs
    $urlStack = array_merge([$url], $response->getHeader(\GuzzleHttp\RedirectMiddleware::HISTORY_HEADER));
    // Let’s consider the first URL a permanent redirect since it is our start point.
    $statusStack = array_merge([301], $response->getHeader(\GuzzleHttp\RedirectMiddleware::STATUS_HISTORY_HEADER));

    // Follow the chain util first non-permanent redirect, since non-permanent
    // redirect could invalidate permanent ones later in the chain.
    $index = 0;
    while (($statusStack[$i] === 301 || $statusStack[$i] === 308) && $index < count($urlStack)) {
        $index++;
    }

    return $urlStack[$index - 1];
}

Copy link
Contributor

@Art4 Art4 Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a PSR-18 client handles the redirects they will return a PSR-7 Response with a 200 (or other then 301||308) status code. So we have no chance to get the final requested url. But this is not a problem because File will be deprecated and removed. I would not put any work into this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but we need to get the permanent URL for features like fossar/selfoss#1178.

Unless we want to deprecate SimplePie::$permanent_url and expose the Response in SimplePie so that apps could extract the permanent URL themselves.

Or we could instruct consumers to not pass clients with RedirectMiddleware in the docs but that is error-prone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should deprecate also SimplePie::$permanent_url while deprecating File, and we should expose Psr\Http\Message\ResponseInterface instead. The extraction of the permanent URL would be a detail of the provided PSR-18 client. This has to be discussed in future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, looking at the Guzzle docs, their redirect middleware will apparently not be used when using Guzzle as PSR-18 client:

This option has no effect when making requests using GuzzleHttp\Client::sendRequest(). In order to stay compliant with PSR-18 any redirect response is returned as is.

That means the PSR-18 client will not be aware that the requests are connected when we handle redirects ourselves and there will be no way to extract the permanent URL from it.

$filepath = dirname(__FILE__, 3) . '/data/this-file-does-not-exist';

$this->expectException(HttpException::class);
$this->expectExceptionMessage('file_get_contents('.$filepath.'): Failed to open stream: No such file or directory');
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we should match error messages like this. I do not think they are guaranteed to be stable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm checking the error message to be sure that the test failed for the expected reason. As an alternative we could throw a new FileNotFoundException.

try {
$raw = file_get_contents($path);
} catch (Throwable $th) {
throw new HttpException($th->getMessage(), $th->getCode(), $th);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file_get_contents should not throw an exception by itself. I presume this is to catch it if app throws in the handler passed to set_error_handler? Perhaps add a comment?

Copy link
Contributor

@Art4 Art4 Jun 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AIUI, E_WARNING will not be converted into throwable unless the application defines a custom error handler with set_error_handler, that throws.

That tests catches the throw below:

if ($raw === false) {
    throw new HttpException('file_get_contents() could not read the file', 1);
}

Copy link
Contributor

@Art4 Art4 Jun 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user has a custom error handler we can catch the error and throw it as a HttpException. The tests expect the message from the PHP Error as file_get_contents(/path/to/file): Failed to open stream: No such file or directory.

If the user has no custom error handler, $raw will be false and we throw a HttpException with the message file_get_contents() could not read the file. (Saying this I'm realizing that we have no tests for this second case.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a is_readable() check in #848.

try {
$raw = file_get_contents($path);
} catch (Throwable $th) {
throw new HttpException($th->getMessage(), $th->getCode(), $th);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, the error code will probably not be meaningful for HttpException.

@jtojnar jtojnar force-pushed the add-psr18-http-client-support-rebased branch 2 times, most recently from 839643d to 1a218b6 Compare March 21, 2023 03:55
src/Sanitize.php Outdated
private function get_http_client(): Client
{
if ($this->http_client === null) {
$this->http_client = new FileClient(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like SimplePie will no longer share input data. Perhaps we should have separate internal set_http_client method that will continue accepting it from SimplePie and a public one set_http_factories for consumers that want to use Sanitize or Locator outside the context of SimplePie?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sanitize has already set_http_client() for PSR classes and the (deprecated) pass_file_data(). If one has not called set_http_client(), the http client will be created on the deprecated file data.

Copy link
Contributor Author

@jtojnar jtojnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have finished reordering the commits and noted some more stuff I have noticed.

UriFactoryInterface $uri_factory
): void {
$this->http_client_injected = true;
$this->http_client = new Psr18Client($http_client, $request_factory, $uri_factory);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have seen some libraries to use Psr18ClientDiscovery. While I dislike this on principle, it is certainly convenient.

src/HTTP/Psr18Client.php Outdated Show resolved Hide resolved
src/HTTP/Psr18Client.php Outdated Show resolved Hide resolved
src/HTTP/Psr18Client.php Outdated Show resolved Hide resolved
src/HTTP/Psr7Response.php Outdated Show resolved Hide resolved
}

/**
* Return the string representation as a URI reference.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should just use {@inheritdoc} here for ApiGen.

Weirdly, Symfony started removing it symfony/symfony#47390. Perhaps because they no longer seem to have generated API docs – quick search only found https://symfony.com/blog/new-symfony-api-documentation which returns 404.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of {@inheritdoc} either.

CHANGELOG.md Outdated
Comment on lines 13 to 14
- New method `SimplePie\Locator::set_http_client()` for providing PSR-18 HTTP client and PSR-17 factories by @Art4 in [#777](https://github.com/simplepie/simplepie/pull/777)
- New method `SimplePie\Sanitize::set_http_client()` for providing PSR-18 HTTP client and PSR-17 factories by @Art4 in [#777](https://github.com/simplepie/simplepie/pull/777)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #815, we marked the last two as internal. Do we want to promote them to API? I think that would only make sense if we expect consumers to instantiate these classes directly, rather than only having SimplePie create them internally.

For 2.0, I would like to minimize the API surface marking everything final and private, and only opening methods that have good use cases. So we should probably decide if we want to allow consumers to create Locator and Sanitize instances, and make set_http_client() methods on them internal, so that we do not need to deprecate them again.

Copy link
Contributor

@Art4 Art4 Jun 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last two was set internal because they accept an also internal SimplePie\HTTP\Client. #777 changes this to PSR-18 client and PSR-17 factories.

I agree with you that we should minimize the API surface but atm it is allowed to create Locator and Sanitize instances and we are referencing Sanitize::set_http_client() as an alternative for the deprecated Sanitize::pass_file_data() and maybe should do the same with Locator::set_http_client().

Alternative we could complete deprecate the instantiation of Locator and Sanitize for users. This could be made and discussed in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deprecating the the instantiation of Locator and Sanitize for consumers in a separate PR sounds good to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #849.

*/
public function __construct(File $file)
public function __construct(/* File */ $file)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though this is no change for PHP variance checker, extending the type that the argument accepts is still a breaking change since subclasses might not expect it.

And there are apparently some 😿 https://github.com/search?q=%2Fextends%20SimplePie.%2BSniffer%2F&type=code

Edit: I am using File::fromResponse here as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the type hint is not a breaking change because we have not released 1.9.0 yet. (refs)

Also the type hint in the docblock was wrong, see https://github.com/simplepie/simplepie/blob/1.8.0/src/Content/Type/Sniffer.php#L71-L79

$locate = $this->registry->create(Locator::class, [&$file, $this->timeout, $this->useragent, $this->max_checked_feeds, $this->force_fsockopen, $this->curl_options]);
$locate->set_http_client($this->get_http_client());
$locate = $this->registry->create(Locator::class, [
(! $file instanceof File) ? File::fromResponse($file) : $file,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to move the instanceof check into the method. Edit: Done here.

$requestFactory->method('createRequest')->willReturn($request);

$body = $this->createMock(StreamInterface::class);
$body->method('getContents')->willReturnCallback(function () {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need to be changed to __toString. Edit: Done.

Copy link
Contributor

@Art4 Art4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you could resolve the git conflicts I could review this PR.

I think most of the notes are addressed in other issues/PRs.

@Art4 Art4 mentioned this pull request Sep 13, 2023
9 tasks
@Art4
Copy link
Contributor

Art4 commented Nov 2, 2023

@jtojnar May I take over this PR? I could resolve the git conflicts and will review the changes there.

jtojnar and others added 4 commits November 6, 2023 02:38
…ent-support"

There are correctness issues as well as disorganized commit history.
The fixed changes are re-applied in the commits that follow.
Providers are meant for data parametrization.
This one basically created a completely different test situation
and what we are going to test next will be even more wild.
Let’s dissolve the provider into individual test cases.
We are going to allow using HTTP Clients that will set this to a different
`Response` subclasses than just `File`.

It was already marked private with PHPDoc, let’s enforce it.
Hopefully, this will produce a more scrutable error message
in case someone wants to access `File` properties on this.
This will allow using PSR-18 HTTP Clients.

Since those return a different `Response` subclass than `File`,
we need to convert `Response` to `File` to pass it to `Locator`, `Sniffer` & co.
Allows passing PSR implementations to Sanitize instead of internal HTTP Client.

TODO: Possibly make the set_http_client method internal again.
Allows passing PSR implementations to Locator instead of internal HTTP Client.

TODO: Possibly make the set_http_client method internal again.
@jtojnar jtojnar force-pushed the add-psr18-http-client-support-rebased branch from 1abf153 to 0427439 Compare November 6, 2023 01:42
@jtojnar
Copy link
Contributor Author

jtojnar commented Nov 6, 2023

I will try moving this forward during the week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants