Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #283 : Fixes CPU busy loop(s) seen in request_multiple. #284

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

TysonAndre
Copy link
Contributor

@TysonAndre TysonAndre commented Jun 13, 2017

This will require more testing across various libcurl versions.
I doubt that the timeout is necessary for curl_multi_select
(calls select() if it can),
but leaving in one just in case of bugs, so that it will end.

  • Haven't thoroughly checked for relevant libcurl bugs.

Asynchronously wait for events with a short timeout if CURLM_CALL_MULTI_PERFORM

Fixes #283

}
while ($status === CURLM_CALL_MULTI_PERFORM);
while (true);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the above approach may be incorrect. Maybe the part that was causing the busy loop wasn't the above do{}while, but below

The below is probably a better alternative, and can get rid of the above.

                       while ($status === CURLM_CALL_MULTI_PERFORM);
 
+                       // Insert a 50ms sleep here so that we don't busy loop and chew cpu
+                       $selectRes  = curl_multi_select($multihandle, 0.05);
+
                        $to_process = array();

@TysonAndre TysonAndre changed the title Fixes #283 : Fixes CPU busy loop when using request_multiple. WIP - Fixes #283 : Fixes CPU busy loop when using request_multiple. Jun 14, 2017
@codecov-io
Copy link

Codecov Report

Merging #284 into master will decrease coverage by 0.37%.
The diff coverage is 33.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #284      +/-   ##
==========================================
- Coverage   92.11%   91.73%   -0.38%     
==========================================
  Files          21       21              
  Lines        1762     1767       +5     
==========================================
- Hits         1623     1621       -2     
- Misses        139      146       +7
Impacted Files Coverage Δ
library/Requests/Transport/cURL.php 91.94% <33.33%> (-1.75%) ⬇️
library/Requests/Proxy/HTTP.php 86.66% <0%> (-13.34%) ⬇️
library/Requests/Transport/fsockopen.php 94.76% <0%> (+0.58%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 87932f5...c7d8ee4. Read the comment docs.

@TysonAndre
Copy link
Contributor Author

TysonAndre commented Oct 23, 2019

I'm encountering brand new issues due to php/php-src#3509 (comment)

On September 2018 for PHP 7.1.23, after I created this PR, curl_multi_select will now return -1 without setting any file descriptors in certain error modes. That's because it uses curl_multi_wait(), which has slightly different issues.

  • I'm using php >= 7.1.23 and curl >= 7.28.0. This issue would not be seen if either of those dependencies was less than that.

I think that the issue might be triggered by timeouts (e.g. connect timeouts on one request, while the other request doesn't timeout).

https://bugs.php.net/bug.php?id=76480 mentions the following:

Please note the patch changes the behaviour of curl_multi_select(). Previously curl_multi_select() would return -1 if cURL has no open file descriptors. In the new situation (using curl_multi_wait() with libcurl 7.28 or higher) curl_multi_select() will return 0 if cURL has no open file descriptors.

One fix to consider would be to treat all remaining curl handles in the multi-handle as if they had timed out if curl_multi_select returned -1.

EDIT: Sorry, that would break older versions or code using older libcurl because it returned -1 on timeout previously. https://bugs.php.net/bug.php?id=77030 mentions a better approach

	$Descriptors = curl_multi_select( $Master, 0.1 );

	if( $Descriptors === -1 && curl_multi_errno( $Master ) !== CURLM_OK ) {
  • It shouldn't be possible to return -1 in older php versions, because it looks like Requests properly checks if there are 1 or more remaining handles
  • So if it does return -1 and curl_multi_errno isn't CURLM_OK, it's probably caused by a timeout, and should be handled by rmccue/Requests like that, throwing Timeout exceptions for the remaining connections

Note that I'm not a curl_multi expert (I had a bit of experience using it from university, but that's it), and there may be edge cases I haven't considered

@TysonAndre
Copy link
Contributor Author

#284 (comment) applied to an alternative version I was testing out locally without the usleep.

  • The usleep is unnecessary if curl_multi_select properly slept for 50 milliseconds while waiting for an event.
  • The usleep would mitigate CPU busy loop issues in newer php versions + curl versions. However, I think that request_multiple may still fail to return before/after my patch in edge cases, and adding $Descriptors === -1 && curl_multi_errno( $Master ) !== CURLM_OK may be needed

@TysonAndre
Copy link
Contributor Author

The failure mode I'm seeing in the latest php versions may be related to my use of the NOSIGNAL patch, if nobody else sees it

#264 (comment)

@TysonAndre
Copy link
Contributor Author

TysonAndre commented Oct 23, 2019

In order to reproduce the issue I saw, I did the following:

  • Configured two requests: one to localhost which would connect immediately, and wait 3000 milliseconds to respond, and another to a service that would refuse to connect.
  • Set a connect timeout of 3 milliseconds and a request timeout of 3000 milliseconds.
  • Performed both requests

https://gist.github.com/TysonAndre/46a6829d9d8de00e50226890a6a9a63f has the full logs and details about the options used, but the part of interest is that curl_multi_select isn't sleeping for nearly as long as I'd expect it to, and it is neither returning -1 nor returning an error in curl_multi_error.

curl_multi_exec   returned 0 after 0.000492 seconds: curl_multi_errno=0
curl_multi_select returned 0 after 0.000010 seconds: curl_multi_errno=0
curl_multi_info_read returned a value:
/path/to/vendor/rmccue/requests/library/Requests/Transport/cURL.php:247:
array(3) {
  'msg' =>
  int(1)
  'result' =>
  int(7)
  'handle' =>
  resource(316) of type (curl)
}
curl error name: Couldn't connect to server
curl_multi_exec   returned 0 after 0.000013 seconds: curl_multi_errno=0
curl_multi_select returned 0 after 0.000005 seconds: curl_multi_errno=0
curl_multi_exec   returned 0 after 0.000081 seconds: curl_multi_errno=0
curl_multi_select returned 0 after 0.000025 seconds: curl_multi_errno=0
curl_multi_exec   returned 0 after 0.000031 seconds: curl_multi_errno=0
curl_multi_select returned 0 after 0.000002 seconds: curl_multi_errno=0
curl_multi_exec   returned 0 after 0.000031 seconds: curl_multi_errno=0
curl_multi_select returned 0 after 0.000004 seconds: curl_multi_errno=0
curl_multi_exec   returned 0 after 0.000008 seconds: curl_multi_errno=0
... 5000 more lines of busy looping

The below link has the code changes I added to diagnose the issue and the (ineffective) mitigations I'm currently trying. A hardcoded usleep() of 1 millisecond would harm PHP response times if the service would normally complete in less than a millisecond

	/**
	 * Send multiple requests simultaneously
	 *
	 * @param array $requests Request data
	 * @param array $options Global options
	 * @return array Array of Requests_Response objects (may contain Requests_Exception or string responses as well)
	 */
	public function request_multiple($requests, $options) {
		// If you're not requesting, we can't get any responses ¯\_(ツ)_/¯
		if (empty($requests)) {
			return array();
		}

		$multihandle = curl_multi_init();
		$subrequests = array();
		$subhandles = array();

		$class = get_class($this);
		foreach ($requests as $id => $request) {
			$subrequests[$id] = new $class();
			$subhandles[$id] = $subrequests[$id]->get_subrequest_handle($request['url'], $request['headers'], $request['data'], $request['options']);
			$request['options']['hooks']->dispatch('curl.before_multi_add', array(&$subhandles[$id]));
			curl_multi_add_handle($multihandle, $subhandles[$id]);
		}

		$completed = 0;
		$responses = array();

		$request['options']['hooks']->dispatch('curl.before_multi_exec', array(&$multihandle));

		do {
			$active = false;

			$is_first_multi_exec = true;
			do {
				/*
				if (!$is_first_multi_exec) {
					// Sleep for 1 millisecond to avoid a busy loop that would chew CPU.
					// XXX uncomment this to mitigate this a bit - there are better mitigations such as making this sleep after curl_multi_select if it returns 0 or -1.
					usleep(1000);
				}
				 */
				$start = microtime(true);
				$status = curl_multi_exec($multihandle, $active);
				fprintf(STDERR, "curl_multi_exec   returned %s after %.6f seconds: curl_multi_errno=%s\n", json_encode($status), microtime(true) - $start, curl_multi_errno($multihandle));
				$is_first_multi_exec = false;
			}
			while ($status === CURLM_CALL_MULTI_PERFORM);

			// Insert a 50ms sleep here so that we don't busy loop and chew cpu
			$start = microtime(true);
			$selectRes  = curl_multi_select($multihandle, 2.0);
			fprintf(STDERR, "curl_multi_select returned %s after %.6f seconds: curl_multi_errno=%s\n", json_encode($selectRes), microtime(true) - $start, curl_multi_errno($multihandle));
			fflush(STDERR);

			$to_process = array();

			// Read the information as needed
			while ($done = curl_multi_info_read($multihandle)) {
				echo "curl_multi_info_read returned a value:\n";
				var_dump($done);
				echo "curl error name: " . curl_strerror($done['result']) . "\n";
				$key = array_search($done['handle'], $subhandles, true);
				if (!isset($to_process[$key])) {
					$to_process[$key] = $done;
				}
			}

			// Parse the finished requests before we start getting the new ones
			foreach ($to_process as $key => $done) {
				$options = $requests[$key]['options'];
				if (CURLE_OK !== $done['result']) {
					//get error string for handle.
					$reason = curl_error($done['handle']);
					$exception = new Requests_Exception_Transport_cURL(
									$reason,
									Requests_Exception_Transport_cURL::EASY,
									$done['handle'],
									$done['result']
								);
					$responses[$key] = $exception;
					$options['hooks']->dispatch('transport.internal.parse_error', array(&$responses[$key], $requests[$key]));
				}
				else {
					$responses[$key] = $subrequests[$key]->process_response($subrequests[$key]->response_data, $options);

					$options['hooks']->dispatch('transport.internal.parse_response', array(&$responses[$key], $requests[$key]));
				}

				curl_multi_remove_handle($multihandle, $done['handle']);
				curl_close($done['handle']);

				if (!is_string($responses[$key])) {
					$options['hooks']->dispatch('multiple.request.complete', array(&$responses[$key], $key));
				}
				$completed++;
			}
		}
		while ($active || $completed < count($subrequests));

		$request['options']['hooks']->dispatch('curl.after_multi_exec', array(&$multihandle));

		curl_multi_close($multihandle);

		return $responses;
	}

The endpoint I used to test that connected immediately but failed to respond, running on localhost:

<?php

sleep(3);
echo "Pretending to be slow to respond\n";

EDIT: This occurs whether or not I set CURLOPT_NOSIGNAL. I'm using a recent version of PHP 7.1 (>= 7.1.23 and curl >= 7.28.0.)
EDIT: the busy loop only happens for a small fraction of requests under those conditions

See WordPress#284 for background information

This will require more testing across various libcurl versions.
I doubt that the timeout is necessary for curl_multi_select
(calls select() if it can),
but leaving in one just in case of bugs, so that it will end.
- Haven't thoroughly checked for relevant libcurl bugs.

Asynchronously wait for events with a short timeout if CURLM_CALL_MULTI_PERFORM
fails. Use shorter timeouts when the total time elapsed so far is
shorter. Make the largest possible manual usleep 2ms.
@codecov-io
Copy link

codecov-io commented Oct 23, 2019

Codecov Report

Merging #284 into master will increase coverage by 0.01%.
The diff coverage is 94.11%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #284      +/-   ##
============================================
+ Coverage     92.12%   92.13%   +0.01%     
- Complexity      761      764       +3     
============================================
  Files            21       21              
  Lines          1764     1781      +17     
============================================
+ Hits           1625     1641      +16     
- Misses          139      140       +1
Impacted Files Coverage Δ Complexity Δ
library/Requests/Transport/cURL.php 93.72% <94.11%> (+0.03%) 69 <0> (+3) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 58a0bbc...34684b1. Read the comment docs.

It seems like long-lived request_multiple operations are more affected
by the sleep statements that were added.
@TysonAndre
Copy link
Contributor Author

It seems like the addition of usleep doesn't hurt short requests, but makes long requests take noticeably longer. I should only be performing the usleep if the number returned by curl_multi_select wasn't positive.

https://curl.haxx.se/libcurl/c/curl_multi_wait.html

On completion, if numfds is non-NULL, it will be populated with the total number of file descriptors on which interesting events occurred. This number can include both libcurl internal descriptors as well as descriptors provided in extra_fds.

The implementation in php-src for curl_multi_select:

#if LIBCURL_VERSION_NUM >= 0x071c00 /* Available since 7.28.0 */
	error = curl_multi_wait(mh->multi, NULL, 0, (unsigned long) timeout * 1000.0, &numfds);
	if (CURLM_OK != error) {
		SAVE_CURLM_ERROR(mh, error);
		RETURN_LONG(-1);
	}

	RETURN_LONG(numfds);
#else
	_make_timeval_struct(&to, timeout);

	FD_ZERO(&readfds);
	FD_ZERO(&writefds);
	FD_ZERO(&exceptfds);

	error = curl_multi_fdset(mh->multi, &readfds, &writefds, &exceptfds, &maxfd);
	SAVE_CURLM_ERROR(mh, error);

	if (maxfd == -1) {
		RETURN_LONG(-1);
	}
	RETURN_LONG(select(maxfd + 1, &readfds, &writefds, &exceptfds, &to));
#endif

@TysonAndre TysonAndre changed the title WIP - Fixes #283 : Fixes CPU busy loop when using request_multiple. Fixes #283 : Fixes CPU busy loop when using request_multiple. Oct 23, 2019
@TysonAndre TysonAndre changed the title Fixes #283 : Fixes CPU busy loop when using request_multiple. Fixes #283 : Fixes CPU busy loop(s) seen in request_multiple. Oct 23, 2019
@TysonAndre
Copy link
Contributor Author

No further changes are planned. This seems to be working pretty well to mitigate edge cases causing cpu usage spikes in a fraction of requests.

  • After the latest commit, requestMultiple duration is back to what it was without the sleep for normal requests (durations ranging from 5-1000ms)
  • Overall CPU usage is slightly down on hosts that are using requestMultiple a lot (with php >= 7.1.23 and curl >= 7.28.0). CPU usage stops spiking when the request duration of the external service spikes
  • The measured number of calls to curl_multi_exec is a lot lower.

@jrfnl
Copy link
Member

jrfnl commented Aug 13, 2021

@TysonAndre Hiya! Sorry for the long wait, but we're finally trying to get through the list of open PRs.

We're currently planning two triage sessions for Requests 2.0 and would like to invite you to join us in one or both of these sessions to talk us through the PR and discuss it.

The triage sessions are currently planned for:

  • Friday August 20, 07:00 - 11:00 UTC
  • Friday September 3, 07:00 - 11:00 UTC

Would you be available to join us during those times on one of those days ?

Also: would you be willing and able to add some tests for these changes to the PR ?

@jrfnl
Copy link
Member

jrfnl commented Aug 13, 2021

PR #493 is possibly a partial duplicate of this PR.

@schlessera schlessera added this to the 2.1.0 milestone Nov 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

performance issue when using request_multiple
4 participants