Performance #10

aprokofyev · 2019-01-03T17:22:28Z

Could you please explain if rendora + chrome headless can process concurrent requests in parallel? Or are all of the requests synchronous? I ran a simple benchmark and that's what I got:

wrk -H 'User-Agent: bot' http://127.0.0.1:3001
Running 10s test @ http://127.0.0.1:3001
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.01s     0.00us   1.01s   100.00%
    Req/Sec     0.11      0.33     1.00     88.89%
  9 requests in 10.09s, 334.15KB read
  Socket errors: connect 0, read 1, write 0, timeout 8
Requests/sec:      0.89
Transfer/sec:     33.10KB

Config(http://backend.d is a simple SPA, fetches dummy data from API, performance ~ 3000rps):

listen:
    address: 0.0.0.0
    port: 3001
target:
    url: "http://backend.d"
backend:
    url: "http://backend.d"
headless:
    waitAfterDOMLoad: 1000
    internal:
        url: http://localhost:9222
    timeout: 5
output:
    minify: true
debug: true
cache:
    type: none
filters:
    userAgent:
        defaultPolicy: blacklist
        exceptions:
            keywords:
                - bot
                - bing
                - crawler
                - curl

The text was updated successfully, but these errors were encountered:

geokb · 2019-01-03T19:33:50Z

you did set waitAfterDOMLoad: 1000 so rendora waits for an entire second after the initial DOM load. Set it again to 0 and see the latencies. I have latencies as little as 10ms for hello-world-tier pages and ~200ms for some really complex pages for a website running in production.

aprokofyev · 2019-01-04T15:52:38Z

Thank you for the reply, I understand what waitAfterDOMLoad does, what I'm trying to figure out is if rendora + chrome headless can process multiple concurrent requests in parallel, or do requests stack up into queue, and each next request is processed after the previous one is finished? The benchmark shows that 8 requests out of ten timed out, which makes me to believe that requests are processed successively, could you please clarify?

wrk -H 'User-Agent: bot' http://127.0.0.1:3001
Running 10s test @ http://127.0.0.1:3001
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.01s     0.00us   1.01s   100.00%
    Req/Sec     0.11      0.33     1.00     88.89%
  9 requests in 10.09s, 334.15KB read
  Socket errors: connect 0, read 1, write 0, timeout 8
Requests/sec:      0.89
Transfer/sec:     33.10KB

geokb · 2019-01-04T17:19:31Z

Aha okay, sorry I didn't read your post carefully the first time. Yeah, it currently processes the requests to the headless Chrome instance successively not in parallel, but rendora itself can accept as many parallel requests as possible by your OS and resources, and since you set a potentially low value of headless.timeout which is at 5 in your config file, this can produce some responses with status of 500 since a stalled request has to wait additional 1000ms (since you set waitAfterDOMLoad to 1000) for each older request

aprokofyev · 2019-01-04T17:41:39Z

Thank you, understood. Do you know by any chance a way to make chrome headless to process requests concurrently?

geokb · 2019-01-04T18:05:06Z

Yes, I guess using Target by the chrome-devtools-protocol (see https://chromedevtools.github.io/devtools-protocol/tot/Target). It's possible to be done here in rendora but that means I have to sacrifice the type safe RPC layer and send/receive raw JSON to the server. I actually wanted to implement the parallelism feature from the very beginning but it's not really very common to see hundreds or even tens of crawler requests per second unless probably for top websites with hundreds of thousands of pages. But I guess I will implement this at some version before v1.0 if there is enough interest.

agonsalves · 2019-01-04T21:50:45Z

Seconding the interest. Even though a single website may not get too many crawler requests per sec on average, I often see surges of 600+ rpm when someone forgets to throttle their bot. Plus, if you have many websites running through this, it will definitely stack up.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance #10

Performance #10

aprokofyev commented Jan 3, 2019 •

edited

geokb commented Jan 3, 2019 •

edited

aprokofyev commented Jan 4, 2019 •

edited

geokb commented Jan 4, 2019

aprokofyev commented Jan 4, 2019

geokb commented Jan 4, 2019

agonsalves commented Jan 4, 2019

Performance #10

Performance #10

Comments

aprokofyev commented Jan 3, 2019 • edited

geokb commented Jan 3, 2019 • edited

aprokofyev commented Jan 4, 2019 • edited

geokb commented Jan 4, 2019

aprokofyev commented Jan 4, 2019

geokb commented Jan 4, 2019

agonsalves commented Jan 4, 2019

aprokofyev commented Jan 3, 2019 •

edited

geokb commented Jan 3, 2019 •

edited

aprokofyev commented Jan 4, 2019 •

edited