New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CachingHostnameResolver with CONCURRENT_REQUESTS_PER_IP fails #6350
Comments
Where does that message, “Twisted/Scrapy Error Detected”, come from? It does not seem to exist in the Scrapy code base. Is it coming from your own spider code ( |
If that means some exception was caught (and silenced) you should at least show that exception instead. |
Thanks both, you're right, sorry, my bad (its been years since last I touched this codebase), I thought it was a twisted/scrapy error but yes, it's my generic error catch.
However, it's for the best, I got lucky in investigating this exception message, because I've changed one or my other settings since and now it works. Looking further into it, I can confirm: Breaks with above exception: Works: So it's the combination of |
Looks like |
Somewhat related to #3867. |
Let me investigate further |
It may be only happening with certain domains. |
Scrapy 2.11.1
lxml 5.2.1.0, libxml2 2.11.7, cssselect 1.2.0, parsel 1.9.1, w3lib 2.1.2, Twisted 24.3.0
I'm using the following settings:
When I added the DNS_RESOLVER line, I start getting every request except the very first one made by a spider resulting in:
The very first query made by the spider works fine, but 100% of the rest are the above error, at a rate of several hundred per second.
I'm using ProxyMiddleware, so I wonder if it's because all the requests are to localhost. Works absolutely fine with the default resolver
Edit: Further testing shows it also fails with the proxy off. It seems to work for the first request to any given domain, but all later requests return the
Twisted/Scrapy Error Detected
The text was updated successfully, but these errors were encountered: