New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crawler should honor the Crawl-Delay if obeyRobotsTxt:true #194
Comments
@yujiosaka You are right, this is not part of the standard. But it looks like everyone agree that it is expected to be as a number of seconds and if the crawler may not obey it out of the box we should have some way to enforce it. It would be sad to be banned from accessing a site because we did not obey their rules :) I do not quite see how providing a robots.txt could be a solution? Or you meant like I could configure the |
What is the current behavior?
The
Crawl-Delay
is ignored.What is the expected behavior?
The
Crawl-Delay
should be honored, it can be retrieved usinggetCrawlDelay()
on the robots parser.What is the motivation / use case for changing the behavior?
A bot is bound to respect all the directives of the robots.txt
The text was updated successfully, but these errors were encountered: