Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update bad bots #3678

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Conversation

SandakovMM
Copy link

I propose updating the bad-bots list in the Apache configuration. While we are waiting for a complex solution in #1950, I would like to see the relevant list of bad-bots as soon as possible.
Therefore, we could update the currently used configuration file. This change is based on https://github.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker and inspired by the commentary from the noted issue.

@sebres
Copy link
Contributor

sebres commented Feb 14, 2024

I never understand the necessity for filters like this (don't like the idea to ban only by the agent, and then even as a blacklist). Let alone it is easy for other party to change the agent to something more browser specific.

In my opinion a prevention against bothering bots should look like:

  • on the web-server side:
    • restrict user agents via robots.txt, robot-tags, etc
    • set maximal connection and request limits for the critical locations or whole site.
    • generate 401 (unauthorized), 403 (forbidden) and other 40x-responces for evildoers violating afore-mentioned policies and few other internal busines-logic relevant settings
    • thereby comply with wiki :: Best practice / Reduce parasitic log-traffic
  • on fail2ban side:
    • monitor this special log/journal for 40x (and additionally by user agent if necessarily, however with a whitelist if possible);
    • find and ban connection and request limits violations;
    • optionally create a jail finding a lot of 404.

But OK, since we provided this filter already, one can also update it occasionally.

@SandakovMM
Copy link
Author

Yeah, I agree. However, it seems like many people are using the list because of the #1950 issue. So it might be a good idea to update it.
By the way, since mechanism looks not really great, are you planning to make further improvements, for example as discussed in this commentary?

@sebres
Copy link
Contributor

sebres commented Feb 16, 2024

are you planning to make further improvements, for example as discussed in #1950 (comment)?

This is not what can be improved in fail2ban... More or less this is individual solution.

@SandakovMM
Copy link
Author

This is not what can be improved in fail2ban... More or less this is individual solution.

Ah, I see now. Ok, thank you.

Would you please merge the PR, or there is something else we should do?

@sebres
Copy link
Contributor

sebres commented Feb 19, 2024

Would you please merge the PR, or there is something else we should do?

Well, possibly one could update the RE (in order to make it a bit less "vulnerable", as well as accept another levels than GET, POST, HEAD):

- failregex = ^<HOST> -.*"(GET|POST|HEAD).*HTTP.*"(?:%(badbots)s|%(badbotscustom)s)"$
+ requri = /\S*
+ rescode = \d+
+ failregex = ^<ADDR> [^"]*"[A-Z]+\s+%(requri)s\s+[^"]*" %(rescode)s \d+ "[^"]*" "(?:%(badbots)s|%(badbotscustom)s)"$

(where requri and rescode are additional filter parameters allowing to restrict the request URI as well as response code, by default matches everything)

@SandakovMM
Copy link
Author

Apologies for the delay.
I have inserted the lines you recommended into the gen_badbots and updated the configuration file. I hope I have correctly understood your previous commentary =)

@sebres
Copy link
Contributor

sebres commented Feb 27, 2024

Regarding the last change (REs) - it looks good, but...

I'm still unsure about this PR as is: there are the lot of new bots now and therefore:

  • hardly to check all that bots more than ever;
  • easy to get a false positive for some "good" bot;
  • the filter becomes slow (secondary thing)

For instance since when exactly AhrefsBot or GPTBot have become bad bots?
At least they are in the list of https://radar.cloudflare.com/traffic/verified-bots as "Verified" good bots.

I know people using that, etc... And it is already ugly filter right now... But we have surely no intention to make it more ugly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants