Update bad bots #3678

SandakovMM · 2024-02-14T08:32:00Z

I propose updating the bad-bots list in the Apache configuration. While we are waiting for a complex solution in #1950, I would like to see the relevant list of bad-bots as soon as possible.
Therefore, we could update the currently used configuration file. This change is based on https://github.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker and inspired by the commentary from the noted issue.

… apache-badbots.conf file

…-bad-bot-blocker repository

sebres · 2024-02-14T14:38:19Z

I never understand the necessity for filters like this (don't like the idea to ban only by the agent, and then even as a blacklist). Let alone it is easy for other party to change the agent to something more browser specific.

In my opinion a prevention against bothering bots should look like:

on the web-server side:
- restrict user agents via robots.txt, robot-tags, etc
- set maximal connection and request limits for the critical locations or whole site.
- generate 401 (unauthorized), 403 (forbidden) and other 40x-responces for evildoers violating afore-mentioned policies and few other internal busines-logic relevant settings
- thereby comply with wiki :: Best practice / Reduce parasitic log-traffic
on fail2ban side:
- monitor this special log/journal for 40x (and additionally by user agent if necessarily, however with a whitelist if possible);
- find and ban connection and request limits violations;
- optionally create a jail finding a lot of 404.

But OK, since we provided this filter already, one can also update it occasionally.

SandakovMM · 2024-02-15T06:30:17Z

Yeah, I agree. However, it seems like many people are using the list because of the #1950 issue. So it might be a good idea to update it.
By the way, since mechanism looks not really great, are you planning to make further improvements, for example as discussed in this commentary?

sebres · 2024-02-16T14:39:07Z

are you planning to make further improvements, for example as discussed in #1950 (comment)?

This is not what can be improved in fail2ban... More or less this is individual solution.

SandakovMM · 2024-02-19T08:25:20Z

This is not what can be improved in fail2ban... More or less this is individual solution.

Ah, I see now. Ok, thank you.

Would you please merge the PR, or there is something else we should do?

sebres · 2024-02-19T13:08:52Z

Would you please merge the PR, or there is something else we should do?

Well, possibly one could update the RE (in order to make it a bit less "vulnerable", as well as accept another levels than GET, POST, HEAD):

- failregex = ^<HOST> -.*"(GET|POST|HEAD).*HTTP.*"(?:%(badbots)s|%(badbotscustom)s)"$
+ requri = /\S*
+ rescode = \d+
+ failregex = ^<ADDR> [^"]*"[A-Z]+\s+%(requri)s\s+[^"]*" %(rescode)s \d+ "[^"]*" "(?:%(badbots)s|%(badbotscustom)s)"$

(where requri and rescode are additional filter parameters allowing to restrict the request URI as well as response code, by default matches everything)

…s checker

SandakovMM · 2024-02-27T12:05:51Z

Apologies for the delay.
I have inserted the lines you recommended into the gen_badbots and updated the configuration file. I hope I have correctly understood your previous commentary =)

sebres · 2024-02-27T17:35:28Z

Regarding the last change (REs) - it looks good, but...

I'm still unsure about this PR as is: there are the lot of new bots now and therefore:

hardly to check all that bots more than ever;
easy to get a false positive for some "good" bot;
the filter becomes slow (secondary thing)

For instance since when exactly AhrefsBot or GPTBot have become bad bots?
At least they are in the list of https://radar.cloudflare.com/traffic/verified-bots as "Verified" good bots.

I know people using that, etc... And it is already ugly filter right now... But we have surely no intention to make it more ugly.

Mikhail Sandakov added 3 commits February 14, 2024 08:58

Update the apache-badbots configuration template to match the current…

31d39af

… apache-badbots.conf file

Get the list of malicious bots from the mitchellkrogza/nginx-ultimate…

1a36d3b

…-bad-bot-blocker repository

Update list of bad bots

5e6b37d

sebres added 2 commits February 14, 2024 16:00

Update .codespellrc: ignore bot-names in configs

4761dee

amend

8542c44

Make it possible to accept not only GET/POST/HEAD requests for badbot…

65ef68b

…s checker

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update bad bots #3678

Update bad bots #3678

SandakovMM commented Feb 14, 2024

sebres commented Feb 14, 2024

SandakovMM commented Feb 15, 2024

sebres commented Feb 16, 2024

SandakovMM commented Feb 19, 2024

sebres commented Feb 19, 2024 •

edited

SandakovMM commented Feb 27, 2024

sebres commented Feb 27, 2024

Update bad bots #3678

Are you sure you want to change the base?

Update bad bots #3678

Conversation

SandakovMM commented Feb 14, 2024

sebres commented Feb 14, 2024

SandakovMM commented Feb 15, 2024

sebres commented Feb 16, 2024

SandakovMM commented Feb 19, 2024

sebres commented Feb 19, 2024 • edited

SandakovMM commented Feb 27, 2024

sebres commented Feb 27, 2024

sebres commented Feb 19, 2024 •

edited