Bot Detection¶

X-Forwarded-For ¶

Attention

A correct setup of the HTTP request headers X-Forwarded-For and X-Real-IP is essential to be able to assign a request to an IP correctly:

searx.botdetection.get_real_ip(request: Request) → str[source]¶

Returns real IP of the request. Since not all proxies set all the HTTP headers and incoming headers can be faked it may happen that the IP cannot be determined correctly.

This function tries to get the remote IP in the order listed below, additional some tests are done and if inconsistencies or errors are detected, they are logged.

The remote IP of the request is taken from (first match):

Bot protection / IP rate limitation. The intention of rate limitation is to limit suspicious requests from an IP. The motivation behind this is the fact that SearXNG passes through requests from bots and is thus classified as a bot itself. As a result, the SearXNG engine then receives a CAPTCHA or is blocked by the search engine (the origin) in some other way.

To avoid blocking, the requests from bots to SearXNG must also be blocked, this is the task of the limiter. To perform this task, the limiter uses the methods from the searx.botdetection.

To enable the limiter activate:

server:
  ...
  limiter: true  # rate limit the number of request on the instance, block some bots

and set the redis-url connection. Check the value, it depends on your redis DB (see redis:), by example:

redis:
  url: unix:///usr/local/searxng-redis/run/redis.sock?db=0

searx.botdetection.limiter.LIMITER_CFG = PosixPath('/etc/searxng/limiter.toml')¶: Lokal Limiter configuration.

searx.botdetection.limiter.LIMITER_CFG_SCHEMA = PosixPath('/home/runner/work/searxng/searxng/searx/botdetection/limiter.toml')¶: Base configuration (schema) of the botdetection.

Method `ip_lists`¶

The ip_lists method implements IP block- and pass-lists.

[botdetection.ip_lists]

pass_ip = [
 '140.238.172.132', # IPv4 of check.searx.space
 '192.168.0.0/16',  # IPv4 private network
 'fe80::/10'        # IPv6 linklocal
]
block_ip = [
   '93.184.216.34', # IPv4 of example.org
   '257.1.1.1',     # invalid IP --> will be ignored, logged in ERROR class
]

searx.botdetection.ip_lists.block_ip(real_ip: IPv4Address | IPv6Address, cfg: config.Config) → Tuple[bool, str][source]¶: Checks if the IP on the subnet is in one of the members of the botdetection.ip_lists.block_ip list.

searx.botdetection.ip_lists.pass_ip(real_ip: IPv4Address | IPv6Address, cfg: config.Config) → Tuple[bool, str][source]¶: Checks if the IP on the subnet is in one of the members of the botdetection.ip_lists.pass_ip list.

searx.botdetection.ip_lists.SEARXNG_ORG = ['140.238.172.132', '2603:c022:0:4900::/56']¶: Passlist of IPs from the SearXNG organization, e.g. check.searx.space.

Rate limit ¶

Method `ip_limit`¶

The ip_limit method counts request from an IP in sliding windows. If there are to many requests in a sliding window, the request is evaluated as a bot request. This method requires a redis DB and needs a HTTP X-Forwarded-For header. To take privacy only the hash value of an IP is stored in the redis DB and at least for a maximum of 10 minutes.

The link_token method can be used to investigate whether a request is suspicious. To activate the link_token method in the ip_limit method add the following to your /etc/searxng/limiter.toml:

[botdetection.ip_limit]
link_token = true

If the link_token method is activated and a request is suspicious the request rates are reduced:

BURST_MAX -> BURST_MAX_SUSPICIOUS
LONG_MAX -> LONG_MAX_SUSPICIOUS

To intercept bots that get their IPs from a range of IPs, there is a SUSPICIOUS_IP_WINDOW. In this window the suspicious IPs are stored for a longer time. IPs stored in this sliding window have a maximum of SUSPICIOUS_IP_MAX accesses before they are blocked. As soon as the IP makes a request that is not suspicious, the sliding window for this IP is droped.

searx.botdetection.ip_limit.API_MAX = 4¶: Maximum requests from one IP in the API_WONDOW

searx.botdetection.ip_limit.API_WONDOW = 3600¶: Time (sec) before sliding window for API requests (format != html) expires.

searx.botdetection.ip_limit.BURST_MAX = 15¶: Maximum requests from one IP in the BURST_WINDOW

searx.botdetection.ip_limit.BURST_MAX_SUSPICIOUS = 2¶: Maximum of suspicious requests from one IP in the BURST_WINDOW

searx.botdetection.ip_limit.BURST_WINDOW = 20¶: Time (sec) before sliding window for burst requests expires.

searx.botdetection.ip_limit.LONG_MAX = 150¶: Maximum requests from one IP in the LONG_WINDOW

searx.botdetection.ip_limit.LONG_MAX_SUSPICIOUS = 10¶: Maximum suspicious requests from one IP in the LONG_WINDOW

searx.botdetection.ip_limit.LONG_WINDOW = 600¶: Time (sec) before the longer sliding window expires.

searx.botdetection.ip_limit.SUSPICIOUS_IP_MAX = 3¶: Maximum requests from one suspicious IP in the SUSPICIOUS_IP_WINDOW.

searx.botdetection.ip_limit.SUSPICIOUS_IP_WINDOW = 2592000¶: Time (sec) before sliding window for one suspicious IP expires.

Method `link_token`¶

The link_token method evaluates a request as suspicious if the URL /client<token>.css is not requested by the client. By adding a random component (the token) in the URL, a bot can not send a ping by request a static URL.

Note

This method requires a redis DB and needs a HTTP X-Forwarded-For header.

To get in use of this method a flask URL route needs to be added:

@app.route('/client<token>.css', methods=['GET', 'POST'])
def client_token(token=None):
    link_token.ping(request, token)
    return Response('', mimetype='text/css')

And in the HTML template from flask a stylesheet link is needed (the value of link_token comes from get_token):

<link rel="stylesheet"
      href="{{ url_for('client_token', token=link_token) }}"
      type="text/css" />

searx.botdetection.link_token.get_ping_key(network: IPv4Network | IPv6Network, request: flask.Request) → str[source]¶: Generates a hashed key that fits (more or less) to a WEB-browser session in a network.

searx.botdetection.link_token.get_token() → str[source]¶

Returns current token. If there is no currently active token a new token is generated randomly and stored in the redis DB.

TOKEN_LIVE_TIME
TOKEN_KEY

searx.botdetection.link_token.is_suspicious(network: IPv4Network | IPv6Network, request: flask.Request, renew: bool = False)[source]¶: Checks whether a valid ping is exists for this (client) network, if not this request is rated as suspicious. If a valid ping exists and argument renew is True the expire time of this ping is reset to PING_LIVE_TIME.

searx.botdetection.link_token.ping(request: Request, token: str)[source]¶: This function is called by a request to URL /client<token>.css. If token is valid a PING_KEY for the client is stored in the DB. The expire time of this ping-key is PING_LIVE_TIME.

searx.botdetection.link_token.PING_KEY = 'SearXNG_limiter.ping'¶: Prefix of all ping-keys generated by get_ping_key

searx.botdetection.link_token.PING_LIVE_TIME = 3600¶: Livetime (sec) of the ping-key from a client (request)

searx.botdetection.link_token.TOKEN_KEY = 'SearXNG_limiter.token'¶: Key for which the current token is stored in the DB

searx.botdetection.link_token.TOKEN_LIVE_TIME = 600¶: Livetime (sec) of limiter’s CSS token.

Probe HTTP headers ¶

Method `http_accept`¶

The http_accept method evaluates a request as the request of a bot if the Accept header ..

did not contain text/html

Method `http_accept_encoding`¶

The http_accept_encoding method evaluates a request as the request of a bot if the Accept-Encoding header ..

did not contain gzip AND deflate (if both values are missed)
did not contain text/html

searx.botdetection.http_user_agent.USER_AGENT = '(unknown|[Cc][Uu][Rr][Ll]|[wW]get|Scrapy|splash|JavaFX|FeedFetcher|python-requests|Go-http-client|Java|Jakarta|okhttp|HttpClient|Jersey|Python|libwww-perl|Ruby|SynHttpClient|UniversalFeedParser|Googlebot|GoogleImageProxy|bingbot|Baiduspider|yacybot|YandexMobileBot|YandexBot|Yahoo! Slurp|MJ12bot|AhrefsBot|archive.org_bot|msnbot|MJ12bot|SeznamBot|linkdexbot|Netvibes|SMTBot|zgrab|James BOT|Sogou|Abonti|Pixray|Spinn3r|SemrushBot|Exabot|ZmEu|BLEXBot|bitlybot|Mozilla/5\\.0\\ \\(compatible;\\ Farside/0\\.1\\.0;\\ \\+https://farside\\.link\\)|.*PetalBot.*)'¶: Regular expression that matches to User-Agent from known bots

Bot Detection¶

X-Forwarded-For ¶

Limiter ¶

Method `ip_lists`¶

Rate limit ¶

Method `ip_limit`¶

Method `link_token`¶

Probe HTTP headers ¶

Method `http_accept`¶

Method `http_accept_encoding`¶

Method `http_accept_language`¶

Method `http_connection`¶

Method `http_user_agent`¶

Table of Contents

Project Links

Navigation

This Page