[network]: Add redundant parallel proxy requests.

Anecdotally, using SearX over unreliable proxies,
like tor, seems to be quite error prone.
SearX puts quite an effort to measure the
performance and reliability of engines, most
likely owning to those aspects being of
significant concern.

The patch here proposes to mitigate related
problems, by issuing concurrent redundant requests
through the specified proxies at once, returning
the first response that is not an error.
The functionality is enabled using the:
`proxy_request_redundancy` parameter within the
outgoing network settings or the engine settings.

Example:

```yaml

outgoing:
    request_timeout: 8.0
    proxies:
        "all://":
            - socks5h://tor:9050
            - socks5h://tor1:9050
            - socks5h://tor2:9050
            - socks5h://tor3:9050
    proxy_request_redundancy: 4
```

In this example, each network request will be
send 4 times, once through every proxy. The
first (non-error) response wins.

In my testing environment using several tor proxy
end-points, this approach almost entirely
removes engine errors related to timeouts
and denied requests. The latency of the
network system is also improved.

The implementation, uses a
`AsyncParallelTransport(httpx.AsyncBaseTransport)`
wrapper to wrap multiple sub-trasports,
and `asyncio.wait` to wait on the first completed
request.

The existing implementation of the network
proxy cycling has also been moved into the
`AsyncParallelTransport` class, which should
improve network client memoization and
performance.

TESTED:
- unit tests for the new functions and classes.
- tested on desktop PC with 10+ upstream proxies
    and comparable request redundancy.
This commit is contained in:
czaky 2024-05-17 02:09:29 +00:00
parent 2f2d93b292
commit 122a9568de
10 changed files with 382 additions and 59 deletions

View file

@ -47,6 +47,7 @@ engine is shown. Most of the options have a default value or even are optional.
max_keepalive_connections: 10
keepalive_expiry: 5.0
using_tor_proxy: false
proxy_request_redundancy: 1
proxies:
http:
- http://proxy1:8080
@ -154,6 +155,9 @@ engine is shown. Most of the options have a default value or even are optional.
``proxies`` :
Overwrites proxy settings from :ref:`settings outgoing`.
``proxy_request_redundancy`` :
Overwrites proxy settings from :ref:`settings outgoing`.
``using_tor_proxy`` :
Using tor proxy (``true``) or not (``false``) for this engine. The default is
taken from ``using_tor_proxy`` of the :ref:`settings outgoing`.
@ -241,4 +245,3 @@ Example configuration in settings.yml for a German and English speaker:
When searching, the default google engine will return German results and
"google english" will return English results.

View file

@ -22,9 +22,9 @@ Communication with search engines.
# and https://www.python-httpx.org/compatibility/#ssl-configuration
# verify: ~/.mitmproxy/mitmproxy-ca-cert.cer
#
# uncomment below section if you want to use a proxyq see: SOCKS proxies
# Uncomment below section if you want to use a proxy. See:
# https://2.python-requests.org/en/latest/user/advanced/#proxies
# are also supported: see
# SOCKS proxies are also supported. See:
# https://2.python-requests.org/en/latest/user/advanced/#socks
#
# proxies:
@ -34,6 +34,11 @@ Communication with search engines.
#
# using_tor_proxy: true
#
# Uncomment below if you want to make multiple request in parallel
# through all the proxies at once:
#
# proxy_request_redundancy: 10
#
# Extra seconds to add in order to account for the time taken by the proxy
#
# extra_proxy_timeout: 10.0
@ -70,6 +75,10 @@ Communication with search engines.
If there are more than one proxy for one protocol (http, https),
requests to the engines are distributed in a round-robin fashion.
``proxy_request_redundancy`` :
Cycle the proxies (``1``) on by one or use them in parallel (``> 1``) for all engines.
The default is ``1`` and can be overwritten in the :ref:`settings engine`
``source_ips`` :
If you use multiple network interfaces, define from which IP the requests must
be made. Example:
@ -106,5 +115,3 @@ Communication with search engines.
``using_tor_proxy`` :
Using tor proxy (``true``) or not (``false``) for all engines. The default is
``false`` and can be overwritten in the :ref:`settings engine`