Anecdotally, using SearX over unreliable proxies,
like tor, seems to be quite error prone.
SearX puts quite an effort to measure the
performance and reliability of engines, most
likely owning to those aspects being of
significant concern.
The patch here proposes to mitigate related
problems, by issuing concurrent redundant requests
through the specified proxies at once, returning
the first response that is not an error.
The functionality is enabled using the:
`proxy_request_redundancy` parameter within the
outgoing network settings or the engine settings.
Example:
```yaml
outgoing:
request_timeout: 8.0
proxies:
"all://":
- socks5h://tor:9050
- socks5h://tor1:9050
- socks5h://tor2:9050
- socks5h://tor3:9050
proxy_request_redundancy: 4
```
In this example, each network request will be
send 4 times, once through every proxy. The
first (non-error) response wins.
In my testing environment using several tor proxy
end-points, this approach almost entirely
removes engine errors related to timeouts
and denied requests. The latency of the
network system is also improved.
The implementation, uses a
`AsyncParallelTransport(httpx.AsyncBaseTransport)`
wrapper to wrap multiple sub-trasports,
and `asyncio.wait` to wait on the first completed
request.
The existing implementation of the network
proxy cycling has also been moved into the
`AsyncParallelTransport` class, which should
improve network client memoization and
performance.
TESTED:
- unit tests for the new functions and classes.
- tested on desktop PC with 10+ upstream proxies
and comparable request redundancy.
In the "Engines" tab on searx.space [1] nearly all engines report a
TimeoutException: yep engine
As documented in issue #2444 [2], this problem can be fixed by increasing the
timeout. Note: on a local instance (`make run`) the timeout of 3sec was
sufficient / at least in my local test, but the balance of searx.space leads me
to believe that this tight timeout is usually not sufficient.
[1] https://searx.space/
[2] https://github.com/searxng/searxng/issues/2444
Closes https://github.com/searxng/searxng/issues/3421
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
In commit 8af181533 in PR:
- https://github.com/searxng/searxng/pull/3321
the category `journal_article` has been removed, `book_any` has been removed
longer time ago.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Changed value of "extra_proxy_timeout" from 10.0 to 10 as the variable expects an int.
Uncommenting this value with a non-int value will throw many errors and crash all engines.
timeout: 4.0
The timeout of presearch-WEB is left up from the default of 3sec to 4sec. The
engine has to send two HTTP requests, they often exceed the default timeout of
3sec. Since all other presearch categories (images, videos, news) also have a
timeout of 4 sec, the WEB search should also have the same timeout.
network: presearch
Place all HTTP requests in the same network, named ``presearch``.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
All the environments defined in ./utils/brand.env are generated on the fly, so
there is no longer a need to define the brand environment in this file and all
the workflows to handle this file.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>