Commit graph

23 commits

Author SHA1 Message Date
czaky
9441c97a26 cleanup 2024-05-18 00:27:37 +00:00
czaky
5fe7e42d1a Add preference for non 404 responses fix test RC.
Fixed race condition with the 404 test.
2024-05-17 12:58:47 +00:00
czaky
122a9568de [network]: Add redundant parallel proxy requests.
Anecdotally, using SearX over unreliable proxies,
like tor, seems to be quite error prone.
SearX puts quite an effort to measure the
performance and reliability of engines, most
likely owning to those aspects being of
significant concern.

The patch here proposes to mitigate related
problems, by issuing concurrent redundant requests
through the specified proxies at once, returning
the first response that is not an error.
The functionality is enabled using the:
`proxy_request_redundancy` parameter within the
outgoing network settings or the engine settings.

Example:

```yaml

outgoing:
    request_timeout: 8.0
    proxies:
        "all://":
            - socks5h://tor:9050
            - socks5h://tor1:9050
            - socks5h://tor2:9050
            - socks5h://tor3:9050
    proxy_request_redundancy: 4
```

In this example, each network request will be
send 4 times, once through every proxy. The
first (non-error) response wins.

In my testing environment using several tor proxy
end-points, this approach almost entirely
removes engine errors related to timeouts
and denied requests. The latency of the
network system is also improved.

The implementation, uses a
`AsyncParallelTransport(httpx.AsyncBaseTransport)`
wrapper to wrap multiple sub-trasports,
and `asyncio.wait` to wait on the first completed
request.

The existing implementation of the network
proxy cycling has also been moved into the
`AsyncParallelTransport` class, which should
improve network client memoization and
performance.

TESTED:
- unit tests for the new functions and classes.
- tested on desktop PC with 10+ upstream proxies
    and comparable request redundancy.
2024-05-17 02:09:29 +00:00
Markus Heiser
542f7d0d7b [mod] pylint all files with one profile / drop PYLINT_SEARXNG_DISABLE_OPTION
In the past, some files were tested with the standard profile, others with a
profile in which most of the messages were switched off ... some files were not
checked at all.

- ``PYLINT_SEARXNG_DISABLE_OPTION`` has been abolished
- the distinction ``# lint: pylint`` is no longer necessary
- the pylint tasks have been reduced from three to two

  1. ./searx/engines -> lint engines with additional builtins
  2. ./searx ./searxng_extra ./tests -> lint all other python files

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-03-11 14:55:38 +01:00
Alexandre Flament
97b1df1629 [mod] searx.network: memory optimization
Avoid to create a SSLContext in AsyncHTTPTransportNoHttp

See:
* 0f61aa58d6/httpx/_transports/default.py (L271)
* https://github.com/encode/httpx/issues/2298
2023-08-27 11:49:40 +02:00
Alexandre Flament
b4e4cfc026 Bump httpx 0.21.2 from to 0.24.1 2023-08-21 22:05:12 +02:00
Markus Heiser
8fa54ffddf [mod] Shuffle httpx's default ciphers of a SSL context randomly.
From the analyse of @9Ninety [1] we know that DDG (and may be other engines / I
have startpage in mind) does some kind of TLS fingerprint to block bots.

This patch shuffles the default ciphers from httpx to avoid a cipher profile
that is known to httpx (and blocked by DDG).

[1] https://github.com/searxng/searxng/issues/2246#issuecomment-1467895556

----

From `What Is TLS Fingerprint and How to Bypass It`_

> When implementing TLS fingerprinting, servers can't operate based on a
> locked-in whitelist database of fingerprints.  New fingerprints appear
> when web clients or TLS libraries release new versions. So, they have to
> live off a blocklist database instead.
> ...
> It's safe to leave the first three as is but shuffle the remaining ciphers
> and you can bypass the TLS fingerprint check.

.. _What Is TLS Fingerprint and How to Bypass It:
   https://www.zenrows.com/blog/what-is-tls-fingerprint#how-to-bypass-tls-fingerprinting

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Closes: https://github.com/searxng/searxng/issues/2246
2023-03-19 13:40:31 +01:00
Alexandre Flament
32e8c2cf09 searx.network: add "verify" option to the networks
Each network can define a verify option:
* false to disable certificate verification
* a path to existing certificate.

SearXNG uses SSL_CERT_FILE and SSL_CERT_DIR when they are defined
see https://www.python-httpx.org/environment_variables/#ssl_cert_file
2022-10-14 13:59:22 +00:00
Martin Fischer
def62c3a47 [typing] add type hints for dictionaries 2022-01-17 11:42:48 +01:00
Alexandre Flament
e64c3deab7 [mod] upgrade httpx 0.21.2
httpx 0.21.2 and httpcore 0.14.4 fix multiple issues:
* https://github.com/encode/httpx/releases/tag/0.21.2
* https://github.com/encode/httpcore/releases/tag/0.14.4

so most of the workarounds in searx.network have been removed.
2022-01-05 18:46:00 +01:00
Markus Heiser
3d96a9839a [format.python] initial formatting of the python code
This patch was generated by black [1]::

    make format.python

[1] https://github.com/psf/black

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-12-27 09:26:22 +01:00
Alexandre Flament
29893cf816 [fix] searx.network.stream: fix memory leak 2021-09-28 19:28:12 +02:00
Alexandre Flament
dc74df3a55
Merge pull request #261 from dalf/upgrade_httpx
[upd] upgrade httpx 0.19.0
2021-09-17 11:48:37 +02:00
Markus Heiser
443bf35e09 [pylint] fix global-variable-not-assigned issues
If there is no write access, there is no need for global.  Remove global
statement if there is no assignment.

global-variable-not-assigned:
  Using global for names but no assignment is done Used when a variable is
  defined through the "global" statement but no assignment to this variable is
  done.

In Pylint 2.11 the global-variable-not-assigned checker now catches global
variables that are never reassigned in a local scope and catches (reassigned)
functions [1][2]

[1] https://pylint.pycqa.org/en/latest/whatsnew/2.11.html
[2] https://github.com/PyCQA/pylint/issues/1375

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-09-17 10:14:27 +02:00
Alexandre Flament
b10403d3a1 [mod] searx.network: remove redundant code
searx.client.new_client: the proxies parameter is a dictonnary,
and the protocol (key of the dictionnary) is already normalized
(see usage of searx.network.network.PROXY_PATTERN_MAPPING)
2021-09-17 10:06:24 +02:00
Alexandre Flament
8e73438cbe [upd] upgrade httpx 0.19.0
adjust searx.network module to the new internal API
see https://github.com/encode/httpx/pull/1522
2021-09-17 10:06:22 +02:00
Alexandre Flament
91a6d80e82 [mod] debug mode: log HTTP requests with network name
For example wikipedia requests use the logger name "searx.network.wikipedia"

Log is disable when searx_debug is False
2021-09-11 10:13:14 +02:00
Markus Heiser
2a3b9a2e26 [pylint] searx: drop no longer needed 'missing-function-docstring'
Suggested-by: @dalf https://github.com/searxng/searxng/issues/102#issuecomment-914168470
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-09-07 13:34:35 +02:00
Markus Heiser
1499002ceb [coding-style] searx/network/client.py - normalized indentations
No functional change!

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-05-24 17:44:43 +02:00
Markus Heiser
b595c482d0 [pylint] searx/network/client.py & add global (TRANSPORT_KWARGS)
No functional change!

- fix messages from pylint
- add ``global TRANSPORT_KWARGS``
- normalized python_socks imports

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-05-24 17:39:37 +02:00
Alexandre Flament
0f4e995ab4 [mod] searx.network.client: the same configuration reuses the same ssl.SSLContext
before there was one ssl.SSLContext per client.

see https://github.com/encode/httpx/issues/978
2021-05-05 20:36:37 +02:00
Alexandre Flament
283ae7bfad [fix] searx.network: fix rare cases where LOOP is None
* searx.network.client.LOOP is initialized in a thread
* searx.network.__init__ imports LOOP which may happen
  before the thread has initialized LOOP

This commit adds a new function "searx.network.client.get_loop()"
to fix this issue
2021-04-27 17:47:36 +02:00
Alexandre Flament
d14994dc73 [httpx] replace searx.poolrequests by searx.network
settings.yml:

* outgoing.networks:
   * can contains network definition
   * propertiers: enable_http, verify, http2, max_connections, max_keepalive_connections,
     keepalive_expiry, local_addresses, support_ipv4, support_ipv6, proxies, max_redirects, retries
   * retries: 0 by default, number of times searx retries to send the HTTP request (using different IP & proxy each time)
   * local_addresses can be "192.168.0.1/24" (it supports IPv6)
   * support_ipv4 & support_ipv6: both True by default
     see https://github.com/searx/searx/pull/1034
* each engine can define a "network" section:
   * either a full network description
   * either reference an existing network

* all HTTP requests of engine use the same HTTP configuration (it was not the case before, see proxy configuration in master)
2021-04-12 17:25:56 +02:00