mirror of
https://github.com/searxng/searxng
synced 2024-01-01 19:24:07 +01:00

Anecdotally, using SearX over unreliable proxies, like tor, seems to be quite error prone. SearX puts quite an effort to measure the performance and reliability of engines, most likely owning to those aspects being of significant concern. The patch here proposes to mitigate related problems, by issuing concurrent redundant requests through the specified proxies at once, returning the first response that is not an error. The functionality is enabled using the: `proxy_request_redundancy` parameter within the outgoing network settings or the engine settings. Example: ```yaml outgoing: request_timeout: 8.0 proxies: "all://": - socks5h://tor:9050 - socks5h://tor1:9050 - socks5h://tor2:9050 - socks5h://tor3:9050 proxy_request_redundancy: 4 ``` In this example, each network request will be send 4 times, once through every proxy. The first (non-error) response wins. In my testing environment using several tor proxy end-points, this approach almost entirely removes engine errors related to timeouts and denied requests. The latency of the network system is also improved. The implementation, uses a `AsyncParallelTransport(httpx.AsyncBaseTransport)` wrapper to wrap multiple sub-trasports, and `asyncio.wait` to wait on the first completed request. The existing implementation of the network proxy cycling has also been moved into the `AsyncParallelTransport` class, which should improve network client memoization and performance. TESTED: - unit tests for the new functions and classes. - tested on desktop PC with 10+ upstream proxies and comparable request redundancy.
148 lines
3.9 KiB
Python
148 lines
3.9 KiB
Python
# SPDX-License-Identifier: AGPL-3.0-or-later
|
|
"""Implementations of the framework for the SearXNG engines.
|
|
|
|
.. hint::
|
|
|
|
The long term goal is to modularize all implementations of the engine
|
|
framework here in this Python package. ToDo:
|
|
|
|
- move implementations of the :ref:`searx.engines loader` to a new module in
|
|
the :py:obj:`searx.enginelib` namespace.
|
|
|
|
"""
|
|
|
|
|
|
from __future__ import annotations
|
|
from typing import List, Callable, TYPE_CHECKING
|
|
|
|
if TYPE_CHECKING:
|
|
from searx.enginelib import traits
|
|
|
|
|
|
class Engine: # pylint: disable=too-few-public-methods
|
|
"""Class of engine instances build from YAML settings.
|
|
|
|
Further documentation see :ref:`general engine configuration`.
|
|
|
|
.. hint::
|
|
|
|
This class is currently never initialized and only used for type hinting.
|
|
"""
|
|
|
|
# Common options in the engine module
|
|
|
|
engine_type: str
|
|
"""Type of the engine (:ref:`searx.search.processors`)"""
|
|
|
|
paging: bool
|
|
"""Engine supports multiple pages."""
|
|
|
|
time_range_support: bool
|
|
"""Engine supports search time range."""
|
|
|
|
safesearch: bool
|
|
"""Engine supports SafeSearch"""
|
|
|
|
language_support: bool
|
|
"""Engine supports languages (locales) search."""
|
|
|
|
language: str
|
|
"""For an engine, when there is ``language: ...`` in the YAML settings the engine
|
|
does support only this one language:
|
|
|
|
.. code:: yaml
|
|
|
|
- name: google french
|
|
engine: google
|
|
language: fr
|
|
"""
|
|
|
|
region: str
|
|
"""For an engine, when there is ``region: ...`` in the YAML settings the engine
|
|
does support only this one region::
|
|
|
|
.. code:: yaml
|
|
|
|
- name: google belgium
|
|
engine: google
|
|
region: fr-BE
|
|
"""
|
|
|
|
fetch_traits: Callable
|
|
"""Function to to fetch engine's traits from origin."""
|
|
|
|
traits: traits.EngineTraits
|
|
"""Traits of the engine."""
|
|
|
|
# settings.yml
|
|
|
|
categories: List[str]
|
|
"""Specifies to which :ref:`engine categories` the engine should be added."""
|
|
|
|
name: str
|
|
"""Name that will be used across SearXNG to define this engine. In settings, on
|
|
the result page .."""
|
|
|
|
engine: str
|
|
"""Name of the python file used to handle requests and responses to and from
|
|
this search engine (file name from :origin:`searx/engines` without
|
|
``.py``)."""
|
|
|
|
enable_http: bool
|
|
"""Enable HTTP (by default only HTTPS is enabled)."""
|
|
|
|
shortcut: str
|
|
"""Code used to execute bang requests (``!foo``)"""
|
|
|
|
timeout: float
|
|
"""Specific timeout for search-engine."""
|
|
|
|
display_error_messages: bool
|
|
"""Display error messages on the web UI."""
|
|
|
|
proxies: dict
|
|
"""Set proxies for a specific engine (YAML):
|
|
|
|
.. code:: yaml
|
|
|
|
proxies :
|
|
http: socks5://proxy:port
|
|
https: socks5://proxy:port
|
|
"""
|
|
|
|
proxy_request_redundancy: int
|
|
"""Cycle proxies one by one (``1``) or
|
|
use them in parallel at once (``> 1``) for this engine."""
|
|
|
|
disabled: bool
|
|
"""To disable by default the engine, but not deleting it. It will allow the
|
|
user to manually activate it in the settings."""
|
|
|
|
inactive: bool
|
|
"""Remove the engine from the settings (*disabled & removed*)."""
|
|
|
|
about: dict
|
|
"""Additional fields describing the engine.
|
|
|
|
.. code:: yaml
|
|
|
|
about:
|
|
website: https://example.com
|
|
wikidata_id: Q306656
|
|
official_api_documentation: https://example.com/api-doc
|
|
use_official_api: true
|
|
require_api_key: true
|
|
results: HTML
|
|
"""
|
|
|
|
using_tor_proxy: bool
|
|
"""Using tor proxy (``true``) or not (``false``) for this engine."""
|
|
|
|
send_accept_language_header: bool
|
|
"""When this option is activated, the language (locale) that is selected by
|
|
the user is used to build and send a ``Accept-Language`` header in the
|
|
request to the origin search engine."""
|
|
|
|
tokens: List[str]
|
|
"""A list of secret tokens to make this engine *private*, more details see
|
|
:ref:`private engines`."""
|