Many WEB-search engines (e.g. startpage) response best results if a **region**
is selected, most often a language filter is also in the properties of the
WEB-search engine.
hint::
The **search language** should not be mixed: sometimes the language argument
is just the language of the UI with none effect on the result list.
To summarize:
Some WEB-search engines have language codes (e.g. `ca`) in their properties,
other have a region codes (e.g. `ca-ES`), some have regions and languages in
their properties (e.g. startpage) and other engine do not have any language or
region support.
In the past we generalized *language* over all kind of engines without taking
into mind that several WEB-search engines have best results when there is a
region selected.
This *language-centric* view in SearXNG is misleading when we need
region-codes to parameterize a engine request!
This patch replaces the *language-centric* view by a "language / region" view.
Conclusions:
With regions we can't say any longer that a engine supports *this or that*
language. By example: when the user selects 'zh' and a engine supports only
region codes like 'zh-TW' or 'zh-CN' we do not what results the user expects /
similar with 'en' or 'fr when the engine needs a region tag.
- Since it is unclear what the user expects by his language selection, we can't
assert a property that says: "supports_selected_language"
The feature is replaced in the UI by the wider sense of "language_support",
what stands for:
The engine has some kind of language support, either
by a region tag or by a language tag.
- A list of "supported_languages" does not make sense when there are regions
responsible for the result of an engine.
The "supported_languages" has been removed from the /config URL.
- The `has_language` test in the `searx/search/checker/impl.py` has been removed
since it does not cover engines with region support.
If there is a need for such a test we can implement new tests after all
engines with language (region) support has been moved to the *supported
properites* scheme (`'type': 'engine_properties'`) / see commit previous
commit:
[mod] engines_languages.json: add new type EngineProperties
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Most engines that support languages (and regions) use the Accept-Language from
the WEB browser to build a response that fits to the language (and region).
- add new engine option: send_accept_language_header
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Disable the python code formatting from python-black, where the readability of
code suffers by formatting.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
it prepares the new architecture change,
everything about multithreading in moved in the searx.search.* packages
previously the call to the "init" function of the engines was done in searx.engines:
* the network was not set (request not sent using the defined proxy)
* it requires to monkey patch the code to avoid HTTP requests during the tests
Some error won't stop the engine:
* additional HTTP redirects for example
* some invalid results
secondary=True allows to flag these errors as not important.
Report to the user suspended engines.
searx.search.processor.abstract:
* manages suspend time (per network).
* reports suspended time to the ResultContainer (method extend_container_if_suspended)
* adds the results to the ResultContainer (method extend_container)
* handles exceptions (method handle_exception)
settings.yml:
* outgoing.networks:
* can contains network definition
* propertiers: enable_http, verify, http2, max_connections, max_keepalive_connections,
keepalive_expiry, local_addresses, support_ipv4, support_ipv6, proxies, max_redirects, retries
* retries: 0 by default, number of times searx retries to send the HTTP request (using different IP & proxy each time)
* local_addresses can be "192.168.0.1/24" (it supports IPv6)
* support_ipv4 & support_ipv6: both True by default
see https://github.com/searx/searx/pull/1034
* each engine can define a "network" section:
* either a full network description
* either reference an existing network
* all HTTP requests of engine use the same HTTP configuration (it was not the case before, see proxy configuration in master)
* searx understand "!ddg !g time" as : send "!g time" to DDG
* !g a DDG bang for Google: DDG return a HTTP redirect to Google
This commit adds a the allows_redirect param not to follow HTTP redirect.
The DDG engine returns a empty result as before without HTTP redirect.
the query "time" is convinient because most of the search engine will return some results,
but some engines in the general category will return documentation about the HTML tags <time> or <input type="time">
see searx.search.processors.abstract.EngineProcessor
First the method searx call the get_params method.
If the return value is not None, then the searx call the method search.