One reason for the often seen CAPTCHA of the startpage requests are the
incomplete requests SearXNG sends to startpage.com. To avoid CAPTCHA we need to
send a well formed HTTP POST request with a cookie, we need to form a request
that is identical to the request build by startpage.com itself:
- in the cookie the **region** is selected
- in the POST arguments the **language** is selected
Based on the *engine_properties* boilerplate, SearXNG's startpage engine now
implements a `_fetch_engine_properties()` function to fetch regions & languages
from startpage.com.
This patch is a complete new implementation of the request() function, reversed
engineered from the startpage.com page. The new implementation adds
- time-range support
- save-search support
to the startpage engine which has been missed in the past.
The locale code 'no_NO' from startpage does not exists and is mapped to nb-NO.
For reference see languages-subtag at iana [1], `no` is the macrolanguage::
type: language
Subtag: nb
Description: Norwegian Bokmål
Added: 2005-10-16
Suppress-Script: Latn
Macrolanguage: no
Additional hints:
- To fetch languages from startpage, this patch makes use of the
EngineProperties implemented in 7bf0d46c
- Te get Startpage's locale & language, the function get_engine_locale from
9ae409a is used.
[1] https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
[2] https://www.w3.org/International/questions/qa-choosing-language-tags#langsubtag
Closes: https://github.com/searxng/searxng/issues/1081
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Many WEB-search engines (e.g. startpage) response best results if a **region**
is selected, most often a language filter is also in the properties of the
WEB-search engine.
hint::
The **search language** should not be mixed: sometimes the language argument
is just the language of the UI with none effect on the result list.
To summarize:
Some WEB-search engines have language codes (e.g. `ca`) in their properties,
other have a region codes (e.g. `ca-ES`), some have regions and languages in
their properties (e.g. startpage) and other engine do not have any language or
region support.
In the past we generalized *language* over all kind of engines without taking
into mind that several WEB-search engines have best results when there is a
region selected.
This *language-centric* view in SearXNG is misleading when we need
region-codes to parameterize a engine request!
This patch replaces the *language-centric* view by a "language / region" view.
Conclusions:
With regions we can't say any longer that a engine supports *this or that*
language. By example: when the user selects 'zh' and a engine supports only
region codes like 'zh-TW' or 'zh-CN' we do not what results the user expects /
similar with 'en' or 'fr when the engine needs a region tag.
- Since it is unclear what the user expects by his language selection, we can't
assert a property that says: "supports_selected_language"
The feature is replaced in the UI by the wider sense of "language_support",
what stands for:
The engine has some kind of language support, either
by a region tag or by a language tag.
- A list of "supported_languages" does not make sense when there are regions
responsible for the result of an engine.
The "supported_languages" has been removed from the /config URL.
- The `has_language` test in the `searx/search/checker/impl.py` has been removed
since it does not cover engines with region support.
If there is a need for such a test we can implement new tests after all
engines with language (region) support has been moved to the *supported
properites* scheme (`'type': 'engine_properties'`) / see commit previous
commit:
[mod] engines_languages.json: add new type EngineProperties
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
"type": "engine_properties"
Supported languages in qwant are locales with a territory tag (aka regions).
Moved `supported_languages` to `EngineProperties.regions`.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This patch adds the boilerplate code, needed to fetch properties from engines.
In the past we only fetched *languages* but some engines need *regions* to
parameterize the engine request.
To fit into our *fetch language* procedures the boilerplate is implemented in
the `searxng_extra/update/update_languages.py` and the *engine_properties* are
stored along in the `searx/data/engines_languages.json`.
This implementation is downward compatible to the `_fetch_fetch_languages()`
infrastructure we have. If there comes the day we have all
`_fetch_fetch_languages()` implementations moved to `_fetch_engine_properties()`
implementations, we can rename the files and scripts.
The new type `EngineProperties` is a dictionary with keys `languages` and
`regions`. The values are dictionaries to map from SearXNG's language & region
to option values the engine does use::
engine_properties = {
'type' : 'engine_properties', # <-- !!!
'regions': {
# 'ca-ES' : <engine's region name>
},
'languages': {
# 'ca' : <engine's language name>
},
}
Similar to the `supported_languages`, in the engine the properties are available
under the name `supported_properties`.
Initial we start with languages & regions, but in a wider sense the type is
named *engine properties*. Engines can store in whatever options they need and
may be in the future there is a need to fetch additional or complete different
properties.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
With the POST method, autocomplete.js does not URL encode the values.
For example "1+1" is sent as "1+1" which is read as "1 1" since space are URL encoded with a plus.
There is no clean way to fix the bug since autocomplete.js seems abandoned.
The commit monkey patches the ajax function of autocomplete.js
Related to #1695
Only raise "suspicious Accept-Encoding" when both "gzip" and "deflate" are missing from Accept-Encoding.
Prevent Browsers which only implement one compression solution from being blocked by the limiter plugin.
Example Browser which is currently blocked: Lynx Browser (https://lynx.invisible-island.net)