One reason for the often seen CAPTCHA of the startpage requests are the
incomplete requests SearXNG sends to startpage.com. To avoid CAPTCHA we need to
send a well formed HTTP POST request with a cookie, we need to form a request
that is identical to the request build by startpage.com itself:
- in the cookie the **region** is selected
- in the POST arguments the **language** is selected
Based on the *engine_properties* boilerplate, SearXNG's startpage engine now
implements a `_fetch_engine_properties()` function to fetch regions & languages
from startpage.com.
This patch is a complete new implementation of the request() function, reversed
engineered from the startpage.com page. The new implementation adds
- time-range support
- save-search support
to the startpage engine which has been missed in the past.
The locale code 'no_NO' from startpage does not exists and is mapped to nb-NO.
For reference see languages-subtag at iana [1], `no` is the macrolanguage::
type: language
Subtag: nb
Description: Norwegian Bokmål
Added: 2005-10-16
Suppress-Script: Latn
Macrolanguage: no
Additional hints:
- To fetch languages from startpage, this patch makes use of the
EngineProperties implemented in 7bf0d46c
- Te get Startpage's locale & language, the function get_engine_locale from
9ae409a is used.
[1] https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
[2] https://www.w3.org/International/questions/qa-choosing-language-tags#langsubtag
Closes: https://github.com/searxng/searxng/issues/1081
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
"type": "engine_properties"
Supported languages in qwant are locales with a territory tag (aka regions).
Moved `supported_languages` to `EngineProperties.regions`.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The request function should not request a language (aka locale) that is not
supported by qwant. Select a locale like zh-TW ends in qwant's API error:
ERROR searx.engines.qwant news: exception : \
API error::locale must be one of the following values: \
en_gb, en_ie, en_us, en_ca, en_my, en_au, en_nz, de_de, de_ch, de_at, fr_fr, \
fr_be, fr_ch, fr_ca, fr_ad, fc_ca, co_fr, es_es, es_ar, es_cl, es_co, es_mx, \
es_pe, es_ad, ca_es, ca_ad, ca_fr, eu_es, eu_fr, it_it, it_ch, pt_pt, pt_ad, \
nl_be, nl_nl
The existing searx.utils.match_language function is unsuitable for this purpose,
it is replaced by function searx.locales.get_engine_locale that is based on the
methods from the babel package.
The quant's _fetch_supported_languages function has been revised to filter out
languages 8aka locales) not supported by qwant.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>