One reason for the often seen CAPTCHA of the startpage requests are the
incomplete requests SearXNG sends to startpage.com. To avoid CAPTCHA we need to
send a well formed HTTP POST request with a cookie, we need to form a request
that is identical to the request build by startpage.com itself:
- in the cookie the **region** is selected
- in the POST arguments the **language** is selected
Based on the *engine_properties* boilerplate, SearXNG's startpage engine now
implements a `_fetch_engine_properties()` function to fetch regions & languages
from startpage.com.
This patch is a complete new implementation of the request() function, reversed
engineered from the startpage.com page. The new implementation adds
- time-range support
- save-search support
to the startpage engine which has been missed in the past.
The locale code 'no_NO' from startpage does not exists and is mapped to nb-NO.
For reference see languages-subtag at iana [1], `no` is the macrolanguage::
type: language
Subtag: nb
Description: Norwegian Bokmål
Added: 2005-10-16
Suppress-Script: Latn
Macrolanguage: no
Additional hints:
- To fetch languages from startpage, this patch makes use of the
EngineProperties implemented in 7bf0d46c
- Te get Startpage's locale & language, the function get_engine_locale from
9ae409a is used.
[1] https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
[2] https://www.w3.org/International/questions/qa-choosing-language-tags#langsubtag
Closes: https://github.com/searxng/searxng/issues/1081
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
"type": "engine_properties"
Supported languages in qwant are locales with a territory tag (aka regions).
Moved `supported_languages` to `EngineProperties.regions`.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This patch adds the boilerplate code, needed to fetch properties from engines.
In the past we only fetched *languages* but some engines need *regions* to
parameterize the engine request.
To fit into our *fetch language* procedures the boilerplate is implemented in
the `searxng_extra/update/update_languages.py` and the *engine_properties* are
stored along in the `searx/data/engines_languages.json`.
This implementation is downward compatible to the `_fetch_fetch_languages()`
infrastructure we have. If there comes the day we have all
`_fetch_fetch_languages()` implementations moved to `_fetch_engine_properties()`
implementations, we can rename the files and scripts.
The new type `EngineProperties` is a dictionary with keys `languages` and
`regions`. The values are dictionaries to map from SearXNG's language & region
to option values the engine does use::
engine_properties = {
'type' : 'engine_properties', # <-- !!!
'regions': {
# 'ca-ES' : <engine's region name>
},
'languages': {
# 'ca' : <engine's language name>
},
}
Similar to the `supported_languages`, in the engine the properties are available
under the name `supported_properties`.
Initial we start with languages & regions, but in a wider sense the type is
named *engine properties*. Engines can store in whatever options they need and
may be in the future there is a need to fetch additional or complete different
properties.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The request function should not request a language (aka locale) that is not
supported by qwant. Select a locale like zh-TW ends in qwant's API error:
ERROR searx.engines.qwant news: exception : \
API error::locale must be one of the following values: \
en_gb, en_ie, en_us, en_ca, en_my, en_au, en_nz, de_de, de_ch, de_at, fr_fr, \
fr_be, fr_ch, fr_ca, fr_ad, fc_ca, co_fr, es_es, es_ar, es_cl, es_co, es_mx, \
es_pe, es_ad, ca_es, ca_ad, ca_fr, eu_es, eu_fr, it_it, it_ch, pt_pt, pt_ad, \
nl_be, nl_nl
The existing searx.utils.match_language function is unsuitable for this purpose,
it is replaced by function searx.locales.get_engine_locale that is based on the
methods from the babel package.
The quant's _fetch_supported_languages function has been revised to filter out
languages 8aka locales) not supported by qwant.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
By using new property `qwant_categ:` the category of qwant is no longer bound to
the category of SearXNG.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Most engines that support languages (and regions) use the Accept-Language from
the WEB browser to build a response that fits to the language (and region).
- add new engine option: send_accept_language_header
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The errors make pyright usage useless since a new error won't be seen [1].
[1] https://github.com/searxng/searxng/pull/1569
```
searx/compat.py:11:27 - error: Expression of type "Type[cached_property[_T@cached_property]]" cannot be assigned to declared type "Type[cached_property]"
"Type[cached_property[_T@cached_property]]" is incompatible with "Type[cached_property]"
Type "Type[cached_property[_T@cached_property]]" cannot be assigned to type "Type[cached_property]" (reportGeneralTypeIssues)
searx/utils.py:69:36 - error: Expression of type "None" cannot be assigned to parameter of type "str"
Type "None" cannot be assigned to type "str" (reportGeneralTypeIssues)
searx/utils.py:573:85 - error: Expression of type "None" cannot be assigned to parameter of type "int"
Type "None" cannot be assigned to type "int" (reportGeneralTypeIssues)
searx/webapp.py:1306:22 - error: Argument of type "str" cannot be assigned to parameter "__a" of type "BytesPath" in function "join"
Type "str" cannot be assigned to type "BytesPath"
"str" is incompatible with "bytes"
"str" is incompatible with protocol "PathLike[bytes]"
"__fspath__" is not present (reportGeneralTypeIssues)
searx/webapp.py:1306:68 - error: Argument of type "Literal['themes']" cannot be assigned to parameter "paths" of type "BytesPath" in function "join"
Type "Literal['themes']" cannot be assigned to type "BytesPath"
"Literal['themes']" is incompatible with "bytes"
"Literal['themes']" is incompatible with protocol "PathLike[bytes]"
"__fspath__" is not present (reportGeneralTypeIssues)
searx/webapp.py:1306:78 - error: Argument of type "str | Any | None" cannot be assigned to parameter "paths" of type "BytesPath" in function "join"
Type "str | Any | None" cannot be assigned to type "BytesPath"
Type "str" cannot be assigned to type "BytesPath"
"str" is incompatible with "bytes"
"str" is incompatible with protocol "PathLike[bytes]"
"__fspath__" is not present (reportGeneralTypeIssues)
searx/webapp.py:1306:85 - error: Argument of type "Literal['img']" cannot be assigned to parameter "paths" of type "BytesPath" in function "join"
Type "Literal['img']" cannot be assigned to type "BytesPath"
"Literal['img']" is incompatible with "bytes"
"Literal['img']" is incompatible with protocol "PathLike[bytes]"
"__fspath__" is not present (reportGeneralTypeIssues)
searx/engines/mongodb.py:8:6 - warning: Import "pymongo" could not be resolved (reportMissingImports)
searx/engines/mysql_server.py:9:8 - warning: Import "mysql.connector" could not be resolved (reportMissingImports)
searx/engines/postgresql.py:9:8 - warning: Import "psycopg2" could not be resolved from source (reportMissingModuleSource)
searx/engines/xpath.py:187:28 - warning: "categories" is not defined (reportUndefinedVariable)
searx/search/__init__.py:184:82 - warning: "flask" is not defined (reportUndefinedVariable)
searx/search/checker/background.py:19:26 - error: Type of "schedule" is partially unknown
Type of "schedule" is "(delay: Any, func: Any, *args: Any) -> Literal[True]" (reportUnknownVariableType)
searx/shared/__init__.py:8:12 - warning: Import "uwsgi" could not be resolved (reportMissingImports)
searx/shared/shared_uwsgi.py:5:8 - warning: Import "uwsgi" could not be resolved (reportMissingImports)
```
The engine name is not only a *name* its also a identifier that is used in
logs, HTTP headers and more. Unicode characters in the name of an engine could
cause various issues.
Closes: https://github.com/searxng/searxng/issues/1544
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Emojipedia is an emoji reference website which documents the meaning and
common usage of emoji characters in the Unicode Standard. It is owned by Zedge
since 2021. Emojipedia is a voting member of The Unicode Consortium.[1]
Cherry picked from @james-still [2[3] and slightly modified to fit SearXNG's
quality gates.
[1] https://en.wikipedia.org/wiki/Emojipedia
[2] 2fc01eb20f
[3] https://github.com/searx/searx/pull/3278