Commit Graph

43 Commits

Author SHA1 Message Date
Émilien (perso)
b6e3cf6ced wikipedia wikidata infobox + disable wikisource (#2806)
Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
2023-10-19 18:03:21 +02:00
Markus Heiser
e8706fb738 [fix] engine & network issues / documentation and type annotations
This patch fixes some quirks and issues related to the engines and the network.
Each engine has its own network and this network was broken for the following
engines[1]:

- archlinux
- bing
- dailymotion
- duckduckgo
- google
- peertube
- startpage
- wikipedia

Since the files have been touched anyway, the type annotaions of the engine
modules has also been completed so that error messages from the type checker are
no longer reported.

Related and (partial) fixed issue:

- [1] https://github.com/searxng/searxng/issues/762#issuecomment-1605323861
- [2] https://github.com/searxng/searxng/issues/2513
- [3] https://github.com/searxng/searxng/issues/2515

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-25 13:58:26 +02:00
Markus Heiser
27369ebec2 [fix] searxng_extra/update/update_engine_descriptions.py (part 1)
Follow up of #2269

The script to update the descriptions of the engines does no longer work since
PR #2269 has been merged.

searx/engines/wikipedia.py
==========================

1. There was a misusage of zh-classical.wikipedia.org:

   - `zh-classical` is dedicate to classical Chinese [1] which is not
     traditional Chinese [2].

   - zh.wikipedia.org has LanguageConverter enabled [3] and is going to
     dynamically show simplified or traditional Chinese according to the
     HTTP Accept-Language header.

2. The update_engine_descriptions.py needs a list of all wikipedias.  The
   implementation from #2269 included only a reduced list:

   - https://meta.wikimedia.org/wiki/Wikipedia_article_depth
   - https://meta.wikimedia.org/wiki/List_of_Wikipedias

searxng_extra/update/update_engine_descriptions.py
==================================================

Before PR #2269 there was a match_language() function that did an approximation
using various methods.  With PR #2269 there are only the types in the data model
of the languages, which can be recognized by babel.  The approximation methods,
which are needed (only here) in the determination of the descriptions, must be
replaced by other methods.

[1] https://en.wikipedia.org/wiki/Classical_Chinese
[2] https://en.wikipedia.org/wiki/Traditional_Chinese_characters
[3] https://www.mediawiki.org/wiki/Writing_systems#LanguageConverter

Closes: https://github.com/searxng/searxng/issues/2330
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-04-15 16:03:59 +02:00
Markus Heiser
858aa3e604 [mod] wikipedia & wikidata: upgrade to data_type: traits_v1
BTW this fix an issue in wikipedia: SearXNG's locales zh-TW and zh-HK are now
using language `zh-classical` from wikipedia (and not `zh`).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
7daf4f95ef [mod] Wikipedia: fetch engine traits (data_type: supported_languages)
Implements a fetch_traits function for the Wikipedia engines.

.. note::

   Does not include migration of the request methode from 'supported_languages'
   to 'traits' (EngineTraits) object!

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Alexandre Flament
bfca63c536 wikipedia engine: update _fetch_supported_languages
the layout https://meta.wikimedia.org/wiki/List_of_Wikipedias has changed
2023-01-29 10:01:58 +00:00
Markus Heiser
8df1f0c47e [mod] add 'Accept-Language' HTTP header to online processores
Most engines that support languages (and regions) use the Accept-Language from
the WEB browser to build a response that fits to the language (and region).

- add new engine option: send_accept_language_header

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-01 17:01:59 +02:00
Markus Heiser
3d96a9839a [format.python] initial formatting of the python code
This patch was generated by black [1]::

    make format.python

[1] https://github.com/psf/black

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-12-27 09:26:22 +01:00
Alexandre Flament
d14994dc73 [httpx] replace searx.poolrequests by searx.network
settings.yml:

* outgoing.networks:
   * can contains network definition
   * propertiers: enable_http, verify, http2, max_connections, max_keepalive_connections,
     keepalive_expiry, local_addresses, support_ipv4, support_ipv6, proxies, max_redirects, retries
   * retries: 0 by default, number of times searx retries to send the HTTP request (using different IP & proxy each time)
   * local_addresses can be "192.168.0.1/24" (it supports IPv6)
   * support_ipv4 & support_ipv6: both True by default
     see https://github.com/searx/searx/pull/1034
* each engine can define a "network" section:
   * either a full network description
   * either reference an existing network

* all HTTP requests of engine use the same HTTP configuration (it was not the case before, see proxy configuration in master)
2021-04-12 17:25:56 +02:00
Alexandre Flament
fcfcf662ff [fix] wikipedia: remove HTML from the title
fr.wikipedia.org (and it seems not other wikipedia websites),
adds HTML to api_result['displayTitle'].
(Search for '!wp :fr Braid' for example)

The commit uses api_result['title']
2021-03-25 08:31:39 +01:00
Marc Abonce Seguin
d6681fd33b remove articles number from engines_languages.json 2021-02-25 23:54:21 -07:00
Alexandre Flament
7d6e69e2f9 [upd] wikipedia engine: return an empty result on query with illegal characters
on some queries (like an IT error message), wikipedia returns an HTTP error 400.
this commit returns an empty result instead of showing an error to the user.
2021-02-11 12:29:21 +01:00
Marc Abonce Seguin
64e81794fe add support for Chinese variants in Wikipedia 2021-02-08 21:56:45 -07:00
Alexandre Flament
a4dcfa025c [enh] engines: add about variable
move meta information from comment to the about variable
so the preferences, the documentation can show these information
2021-01-14 20:57:17 +01:00
Alexandre Flament
d703119d3a [enh] add raise_for_httperror
check HTTP response:
* detect some comme CAPTCHA challenge (no solving). In this case the engine is suspended for long a time.
* otherwise raise HTTPError as before

the check is done in poolrequests.py (was before in search.py).

update qwant, wikipedia, wikidata to use raise_for_httperror instead of raise_for_status
2020-12-11 14:37:08 +01:00
Alexandre Flament
58d51e082d [fix] wikipedia: minor fix: return no result instead of crash in some very few cases.
In few cases, the JSON results doesn't contains the key 'type'.
2020-12-07 17:42:05 +01:00
Alexandre Flament
f0054d67f1 [fix] wikipedia engine: don't raise an error when the query is not found
Add a new parameter "raise_for_status", set by default to True.
When True, any HTTP status code >= 300 raise an exception ( #2332 )
When False, the engine can manage the HTTP status code by itself.
2020-12-04 20:04:39 +01:00
Dalf
1022228d95 Drop Python 2 (1/n): remove unicode string and url_utils 2020-09-10 10:39:04 +02:00
Marc Abonce Seguin
ab20ca182c use Wikipedia's REST v1 API 2020-09-10 09:54:30 +02:00
Marc Abonce Seguin
77b9faa8df fix Wikipedia's paragraph extraction 2020-07-26 23:53:40 -07:00
Marc Abonce Seguin
5706c12fba remove empty parenthesis in wikipedia's summary
They're usually IPA pronunciations which are removed
by the API.
2019-12-21 22:47:08 -06:00
Marc Abonce Seguin
c18048e045 exclude disambiguation pages from wikipedia infobox 2019-12-21 22:47:08 -06:00
Adam Tauber
00512e36c1 [fix] handle empty response from wikipedia engine - closes #1114 2019-12-21 21:01:08 +01:00
Noémi Ványi
97351a2c72 fix after rebase 2019-01-07 21:28:58 +01:00
Noémi Ványi
b63d645a52 Revert "remove 'all' option from search languages"
This reverts commit 4d1770398a.
2019-01-07 21:19:00 +01:00
Marc Abonce Seguin
5568f24d6c [fix] check language aliases when setting search language 2019-01-06 20:31:57 -06:00
Marc Abonce Seguin
772c048d01 refactor engine's search language handling
Add match_language function in utils to match any user given
language code with a list of engine's supported languages.

Also add language_aliases dict on each engine to translate
standard language codes into the custom codes used by the engine.
2018-03-27 00:08:03 -06:00
marc
4d1770398a remove 'all' option from search languages 2017-12-06 01:20:15 -06:00
Adam Tauber
52e615dede [enh] py3 compatibility 2017-05-15 12:02:30 +02:00
marc
1175b3906f change language list to only include languages with a minimum of engines
that support them.
users can still query lesser supported through the :lang_code bang.
2016-12-29 01:55:30 -06:00
marc
4a1ff56389 minor fixes in utils/fetch_languages.py 2016-12-16 22:14:14 -06:00
marc
af35eee10b tests for _fetch_supported_languages in engines
and refactor method to make it testable without making requests
2016-12-15 00:40:21 -06:00
marc
f62ce21f50 [mod] fetch supported languages for several engines
utils/fetch_languages.py gets languages supported by each engine and
generates engines_languages.json with each engine's supported language.
2016-12-13 19:58:10 -06:00
marc
149802c569 [enh] add supported_languages on engines and auto-generate languages.py 2016-12-13 19:32:00 -06:00
marc
c2e4014287 [fix] urls merge in infobox (#593)
TODO:
    merge attributes
2016-08-05 23:51:04 -05:00
a01200356
8d335dbdae [enh] wikipedia infobox
creates simple multilingual infobox using wikipedia's api
2016-04-17 16:22:19 -05:00
Thomas Pointhuber
52ad49ccba using general mediawiki-engine
* writing general mediawiki-engine
* using this engine for wikipedia
* using this engine for uncyclopedia
2014-09-03 11:40:29 +02:00
Thomas Pointhuber
bb628469d3 fix wikipedia engine and add comments
* add paging support
* make number_of_results changable
* make result calculation more clear
* add comments
2014-09-02 21:01:24 +02:00
asciimoo
2a788c8f29 [enh] search language support init 2014-01-31 04:35:23 +01:00
asciimoo
e7792d77a7 [mod] wikipedia engine removed 2013-10-23 23:46:33 +02:00
asciimoo
a0037313ea [mod] wikipedia limited to first result 2013-10-16 23:21:04 +02:00
asciimoo
4bf44076d4 [enh] proper urls 2013-10-15 22:28:27 +02:00
asciimoo
e4b768b6cc [enh] wikipedia search added 2013-10-15 20:51:35 +02:00