Commit Graph

53 Commits

Author SHA1 Message Date
Markus Heiser
4016ee8842 [doc] add documentation of Mwmbl engine & autocompleter
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-08-28 10:25:44 +02:00
Bnyro
607be42b17 [mod] autocomplete.py: add support for mwmbl completions 2023-08-28 10:25:44 +02:00
Markus Heiser
4d4aa13e1f [mod] remove obsolete EngineTraits.supported_languages
All engines has been migrated from ``supported_languages`` to the
``fetch_traits`` concept.  There is no longer a need for the obsolete code that
implements the ``supported_languages`` concept.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
2499899554 [mod] Google: reversed engineered & upgrade to data_type: traits_v1
Partial reverse engineering of the Google engines including a improved language
and region handling based on the engine.traits_v1 data.

When ever possible the implementations of the Google engines try to make use of
the async REST APIs.  The get_lang_info() has been generalized to a
get_google_info() function / especially the region handling has been improved by
adding the cr parameter.

searx/data/engine_traits.json
  Add data type "traits_v1" generated by the fetch_traits() functions from:

  - Google (WEB),
  - Google images,
  - Google news,
  - Google scholar and
  - Google videos

  and remove data from obsolete data type "supported_languages".

  A traits.custom type that maps region codes to *supported_domains* is fetched
  from https://www.google.com/supported_domains

searx/autocomplete.py:
  Reversed engineered autocomplete from Google WEB.  Supports Google's languages and
  subdomains.  The old API suggestqueries.google.com/complete has been replaced
  by the async REST API: https://{subdomain}/complete/search?{args}

searx/engines/google.py
  Reverse engineering and extensive testing ..
  - fetch_traits():  Fetch languages & regions from Google properties.
  - always use the async REST API (formally known as 'use_mobile_ui')
  - use *supported_domains* from traits
  - improved the result list by fetching './/div[@data-content-feature]'
    and parsing the type of the various *content features* --> thumbnails are
    added

searx/engines/google_images.py
  Reverse engineering and extensive testing ..
  - fetch_traits():  Fetch languages & regions from Google properties.
  - use *supported_domains* from traits
  - if exists, freshness_date is added to the result
  - issue 1864: result list has been improved a lot (due to the new cr parameter)

searx/engines/google_news.py
  Reverse engineering and extensive testing ..
  - fetch_traits():  Fetch languages & regions from Google properties.
    *supported_domains* is not needed but a ceid list has been added.
  - different region handling compared to Google WEB
  - fixed for various languages & regions (due to the new ceid parameter) /
    avoid CONSENT page
  - Google News do no longer support time range
  - result list has been fixed: XPath of pub_date and pub_origin

searx/engines/google_videos.py
  - fetch_traits():  Fetch languages & regions from Google properties.
  - use *supported_domains* from traits
  - add paging support
  - implement a async request ('asearch': 'arc' & 'async':
    'use_ac:true,_fmt:html')
  - simplified code (thanks to '_fmt:html' request)
  - issue 1359: fixed xpath of video length data

searx/engines/google_scholar.py
  - fetch_traits():  Fetch languages & regions from Google properties.
  - use *supported_domains* from traits
  - request(): include patents & citations
  - response(): fixed CAPTCHA detection (Scholar has its own CATCHA manager)
  - hardening XPath to iterate over results
  - fixed XPath of pub_type (has been change from gs_ct1 to gs_cgt2 class)
  - issue 1769 fixed: new request implementation is no longer incompatible

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
c80e82a855 [mod] DuckDuckGo: reversed engineered & upgrade to data_type: traits_v1
Partial reverse engineering of the DuckDuckGo (DDG) engines including a
improved language and region handling based on the enigne.traits_v1 data.

- DDG Lite
- DDG Instant Answer API
- DDG Images
- DDG Weather

docs/src/searx.engine.duckduckgo.rst:
  Online documentation of the DDG engines (make docs.live)

searx/data/engine_traits.json
  Add data type "traits_v1" generated by the fetch_traits() functions from:

  - "duckduckgo" (WEB),
  - "duckduckgo images" and
  - "duckduckgo weather"

  and remove data from obsolete data type "supported_languages".

searx/autocomplete.py:
  Reversed engineered Autocomplete from DDG.  Supports DDG's languages.

searx/engines/duckduckgo.py:
  - fetch_traits():  Fetch languages & regions from DDG.

  - get_ddg_lang(): Get DDG's language identifier from SearXNG's locale.  DDG
    defines its languages by region codes.  DDG-Lite does not offer a language
    selection to the user, only a region can be selected by the user.

  - Cache ``vqd`` value: The vqd value depends on the query string and is needed
    for the follow up pages or the images loaded by a XMLHttpRequest (DDG
    images).  The ``vqd`` value of a search term is stored for 10min in the
    redis DB.

  - DDG Lite engine: reversed engineered request method with improved Language
    and region support and better ``vqd`` handling.

searx/engines/duckduckgo_definitions.py: DDG Instant Answer API
  The *instant answers* API does not support languages, or at least we could not
  find out how language support should work.  It seems that most of the features
  are based on English terms.

searx/engines/duckduckgo_images.py: DDG Images
  Reversed engineered request method.  Improved language and region handling
  based on cookies and the enigne.traits_v1 data.  Response: add image format to
  the result list

searx/engines/duckduckgo_weather.py: DDG Weather
  Improved language and region handling based on cookies and the
  enigne.traits_v1 data.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
e9afc4f8ce [mod] Startpage: reversed engineered & upgrade to data_type: traits_v1
One reason for the often seen CAPTCHA of the Startpage requests are the
incomplete requests SearXNG sends to startpage.com: this patch is a complete new
implementation of the ``request()`` function, reversed engineered from the
Startpage's search form.  The new implementation:

- use traits of data_type: traits_v1 and drop deprecated data_type: supported_languages
- adds time-range support
- adds save-search support
- fix searxng/searxng/issues 1884
- fix searxng/searxng/issues 1081 --> improvements to avoid CAPTCHA

In preparation for more categories (News, Images, Videos ..) from Startpage, the
variable ``startpage_categ`` was set up.  The default value is ``web`` and other
categories from Startpage are not yet implemented.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
858aa3e604 [mod] wikipedia & wikidata: upgrade to data_type: traits_v1
BTW this fix an issue in wikipedia: SearXNG's locales zh-TW and zh-HK are now
using language `zh-classical` from wikipedia (and not `zh`).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
c1ae2ef57c [mod] qwant: fetch engine traits (data_type: traits_v1)
Implements a fetch_traits function for the Qwant engines.

.. note::

   Includes migration of the request methode from 'supported_languages' to
   'traits' (EngineTraits) object!

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
6e5f22e558 [mod] replace engines_languages.json by engines_traits.json
Implementations of the *traits* of the engines.

Engine's traits are fetched from the origin engine and stored in a JSON file in
the *data folder*.  Most often traits are languages and region codes and their
mapping from SearXNG's representation to the representation in the origin search
engine.

To load traits from the persistence::

    searx.enginelib.traits.EngineTraitsMap.from_data()

For new traits new properties can be added to the class::

    searx.enginelib.traits.EngineTraits

.. hint::

   Implementation is downward compatible to the deprecated *supported_languages
   method* from the vintage implementation.

   The vintage code is tagged as *deprecated* an can be removed when all engines
   has been ported to the *traits method*.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
4c06837a50 [mod] make python code pylint 2.16.1 compliant
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-02-10 13:59:21 +01:00
LencoDigitexer
bc28091557 remove the print statement 2022-09-17 11:25:14 +03:00
LencoDigitexer
7b8d6015e3 add yandex autocompleter 2022-09-09 23:42:44 +03:00
Markus Heiser
4326009d00 [format.python] based on bugfix in 9ed626130 2022-05-07 18:23:10 +02:00
Vojtěch Fošnár
de4af2fefd [enh] add seznam autocomplete 2022-04-14 03:02:05 +02:00
Markus Heiser
e9588b70a6 [fix] brave autocompleter: charset_normalizer issues
Use httpx.Response.json() to avoid charset_normalizer issues:

DEBUG   charset_normalizer            : override steps (5) and chunk_size (512) as content does not fit (153 byte(s) given) parameters.
INFO    charset_normalizer            : ascii passed initial chaos probing. Mean measured chaos is 0.000000 %
DEBUG   charset_normalizer            : ascii should target any language(s) of ['Latin Based']
INFO    charset_normalizer            : ascii is most likely the one. Stopping the process.

[1] https://www.python-httpx.org/api/#response

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-23 17:22:13 +01:00
Markus Heiser
9c5bac4c43 [pylint] searx/autocomplete.py
Fix remarks from pylint, BTW set SPDX-License-Identifier.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-23 09:12:03 +01:00
Allen
b8c98c4c0d [enh] Add autocompleter from Brave
Raw response example: https://search.brave.com/api/suggest?q=how%20to:%20with%20j

Headers are needed in order to get a 200 response, thus Searx user-agent is used.

Other URL param could be  '&rich=false' or  '&rich=true'.

Cherry-pick: 71786bf9cb
2022-01-21 14:39:10 +01:00
Markus Heiser
3d96a9839a [format.python] initial formatting of the python code
This patch was generated by black [1]::

    make format.python

[1] https://github.com/psf/black

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-12-27 09:26:22 +01:00
Alexandre Flament
3c3599c9e6 [fix] startpage autocompletion 2021-11-13 13:26:47 +01:00
Alexandre Flament
d14994dc73 [httpx] replace searx.poolrequests by searx.network
settings.yml:

* outgoing.networks:
   * can contains network definition
   * propertiers: enable_http, verify, http2, max_connections, max_keepalive_connections,
     keepalive_expiry, local_addresses, support_ipv4, support_ipv6, proxies, max_redirects, retries
   * retries: 0 by default, number of times searx retries to send the HTTP request (using different IP & proxy each time)
   * local_addresses can be "192.168.0.1/24" (it supports IPv6)
   * support_ipv4 & support_ipv6: both True by default
     see https://github.com/searx/searx/pull/1034
* each engine can define a "network" section:
   * either a full network description
   * either reference an existing network

* all HTTP requests of engine use the same HTTP configuration (it was not the case before, see proxy configuration in master)
2021-04-12 17:25:56 +02:00
Alexandre Flament
eaa694fb7d [enh] replace requests by httpx 2021-04-10 15:38:33 +02:00
Alexandre Flament
63f17d2e4c [enh] autocomplete refactoring, autocomplete on external bangs 2021-03-01 19:12:32 +01:00
Alexandre Flament
0226ae69d3 [fix] dbpedia autocomplete (and use HTTPS) 2020-12-04 16:47:43 +01:00
Alexandre Flament
edd8dccd07 [mod] searx.query.RawTextQuery: getSearchQuery and changeSearchQuery rename to getQuery and changeQuery
getSearchQuery is confusing, the method returns a str not a SearchQuery object
2020-09-22 12:36:26 +02:00
Dalf
1022228d95 Drop Python 2 (1/n): remove unicode string and url_utils 2020-09-10 10:39:04 +02:00
piplongrun
b136480546
Add Swisscows autocomplete option 2020-02-14 19:19:24 +01:00
Marc Abonce Seguin
40272b0044 [fix] never pass bangs to autocomplete suggestions 2019-07-01 17:16:02 -05:00
Adam Tauber
36af8f9d67 [fix] use py2/3 compatibility layer 2017-07-10 11:42:44 +02:00
Adam Tauber
52e615dede [enh] py3 compatibility 2017-05-15 12:02:30 +02:00
Adam Tauber
7b1daf254e [fix] autocomplete unicode issue - closes #808 2017-01-03 13:11:38 +01:00
marc
149802c569 [enh] add supported_languages on engines and auto-generate languages.py 2016-12-13 19:32:00 -06:00
a01200356
94cb3a7f11 [enh] multilingual autocomplete
implemented for wikipedia, qwant and google
2016-03-29 19:10:13 -06:00
Alexandre Flament
6ab91515df [enh] autocompletion : add qwant 2016-03-02 19:54:06 +08:00
Adam Tauber
bd22e9a336 [fix] pep8 compatibilty 2016-01-18 12:47:31 +01:00
Adam Tauber
1fcf066a81 [mod] change settings file structure according to #314 2015-08-02 20:32:22 +02:00
Dalf
9d10277c22 remove 'print' 2015-06-02 10:50:49 +02:00
Cqoicebordel
633c7b6a5f Add startpage as an autocompleter engine 2015-06-01 20:45:18 +02:00
Adam Tauber
93fd1e4c76 Merge pull request #308 from dalf/versions_upgrade
update versions.cfg to use the current up-to-date packages
2015-05-02 14:58:32 -04:00
Alexandre Flament
4689fe341c update versions.cfg to use the current up-to-date packages 2015-05-02 15:45:17 +02:00
Alexandre Flament
78edc16e66 [enh] reduce the number of http outgoing connections.
engines that still use http : gigablast, bing image for thumbnails, 1x and dbpedia autocompleter
2015-05-02 11:43:12 +02:00
Adam Tauber
9d11b36b5b [fix] timeout to autocompleters 2015-04-10 00:59:25 +02:00
Adam Tauber
10891bdeab Merge pull request #192 from dalf/connection-pool
[enh] improve response time. close #100
2015-01-21 19:44:20 +01:00
dalf
d07cfd9089 [enh] use one single http connection pool : improve response time. close #100 2015-01-21 11:33:16 +01:00
Cqoicebordel
4d0aeae567 Thanks @pointhi ! 2015-01-19 22:17:12 +01:00
Cqoicebordel
bc2d5bf88c Add '?' bang to the autocompleter 2015-01-19 19:47:32 +01:00
Thomas Pointhuber
c19b0899a4 [fix] little autocompleter fix 2015-01-10 19:55:21 +01:00
Thomas Pointhuber
4e2dae30f0 [enh] add autocompletion for searx-specific strings 2015-01-10 16:42:57 +01:00
Adam Tauber
b422788eb4 [fix] wikipedia autocompleter url param 2014-11-04 19:53:42 +01:00
Thomas Pointhuber
53dc92b0d7 update comments in autocomplete.py
* update comments
* add licence-header
2014-09-13 18:47:28 +02:00
Adam Tauber
cd3a52e189 [enh] duckduckgo autocomplete added 2014-09-07 23:56:06 +02:00