Commit Graph

64 Commits

Author SHA1 Message Date
Markus Heiser f4f2af7db6 [fix] Revision of the Bing engines
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-10-19 18:04:52 +02:00
jazzzooo ae91b993b1 [fix] engine - bing fix search, pagination, remove safesearch 2023-10-19 18:04:52 +02:00
Alexandre Flament 21ba135aae [mod] bing: resolve redirect without additional requests
Remove the usage of searx.network.multi_requests
The results from Bing contains the target URL encoded in base64
See the u parameter, remove the first two character "a1", and done.

Also add a comment the check of the result_len / pageno
( from https://github.com/searx/searx/pull/1387 )
2023-09-04 17:25:30 +02:00
Markus Heiser e8706fb738 [fix] engine & network issues / documentation and type annotations
This patch fixes some quirks and issues related to the engines and the network.
Each engine has its own network and this network was broken for the following
engines[1]:

- archlinux
- bing
- dailymotion
- duckduckgo
- google
- peertube
- startpage
- wikipedia

Since the files have been touched anyway, the type annotaions of the engine
modules has also been completed so that error messages from the type checker are
no longer reported.

Related and (partial) fixed issue:

- [1] https://github.com/searxng/searxng/issues/762#issuecomment-1605323861
- [2] https://github.com/searxng/searxng/issues/2513
- [3] https://github.com/searxng/searxng/issues/2515

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-25 13:58:26 +02:00
Markus Heiser 23ac964e35 [fix] Bing-WEB: use <span class='algoSlug_icon'> for the description
On some result items from Bing-WEB the `<span class='algoSlug_icon'>` tag is the
only tag that contains a description.  The issue can be reproduced by [1]::

    !bi vmware

[1] https://github.com/searxng/searxng/issues/1764#issuecomment-1417990531

Reported-by: @AlyoshaVasilieva
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-04-08 09:43:04 +02:00
Markus Heiser e0a6ca96cc [doc] add a description of bing engines (web, news, video, images)
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser d0f465e6fa [mod] bing: add time_range support & upgrade to data_type: traits_v1
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser d3aa690a7a [mod] bing: fetch engine traits (data_type: supported_languages)
Implements a fetch_traits function for the Bing engines.

.. note::

   Does not include migration of the request methode from 'supported_languages'
   to 'traits' (EngineTraits) object!

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Ahmad Alkadri 7fc8d72889 [fix] bing: parsing result; check to see if the element contains links
This patch is to hardening the parsing of the bing response:

1. To fix [2087] check if the selected result item contains a link, otherwise
   skip result item and continue in the result loop.  Increment the result
   pointer when a result has been added / the enumerate that counts for skipped
   items is no longer valid when result items are skipped.

   To test the bugfix use:   ``!bi :all cerbot``

2. Limit the XPath selection of result items to direct children nodes (list
   items ``li``) of the ordered list (``ol``).

   To test the selector use: ``!bi :en pontiac aztek wiki``

   .. in the result list you should find the wikipedia entry on top,
   compare [2068]

[2087] https://github.com/searxng/searxng/issues/2087
[2068] https://github.com/searxng/searxng/issues/2068
2023-01-09 15:08:24 +01:00
ahmad-alkadri 9ee99423fe [fix] Bing-Web engine: XPath to get the wikipedia result
Modify the XPath selector to get the wikipedia result plus small fixes.

About result content: especially with the Wikipedia result, we'd get several
paragraph elements, only the first paragraph would be taken and displayed on the
search result
2023-01-08 09:11:16 +01:00
Markus Heiser 8df1f0c47e [mod] add 'Accept-Language' HTTP header to online processores
Most engines that support languages (and regions) use the Accept-Language from
the WEB browser to build a response that fits to the language (and region).

- add new engine option: send_accept_language_header

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-01 17:01:59 +02:00
Alexandre Flament a1e8af0796 bing.py: resolve bing.com/ck/a redirections
add a new function searx.network.multi_requests to send multiple HTTP requests at once
2022-07-08 22:02:21 +02:00
Alexandre FLAMENT f00cdb5e51 bing engine: _fetch_supported_languages: don't use the language code as a country
ref #1029
2022-03-31 20:03:34 +00:00
Martin Fischer b02f762687 [enh] add more categories 2022-01-05 11:00:11 +01:00
Markus Heiser 61ce0c2244 [fix] bing engines: fetch_supported_languages
The Request to and the Response from https://www.bing.com/account/general has
been changed.

[1] https://github.com/searxng/searxng/pull/672#discussion_r777104919

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-01 17:31:38 +01:00
Markus Heiser 3d96a9839a [format.python] initial formatting of the python code
This patch was generated by black [1]::

    make format.python

[1] https://github.com/psf/black

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-12-27 09:26:22 +01:00
Markus Heiser 6b85607274 [fix] bing engine: fix paging support, show inital page.
Follow up queries for the pages needed to be fixed.

- Split search-term in one for initial query and one for following queries.
- Set some headers in HTTP requests, bing needs for paging support.
- IMO //div[@class="sa_cc"] does no longer match in a bing response.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-12-18 13:50:38 +01:00
Markus Heiser b2177e5916 [pylint] Bing (Web) engine
Fix remarks from pylint and improved code-style.  In preparation for a bug-fix
of the Bing (Web) engine I add this engine to the pylint-list.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-12-18 13:40:36 +01:00
Markus Heiser aecfb2300d [mod] one logger per engine - drop obsolete logger.getChild
Remove the no longer needed `logger = logger.getChild(...)` from engines.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-09-06 18:05:46 +02:00
Alexandre Flament ca93a01844 [mod] dynamically set language_support variable
The language_support variable is set to True by default,
and set to False in only 5 engines.

Except the documentation and the /config URL, this variable is not used.

This commit remove the variable definition in the engines, and
set value according to supported_languages length: False when the length is 0,
True otherwise.

Close #2485
2021-02-01 17:10:37 +01:00
Alexandre Flament a4dcfa025c [enh] engines: add about variable
move meta information from comment to the about variable
so the preferences, the documentation can show these information
2021-01-14 20:57:17 +01:00
Alexandre Flament 3038052c79 [mod] remove unused import
use
from searx.engines.duckduckgo import _fetch_supported_languages, supported_languages_url  # NOQA
so it is possible to easily remove all unused import using autoflake:
autoflake --in-place --recursive --remove-all-unused-imports searx tests
2020-11-14 14:11:02 +01:00
Alexandre Flament 2006eb4680 [mod] move extract_text, extract_url to searx.utils 2020-10-02 18:13:56 +02:00
Dalf c225db45c8 Drop Python 2 (4/n): SearchQuery.query is a str instead of bytes 2020-09-10 10:49:42 +02:00
Dalf 1022228d95 Drop Python 2 (1/n): remove unicode string and url_utils 2020-09-10 10:39:04 +02:00
Markus Heiser 1c853f9573 bing_news: parital rollback of c89c05bc
The bing_news bug (discussed in #1838) was caused by wrong language tags, which
was fixed e0c99d9d / no need to change the bing_news search string.

closes: https://github.com/asciimoo/searx/issues/1838

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-03-01 11:07:59 +01:00
Markus Heiser e0c99d9dcb bugfix: fetch_supported_languages bing, -news, -videos, -images
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-03-01 08:01:36 +01:00
Adam Tauber 17b6faa4c3 [fix] pep8 2020-01-02 22:38:12 +01:00
Adam Tauber 2292e6e130 [fix] handle missing result size 2020-01-02 22:28:47 +01:00
Dalf 85b3723345 [mod] speed optimization
compile XPath only once
avoid redundant call to urlparse
get_locale(webapp.py): avoid useless call to request.accept_languages.best_match
2019-11-15 09:33:15 +01:00
Léo Bourrel 88261e111c Fix bing engine results count (#1387)
This PR fixes the result count from bing which was throwing an (hidden) error and add a validation to avoid reading more results than avalaible.

For example :
If there is 100 results from some search and we try to get results from 120 to 130, Bing will send back the results from 0 to 10 and no error. If we compare results count with the first parameter of the request we can avoid this "invalid" results.
2019-08-05 16:15:40 +02:00
Dalf 1cee2c1796 [fix] bing engine
before this commit, sometimes there are no results
use a generic user-agent instead of one with the OS "Windows NT 6.3; WOW64"
2019-08-05 15:46:40 +02:00
Noémi Ványi b63d645a52 Revert "remove 'all' option from search languages"
This reverts commit 4d1770398a.
2019-01-07 21:19:00 +01:00
Marc Abonce Seguin 75b276f408 fix bing "garbage" results (issue #1275) 2018-05-20 18:13:32 -05:00
Marc Abonce Seguin 772c048d01 refactor engine's search language handling
Add match_language function in utils to match any user given
language code with a list of engine's supported languages.

Also add language_aliases dict on each engine to translate
standard language codes into the custom codes used by the engine.
2018-03-27 00:08:03 -06:00
marc 4d1770398a remove 'all' option from search languages 2017-12-06 01:20:15 -06:00
Adam Tauber 52e615dede [enh] py3 compatibility 2017-05-15 12:02:30 +02:00
marc fd65c12921 make search language handling less strict
languages.py can change, so users may query on a language that is not
on the list anymore, even if it is still recognized by a few engines.

also made no and nb the same because they seem to return the same,
though most engines will only support one or the other.
2017-03-18 23:44:21 +01:00
Adam Tauber 6bf9c398a7 [fix] use english as default language in bing
If no language is specified, bing returns results with multiple languages
for one query which isn't really useful. Setting english as default
insted if nothing.
2016-12-30 18:17:14 +01:00
marc af35eee10b tests for _fetch_supported_languages in engines
and refactor method to make it testable without making requests
2016-12-15 00:40:21 -06:00
marc f62ce21f50 [mod] fetch supported languages for several engines
utils/fetch_languages.py gets languages supported by each engine and
generates engines_languages.json with each engine's supported language.
2016-12-13 19:58:10 -06:00
marc 149802c569 [enh] add supported_languages on engines and auto-generate languages.py 2016-12-13 19:32:00 -06:00
Adam Tauber 16bdc0baf4 [mod] do not escape html content in engines 2016-12-09 18:59:19 +01:00
Adam Tauber 43ddbc60da [fix] pep8 2016-11-14 16:09:16 +01:00
Adam Tauber 16f2e346b3 [fix] bing unicode issue part III. 2016-11-14 15:52:29 +01:00
Adam Tauber 1176505fa4 [fix] bing character encoding - closes #760 2016-11-14 15:47:42 +01:00
Adam Tauber 17b08d096c [fix] unicode search expression for bing 2016-11-07 22:33:17 +01:00
Adam Tauber 16ff8d06c7 [fix] bing paging and language support
see https://msdn.microsoft.com/en-us/library/ff795620.aspx for bing
specific search operators

closes #755
2016-11-07 22:30:20 +01:00
Adam Tauber 2f7752b410 [enh] display number of results 2016-06-28 00:06:50 +02:00
Adam Tauber 604f32f672 [fix] bing unicode encode error - fixes #408 2015-08-28 14:51:32 +02:00