on some queries (like an IT error message), wikipedia returns an HTTP error 400.
this commit returns an empty result instead of showing an error to the user.
Some JSON API returns HTML in either in the HTML or the content.
This commit adds two new parameters to the json_engine:
content_html_to_text and title_html_to_text, False by default.
If True, then the searx.utils.html_to_text removes the HTML tags.
Update crossref, openairedatasets and openairepublications engines
The language_support variable is set to True by default,
and set to False in only 5 engines.
Except the documentation and the /config URL, this variable is not used.
This commit remove the variable definition in the engines, and
set value according to supported_languages length: False when the length is 0,
True otherwise.
Close#2485
Avoid SearxEngineXPathException errors when parsing non valid results::
.//div[@class="yuRUbf"]//a/@href index 0 not found
Traceback (most recent call last):
File "./searx/engines/google.py", line 274, in response
url = eval_xpath_getindex(result, href_xpath, 0)
File "./searx/searx/utils.py", line 608, in eval_xpath_getindex
raise SearxEngineXPathException(xpath_spec, 'index ' + str(index) + ' not found')
searx.exceptions.SearxEngineXPathException: .//div[@class="yuRUbf"]//a/@href index 0 not found
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
BTW: make the engines ready for search.checker:
- replace eval_xpath by eval_xpath_getindex and eval_xpath_list
- google_images: remove outer try/except block
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The 'video.html' template from the 'oscar' design supports replacement
for *author* and *length*. Google-videos does not have an author, alternatively
the publisher info from is used for the *author*.
Hint: these replacements are not supported by the 'simple' design.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This revise is based on the methods developed in the revise of the google engine
(see commit 410c2f9).
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This revise is based on the methods developed in the revise of the google engine
(see commit 410c2f9).
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
see searx.search.processors.abstract.EngineProcessor
First the method searx call the get_params method.
If the return value is not None, then the searx call the method search.
check HTTP response:
* detect some comme CAPTCHA challenge (no solving). In this case the engine is suspended for long a time.
* otherwise raise HTTPError as before
the check is done in poolrequests.py (was before in search.py).
update qwant, wikipedia, wikidata to use raise_for_httperror instead of raise_for_status
before commit 58d72f2, category was not set in xpath.py,
so searx/engines/__init__py was setting the category to ['general']
the commit 58d72f2 set the category to [] which is not replaced by searx/engines/__init__.py
consequence: the mojeek engine is hidden in the preferences.
this commit revert the xpath.py change.
close#2368