- add to list of pylint scripts
- add debug log messages
- move API key int `settings.yml`
- improved readability
- add some metadata to results
Signed-off-by: Markus Heiser <markus@darmarit.de>
Springer Nature is a global publisher dedicated to providing service to research
community [1] with official API [2].
To test this PR, first get your API key following this page:
https://dev.springernature.com/signup
In searx/engines/springer.py at line 24, add this API key. I left my own key,
commented out in the line aboce. Feel free to use it, if needed.
[1] https://www.springernature.com/
[2] https://dev.springernature.com/
The new sci-hub URLs are comming from @aurora-vasiliev [1].
[1] https://github.com/searx/searx/pull/2706
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
In the EU there exists a "General Data Protection Regulation" [1] aka GDPR (BTW:
very user friendly!) which requires consent to tracking. To get the consent
from the user, youtube requests are redirected to confirm and get a CONSENT
Cookie from https://consent.youtube.com
This patch adds a CONSENT Cookie to the youtube request to avoid redirection.
[1] https://en.wikipedia.org/wiki/General_Data_Protection_Regulation
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Reported-by: https://github.com/searx/searx/issues/2774
* display the median time instead of the average.
* add a "Reliability" column (sum up the metrics and the checker results).
* the "selected language", "SafeSearch", "Time range" values are displayed as "broken" when the checker tests fail.
Some error won't stop the engine:
* additional HTTP redirects for example
* some invalid results
secondary=True allows to flag these errors as not important.
Report to the user suspended engines.
searx.search.processor.abstract:
* manages suspend time (per network).
* reports suspended time to the ResultContainer (method extend_container_if_suspended)
* adds the results to the ResultContainer (method extend_container)
* handles exceptions (method handle_exception)
Some engine do have set result.img_src, other return a result.thumbnail. If
result.img_src is unset and a result.thumbnail is given, show it to the UI.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
I also found some items missing a thumbnail and I used text_extract for content
and title, to remove unneeded whitespaces.
BTW: added bandcamp's favicon
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
settings.yml:
* outgoing.networks:
* can contains network definition
* propertiers: enable_http, verify, http2, max_connections, max_keepalive_connections,
keepalive_expiry, local_addresses, support_ipv4, support_ipv6, proxies, max_redirects, retries
* retries: 0 by default, number of times searx retries to send the HTTP request (using different IP & proxy each time)
* local_addresses can be "192.168.0.1/24" (it supports IPv6)
* support_ipv4 & support_ipv6: both True by default
see https://github.com/searx/searx/pull/1034
* each engine can define a "network" section:
* either a full network description
* either reference an existing network
* all HTTP requests of engine use the same HTTP configuration (it was not the case before, see proxy configuration in master)
This patch is an addition to PR #2656 which removed all usage of `base_url` from
the templates, except one was forgotten in the cookie URL of the preferences.
closes: 2740
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
I can't set `default_doi_resolver` in `settings.yml` if I'm using
`use_default_settings`. Searx seems to try to interpret all settings at root
level in `settings.yml` as dict, which is correct except for
`default_doi_resolver` which is at root level and a string::
File "/usr/lib/python3.9/site-packages/searx/settings_loader.py", line 125, in load_settings
update_settings(default_settings, user_settings)
File "/usr/lib/python3.9/site-packages/searx/settings_loader.py", line 61, in update_settings
update_dict(default_settings[k], v)
File "/usr/lib/python3.9/site-packages/searx/settings_loader.py", line 48, in update_dict
for k, v in user_dict.items():
AttributeError: 'str' object has no attribute 'items'
Signed-off-by: Markus Heiser <markus@darmarit.de>
Suggested-by: @0xhtml https://github.com/searx/searx/issues/2722#issuecomment-813391659
The `url_for` function in the template context is not the one from Flask, it is
the one from `webapp`. The `webapp.url_for_theme` is different from its
namesake of Flask and has it quirks, when called with argument `_external=True`.
The `webapp.url_for_theme` can't handle absolute URLs since it pokes a leading
'/', here is the snippet of the old code::
url = url_for(endpoint, **values)
if settings['server']['base_url']:
if url.startswith('/'):
url = url[1:]
url = urljoin(settings['server']['base_url'], url)
Next drawback of (Flask's) `_external=True` is, that it will not return the HTTP
scheme when searx (the Flask app) listens on http and is proxied by a https
server.
To get the right scheme `HTTP_X_SCHEME` is needed by Flask (werkzeug). Since
this is not provided in every environment (e.g. behind Apache mod_wsgi or the
HTTP header is not fully set for some other reasons) it is recommended to
get *script_name*, *server* and *scheme* from the configured `base_url`. If
`base_url` is specified, then these values from are given preference over any
Flask's generics.
BTW this patch normalize to use `url_for` in the `opensearch.xml` and drop the
need of `host` and `urljoin` in template's context.
Signed-off-by: Markus Heiser <markus@darmarit.de>
Instead of a hard-coded `oadoi.org` default, use the default value from
`settings.yml`.
Fix an issue in the themes: The replacement 'current_doi_resolver' contains the
doi_resolver_url, not the name of the DOI resolver. Compare return value of::
searx.plugins.oa_doi_rewrite.get_doi_resolver(...)
Fix a typo in `get_doi_resolver(..)`: suggested by @kvch:
*L32 should set doi_resolver not doi_resolvers*
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
fr.wikipedia.org (and it seems not other wikipedia websites),
adds HTML to api_result['displayTitle'].
(Search for '!wp :fr Braid' for example)
The commit uses api_result['title']
the json response has been changed and it contains html chunks which is
not compatible with our json engine, so we have to switch to html/xpath
parsing
The get_cliend_id() function:
* fetches https://soundcloud.com
* then fetches each referenced javascript URL to get the client id.
This commit fetches the javascript URLs in the reverse order: the client id is in the last javascript URL.
Added a line to the yacy entry to enable HTTP if the local yacy instance isn't using HTTPS. Otherwise, an error will be thrown in the logs: "No connection adapters were found for 'http://localhost:8090/yacysearch.json...'". This is likely related to ticket #2641 that forces HTTPS by default.
See https://github.com/requirejs/requirejs/issues/1816
requirejs loads one file: leaflet.
This commit:
* removes requirejs
* load leaflet using <script src...> HTML tag in searx/templates/oscar/base.html
Many things have been changed since last review of this engine. This patch fix
xpath selectors, implements suggestion and is a complete review / rewrite of the
engine.
Signed-off-by: Markus Heiser <markus@darmarit.de>
When initing engines a "SearxEngineResponseException" is logged very verbose,
including full traceback information:
ERROR:searx.engines:yggtorrent engine: Fail to initialize
Traceback (most recent call last):
File "share/searx/searx/engines/__init__.py", line 293, in engine_init
init_fn(get_engine_from_settings(engine_name))
File "share/searx/searx/engines/yggtorrent.py", line 42, in init
resp = http_get(url, allow_redirects=False)
File "share/searx/searx/poolrequests.py", line 197, in get
return request('get', url, **kwargs)
File "share/searx/searx/poolrequests.py", line 190, in request
raise_for_httperror(response)
File "share/searx/searx/raise_for_httperror.py", line 60, in raise_for_httperror
raise_for_captcha(resp)
File "share/searx/searx/raise_for_httperror.py", line 43, in raise_for_captcha
raise_for_cloudflare_captcha(resp)
File "share/searx/searx/raise_for_httperror.py", line 30, in raise_for_cloudflare_captcha
raise SearxEngineCaptchaException(message='Cloudflare CAPTCHA', suspended_time=3600 * 24 * 15)
searx.exceptions.SearxEngineCaptchaException: Cloudflare CAPTCHA, suspended_time=1296000
For SearxEngineResponseException this is not needed. Those types of exceptions
can be a normal use case. E.g. for CAPTCHA errors like shown in the example
above. It should be enough to log a warning for such issues:
WARNING:searx.engines:yggtorrent engine: Fail to initialize // Cloudflare CAPTCHA, suspended_time=1296000
closes: #2612
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The old xpath configuration for google scholar did not work and is replaced by a
python implementation.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- unittest2 is a backport of the new features added to the unittest testing
framework in Python 2.7
- unittest2 was only needed in py2 and can be dropped now
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Bing has a list of regions that it supports and some of these regions
may have more than one possible language.
In some cases, like Switzerland, these languages are always shown as
options, so there is no issue. But in other cases, like Andorra, Bing
will only show one language at the time, either the region's default or
the request's language if the latter is supported by that region.
For example, if the HTTP request is in French, Andorra will appear as
fr-AD but if the same page is requested in any other language Andorra
will appear as ca-AD.
This is specially a problem when Bing assumes that the request is in
English because it overrides enough language codes to make several major
languages like Arabic dissappear from the languages.py file.
To avoid that issue, I set the Accept-Language header to a language
that's only supported in one region to hopefully avoid these overrides.