Loading an engine should not exit the application (*). Instead
of exit, return None.
(*) RuntimeError still exit the application: syntax error, etc...
BTW: add documentation and normalize indentation (no functional change)
Suggested-by: @dalf https://github.com/searxng/searxng/pull/116#issuecomment-851865627
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Slightly modified merge of commit [1cb1d3ac] from searx [PR 2543]:
This adds Docker Hub .. as a search engine .. the engine's favicon was
downloaded from the Docker Hub website with wget and converted to a PNG
with ImageMagick .. It supports the parsing of URLs, titles, content,
published dates, and thumbnails of Docker images.
[1cb1d3ac] https://github.com/searx/searx/pull/2543/commits/1cb1d3ac
[PR 2543] https://github.com/searx/searx/pull/2543
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Access to formats can be denied by settings configuration::
search:
formats: [html, csv, json, rss]
Closes: https://github.com/searxng/searxng/issues/95
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
To test & demonstrate this implementation download:
https://liste.mediathekview.de/filmliste-v2.db.bz2
and unpack into searx/data/filmliste-v2.db, in your settings.yml define a sqlite
engine named "demo"::
- name : demo
engine : sqlite
shortcut: demo
categories: general
result_template: default.html
database : searx/data/filmliste-v2.db
query_str : >-
SELECT title || ' (' || time(duration, 'unixepoch') || ')' AS title,
COALESCE( NULLIF(url_video_hd,''), NULLIF(url_video_sd,''), url_video) AS url,
description AS content
FROM film
WHERE title LIKE :wildcard OR description LIKE :wildcard
ORDER BY duration DESC
disabled : False
Query to test: "!demo concert"
This is a rewrite of the implementation from commit [1]
[1] searx/searx@8e90a21
Suggested-by: @virtadpt searx/searx#2808
No functional change, just some linting.
- fix messages from pylint (see below)
- log where general Exceptions are catched (broad-except)
- normalized various indentation
- To avoid clashes with common names, add prefix 'route_' to all @app.route
decorated functions.
Fixed messages::
searx/webapp.py:744:0: C0301: Line too long (146/120) (line-too-long)
searx/webapp.py:756:0: C0301: Line too long (132/120) (line-too-long)
searx/webapp.py:730:9: W0511: TODO, check if timezone is calculated right (fixme)
searx/webapp.py:1:0: C0114: Missing module docstring (missing-module-docstring)
searx/webapp.py:126:8: I1101: Module 'setproctitle' has no 'setthreadtitle' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
searx/webapp.py:126:36: W0212: Access to a protected member _name of a client class (protected-access)
searx/webapp.py:131:4: R1722: Consider using sys.exit() (consider-using-sys-exit)
searx/webapp.py:141:4: R1722: Consider using sys.exit() (consider-using-sys-exit)
searx/webapp.py:255:38: W0621: Redefining name 'request' from outer scope (line 32) (redefined-outer-name)
searx/webapp.py:307:4: W0702: No exception type(s) specified (bare-except)
searx/webapp.py:374:24: W0621: Redefining name 'theme' from outer scope (line 155) (redefined-outer-name)
searx/webapp.py:420:8: R1705: Unnecessary "else" after "return" (no-else-return)
searx/webapp.py:544:4: W0621: Redefining name 'preferences' from outer scope (line 917) (redefined-outer-name)
searx/webapp.py:551:4: W0702: No exception type(s) specified (bare-except)
searx/webapp.py:566:15: W0703: Catching too general exception Exception (broad-except)
searx/webapp.py:613:4: R1705: Unnecessary "elif" after "return" (no-else-return)
searx/webapp.py:690:8: W0621: Redefining name 'search' from outer scope (line 661) (redefined-outer-name)
searx/webapp.py:661:0: R0914: Too many local variables (22/20) (too-many-locals)
searx/webapp.py:674:8: R1705: Unnecessary "else" after "return" (no-else-return)
searx/webapp.py:697:11: W0703: Catching too general exception Exception (broad-except)
searx/webapp.py:748:4: R1705: Unnecessary "elif" after "return" (no-else-return)
searx/webapp.py:661:0: R0911: Too many return statements (9/6) (too-many-return-statements)
searx/webapp.py:661:0: R0912: Too many branches (29/12) (too-many-branches)
searx/webapp.py:661:0: R0915: Too many statements (74/50) (too-many-statements)
searx/webapp.py:931:4: W0621: Redefining name 'image_proxy' from outer scope (line 1072) (redefined-outer-name)
searx/webapp.py:946:4: W0621: Redefining name 'stats' from outer scope (line 1132) (redefined-outer-name)
searx/webapp.py:917:0: R0914: Too many local variables (34/20) (too-many-locals)
searx/webapp.py:917:0: R0912: Too many branches (19/12) (too-many-branches)
searx/webapp.py:917:0: R0915: Too many statements (65/50) (too-many-statements)
searx/webapp.py:1063:44: W0621: Redefining name 'preferences' from outer scope (line 917) (redefined-outer-name)
searx/webapp.py:1072:0: R0911: Too many return statements (9/6) (too-many-return-statements)
searx/webapp.py:1151:4: C0103: Variable name "SORT_PARAMETERS" doesn't conform to '(([a-z][a-zA-Z0-9_]{2,30})|(_[a-z0-9_]*)|([a-z]))$' pattern (invalid-name)
searx/webapp.py:1297:0: R1721: Unnecessary use of a comprehension (unnecessary-comprehension)
searx/webapp.py:1303:0: C0103: Argument name "e" doesn't conform to '(([a-z][a-zA-Z0-9_]{2,30})|(_[a-z0-9_]*))$' pattern (invalid-name)
searx/webapp.py:1303:19: W0613: Unused argument 'e' (unused-argument)
searx/webapp.py:1338:23: W0621: Redefining name 'app' from outer scope (line 162) (redefined-outer-name)
searx/webapp.py:1318:0: R0903: Too few public methods (1/2) (too-few-public-methods)
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
pylint message: wrong-import-order
Respect PEP8 import order (standard imports first, then third-party libraries,
then local imports).
pylint message: wrong-import-position
Do not mix code & imports
BTW:
- only one import per line
- replace licence text by SPDX tag
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Remove extension of the sys.path (aka PYTHONPATH). Running instance directly
from repository's folder is a relict from the early beginning in
2014 (fd651083f) and is no longer supported.
Since commit dd46629 was merged the command line 'searx-run' exists and should
be used.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- Use result 'alt_description' as title, if not given use
default title 'unknown'.
- Use result 'description' from unsplash as 'content'
Fix error::
DEBUG:searx:result: invalid title: {..., 'title': None, 'content': '', 'engine': 'unsplash'}
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
No functional change!
- fix messages from pylint
- add ``global THREADLOCAL``
- normalized various indentation
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Remove 'template' from result. Engine genius should
not use the video template. BTW: fix indentations
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- pylint searx/engines/xpath.py
- fix indentation of some long lines
- add logging
- add doc-strings
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- init stat values by None
- drop round_or_none
- don't try to get percentage if base is 'None'
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Add a new environment variable SEARX_DISABLE_ETC_SETTINGS
to disable loading of /etc/searx/settings.yml
unit tests:
* set SEARX_DISABLE_ETC_SETTINGS to 1
* remove SEARX_SETTINGS_PATH if it exists
Based on commits
- 0507e185 [fix] bar graph and rename CSS class engine-scores -> engine-score
- 3e9ad7ae [fix] make /stats more CSP compliant - github issue form
- 34859d0e [fix] make /stats more CSP compliant - oscar theme
- 0a6c4884 [fix] make /stats more CSP compliant - simple theme
- cdfb4b7f [fix] make /stats more CSP compliant - bar graph
- 965817f2 [fix] simple theme - generate missing sourceMap file
this patch is generated by::
make themes.all
Reported-by: https://github.com/searxng/searxng/issues/57
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- drop #main_stats selector in stats.less
- 'engine-score' exists before this PR.
- untabify searx/static/themes/__common__/less/stats.less
for details see comment at: d93bec7638..1204e4f07e (r633571496)
Suggested-by: @dalf in commit 1204e4f0
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Upgraded [v3.3.0] otherwise::
` width: calc(100% - 5rem);`
becomes `width: 95%` once compiled by less version 1.4.1.
[v3.3.0] https://github.com/gruntjs/grunt-contrib-less/releases/tag/v3.0.0
Suggested-by: @dalf in commit 1204e4f0
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Make 'soft_max_redirects' configurable per Xpath engine::
- name : <engine-name>
engine : xpath
soft_max_redirects: 1
...
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
it prepares the new architecture change,
everything about multithreading in moved in the searx.search.* packages
previously the call to the "init" function of the engines was done in searx.engines:
* the network was not set (request not sent using the defined proxy)
* it requires to monkey patch the code to avoid HTTP requests during the tests
* [mod] option to enable or disable "proxy" button next to each result
Closes: https://github.com/searxng/searxng/issues/51
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Co-authored-by: Alexandre Flament <alex@al-f.net>
When there is at least one errors or one failed checker test:
* the warning icon is displayed in the reliability column
* the link "View error logs and submit a bug report" is displayed on engine name tooltip.
Before:
* the warning icon was displayed only when one or more checker test(s) failed.
* the link "View error logs and submit a bug report" was not shown when a checker test failed but there were no error.
In the preference page, in the 'about' toolbox of an engine, add a link to the
stats page of the engine, if the engine had one or more errors.
Condition is::
reliabilities[<engine.name>].errors
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Inline styles are blocked by default with Content Security Policy (CSP). Move
the inline styles from 'new_issue.html' to::
searx/static/themes/__common__/less/new_issue.less
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
File searx/static/themes/simple/less/stats.less is not used (imported) in any
other less file. I can't say when it's usage was dropped or if it has ever been
used. ATM this file is without any usage.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* searx.network.client.LOOP is initialized in a thread
* searx.network.__init__ imports LOOP which may happen
before the thread has initialized LOOP
This commit adds a new function "searx.network.client.get_loop()"
to fix this issue
The issue exists only in the debug log::
--- Logging error ---
Traceback (most recent call last):
File "/usr/lib/python3.9/logging/__init__.py", line 1086, in emit
stream.write(msg + self.terminator)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 79-89: ordinal not in range(128)
Call stack:
File "/usr/local/searx/searx-pyenv/lib/python3.9/site-packages/flask/app.py", line 2464, in __call__
return self.wsgi_app(environ, start_response)
File "/usr/local/searx/searx-src/searx/webapp.py", line 1316, in __call__
return self.app(environ, start_response)
File "/usr/local/searx/searx-pyenv/lib/python3.9/site-packages/werkzeug/middleware/proxy_fix.py", line 169, in __call__
return self.app(environ, start_response)
File "/usr/local/searx/searx-pyenv/lib/python3.9/site-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/searx/searx-pyenv/lib/python3.9/site-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/searx/searx-pyenv/lib/python3.9/site-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/usr/local/searx/searx-src/searx/webapp.py", line 766, in search
number_of_results=format_decimal(number_of_results),
File "/usr/local/searx/searx-pyenv/lib/python3.9/site-packages/flask_babel/__init__.py", line 458, in format_decimal
locale = get_locale()
File "/usr/local/searx/searx-pyenv/lib/python3.9/site-packages/flask_babel/__init__.py", line 226, in get_locale
rv = babel.locale_selector_func()
File "/usr/local/searx/searx-src/searx/webapp.py", line 249, in get_locale
logger.debug("%s uses locale `%s` from %s", request.url, locale, locale_source)
Unable to print the message and arguments - possible formatting error.
Use the traceback above to help find the error.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- add to list of pylint scripts
- add debug log messages
- move API key int `settings.yml`
- improved readability
- add some metadata to results
Signed-off-by: Markus Heiser <markus@darmarit.de>
Springer Nature is a global publisher dedicated to providing service to research
community [1] with official API [2].
To test this PR, first get your API key following this page:
https://dev.springernature.com/signup
In searx/engines/springer.py at line 24, add this API key. I left my own key,
commented out in the line aboce. Feel free to use it, if needed.
[1] https://www.springernature.com/
[2] https://dev.springernature.com/
The new sci-hub URLs are comming from @aurora-vasiliev [1].
[1] https://github.com/searx/searx/pull/2706
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
In the EU there exists a "General Data Protection Regulation" [1] aka GDPR (BTW:
very user friendly!) which requires consent to tracking. To get the consent
from the user, youtube requests are redirected to confirm and get a CONSENT
Cookie from https://consent.youtube.com
This patch adds a CONSENT Cookie to the youtube request to avoid redirection.
[1] https://en.wikipedia.org/wiki/General_Data_Protection_Regulation
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Reported-by: https://github.com/searx/searx/issues/2774
* display the median time instead of the average.
* add a "Reliability" column (sum up the metrics and the checker results).
* the "selected language", "SafeSearch", "Time range" values are displayed as "broken" when the checker tests fail.
Some error won't stop the engine:
* additional HTTP redirects for example
* some invalid results
secondary=True allows to flag these errors as not important.
Report to the user suspended engines.
searx.search.processor.abstract:
* manages suspend time (per network).
* reports suspended time to the ResultContainer (method extend_container_if_suspended)
* adds the results to the ResultContainer (method extend_container)
* handles exceptions (method handle_exception)
Some engine do have set result.img_src, other return a result.thumbnail. If
result.img_src is unset and a result.thumbnail is given, show it to the UI.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
I also found some items missing a thumbnail and I used text_extract for content
and title, to remove unneeded whitespaces.
BTW: added bandcamp's favicon
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
settings.yml:
* outgoing.networks:
* can contains network definition
* propertiers: enable_http, verify, http2, max_connections, max_keepalive_connections,
keepalive_expiry, local_addresses, support_ipv4, support_ipv6, proxies, max_redirects, retries
* retries: 0 by default, number of times searx retries to send the HTTP request (using different IP & proxy each time)
* local_addresses can be "192.168.0.1/24" (it supports IPv6)
* support_ipv4 & support_ipv6: both True by default
see https://github.com/searx/searx/pull/1034
* each engine can define a "network" section:
* either a full network description
* either reference an existing network
* all HTTP requests of engine use the same HTTP configuration (it was not the case before, see proxy configuration in master)
This patch is an addition to PR #2656 which removed all usage of `base_url` from
the templates, except one was forgotten in the cookie URL of the preferences.
closes: 2740
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
I can't set `default_doi_resolver` in `settings.yml` if I'm using
`use_default_settings`. Searx seems to try to interpret all settings at root
level in `settings.yml` as dict, which is correct except for
`default_doi_resolver` which is at root level and a string::
File "/usr/lib/python3.9/site-packages/searx/settings_loader.py", line 125, in load_settings
update_settings(default_settings, user_settings)
File "/usr/lib/python3.9/site-packages/searx/settings_loader.py", line 61, in update_settings
update_dict(default_settings[k], v)
File "/usr/lib/python3.9/site-packages/searx/settings_loader.py", line 48, in update_dict
for k, v in user_dict.items():
AttributeError: 'str' object has no attribute 'items'
Signed-off-by: Markus Heiser <markus@darmarit.de>
Suggested-by: @0xhtml https://github.com/searx/searx/issues/2722#issuecomment-813391659
The `url_for` function in the template context is not the one from Flask, it is
the one from `webapp`. The `webapp.url_for_theme` is different from its
namesake of Flask and has it quirks, when called with argument `_external=True`.
The `webapp.url_for_theme` can't handle absolute URLs since it pokes a leading
'/', here is the snippet of the old code::
url = url_for(endpoint, **values)
if settings['server']['base_url']:
if url.startswith('/'):
url = url[1:]
url = urljoin(settings['server']['base_url'], url)
Next drawback of (Flask's) `_external=True` is, that it will not return the HTTP
scheme when searx (the Flask app) listens on http and is proxied by a https
server.
To get the right scheme `HTTP_X_SCHEME` is needed by Flask (werkzeug). Since
this is not provided in every environment (e.g. behind Apache mod_wsgi or the
HTTP header is not fully set for some other reasons) it is recommended to
get *script_name*, *server* and *scheme* from the configured `base_url`. If
`base_url` is specified, then these values from are given preference over any
Flask's generics.
BTW this patch normalize to use `url_for` in the `opensearch.xml` and drop the
need of `host` and `urljoin` in template's context.
Signed-off-by: Markus Heiser <markus@darmarit.de>
Instead of a hard-coded `oadoi.org` default, use the default value from
`settings.yml`.
Fix an issue in the themes: The replacement 'current_doi_resolver' contains the
doi_resolver_url, not the name of the DOI resolver. Compare return value of::
searx.plugins.oa_doi_rewrite.get_doi_resolver(...)
Fix a typo in `get_doi_resolver(..)`: suggested by @kvch:
*L32 should set doi_resolver not doi_resolvers*
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
fr.wikipedia.org (and it seems not other wikipedia websites),
adds HTML to api_result['displayTitle'].
(Search for '!wp :fr Braid' for example)
The commit uses api_result['title']
the json response has been changed and it contains html chunks which is
not compatible with our json engine, so we have to switch to html/xpath
parsing
The get_cliend_id() function:
* fetches https://soundcloud.com
* then fetches each referenced javascript URL to get the client id.
This commit fetches the javascript URLs in the reverse order: the client id is in the last javascript URL.
Added a line to the yacy entry to enable HTTP if the local yacy instance isn't using HTTPS. Otherwise, an error will be thrown in the logs: "No connection adapters were found for 'http://localhost:8090/yacysearch.json...'". This is likely related to ticket #2641 that forces HTTPS by default.
See https://github.com/requirejs/requirejs/issues/1816
requirejs loads one file: leaflet.
This commit:
* removes requirejs
* load leaflet using <script src...> HTML tag in searx/templates/oscar/base.html
Many things have been changed since last review of this engine. This patch fix
xpath selectors, implements suggestion and is a complete review / rewrite of the
engine.
Signed-off-by: Markus Heiser <markus@darmarit.de>
When initing engines a "SearxEngineResponseException" is logged very verbose,
including full traceback information:
ERROR:searx.engines:yggtorrent engine: Fail to initialize
Traceback (most recent call last):
File "share/searx/searx/engines/__init__.py", line 293, in engine_init
init_fn(get_engine_from_settings(engine_name))
File "share/searx/searx/engines/yggtorrent.py", line 42, in init
resp = http_get(url, allow_redirects=False)
File "share/searx/searx/poolrequests.py", line 197, in get
return request('get', url, **kwargs)
File "share/searx/searx/poolrequests.py", line 190, in request
raise_for_httperror(response)
File "share/searx/searx/raise_for_httperror.py", line 60, in raise_for_httperror
raise_for_captcha(resp)
File "share/searx/searx/raise_for_httperror.py", line 43, in raise_for_captcha
raise_for_cloudflare_captcha(resp)
File "share/searx/searx/raise_for_httperror.py", line 30, in raise_for_cloudflare_captcha
raise SearxEngineCaptchaException(message='Cloudflare CAPTCHA', suspended_time=3600 * 24 * 15)
searx.exceptions.SearxEngineCaptchaException: Cloudflare CAPTCHA, suspended_time=1296000
For SearxEngineResponseException this is not needed. Those types of exceptions
can be a normal use case. E.g. for CAPTCHA errors like shown in the example
above. It should be enough to log a warning for such issues:
WARNING:searx.engines:yggtorrent engine: Fail to initialize // Cloudflare CAPTCHA, suspended_time=1296000
closes: #2612
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The old xpath configuration for google scholar did not work and is replaced by a
python implementation.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- unittest2 is a backport of the new features added to the unittest testing
framework in Python 2.7
- unittest2 was only needed in py2 and can be dropped now
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>