In PR #1071 the language catalog of dailymotion has been cleaned up, before
there had been over 7000 "languages" in the catalog.
As a side effect of this clean-up the language & region catalog in SearXNG has
been reduced [1].
This patch reduce the ``min_engines_per_lang`` from 13 to 12 to get the missed
languages back in language & region catalog of SearXNG.
[1] 3bb62823ec (diff-f3f00db0f87f95b882624a192e0aac21525638af0b18c9514e765fcf1991678d)
Requested-by: @tiekoetter in a Matrix chat
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- fix the issue of fetching more the 7000 *languages*
- improve the request function and filter by language & country
- implement time_range_support & safesearch
- add more fields to the response from dailymotion (allow_embed, length)
- better clean up of HTML tags in the 'content' field.
This is more or less a complete rework based on the '/videos' API from [1].
This patch cleans up the language list in SearXNG that has been polluted by the
ISO-639-3 2 and 3 letter codes from dailymotion languages which have never been
used.
[1] https://developers.dailymotion.com/tools/
Closes: https://github.com/searxng/searxng/issues/1065
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
There are several reasons why we should prefer markdown-it-py over mistletoe:
- Get identical rendering results in SearXNG's `/info` pages and the SearXNG's
project documentation which is build by Sphinx-doc.
In the Sphinx-doc we use the MyST parser to render Markdown and the MyST
parser itself is built on top of the markdown-it-py package.
- markdown-it-py has a typographer that supports *replacements*
and *smartquotes* (e.g. em-dash, copyright, ellipsis, ...) [1]
- markdown-it-py is much more flexible compared to mistletoe [2]
- markdown-it-py is the fastest CommonMark compliant parser in python [3]
[1] https://markdown-it-py.readthedocs.io/en/latest/using.html#typographic-components
[2] https://markdown-it-py.readthedocs.io/en/latest/plugins.html
[3] https://markdown-it-py.readthedocs.io/en/latest/other.html#performance
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* update search input form params; inspiried by whoogle
* remove autofocus from result page input form (JS impl. as well as input param)
-> autofocus on landing page still works only on desktop and tablet with JS impl.
* update landing page margins on mobile
* rework border and radius for search form to 0.8rem and outline
* remove positioning from autocomplete JS lib and use CSS impl.
* match search box and autocomplete width
* rework search form to a google like design on mobile
* fix settings icon display withg RTL on mobile on result page when search input is empty
* less-plugin-clean-css: no updated version.
@wikipedia/less-plugin-clean-css might be an alternative.
* stylelint & stylelint-config-standard
the new versions require configuration and source code changes
Since PR 932 [1][2] static files can't be delivered by HTTP server any longer.
This patch makes the hash paramter in the URL of static files:
/static/themes/simple/css/searxng.min.css?5fde34a74bc438c7b56ec8c6501e131cc9914bd8
optional. By default the hash parameter is disabled.
HINT:
Instances that do not deliver static files by their HTTP server and have a
long expire time [3] should enable this option.
----
This is only a interim solution, on the long run:
make static.build.commit
creates files including the file name:
css/searxng-5fde34a74bc438c7b56ec8c6501e131cc9914bd8.min.css
and a mapping.json with this content[4]
[1] https://github.com/searxng/searxng/issues/964
[2] https://github.com/searxng/searxng/pull/932#issuecomment-1067039518
[3] 5583336440
[4] https://github.com/searxng/searxng/pull/932#issuecomment-1067216426
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The <input type="reset"> introduced in the PR 894, restores the default value.
It works in the index page, but it doesn't work in the /search page:
the reset button restore the initial query.
This PR:
* fix the JS version: the reset button clear the text
* keep the clear button in the / page
* hide the clear button in the /search page
SearXNG shows two different things:
region:
"de-CH" is the equivalent of "Schweiz (de)" in DDG.
languages:
"en" doesn't say anything about the location. It is up the engines to do their
best to select English results without a region.
Suggested-by: @dalf https://github.com/searxng/searxng/pull/967#issuecomment-1072979693
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
With this patch ``searxng.msg`` files can be added to SearXNG. In
``searxng.msg`` files messages can be defined which are not captured by babel's
gettext, like the generic names of the categories or messages that are stored in
constants.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
cache_property has been added in py3.8 [1]
To support cache_property in py3.7 the implementation from 3.8 has been
copied to compat.py. This code can be cleanup with EOL of py3.7.
[1] https://docs.python.org/3/library/functools.html#functools.cached_property
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This patch implements a bolierplate to share content from info-pages of the
SearXNG instance (URL /info) with the project documentation (path /docs/user).
The info pages are using Markdown (CommonMark), to include them in the project
documentation (reST) the myst-parser [1] is used in the Sphinx-doc build chain.
If base_url is known (defined in settings.yml) links to the instance are also
inserted into the project documentation::
searxng_extra/docs_prebuild
[1] https://www.sphinx-doc.org/en/master/usage/markdown.html
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The previous value was 80em (1280px).
Some desktop screens have this resolution,
and tablet layout takes too much space in this configuration
This PR switch to the table layout for screen width strictly below 1280px.
Close https://github.com/searxng/searxng/issues/874
* drop image_layout.js from simple theme
* move image_layout.js to oscar theme and delete common js dir (since its empty now)
* align top position of image detail modal with bottom position of search header
* use flexbox to display images; row height can be set via @results-image-row-height in defenitions.less
* display span title underneath each image with a max width of 12rem
* increase margin and padding around image article on desktop and tablet
* make article height smaller on phone layout (height of 6rem) to display more content on current view
* remove content from result, if the title and content matches
* use a group that cotains the flex image article, if images are mixed with other categories
* fix pylint issues in webapp.py
* use the default.html result template in unit tests (thanks @return42)
Add player:
- The players are just playing 30sec from the title. Some of the player will be
blocked because of a cross-origin request and some players will link to apple
when you press the play button.
Avoid exceptions and (and BTW improve results)
- ERROR searx.engines.genius : list index out of range
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The 'scrap_img_by_id' function didn't return any longer anything useful. This
fix allows the google images engine to present the full source image instead of
only the thumbnail.
The function scrap_img_by_id() is rpelaced by a fully rewrite to parse image
URLs by a regular expression. The new function parse_urls_img_from_js(dom)
returns a mapping of data-id to image URL.
Closes: https://github.com/searxng/searxng/issues/909
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This commit sets appropriate height of the (embedded) player from:
- soundcloud
- mixcloud
- deezer
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This is a rewrite of the hostname_replace.py that:
- don't stop to replace URL in fields ('data_src', 'audio_src') if there isn't a
'parsed_url',
- adds a comment about keep or remove a result from the result list
- adds a loop over ['data_src', 'audio_src'] instead of doubling code lines
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Embedded HTML breaks SearXNG architecture. To modularize, HTML is generated in
the templates (oscar & simple) and result parameter 'embedded' is replaced by
'data_src' (and 'audio_src'), an URL for embedded content (<iframe>).
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
To test you need to redirect embeded videos (e.g.) from youtube to a invidios
instance. Search for videos using engine `!youtube lebowski`. The result URLs
and the embeded videos should link to the invidios instance.
Here is an example of such a `hostname_replace` configuration::
hostname_replace:
# youtube --> Invidious
'(.*\.)?youtube-nocookie\.com': 'invidio.xamh.de'
'(.*\.)?youtube\.com$': 'invidio.xamh.de'
'(.*\.)?invidious\.snopyta\.org$': 'invidio.xamh.de'
'(.*\.)?vid\.puffyan\.us': 'invidio.xamh.de'
'(.*\.)?invidious\.kavin\.rocks$': 'invidio.xamh.de'
'(.*\.)?inv\.riverside\.rocks$': 'invidio.xamh.de'
Closes: https://github.com/searxng/searxng/issues/873
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Embedded HTML breaks SearXNG architecture. To modularize, HTML is generated in
the templates (oscar & simple) and result parameter 'embedded' is replaced by
'data_src', an URL for embedded content (<iframe>).
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Openstreatmap images are now loaded from uploads.wikimedia.org instead of
commons.wikimedia.org to prevent redirects.
With `image_proxy` enabled images from commons.wikimedia.org cant be loaded
since they are redirected. We already discussed this issue [875] and
@tiekoetter fixed this issue in PR [878].
Related-to:
- [875] https://github.com/searxng/searxng/issues/875
- [878] https://github.com/searxng/searxng/pull/878
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Wikidata info box images are now loaded from uploads.wikimedia.org instead of commons.wikimedia.org to prevent redirects
Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
Two different threads ( = two different user queries) can call the request
function in a row and then the response function. The namespace will be same
since this is the same engine.
To keep exactly the same value ``base_url`` must be stored in params and then
retrieve using ``resp.search_params["base_url"]``.
Suggested-by: @dalf https://github.com/searxng/searxng/pull/862#discussion_r799324861
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Two different threads ( = two different user queries) can call the request
function in a row and then the response function. The namespace will be same
since this is the same engine.
To keep exactly the same value ``base_url`` must be stored in params and then
retrieve using ``resp.search_params["base_url"]``.
Suggested-by: @dalf https://github.com/searxng/searxng/pull/862#discussion_r799324861
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
can replace filtron:
* rate limite the number of request per IP and per (IP, User-Agent)
* block some bots
use Redis
data stored in Redis never contains the IP addresses, only HMAC using the secret_key
Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
Now that about.html extends page_with_header.html
it already has a link to the start page and removing
the link makes it easier to extract the page title
from the Markdown for the following commit.
Currency engine has DuckDuckGo metadata
In the engine selector of the preferences window, the currency search engine has
the same metadata and wikidata url as duckduckgo, I'd assume there should be a
difference of some sort there clarifying what source the currency uses or, if
it's a duckduckgo service, at least clarifying that it's a currency service by
duck duck go.
Closes: https://github.com/searxng/searxng/issues/787
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Previously the preferences & stats templates contained the markup:
<a href="{{ url_for('index') }}"><h1><span>SearXNG</span></h1></a>
There are many things wrong with this:
1. the markup was duplicated
2. the CSS needed to be changed whenever a new page wanted to use this
header (since the CSS used page-specific selectors)
3. h1 should be reserved for the actual page title
(e.g. Preferences or Engine stats)
4. the image was set via CSS which also set:
span { visibility: hidden; }
which however removes the alternative text from the accessibility
tree (meaning screen readers will ignore it).
This commit fixes all these problems.
Other optional parameter ..
`&sort=crawl_date`
can be appended to search_string to sort results by date.
`&domain=example.org`
can be implemented to search_string to get results from just one domain.
Public instances could get relatively fast timed-out for 3600s.
--
Merged from @allendema's commit [1] and slightly modfied / see [2].
Related-to: [1] 455b2b4460
Related-to: [2] https://github.com/searx/searx/pull/3040
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Check 'using_tor_proxy' for each engine individually instead of checking globally
[fix] searx.network: update _rdns test to the last httpx version
Co-authored-by: Alexandre Flament <alex@al-f.net>
The macro "checkbox" in macros.html uses the macro "icon_small"
from icons.html
The commit imports icon_small in macros.html to fix the issue.
It works because the macros in macros.html are imported with the Jinja2 context.
See https://jinja.palletsprojects.com/en/3.0.x/templates/#import-visibilityclose#819
Engine description can be configured, this is needed e.g. by custom search
engines. Here is an example of a command engine with a description in the about
section::
- name: locate
engine: command
command: ['locate', '{{QUERY}}']
disabled: true
categories: files
about:
description: local files
website: 'https://www.man7.org/linux/man-pages/man1/locate.1.html'
delimiter:
chars: ' '
keys: ['line']
Closes: https://github.com/searxng/searxng/issues/788
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Use httpx.Response.json() to avoid charset_normalizer issues:
DEBUG charset_normalizer : override steps (5) and chunk_size (512) as content does not fit (153 byte(s) given) parameters.
INFO charset_normalizer : ascii passed initial chaos probing. Mean measured chaos is 0.000000 %
DEBUG charset_normalizer : ascii should target any language(s) of ['Latin Based']
INFO charset_normalizer : ascii is most likely the one. Stopping the process.
[1] https://www.python-httpx.org/api/#response
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Currently we have two kinds of user documentation:
* the about page[1] which is written in HTML and part of the web
application and can therefore link instance-specific pages
(like e.g. the preferences) via Jinja variables
* the Sphinx documentation[2] which is written in reStructuredText
and cannot link instance-specific pages since it doesn't know
which instance the user is using
The plan is to integrate the user documentation currently in Sphinx
into the application, so that it can also link instance specific pages.
We also want to enable the user documentation to be translated.
This commit implements the first step in this endeavor (see #722).
[1]: searx/templates/__common__/about.html
[2]: docs/user/ (currently served at https://docs.searxng.org/user/)
Since https://github.com/searxng/searxng/pull/354
the searx.network.stream(...) returns a tuple
This commits update the checker code according to
this function signature change.
webapp.py monkey-patches the Flask request global.
This commit adds a type cast so that e.g. Pyright[1]
doesn't show "Cannot access member" errors everywhere.
[1]: https://github.com/microsoft/pyright
* mirror all inline SVGs so that direction SVGs display correctly on RTL
* set the bold list element in info box to RTL so the colon gets displayed on the right side
* set correct .ltr function for the left border on the search button in #q
* move text to the right in autocomplete
* move search form in lign with result article on RTL
* add the correct padding for img thumbnails in categories like music on RTL
* apply RTL to result table for map results
* align text in tables part of /preferences on RTL
* move burger menu on index page to the left on RTL
* fix positioning of drop down arrow on select boxes on RTL
* align result URL on the right (written LTR)
* align vim hotkeys help on the left since it is not translated
* image detail:
* labels (author, format, URL, etc...) are written on the right,
values are on the left.
* URL are written LTR and overflow on the right
The less grunt runner silently ignore missing files and continue with the build[1]::
Running "less:production" (less) task
>> Destination css/searxng.min.css not written because no source files were found.
>> 1 stylesheet created.
>> 1 sourcemap created.
Add filter function that calls grunt.fail() if the scr file does not exists.
[1] https://github.com/searxng/searxng/pull/750#discussion_r784357031
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Bangs with a `*` suffix (e.g. `!!d*`) overwrite Bangs with the same
prefix (e.g. `!!d`) [1]. This can be avoid when a non printable character is
used to tag a LEAF_KEY.
[1] https://github.com/searxng/searxng/pull/740#issuecomment-1010411888
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
There is an issue with redis v4.1.0 [1] / for the interim lets remove this
python dependency.
[1] https://github.com/searxng/searxng/issues/741
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
An ambiguous bang like `!!d` raises an exception in function get_bang_url(). A
bang is only unique when the bang_definition from get_bang_definition_and_ac() is
a string / for a ambiguous bang the returned bang_definition is a dictionary.
Reported-by: user prg at #searxng:matrix.org on 2022/01/11
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
In case of CAPTCHA raise a SearxEngineCaptchaException and suspend for 7 days.
When get_sc_code() fails raise a SearxEngineResponseException and suspend for 7
days.
[1] https://github.com/searxng/searxng/pull/695
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Startpage has introduced new anti-scraping measures that make SearXNG instances
run into captchas:
1. some arguments has been removed and a new `sc` has been added.
2. search path changed from `do/search` to `sp/search`
3. POST request is no longer needed
Closes: https://github.com/searxng/searxng/issues/692
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
api.openverse.engineering is a little picky and wants to have a trailing slash
in the path:
/v1/images? -->/ v1/images/?
otherwise it redirects, here is the debug log:
DEBUG searx.network.openverse : HTTP Request: GET https://api.openverse.engineering/v1/images?&page=1&page_size=20&format=json&q=foo "HTTP/2 301 Moved Permanently" (text/html; charset=utf-8)
DEBUG searx.network.openverse : HTTP Request: GET https://api.openverse.engineering/v1/images/?&page=1&page_size=20&format=json&q=foo "HTTP/2 200 OK" (application/json)
WARNING searx.engines.openverse : ErrorContext('searx/search/processors/online.py', 105, 'count_error(', None, '1 redirects, maximum: 0', ('200', 'OK', 'api.openverse.engineering')) True
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The implementation of the etools engine is poor. No date-range support, no
language support and it is broken by a CAPTCHA.
etools is a metasearch engine, the major search engines it supports (google,
bing, wikipedia, Yahoo) are already available in SeaarXNG.
While etools does support several engines we currently don't support directly,
support for them should be added directly to SearXNG if there is demand.
In practice: in SearXNG the worse etools results will be mixed with good results
from other engines we have (as long as there is no captcha).
At best case, what we win with etools is in e.g. results from de.ask.com in a
query from a german request .. in all other cases worse results are bubble up in
SearXNG's result list.
[1] https://github.com/searxng/searxng/issues/696#issuecomment-1005855499
Closes: https://github.com/searxng/searxng/issues/696
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The previous implementation used two hash sets and a list.
... that's not necessary ... a single hash map suffices.
And it's also less error prone ... because the previous data structure
allowed a setting to be enabled and disabled at the same time.
Previously the default_value was abused for the cookie name.
Having SwitchableSetting subclass Setting doesn't even make sense
in the first place since none of the Setting methods apply.
The ? search operator has been broken for some time and
currently only raises the question why it's still there.
## Context ##
The query "Paris !images" searches for "Paris" in the "images" category.
Once upon a time Searx supported "Paris ?images" to search for "Paris"
in the currently enabled categories and the "images" category.
The feature makes sense ... the ? syntax does not.
We will hopefully introduce a +!images syntax in the future.
Fixes#702.
* allow not to record metrics (response time, etc...)
* this commit doesn't change the UI. If the metrics are disabled
/stats and /stats/errors will return empty response.
in /preferences, the columns response time and reliability will be empty.
The tab icon names are currently hard coded in the templates.
This commit lets us introduce an icon property in the future, e.g:
categories_as_tabs:
general:
icon: search-outline
These dictionaries are no longer part of the general category,
so they're no longer queried by default -> we can enable them
by default without degrading general query performance.
The general category is the category that is searched by default.
From a privacy standpoint it doesn't make sense to send all general
queries to specialized search engines that cannot deal with those
queries anyway.
Previously we didn't have a good place to put search engines that don't
fit into any of the tab categories. This commit automatically puts
search engines that don't belong to any tab category in an "other"
category, that is only displayed in the user preferences (and not above
search results).
Add a redis connector, the default DB connector is a socket at::
unix:///usr/local/searxng-redis/run/redis.sock?db=0
To set up a redis instance simply use::
$ ./manage redis.build
$ sudo -H ./manage redis.install
A hint for developers:
To get access rights to this instance, your developer account needs to be added
to the *searxng-redis* group::
$ sudo -H ./manage redis.addgrp "${USER}"
# don't forget to logout & login to get member of group
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Previously all categories were displayed as search engine tabs.
This commit changes that so that only the categories listed under
categories_as_tabs in settings.yml are displayed.
This lets us introduce more categories without cluttering up the UI.
Categories not displayed as tabs can still be searched with !bangs.
Add event listener to query selector::
'#urls img.image'
From the user point of view, I think it is better to hide the image:
img_load_error.svg is helplful in the image category because it still allows to
select the image. IMO, in the news category, the fact there is a missing image
won't help to choose the links. From a developer point of view, the place holder
is signal that may be the engine needs to be updated (at least give a look). The
browser console should show the same information too, but it requires some
additional steps. [1]
[1] https://github.com/searxng/searxng/pull/610#issuecomment-997640132
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>