Commit Graph

4240 Commits

Author SHA1 Message Date
Noémi Ványi
ffaf785f82
Merge pull request #2533 from mrwormo/ccengine
[Engine] Add Creative Commons search engine
2021-02-04 22:35:08 +01:00
mrwormo
c4c1636b18 Add Creative Commons search engine 2021-02-04 11:31:35 +01:00
Noémi Ványi
006f206dc9
Merge pull request #2530 from return42/fix-user-hb
[fix] make books/user.pdf
2021-02-02 20:50:35 +01:00
Markus Heiser
89554e42a9 [fix] make books/user.pdf
Error:

  Configuration error:
  There is a programmable error in your configuration file:
  ...
  NameError: name 'DOCS_URL' is not defined
  make: *** [utils/makefile.sphinx:156: books/user.latex] Fehler 2

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-02-02 20:14:07 +01:00
Alexandre Flament
90b9d0d6a8 [mod] CI: minor changes
* utils/makefile.python: travis-gh-pages renamed ci-gh-pages
2021-02-02 08:53:57 +01:00
Alexandre Flament
34de715e62
Merge pull request #2500 from dalf/github-action-data
[enh] every Sunday, call utils/fetch_*.py scripts and create a PR automatically
2021-02-01 17:16:58 +01:00
Alexandre Flament
1742355eb8
Merge pull request #2499 from dalf/remove-language-support-variable
[mod] dynamically set language_support variable
2021-02-01 17:16:18 +01:00
Alexandre Flament
ca93a01844 [mod] dynamically set language_support variable
The language_support variable is set to True by default,
and set to False in only 5 engines.

Except the documentation and the /config URL, this variable is not used.

This commit remove the variable definition in the engines, and
set value according to supported_languages length: False when the length is 0,
True otherwise.

Close #2485
2021-02-01 17:10:37 +01:00
Alexandre Flament
99244440e4
Merge pull request #2514 from return42/fix-gh-pages
[fix] Makefile target gh-pages & flatten history of branch gh.pages
2021-02-01 17:07:08 +01:00
Alexandre Flament
0a8799b834
Merge pull request #2517 from dalf/debug-ci
Update pyenv pyenvinstall Make targets
2021-02-01 17:01:34 +01:00
Markus Heiser
8c45f1149d [hardening] github workflows - corrupted cache
aka: ensure that 'make test' works as expected

The cache contains a copy './local' which is - under some circumstance -
corrupted.  It is not possible to clear the cache [1] (see the top of the page).

Ensure that 'make test' works as expected [2] even if

- the python interpreter is missing
- the virtualenv exists but pyyaml is missing

To hardening when the workflow cache fails, this patch adds the new target
'travis.test' into the workflow.  This target probes to import a python module
'yaml'.  If this fails the virtualenv will be completely new build.

[1] https://github.com/actions/cache/issues/2#issuecomment-673493515
[2] https://github.com/searx/searx/pull/2517#discussion_r567240235

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-02-01 16:58:04 +01:00
Markus Heiser
38b39ef0ae [fix] re-add 'pip-exe' target - partial revert 9b48ae47
Target pip-exe is a prerequisite of the targets:

  - pyinstall
  - pyuninstall

and was accidentally deleted in commit 9b48ae47.

HINT:
  do not confuse pyinstall with penvinstall

pyinstall & pyuninstall
    Installing into user's HOME using pip from OS,
    therefore the message is needed.

pyenvinstall & pyenvuninstall
    Installing into virtualenv (./local) using pip which is provided by
    prerequisite 'pyenv' in the virtualenv.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-02-01 16:58:04 +01:00
Alexandre Flament
d70c5a621a [mod] more robust make pyenv / make pyenvinstall
"make pyenv" ensures that ./local/py3/bin/python is an executable
2021-02-01 16:58:04 +01:00
Alexandre Flament
806af50738
Merge pull request #2494 from return42/rm-fabfile
[fix] remove Fabric file
2021-02-01 15:09:35 +01:00
Markus Heiser
40d2a116e1 [fix] Makefile target gh-pages & flatten history of branch gh.pages
1. This patch fixes error:

    rm -rf gh-pages/
    make V=1 gh-pages
    make[1]: Leaving directory '/800GBPCIex4/share/searx'
    [ -d "gh-pages/.git" ] || git clone  gh-pages
    fatal: repository 'gh-pages' does not exist

2. The gh-page build has been moved to ./build/gh-pages this also affects
   'travis-gh-pages'

3. The gh-pages commit messages now includes a ref to the repository and commit

4. Since a gh-pages history has only the drawback that the reposetory grows
   fast, this patch also flattens the history:

    cd build/gh-pages/; git log --oneline
    bash: cd: build/gh-pages/: Datei oder Verzeichnis nicht gefunden
    026126be (HEAD -> gh-pages, origin/gh-pages) make gh-pages: from https://github.com/return42/searx.git@71d66979c2935312e0aed7fc7c3cf6199fbe88a2

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-29 11:41:48 +01:00
Alexandre Flament
71d66979c2
Merge pull request #2482 from return42/fix-google-video
[fix] revise of the google-Video engine
2021-01-28 11:11:07 +01:00
Markus Heiser
7f505bdc6f [fix] google: avoid unnecessary SearxEngineXPathException errors
Avoid SearxEngineXPathException errors when parsing non valid results::

    .//div[@class="yuRUbf"]//a/@href index 0 not found
    Traceback (most recent call last):
      File "./searx/engines/google.py", line 274, in response
        url = eval_xpath_getindex(result, href_xpath, 0)
      File "./searx/searx/utils.py", line 608, in eval_xpath_getindex
        raise SearxEngineXPathException(xpath_spec, 'index ' + str(index) + ' not found')
    searx.exceptions.SearxEngineXPathException: .//div[@class="yuRUbf"]//a/@href index 0 not found

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-28 10:08:50 +01:00
Markus Heiser
e436287385 [mod] checker: add some additional tests
BTW: fix indentation by 2 spaces

The additional tests has been commented out in the google engines to not release
any CAPTCHA issues.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-28 10:08:50 +01:00
Markus Heiser
b1fefec40d [fix] normalize the language & region aspects of all google engines
BTW: make the engines ready for search.checker:

- replace eval_xpath by eval_xpath_getindex and eval_xpath_list
- google_images: remove outer try/except block

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-28 10:08:46 +01:00
Alexandre Flament
0f18e885bf
Merge pull request #2479 from Tobi823/master
Document workaround for using 2 languages simultaneously #1508
2021-01-27 21:29:42 +01:00
Alexandre Flament
b661c3f5d4
Merge pull request #2509 from return42/fix-morty-key
[doc] improve admin-docs about result proxy (morty) configuration
2021-01-27 15:31:29 +01:00
Markus Heiser
a69a8a3ed5 [doc] improve admin-docs about result proxy (morty) configuration
[1] https://github.com/searx/searx/pull/1872#issuecomment-768107138

Suggested-by @dalf [1]
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-27 09:58:06 +01:00
Markus Heiser
923b490022 [mod] add Makfile targets for search.checker.<engine_name>
To check all engines:

    make search.checker

To check a engine 'google news' replace space by underline:

    make search.checker.google_news

To see HTTP requests and more use SEARX_DEBUG:

    make SEARX_DEBUG=1 search.checker.google_news

To filter out HTTP redirects:

    make SEARX_DEBUG=1 search.checker.google_news | grep -A1 "HTTP/1.1\" 3[0-9][0-9]"
    ...
    Engine google news                   Checking
    https://news.google.com:443 "GET /search?q=life&hl=en&lr=lang_en&ie=utf8&oe=utf8&ceid=US%3Aen&gl=US HTTP/1.1" 302 0
    https://news.google.com:443 "GET /search?q=life&hl=en-US&lr=lang_en&ie=utf8&oe=utf8&ceid=US:en&gl=US HTTP/1.1" 200 None
    --
    https://news.google.com:443 "GET /search?q=computer&hl=en&lr=lang_en&ie=utf8&oe=utf8&ceid=US%3Aen&gl=US HTTP/1.1" 302 0
    https://news.google.com:443 "GET /search?q=computer&hl=en-US&lr=lang_en&ie=utf8&oe=utf8&ceid=US:en&gl=US HTTP/1.1" 200 None
    --

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-26 11:46:36 +01:00
Alexandre Flament
6047087aac [mod] utils/fetch_languages.py: write files at the right location 2021-01-24 14:25:27 +01:00
Alexandre Flament
3330cf4a46 [enh] every monday, call utils/fetch_*.py scripts and create a PR automatically 2021-01-24 13:32:39 +01:00
Markus Heiser
ff6804e545 [data] make engines.languages
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-24 09:52:32 +01:00
Markus Heiser
8cdad5d85d [fix] google-videos: parse values for 'length' & 'author'
The 'video.html' template from the 'oscar' design supports replacement
for *author* and *length*.  Google-videos does not have an author, alternatively
the publisher info from is used for the *author*.

Hint: these replacements are not supported by the 'simple' design.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-24 09:51:24 +01:00
Markus Heiser
89b3050b5c [fix] revise of the google-Video engine
This revise is based on the methods developed in the revise of the google engine
(see commit 410c2f9).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-24 09:39:30 +01:00
Alexandre Flament
f4a17acb7a
Merge pull request #2498 from dalf/minor-fix-google-news
[fix] google_news: avoid one HTTP redirect except for the English results
2021-01-24 09:13:48 +01:00
Alexandre Flament
96c2996857
Merge pull request #2497 from return42/fix-test.sh
[fix] lxc.sh - SC2034: ubu2010_boilerplate appears unused.
2021-01-24 09:06:11 +01:00
Alexandre Flament
8c46b767d0 [fix] google_news: avoid one HTTP redirect except for the English results
also add
params['soft_max_redirects'] = 1
to avoid false error reporting in /stats/errors
2021-01-24 08:53:35 +01:00
Markus Heiser
ea5c992d4f [fix] lxc.sh - SC2034: ubu2010_boilerplate appears unused.
$ make test.sh
  In utils/lxc.sh line 42:
  ubu2010_boilerplate="$ubu1904_boilerplate"
  ^-----------------^ SC2034: ubu2010_boilerplate appears unused. Verify use (or export if used externally).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-24 08:29:13 +01:00
Alexandre Flament
7d24850d49
Merge pull request #2483 from return42/fix-google-news
[fix] revise of the google-News engine
2021-01-23 20:21:09 +01:00
Markus Heiser
5f92dfcdbe [fix] google-news: query uses locale without country tag
Wthout country-region tag google will redirect to correct the contry tag [1]:

    SEARX_DEBUG=1 searx-checker -v "google news"
    ...
    https://news.google.com:443 "GET /search?q=computer&hl=en...      HTTP/1.1" 302 0
    https://news.google.com:443 "GET /search?q=computer&hl=en-US&.... HTTP/1.1" 200 None
    ...

[1] https://github.com/searx/searx/pull/2483#issuecomment-765600849

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-23 11:37:14 +01:00
Markus Heiser
baec54c492 [fix] revise of the google-news engine
This revise is based on the methods developed in the revise of the google engine
(see commit 410c2f9).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-22 18:49:45 +01:00
Markus Heiser
a8544798ec [fix] remove Fabric file
The fabfile.py has not been updated since 5 years.  I also asked [1] if someone
still use Fabric wtihout any response.  Lets drop outdated Fabric file.

[1] https://github.com/searx/searx/discussions/2400

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-22 17:57:55 +01:00
Adam Tauber
f310305c54
Merge pull request #2481 from dalf/mod-check
Mod check
2021-01-20 18:48:29 +00:00
Alexandre Flament
73c86f9bf2 [mod] checker: disable by default 2021-01-19 21:44:48 +01:00
Alexandre Flament
3b7b852aa8 [fix] checker: minor fix about language detection 2021-01-19 21:29:31 +01:00
Alexandre Flament
aa887eb375 [mod] checker : replace pycld3 by langdetect
pycld3 requires the native library cld3
langdetect is a pure python package
2021-01-19 21:26:04 +01:00
Tobi823
16a0a01553 Document workaround for using 2 languages simultaneously #1508 2021-01-18 17:23:09 +01:00
Alexandre Flament
0495e15df4
Merge pull request #2476 from dalf/fix-error-recording-and-checker
Fix error recording and checker
2021-01-18 08:29:25 +01:00
Alexandre Flament
67a1aab0d5 [fix] /stats/checker : remove the timestamp field when the checker is disabled 2021-01-18 08:19:53 +01:00
Alexandre Flament
d473407ec9 [fix] checker: fix engine statistics
Without this commit, the URL /stats/errors shows percentage above 100% after the checker has run.
2021-01-18 08:19:44 +01:00
Alexandre Flament
ca76f3119a [fix] error_recorder: record code and lineno about the engine
since the PR #2225 , code and lineno were sometimes meaningless
see /stats/errors
2021-01-17 16:25:11 +01:00
Alexandre Flament
80d7411f2c
Merge pull request #2452 from kvch/add-wilby-engine
Add wiby.me engine
2021-01-16 22:36:31 +01:00
Alexandre Flament
b405646749
Merge pull request #2451 from mrwormo/invidious-engine
[Fix] Invidious Engine
2021-01-16 19:25:45 +01:00
Alexandre Flament
709dd960f1
Merge pull request #2473 from return42/fix-setup.py
[fix] setup.py requires pyyaml installed
2021-01-16 19:05:36 +01:00
Alexandre Flament
1d13ad8452
Merge pull request #2460 from dalf/engine-about
[enh] engines: add about variable
2021-01-16 19:05:17 +01:00
Markus Heiser
c4a98862bf [fix] setup.py requires pyyaml installed
pip install -e .
...
Obtaining file:///usr/local/searx/searx-src
    ERROR: Command errored out with exit status 1:
     command: /usr/local/searx/searx-pyenv/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/usr/local/searx/searx-src/setup.py'"'"'; __file__='"'"'/usr/local/searx/searx-src/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'rn'"'"', '"'"'n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-vzer91m2
         cwd: /usr/local/searx/searx-src/
    Complete output (9 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/usr/local/searx/searx-src/setup.py", line 10, in <module>
        from searx.version import VERSION_STRING
      File "/usr/local/searx/searx-src/searx/__init__.py", line 19, in <module>
        import searx.settings_loader
      File "/usr/local/searx/searx-src/searx/settings_loader.py", line 8, in <module>
        import yaml
    ModuleNotFoundError: No module named 'yaml'
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-16 08:58:13 +01:00