while PR #2357 [1] was being implemented the question came up:
would be better to change the PING resource from CSS to an image so that
some terminal based browser may still able to pass the test [1]
This patch implements a POC in where a <img src=token> tag is loaded instaed a
CSS.
To test this patch activate limiter and link_token method [3] and start a
developer instance::
make run
In your terminal browser open http://127.0.0.1:8888/search?q=foo
If the browser is suitable for the link_token method, it loads the image and the
following messages appear::
DEBUG searx.botdetection.limiter : OK 127.0.0.1/32: /clientft61aak7fzyu6o6v.svg ...
DEBUG searx.botdetection.link_token : token is valid --> True
DEBUG searx.botdetection.link_token : store ping_key for (client) network 127.0.0.1/32 (IP 127.0.0.1) -> SearXNG_limiter.ping[...]
Browsers that do not load images will be blocked: If you try by example::
lynx http://127.0.0.1:8888/search?q=foo
you will see a WARNING message like::
WARNING searx.botdetection.link_token : missing ping (IP: 127.0.0.1/32) / request: SearXNG_limiter.ping[...]
Modern terminal WEB browser do support `<img>` tag as well as CSS:
browsh http://127.0.0.1:8888/search?q=foo
----
[1] 80aaef6c95
[2] https://github.com/searxng/searxng/pull/2357#issuecomment-1574898834
[3] activate limiter and link_token method
```diff
diff --git a/searx/botdetection/limiter.toml b/searx/botdetection/limiter.toml
index 71a231e8f..7e1dba755 100644
--- a/searx/botdetection/limiter.toml
+++ b/searx/botdetection/limiter.toml
@@ -17,6 +17,6 @@ ipv6_prefix = 48
filter_link_local = false
# acrivate link_token method in the ip_limit method
-link_token = false
+link_token = true
diff --git a/searx/settings.yml b/searx/settings.yml
index a82a3432d..e7b983afc 100644
--- a/searx/settings.yml
+++ b/searx/settings.yml
@@ -73,7 +73,7 @@ server:
# public URL of the instance, to ensure correct inbound links. Is overwritten
# by ${SEARXNG_URL}.
base_url: false # "http://example.com/location"
- limiter: false # rate limit the number of request on the instance, block some bots
+ limiter: true # rate limit the number of request on the instance, block some bots
# If your instance owns a /etc/searxng/settings.yml file, then set the following
# values there.
```
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The entire source code of the duckduckgo engine has been reengineered and
purified.
1. DDG used the URL https://html.duckduckgo.com/html for no-JS requests whose
response is also easier to parse than the previous
https://lite.duckduckgo.com/lite/ URL
2. the bot detection of DDG has so far caused problems and often led to a
CAPTCHA, this can be circumvented using `'Sec-Fetch-Mode'] = “navigate”`
Closes: https://github.com/searxng/searxng/issues/3927
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The previous implementation could not distinguish a CAPTCHA response from an
ordinary result list. In the previous implementation a CAPTCHA was taken as a
result list where no items are in.
DDG does not block IPs. Instead, a CAPTCHA wall is placed in front of request
on a dubious request.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This patch adds an additional *isinstance* check within the ast parser to check
for float along with int, fixing the underlying issue.
Co-Authored: Markus Heiser <markus.heiser@darmarit.de>
Improve region and language detection / all locale
Testing has shown the following behaviour for the different
default and empty values of Mojeeks parameters:
| param | idx | value | behaviour |
| -------- | --- | ------ | ------------------------- |
| region | 0 | '' | detect region based on IP |
| region | 1 | 'none' | all regions |
| language | 0 | '' | all languages |
Without this patch the Gitea Search Engine is only partially compatible with
modern gitea or forgejo:
- Fixing some JSON Fields
- Using Repository Avatar when Available
To Verify My results you can look at the Modern API doc and results, its
available on all Gitea and Forgejo instance by Default. Heres an Search API
result of Mine:
- https://git.euph.dev/api/v1/repos/search?q=ccna
All favicons implementations have been documented and moved to the Python
package:
searx.favicons
There is a configuration (based on Pydantic) for the favicons and all its
components:
searx.favicons.config
A solution for caching favicons has been implemented:
searx.favicon.cache
If the favicon is already in the cache, the returned URL is a data URL [1]
(something like `data:image/png;base64,...`). By generating a data url from
the FaviconCache, additional HTTP roundtripps via the favicon_proxy are saved:
favicons.proxy.favicon_url
The favicon proxy service now sets a HTTP header "Cache-Control: max-age=...":
favicons.proxy.favicon_proxy
The resolvers now also provide the mime type (data, mime):
searx.favicon.resolvers
[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- for tests which perform the same arrange/act/assert pattern but with different
data, the data portion has been moved to the ``paramaterized.expand`` fields
- for monolithic tests which performed multiple arrange/act/asserts,
they have been broken up into different unit tests.
- when possible, change generic assert statements to more concise
asserts (i.e. ``assertIsNone``)
This work ultimately is focused on creating smaller and more concise tests.
While paramaterized may make adding new configurations for existing tests
easier, that is just a beneficial side effect. The main benefit is that smaller
tests are easier to reason about, meaning they are easier to debug when they
start failing. This improves the developer experience in debugging what went
wrong when refactoring the project.
Total number of tests went from 192 -> 259; or, broke apart larger tests into 69
more concise ones.