searxng

Table of Contents

Searx milestones

Milestone 1.1 - Clean up
Milestone 1.2 - async
Milestone 1.3 - network
Milestone 1.4 - better statistics about the engines
Milestone 1.5 - infoboxes engines
Milestone 1.6 - better engine framework
Milestone 1.7 - framework for the the on_result plugins
Milestone 1.8 - autocomplete
Milestone 1.9 - data upgrade & build process
Milestone 1.10 - upgrade oscar theme

Documentation & packaging milestones

Milestone 2.1 - common configuration files

Searx milestones

Milestone 1.1 - Clean up

#1471 : Drop Python 2. this is more than calling 2to3, there is encoding issue here and there related to Python 2 support.
clean up:
- webapp.py
- the dependencies. For example pyopenssl can be optional now (see https://requests.readthedocs.io/en/master/community/updates/#id1 version 2.24.0).
Backport from the different forks:
- https://gitlab.e.foundation/e/cloud/my-spot ( searx#1674 )
- https://github.com/entropage/mijisou(not sure which commits, but at least have a look)
Add typing to the core components.

Milestone 1.2 - async

Replace requests by httpx or aiohttp

See https://github.com/searx/searx/pull/1856: use httpx instead of requests. Drop source ip rotation, proxy support in this milestone.

but read also #503#issuecomment-647025488
related to #899
see also : https://bugs.python.org/issue36098 and encode/httpx#1031
solution: monkey patch for a time ( for example encode/httpcore#107 )
https://github.com/searx/searx/wiki/New-architecture-proposal (See also PR #1724 ) : switch to async (starlette + httpx) instead of one thread per engine. After this task (and perhaps previous one), it would be to switch in maintenance mode on the master branch for a time (few weeks ?). Incoming new things can go feature branches.
- replace uwsgi by uvicorn
- fork at runtime:
  - one for front-end with one asnycio loop.
  - one for back-end with another asyncio loop.
make sure translation works as before (see [https://github.com/encode/starlette/issues/279#issuecomment-505243515 )
make sure the performance are at least equal (on low end machine, on up to date hardware) <-- this one will take time.
make sure the dev environment works (reload)
define deployment configuration: encode/uvicorn#517
update documentation / installation scripts.
- https://github.com/searx/searx/blob/master/utils/searx.sh
- https://github.com/searx/searx/blob/master/dockerfiles/docker-entrypoint.sh
- https://github.com/searx/searx/tree/master/docs
- safety net: if uwsgi is detected, then stop.
Optional: implement (settings.yml can disable this feature): Extend global timeout when there is no result #948

Milestone 1.3 - network

IP rotation per engine (for now it is per request even on different engines).
allow to specify an IP range (useful for IPv6). Related to searx#1034 (would be better to detect IPv6 support to avoid maintenance, dnspython can help).
allow to specify a list of proxy.
being able to define a retry policy.
what will take time here is test, test, test.

Milestone 1.4 - better statistics about the engines

Updated version of PR measure response time with more details. #447 (see #162#issuecomment-76623027 ).

Records accurate statistics, display graphics about them (produce graph on the server to avoid javascript usage.).

Milestone 1.5 - infoboxes engines

The wikidata engine is the default engine to display infoboxes. Unfortunately, it is slow, the duckduckgo_definition is faster but requires some work to provide more user friendly informations.

improve response time of the wikidata engine. Define helpers:
- load all property name translations at load time (one SPARQL request).
- build one big SPARQL request template at load time.
- use the big SPARQL request instead of asking for the HTML version.
- parse JSON with a functionToApply[propertyName]
improve the results of the ddd engine:
- define common data_type

Milestone 1.6 - better engine framework

Apply this : #302#issuecomment-565828553 (the issue, but not the comment, is included in the version 1.0). See this gist : https://gist.github.com/dalf/3c3904699153a741f27842d8ea30b449
#1802 : Engine code: describe which XPath can fail, which must not. The idea: if an engine fails, we should know why: missing XPath result, missing JSON result, internal error, unexpect data, etc... --> if I sum up: the purpose is it create a better framework / toolset for the engines. It will take some times to review all the engines and find what kind of error to report (the purpose is to not fix them, but to be able to report the errors). For example: issue a warning if there is a unexpected HTTP redirect.
Integrate https://github.com/searx/searx-checker into searx: see #1559 : Add some code directly into the engine to make sure that they are working as expecting. For example, list some request that should work, and the expected results. Most probably it should be code rather that data because each engine behaves differently. So CI can include a report.
Expose the errors to a public API so searx-stats2 can collect them (for example: this XPath in this engine fails 40% of the time). Triple check that everything is anonymous.

Milestone 1.7 - framework for the the on_result plugins

Related issue: #2080

The on_result plugins can define some triggers: searx calls the "on_result" functions only when the host match.

Milestone 1.8 - autocomplete

#392 : include answers in the autocomplete results.
autocomplete with the external bangs.

Milestone 1.9 - data upgrade & build process

Related issues:

#2052
#2034

Check with different the searx package maintainers.

Milestone 1.10 - upgrade oscar theme

Upgrade to the dependencies (jquery, bootstrap, leaftlet, etc...)
Drop IE support
Optimize some of the HTML <--- see performance on FF, Chrome, mobile, desktop:
- perhaps merge some files
- /translations.js slows down Searx. #2064
- reduce file size if possible (partial bootstrap).

Documentation & packaging milestones

Milestone 2.1 - common configuration files

Find a way to have a reference configuration. Currently about the filtron configuration, there are 3 versions:

The purpose is to ensure the default setup is secured (HTTP headers, know working filtron configuration, etc...). If it is updated, it is updated everywhere.

Same can be done about the reverse proxy configuration / HTTP headers.