forked from zaclys/searxng
281 lines
22 KiB
HTML
281 lines
22 KiB
HTML
<!DOCTYPE html>
|
||
|
||
<html lang="en">
|
||
<head>
|
||
<meta charset="utf-8" />
|
||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||
<title>Startpage Engines — SearXNG Documentation (2023.8.11+905ce2a6f)</title>
|
||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css?v=4f649999" />
|
||
<link rel="stylesheet" type="text/css" href="../../../_static/searxng.css?v=52e4ff28" />
|
||
<link rel="stylesheet" type="text/css" href="../../../_static/tabs.css?v=a5c4661c" />
|
||
<script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js?v=3c88bde0"></script>
|
||
<script src="../../../_static/doctools.js?v=888ff710"></script>
|
||
<script src="../../../_static/sphinx_highlight.js?v=4825356b"></script>
|
||
<link rel="index" title="Index" href="../../../genindex.html" />
|
||
<link rel="search" title="Search" href="../../../search.html" />
|
||
<link rel="next" title="Tagesschau API" href="tagesschau.html" />
|
||
<link rel="prev" title="Recoll Engine" href="recoll.html" />
|
||
</head><body>
|
||
<div class="related" role="navigation" aria-label="related navigation">
|
||
<h3>Navigation</h3>
|
||
<ul>
|
||
<li class="right" style="margin-right: 10px">
|
||
<a href="../../../genindex.html" title="General Index"
|
||
accesskey="I">index</a></li>
|
||
<li class="right" >
|
||
<a href="../../../py-modindex.html" title="Python Module Index"
|
||
>modules</a> |</li>
|
||
<li class="right" >
|
||
<a href="tagesschau.html" title="Tagesschau API"
|
||
accesskey="N">next</a> |</li>
|
||
<li class="right" >
|
||
<a href="recoll.html" title="Recoll Engine"
|
||
accesskey="P">previous</a> |</li>
|
||
<li class="nav-item nav-item-0"><a href="../../../index.html">SearXNG Documentation (2023.8.11+905ce2a6f)</a> »</li>
|
||
<li class="nav-item nav-item-1"><a href="../../index.html" >Developer documentation</a> »</li>
|
||
<li class="nav-item nav-item-2"><a href="../index.html" accesskey="U">Engine Implementations</a> »</li>
|
||
<li class="nav-item nav-item-this"><a href="">Startpage Engines</a></li>
|
||
</ul>
|
||
</div>
|
||
|
||
<div class="document">
|
||
<div class="documentwrapper">
|
||
<div class="bodywrapper">
|
||
<div class="body" role="main">
|
||
|
||
<section id="startpage-engines">
|
||
<span id="id1"></span><h1>Startpage Engines<a class="headerlink" href="#startpage-engines" title="Permalink to this heading">¶</a></h1>
|
||
<nav class="contents local" id="contents">
|
||
<ul class="simple">
|
||
<li><p><a class="reference internal" href="#startpage-regions" id="id9">Startpage regions</a></p></li>
|
||
<li><p><a class="reference internal" href="#startpage-languages" id="id10">Startpage languages</a></p></li>
|
||
<li><p><a class="reference internal" href="#startpage-categories" id="id11">Startpage categories</a></p></li>
|
||
</ul>
|
||
</nav>
|
||
<span class="target" id="module-searx.engines.startpage"></span><p>Startpage’s language & region selectors are a mess ..</p>
|
||
<section id="startpage-regions">
|
||
<span id="id2"></span><h2><a class="toc-backref" href="#id9" role="doc-backlink">Startpage regions</a><a class="headerlink" href="#startpage-regions" title="Permalink to this heading">¶</a></h2>
|
||
<p>In the list of regions there are tags we need to map to common region tags:</p>
|
||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">pt</span><span class="o">-</span><span class="n">BR_BR</span> <span class="o">--></span> <span class="n">pt_BR</span>
|
||
<span class="n">zh</span><span class="o">-</span><span class="n">CN_CN</span> <span class="o">--></span> <span class="n">zh_Hans_CN</span>
|
||
<span class="n">zh</span><span class="o">-</span><span class="n">TW_TW</span> <span class="o">--></span> <span class="n">zh_Hant_TW</span>
|
||
<span class="n">zh</span><span class="o">-</span><span class="n">TW_HK</span> <span class="o">--></span> <span class="n">zh_Hant_HK</span>
|
||
<span class="n">en</span><span class="o">-</span><span class="n">GB_GB</span> <span class="o">--></span> <span class="n">en_GB</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>and there is at least one tag with a three letter language tag (ISO 639-2):</p>
|
||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">fil_PH</span> <span class="o">--></span> <span class="n">fil_PH</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>The locale code <code class="docutils literal notranslate"><span class="pre">no_NO</span></code> from Startpage does not exists and is mapped to
|
||
<code class="docutils literal notranslate"><span class="pre">nb-NO</span></code>:</p>
|
||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">babel</span><span class="o">.</span><span class="n">core</span><span class="o">.</span><span class="n">UnknownLocaleError</span><span class="p">:</span> <span class="n">unknown</span> <span class="n">locale</span> <span class="s1">'no_NO'</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>For reference see languages-subtag at iana; <code class="docutils literal notranslate"><span class="pre">no</span></code> is the macrolanguage <a class="footnote-reference brackets" href="#id5" id="id3" role="doc-noteref"><span class="fn-bracket">[</span>1<span class="fn-bracket">]</span></a> and
|
||
W3C recommends subtag over macrolanguage <a class="footnote-reference brackets" href="#id6" id="id4" role="doc-noteref"><span class="fn-bracket">[</span>2<span class="fn-bracket">]</span></a>.</p>
|
||
<aside class="footnote-list brackets">
|
||
<aside class="footnote brackets" id="id5" role="note">
|
||
<span class="label"><span class="fn-bracket">[</span><a role="doc-backlink" href="#id3">1</a><span class="fn-bracket">]</span></span>
|
||
<p><a class="reference external" href="https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry">iana: language-subtag-registry</a></p>
|
||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="nb">type</span><span class="p">:</span> <span class="n">language</span>
|
||
<span class="n">Subtag</span><span class="p">:</span> <span class="n">nb</span>
|
||
<span class="n">Description</span><span class="p">:</span> <span class="n">Norwegian</span> <span class="n">Bokmål</span>
|
||
<span class="n">Added</span><span class="p">:</span> <span class="mi">2005</span><span class="o">-</span><span class="mi">10</span><span class="o">-</span><span class="mi">16</span>
|
||
<span class="n">Suppress</span><span class="o">-</span><span class="n">Script</span><span class="p">:</span> <span class="n">Latn</span>
|
||
<span class="n">Macrolanguage</span><span class="p">:</span> <span class="n">no</span>
|
||
</pre></div>
|
||
</div>
|
||
</aside>
|
||
<aside class="footnote brackets" id="id6" role="note">
|
||
<span class="label"><span class="fn-bracket">[</span><a role="doc-backlink" href="#id4">2</a><span class="fn-bracket">]</span></span>
|
||
<p>Use macrolanguages with care. Some language subtags have a Scope field set to
|
||
macrolanguage, i.e. this primary language subtag encompasses a number of more
|
||
specific primary language subtags in the registry. … As we recommended for
|
||
the collection subtags mentioned above, in most cases you should try to use
|
||
the more specific subtags … <a class="reference external" href="https://www.w3.org/International/questions/qa-choosing-language-tags#langsubtag">W3: The primary language subtag</a></p>
|
||
</aside>
|
||
</aside>
|
||
</section>
|
||
<section id="startpage-languages">
|
||
<span id="id7"></span><h2><a class="toc-backref" href="#id10" role="doc-backlink">Startpage languages</a><a class="headerlink" href="#startpage-languages" title="Permalink to this heading">¶</a></h2>
|
||
<dl>
|
||
<dt><a class="reference internal" href="#searx.engines.startpage.send_accept_language_header" title="searx.engines.startpage.send_accept_language_header"><code class="xref py py-obj docutils literal notranslate"><span class="pre">send_accept_language_header</span></code></a>:</dt><dd><p>The displayed name in Startpage’s settings page depend on the location of the
|
||
IP when <code class="docutils literal notranslate"><span class="pre">Accept-Language</span></code> HTTP header is unset. In <a class="reference internal" href="#searx.engines.startpage.fetch_traits" title="searx.engines.startpage.fetch_traits"><code class="xref py py-obj docutils literal notranslate"><span class="pre">fetch_traits</span></code></a>
|
||
we use:</p>
|
||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="s1">'Accept-Language'</span><span class="p">:</span> <span class="s2">"en-US,en;q=0.5"</span><span class="p">,</span>
|
||
<span class="o">..</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>to get uniform names independent from the IP).</p>
|
||
</dd>
|
||
</dl>
|
||
</section>
|
||
<section id="startpage-categories">
|
||
<span id="id8"></span><h2><a class="toc-backref" href="#id11" role="doc-backlink">Startpage categories</a><a class="headerlink" href="#startpage-categories" title="Permalink to this heading">¶</a></h2>
|
||
<p>Startpage’s category (for Web-search, News, Videos, ..) is set by
|
||
<a class="reference internal" href="#searx.engines.startpage.startpage_categ" title="searx.engines.startpage.startpage_categ"><code class="xref py py-obj docutils literal notranslate"><span class="pre">startpage_categ</span></code></a> in settings.yml:</p>
|
||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="o">-</span> <span class="n">name</span><span class="p">:</span> <span class="n">startpage</span>
|
||
<span class="n">engine</span><span class="p">:</span> <span class="n">startpage</span>
|
||
<span class="n">startpage_categ</span><span class="p">:</span> <span class="n">web</span>
|
||
<span class="o">...</span>
|
||
</pre></div>
|
||
</div>
|
||
<div class="admonition hint">
|
||
<p class="admonition-title">Hint</p>
|
||
<p>The default category is <code class="docutils literal notranslate"><span class="pre">web</span></code> .. and other categories than <code class="docutils literal notranslate"><span class="pre">web</span></code> are not
|
||
yet implemented.</p>
|
||
</div>
|
||
</section>
|
||
<dl class="py function">
|
||
<dt class="sig sig-object py" id="searx.engines.startpage.fetch_traits">
|
||
<span class="sig-prename descclassname"><span class="pre">searx.engines.startpage.</span></span><span class="sig-name descname"><span class="pre">fetch_traits</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">engine_traits</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="../enginelib.html#searx.enginelib.traits.EngineTraits" title="searx.enginelib.traits.EngineTraits"><span class="pre">EngineTraits</span></a></span></em><span class="sig-paren">)</span><a class="reference internal" href="../../../_modules/searx/engines/startpage.html#fetch_traits"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#searx.engines.startpage.fetch_traits" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Fetch <a class="reference internal" href="#startpage-languages"><span class="std std-ref">languages</span></a> and <a class="reference internal" href="#startpage-regions"><span class="std std-ref">regions</span></a> from Startpage.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="py function">
|
||
<dt class="sig sig-object py" id="searx.engines.startpage.get_sc_code">
|
||
<span class="sig-prename descclassname"><span class="pre">searx.engines.startpage.</span></span><span class="sig-name descname"><span class="pre">get_sc_code</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">searxng_locale</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">params</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="../../../_modules/searx/engines/startpage.html#get_sc_code"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#searx.engines.startpage.get_sc_code" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Get an actual <code class="docutils literal notranslate"><span class="pre">sc</span></code> argument from Startpage’s search form (HTML page).</p>
|
||
<p>Startpage puts a <code class="docutils literal notranslate"><span class="pre">sc</span></code> argument on every HTML <a class="reference internal" href="#searx.engines.startpage.search_form_xpath" title="searx.engines.startpage.search_form_xpath"><code class="xref py py-obj docutils literal notranslate"><span class="pre">search</span> <span class="pre">form</span></code></a>. Without this argument Startpage considers the request
|
||
is from a bot. We do not know what is encoded in the value of the <code class="docutils literal notranslate"><span class="pre">sc</span></code>
|
||
argument, but it seems to be a kind of a <em>time-stamp</em>.</p>
|
||
<p>Startpage’s search form generates a new sc-code on each request. This
|
||
function scrap a new sc-code from Startpage’s home page every
|
||
<a class="reference internal" href="#searx.engines.startpage.sc_code_cache_sec" title="searx.engines.startpage.sc_code_cache_sec"><code class="xref py py-obj docutils literal notranslate"><span class="pre">sc_code_cache_sec</span></code></a> seconds.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="py function">
|
||
<dt class="sig sig-object py" id="searx.engines.startpage.request">
|
||
<span class="sig-prename descclassname"><span class="pre">searx.engines.startpage.</span></span><span class="sig-name descname"><span class="pre">request</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">query</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">params</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="../../../_modules/searx/engines/startpage.html#request"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#searx.engines.startpage.request" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Assemble a Startpage request.</p>
|
||
<p>To avoid CAPTCHA we need to send a well formed HTTP POST request with a
|
||
cookie. We need to form a request that is identical to the request build by
|
||
Startpage’s search form:</p>
|
||
<ul class="simple">
|
||
<li><p>in the cookie the <strong>region</strong> is selected</p></li>
|
||
<li><p>in the HTTP POST data the <strong>language</strong> is selected</p></li>
|
||
</ul>
|
||
<p>Additionally the arguments form Startpage’s search form needs to be set in
|
||
HTML POST data / compare <code class="docutils literal notranslate"><span class="pre"><input></span></code> elements: <a class="reference internal" href="#searx.engines.startpage.search_form_xpath" title="searx.engines.startpage.search_form_xpath"><code class="xref py py-obj docutils literal notranslate"><span class="pre">search_form_xpath</span></code></a>.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="py data">
|
||
<dt class="sig sig-object py" id="searx.engines.startpage.sc_code_cache_sec">
|
||
<span class="sig-prename descclassname"><span class="pre">searx.engines.startpage.</span></span><span class="sig-name descname"><span class="pre">sc_code_cache_sec</span></span><em class="property"><span class="w"> </span><span class="p"><span class="pre">=</span></span><span class="w"> </span><span class="pre">30</span></em><a class="headerlink" href="#searx.engines.startpage.sc_code_cache_sec" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Time in seconds the sc-code is cached in memory <a class="reference internal" href="#searx.engines.startpage.get_sc_code" title="searx.engines.startpage.get_sc_code"><code class="xref py py-obj docutils literal notranslate"><span class="pre">get_sc_code</span></code></a>.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="py data">
|
||
<dt class="sig sig-object py" id="searx.engines.startpage.search_form_xpath">
|
||
<span class="sig-prename descclassname"><span class="pre">searx.engines.startpage.</span></span><span class="sig-name descname"><span class="pre">search_form_xpath</span></span><em class="property"><span class="w"> </span><span class="p"><span class="pre">=</span></span><span class="w"> </span><span class="pre">'//form[@id="search"]'</span></em><a class="headerlink" href="#searx.engines.startpage.search_form_xpath" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>XPath of Startpage’s origin search form</p>
|
||
</dd></dl>
|
||
|
||
<dl class="py data">
|
||
<dt class="sig sig-object py" id="searx.engines.startpage.send_accept_language_header">
|
||
<span class="sig-prename descclassname"><span class="pre">searx.engines.startpage.</span></span><span class="sig-name descname"><span class="pre">send_accept_language_header</span></span><em class="property"><span class="w"> </span><span class="p"><span class="pre">=</span></span><span class="w"> </span><span class="pre">True</span></em><a class="headerlink" href="#searx.engines.startpage.send_accept_language_header" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Startpage tries to guess user’s language and territory from the HTTP
|
||
<code class="docutils literal notranslate"><span class="pre">Accept-Language</span></code>. Optional the user can select a search-language (can be
|
||
different to the UI language) and a region filter.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="py data">
|
||
<dt class="sig sig-object py" id="searx.engines.startpage.startpage_categ">
|
||
<span class="sig-prename descclassname"><span class="pre">searx.engines.startpage.</span></span><span class="sig-name descname"><span class="pre">startpage_categ</span></span><em class="property"><span class="w"> </span><span class="p"><span class="pre">=</span></span><span class="w"> </span><span class="pre">'web'</span></em><a class="headerlink" href="#searx.engines.startpage.startpage_categ" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Startpage’s category, visit <a class="reference internal" href="#startpage-categories"><span class="std std-ref">Startpage categories</span></a>.</p>
|
||
</dd></dl>
|
||
|
||
</section>
|
||
|
||
|
||
<div class="clearer"></div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<span id="sidebar-top"></span>
|
||
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
|
||
<div class="sphinxsidebarwrapper">
|
||
|
||
|
||
<p class="logo"><a href="../../../index.html">
|
||
<img class="logo" src="../../../_static/searxng-wordmark.svg" alt="Logo"/>
|
||
</a></p>
|
||
|
||
|
||
<h3><a href="../../../index.html">Table of Contents</a></h3>
|
||
<ul class="current">
|
||
<li class="toctree-l1"><a class="reference internal" href="../../../user/index.html">User information</a></li>
|
||
<li class="toctree-l1"><a class="reference internal" href="../../../own-instance.html">Why use a private instance?</a></li>
|
||
<li class="toctree-l1"><a class="reference internal" href="../../../admin/index.html">Administrator documentation</a></li>
|
||
<li class="toctree-l1 current"><a class="reference internal" href="../../index.html">Developer documentation</a><ul class="current">
|
||
<li class="toctree-l2"><a class="reference internal" href="../../quickstart.html">Development Quickstart</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="../../contribution_guide.html">How to contribute</a></li>
|
||
<li class="toctree-l2 current"><a class="reference internal" href="../index.html">Engine Implementations</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="../../search_api.html">Search API</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="../../plugins.html">Plugins</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="../../translation.html">Translation</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="../../lxcdev.html">Developing in Linux Containers</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="../../makefile.html">Makefile & <code class="docutils literal notranslate"><span class="pre">./manage</span></code></a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="../../reST.html">reST primer</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="../../searxng_extra/index.html">Tooling box <code class="docutils literal notranslate"><span class="pre">searxng_extra</span></code></a></li>
|
||
</ul>
|
||
</li>
|
||
<li class="toctree-l1"><a class="reference internal" href="../../../utils/index.html">DevOps tooling box</a></li>
|
||
<li class="toctree-l1"><a class="reference internal" href="../../../src/index.html">Source-Code</a></li>
|
||
</ul>
|
||
|
||
<h3>Project Links</h3>
|
||
<ul>
|
||
<li><a href="https://github.com/searxng/searxng/tree/master">Source</a>
|
||
|
||
<li><a href="https://github.com/searxng/searxng/wiki">Wiki</a>
|
||
|
||
<li><a href="https://searx.space">Public instances</a>
|
||
|
||
<li><a href="https://github.com/searxng/searxng/issues">Issue Tracker</a>
|
||
</ul><h3>Navigation</h3>
|
||
<ul>
|
||
<li><a href="../../../index.html">Overview</a>
|
||
<ul>
|
||
<li><a href="../../index.html">Developer documentation</a>
|
||
<ul>
|
||
<li><a href="../index.html">Engine Implementations</a>
|
||
<ul>
|
||
<li>Previous: <a href="recoll.html" title="previous chapter">Recoll Engine</a>
|
||
<li>Next: <a href="tagesschau.html" title="next chapter">Tagesschau API</a></ul>
|
||
</li></ul>
|
||
</li>
|
||
</ul>
|
||
</li>
|
||
</ul>
|
||
<div id="searchbox" style="display: none" role="search">
|
||
<h3 id="searchlabel">Quick search</h3>
|
||
<div class="searchformwrapper">
|
||
<form class="search" action="../../../search.html" method="get">
|
||
<input type="text" name="q" aria-labelledby="searchlabel" autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false"/>
|
||
<input type="submit" value="Go" />
|
||
</form>
|
||
</div>
|
||
</div>
|
||
<script>document.getElementById('searchbox').style.display = "block"</script>
|
||
<div role="note" aria-label="source link">
|
||
<h3>This Page</h3>
|
||
<ul class="this-page-menu">
|
||
<li><a href="../../../_sources/dev/engines/online/startpage.rst.txt"
|
||
rel="nofollow">Show Source</a></li>
|
||
</ul>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<div class="clearer"></div>
|
||
</div>
|
||
|
||
<div class="footer" role="contentinfo">
|
||
© Copyright SearXNG team.
|
||
</div>
|
||
<script src="../../../_static/version_warning_offset.js"></script>
|
||
|
||
</body>
|
||
</html> |