searxngRebrandZaclys/dev/engines/online/startpage.html
2023-08-11 10:34:02 +00:00

281 lines
22 KiB
HTML
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Startpage Engines &#8212; SearXNG Documentation (2023.8.11+905ce2a6f)</title>
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css?v=4f649999" />
<link rel="stylesheet" type="text/css" href="../../../_static/searxng.css?v=52e4ff28" />
<link rel="stylesheet" type="text/css" href="../../../_static/tabs.css?v=a5c4661c" />
<script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js?v=3c88bde0"></script>
<script src="../../../_static/doctools.js?v=888ff710"></script>
<script src="../../../_static/sphinx_highlight.js?v=4825356b"></script>
<link rel="index" title="Index" href="../../../genindex.html" />
<link rel="search" title="Search" href="../../../search.html" />
<link rel="next" title="Tagesschau API" href="tagesschau.html" />
<link rel="prev" title="Recoll Engine" href="recoll.html" />
</head><body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="../../../genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="../../../py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="right" >
<a href="tagesschau.html" title="Tagesschau API"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="recoll.html" title="Recoll Engine"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="../../../index.html">SearXNG Documentation (2023.8.11+905ce2a6f)</a> &#187;</li>
<li class="nav-item nav-item-1"><a href="../../index.html" >Developer documentation</a> &#187;</li>
<li class="nav-item nav-item-2"><a href="../index.html" accesskey="U">Engine Implementations</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Startpage Engines</a></li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<section id="startpage-engines">
<span id="id1"></span><h1>Startpage Engines<a class="headerlink" href="#startpage-engines" title="Permalink to this heading"></a></h1>
<nav class="contents local" id="contents">
<ul class="simple">
<li><p><a class="reference internal" href="#startpage-regions" id="id9">Startpage regions</a></p></li>
<li><p><a class="reference internal" href="#startpage-languages" id="id10">Startpage languages</a></p></li>
<li><p><a class="reference internal" href="#startpage-categories" id="id11">Startpage categories</a></p></li>
</ul>
</nav>
<span class="target" id="module-searx.engines.startpage"></span><p>Startpages language &amp; region selectors are a mess ..</p>
<section id="startpage-regions">
<span id="id2"></span><h2><a class="toc-backref" href="#id9" role="doc-backlink">Startpage regions</a><a class="headerlink" href="#startpage-regions" title="Permalink to this heading"></a></h2>
<p>In the list of regions there are tags we need to map to common region tags:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">pt</span><span class="o">-</span><span class="n">BR_BR</span> <span class="o">--&gt;</span> <span class="n">pt_BR</span>
<span class="n">zh</span><span class="o">-</span><span class="n">CN_CN</span> <span class="o">--&gt;</span> <span class="n">zh_Hans_CN</span>
<span class="n">zh</span><span class="o">-</span><span class="n">TW_TW</span> <span class="o">--&gt;</span> <span class="n">zh_Hant_TW</span>
<span class="n">zh</span><span class="o">-</span><span class="n">TW_HK</span> <span class="o">--&gt;</span> <span class="n">zh_Hant_HK</span>
<span class="n">en</span><span class="o">-</span><span class="n">GB_GB</span> <span class="o">--&gt;</span> <span class="n">en_GB</span>
</pre></div>
</div>
<p>and there is at least one tag with a three letter language tag (ISO 639-2):</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">fil_PH</span> <span class="o">--&gt;</span> <span class="n">fil_PH</span>
</pre></div>
</div>
<p>The locale code <code class="docutils literal notranslate"><span class="pre">no_NO</span></code> from Startpage does not exists and is mapped to
<code class="docutils literal notranslate"><span class="pre">nb-NO</span></code>:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">babel</span><span class="o">.</span><span class="n">core</span><span class="o">.</span><span class="n">UnknownLocaleError</span><span class="p">:</span> <span class="n">unknown</span> <span class="n">locale</span> <span class="s1">&#39;no_NO&#39;</span>
</pre></div>
</div>
<p>For reference see languages-subtag at iana; <code class="docutils literal notranslate"><span class="pre">no</span></code> is the macrolanguage <a class="footnote-reference brackets" href="#id5" id="id3" role="doc-noteref"><span class="fn-bracket">[</span>1<span class="fn-bracket">]</span></a> and
W3C recommends subtag over macrolanguage <a class="footnote-reference brackets" href="#id6" id="id4" role="doc-noteref"><span class="fn-bracket">[</span>2<span class="fn-bracket">]</span></a>.</p>
<aside class="footnote-list brackets">
<aside class="footnote brackets" id="id5" role="note">
<span class="label"><span class="fn-bracket">[</span><a role="doc-backlink" href="#id3">1</a><span class="fn-bracket">]</span></span>
<p><a class="reference external" href="https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry">iana: language-subtag-registry</a></p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="nb">type</span><span class="p">:</span> <span class="n">language</span>
<span class="n">Subtag</span><span class="p">:</span> <span class="n">nb</span>
<span class="n">Description</span><span class="p">:</span> <span class="n">Norwegian</span> <span class="n">Bokmål</span>
<span class="n">Added</span><span class="p">:</span> <span class="mi">2005</span><span class="o">-</span><span class="mi">10</span><span class="o">-</span><span class="mi">16</span>
<span class="n">Suppress</span><span class="o">-</span><span class="n">Script</span><span class="p">:</span> <span class="n">Latn</span>
<span class="n">Macrolanguage</span><span class="p">:</span> <span class="n">no</span>
</pre></div>
</div>
</aside>
<aside class="footnote brackets" id="id6" role="note">
<span class="label"><span class="fn-bracket">[</span><a role="doc-backlink" href="#id4">2</a><span class="fn-bracket">]</span></span>
<p>Use macrolanguages with care. Some language subtags have a Scope field set to
macrolanguage, i.e. this primary language subtag encompasses a number of more
specific primary language subtags in the registry. … As we recommended for
the collection subtags mentioned above, in most cases you should try to use
the more specific subtags … <a class="reference external" href="https://www.w3.org/International/questions/qa-choosing-language-tags#langsubtag">W3: The primary language subtag</a></p>
</aside>
</aside>
</section>
<section id="startpage-languages">
<span id="id7"></span><h2><a class="toc-backref" href="#id10" role="doc-backlink">Startpage languages</a><a class="headerlink" href="#startpage-languages" title="Permalink to this heading"></a></h2>
<dl>
<dt><a class="reference internal" href="#searx.engines.startpage.send_accept_language_header" title="searx.engines.startpage.send_accept_language_header"><code class="xref py py-obj docutils literal notranslate"><span class="pre">send_accept_language_header</span></code></a>:</dt><dd><p>The displayed name in Startpages settings page depend on the location of the
IP when <code class="docutils literal notranslate"><span class="pre">Accept-Language</span></code> HTTP header is unset. In <a class="reference internal" href="#searx.engines.startpage.fetch_traits" title="searx.engines.startpage.fetch_traits"><code class="xref py py-obj docutils literal notranslate"><span class="pre">fetch_traits</span></code></a>
we use:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="s1">&#39;Accept-Language&#39;</span><span class="p">:</span> <span class="s2">&quot;en-US,en;q=0.5&quot;</span><span class="p">,</span>
<span class="o">..</span>
</pre></div>
</div>
<p>to get uniform names independent from the IP).</p>
</dd>
</dl>
</section>
<section id="startpage-categories">
<span id="id8"></span><h2><a class="toc-backref" href="#id11" role="doc-backlink">Startpage categories</a><a class="headerlink" href="#startpage-categories" title="Permalink to this heading"></a></h2>
<p>Startpages category (for Web-search, News, Videos, ..) is set by
<a class="reference internal" href="#searx.engines.startpage.startpage_categ" title="searx.engines.startpage.startpage_categ"><code class="xref py py-obj docutils literal notranslate"><span class="pre">startpage_categ</span></code></a> in settings.yml:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="o">-</span> <span class="n">name</span><span class="p">:</span> <span class="n">startpage</span>
<span class="n">engine</span><span class="p">:</span> <span class="n">startpage</span>
<span class="n">startpage_categ</span><span class="p">:</span> <span class="n">web</span>
<span class="o">...</span>
</pre></div>
</div>
<div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>The default category is <code class="docutils literal notranslate"><span class="pre">web</span></code> .. and other categories than <code class="docutils literal notranslate"><span class="pre">web</span></code> are not
yet implemented.</p>
</div>
</section>
<dl class="py function">
<dt class="sig sig-object py" id="searx.engines.startpage.fetch_traits">
<span class="sig-prename descclassname"><span class="pre">searx.engines.startpage.</span></span><span class="sig-name descname"><span class="pre">fetch_traits</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">engine_traits</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference internal" href="../enginelib.html#searx.enginelib.traits.EngineTraits" title="searx.enginelib.traits.EngineTraits"><span class="pre">EngineTraits</span></a></span></em><span class="sig-paren">)</span><a class="reference internal" href="../../../_modules/searx/engines/startpage.html#fetch_traits"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#searx.engines.startpage.fetch_traits" title="Permalink to this definition"></a></dt>
<dd><p>Fetch <a class="reference internal" href="#startpage-languages"><span class="std std-ref">languages</span></a> and <a class="reference internal" href="#startpage-regions"><span class="std std-ref">regions</span></a> from Startpage.</p>
</dd></dl>
<dl class="py function">
<dt class="sig sig-object py" id="searx.engines.startpage.get_sc_code">
<span class="sig-prename descclassname"><span class="pre">searx.engines.startpage.</span></span><span class="sig-name descname"><span class="pre">get_sc_code</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">searxng_locale</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">params</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="../../../_modules/searx/engines/startpage.html#get_sc_code"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#searx.engines.startpage.get_sc_code" title="Permalink to this definition"></a></dt>
<dd><p>Get an actual <code class="docutils literal notranslate"><span class="pre">sc</span></code> argument from Startpages search form (HTML page).</p>
<p>Startpage puts a <code class="docutils literal notranslate"><span class="pre">sc</span></code> argument on every HTML <a class="reference internal" href="#searx.engines.startpage.search_form_xpath" title="searx.engines.startpage.search_form_xpath"><code class="xref py py-obj docutils literal notranslate"><span class="pre">search</span> <span class="pre">form</span></code></a>. Without this argument Startpage considers the request
is from a bot. We do not know what is encoded in the value of the <code class="docutils literal notranslate"><span class="pre">sc</span></code>
argument, but it seems to be a kind of a <em>time-stamp</em>.</p>
<p>Startpages search form generates a new sc-code on each request. This
function scrap a new sc-code from Startpages home page every
<a class="reference internal" href="#searx.engines.startpage.sc_code_cache_sec" title="searx.engines.startpage.sc_code_cache_sec"><code class="xref py py-obj docutils literal notranslate"><span class="pre">sc_code_cache_sec</span></code></a> seconds.</p>
</dd></dl>
<dl class="py function">
<dt class="sig sig-object py" id="searx.engines.startpage.request">
<span class="sig-prename descclassname"><span class="pre">searx.engines.startpage.</span></span><span class="sig-name descname"><span class="pre">request</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">query</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">params</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="../../../_modules/searx/engines/startpage.html#request"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#searx.engines.startpage.request" title="Permalink to this definition"></a></dt>
<dd><p>Assemble a Startpage request.</p>
<p>To avoid CAPTCHA we need to send a well formed HTTP POST request with a
cookie. We need to form a request that is identical to the request build by
Startpages search form:</p>
<ul class="simple">
<li><p>in the cookie the <strong>region</strong> is selected</p></li>
<li><p>in the HTTP POST data the <strong>language</strong> is selected</p></li>
</ul>
<p>Additionally the arguments form Startpages search form needs to be set in
HTML POST data / compare <code class="docutils literal notranslate"><span class="pre">&lt;input&gt;</span></code> elements: <a class="reference internal" href="#searx.engines.startpage.search_form_xpath" title="searx.engines.startpage.search_form_xpath"><code class="xref py py-obj docutils literal notranslate"><span class="pre">search_form_xpath</span></code></a>.</p>
</dd></dl>
<dl class="py data">
<dt class="sig sig-object py" id="searx.engines.startpage.sc_code_cache_sec">
<span class="sig-prename descclassname"><span class="pre">searx.engines.startpage.</span></span><span class="sig-name descname"><span class="pre">sc_code_cache_sec</span></span><em class="property"><span class="w"> </span><span class="p"><span class="pre">=</span></span><span class="w"> </span><span class="pre">30</span></em><a class="headerlink" href="#searx.engines.startpage.sc_code_cache_sec" title="Permalink to this definition"></a></dt>
<dd><p>Time in seconds the sc-code is cached in memory <a class="reference internal" href="#searx.engines.startpage.get_sc_code" title="searx.engines.startpage.get_sc_code"><code class="xref py py-obj docutils literal notranslate"><span class="pre">get_sc_code</span></code></a>.</p>
</dd></dl>
<dl class="py data">
<dt class="sig sig-object py" id="searx.engines.startpage.search_form_xpath">
<span class="sig-prename descclassname"><span class="pre">searx.engines.startpage.</span></span><span class="sig-name descname"><span class="pre">search_form_xpath</span></span><em class="property"><span class="w"> </span><span class="p"><span class="pre">=</span></span><span class="w"> </span><span class="pre">'//form[&#64;id=&quot;search&quot;]'</span></em><a class="headerlink" href="#searx.engines.startpage.search_form_xpath" title="Permalink to this definition"></a></dt>
<dd><p>XPath of Startpages origin search form</p>
</dd></dl>
<dl class="py data">
<dt class="sig sig-object py" id="searx.engines.startpage.send_accept_language_header">
<span class="sig-prename descclassname"><span class="pre">searx.engines.startpage.</span></span><span class="sig-name descname"><span class="pre">send_accept_language_header</span></span><em class="property"><span class="w"> </span><span class="p"><span class="pre">=</span></span><span class="w"> </span><span class="pre">True</span></em><a class="headerlink" href="#searx.engines.startpage.send_accept_language_header" title="Permalink to this definition"></a></dt>
<dd><p>Startpage tries to guess users language and territory from the HTTP
<code class="docutils literal notranslate"><span class="pre">Accept-Language</span></code>. Optional the user can select a search-language (can be
different to the UI language) and a region filter.</p>
</dd></dl>
<dl class="py data">
<dt class="sig sig-object py" id="searx.engines.startpage.startpage_categ">
<span class="sig-prename descclassname"><span class="pre">searx.engines.startpage.</span></span><span class="sig-name descname"><span class="pre">startpage_categ</span></span><em class="property"><span class="w"> </span><span class="p"><span class="pre">=</span></span><span class="w"> </span><span class="pre">'web'</span></em><a class="headerlink" href="#searx.engines.startpage.startpage_categ" title="Permalink to this definition"></a></dt>
<dd><p>Startpages category, visit <a class="reference internal" href="#startpage-categories"><span class="std std-ref">Startpage categories</span></a>.</p>
</dd></dl>
</section>
<div class="clearer"></div>
</div>
</div>
</div>
<span id="sidebar-top"></span>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<p class="logo"><a href="../../../index.html">
<img class="logo" src="../../../_static/searxng-wordmark.svg" alt="Logo"/>
</a></p>
<h3><a href="../../../index.html">Table of Contents</a></h3>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../../user/index.html">User information</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../own-instance.html">Why use a private instance?</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../admin/index.html">Administrator documentation</a></li>
<li class="toctree-l1 current"><a class="reference internal" href="../../index.html">Developer documentation</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="../../quickstart.html">Development Quickstart</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../contribution_guide.html">How to contribute</a></li>
<li class="toctree-l2 current"><a class="reference internal" href="../index.html">Engine Implementations</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../search_api.html">Search API</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../plugins.html">Plugins</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../translation.html">Translation</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../lxcdev.html">Developing in Linux Containers</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../makefile.html">Makefile &amp; <code class="docutils literal notranslate"><span class="pre">./manage</span></code></a></li>
<li class="toctree-l2"><a class="reference internal" href="../../reST.html">reST primer</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../searxng_extra/index.html">Tooling box <code class="docutils literal notranslate"><span class="pre">searxng_extra</span></code></a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../../utils/index.html">DevOps tooling box</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../src/index.html">Source-Code</a></li>
</ul>
<h3>Project Links</h3>
<ul>
<li><a href="https://github.com/searxng/searxng/tree/master">Source</a>
<li><a href="https://github.com/searxng/searxng/wiki">Wiki</a>
<li><a href="https://searx.space">Public instances</a>
<li><a href="https://github.com/searxng/searxng/issues">Issue Tracker</a>
</ul><h3>Navigation</h3>
<ul>
<li><a href="../../../index.html">Overview</a>
<ul>
<li><a href="../../index.html">Developer documentation</a>
<ul>
<li><a href="../index.html">Engine Implementations</a>
<ul>
<li>Previous: <a href="recoll.html" title="previous chapter">Recoll Engine</a>
<li>Next: <a href="tagesschau.html" title="next chapter">Tagesschau API</a></ul>
</li></ul>
</li>
</ul>
</li>
</ul>
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="../../../search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false"/>
<input type="submit" value="Go" />
</form>
</div>
</div>
<script>document.getElementById('searchbox').style.display = "block"</script>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="../../../_sources/dev/engines/online/startpage.rst.txt"
rel="nofollow">Show Source</a></li>
</ul>
</div>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright SearXNG team.
</div>
<script src="../../../_static/version_warning_offset.js"></script>
</body>
</html>