searxngRebrandZaclys/_modules/searx/engines/bing.html
2023-08-11 10:34:02 +00:00

449 lines
49 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>searx.engines.bing &#8212; SearXNG Documentation (2023.8.11+905ce2a6f)</title>
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css?v=4f649999" />
<link rel="stylesheet" type="text/css" href="../../../_static/searxng.css?v=52e4ff28" />
<link rel="stylesheet" type="text/css" href="../../../_static/tabs.css?v=a5c4661c" />
<script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js?v=3c88bde0"></script>
<script src="../../../_static/doctools.js?v=888ff710"></script>
<script src="../../../_static/sphinx_highlight.js?v=4825356b"></script>
<script src="../../../_static/tabs.js?v=3030b3cb"></script>
<link rel="index" title="Index" href="../../../genindex.html" />
<link rel="search" title="Search" href="../../../search.html" />
</head><body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="../../../genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="../../../py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="nav-item nav-item-0"><a href="../../../index.html">SearXNG Documentation (2023.8.11+905ce2a6f)</a> &#187;</li>
<li class="nav-item nav-item-1"><a href="../../index.html" >Module code</a> &#187;</li>
<li class="nav-item nav-item-2"><a href="../engines.html" accesskey="U">searx.engines</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">searx.engines.bing</a></li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<h1>Source code for searx.engines.bing</h1><div class="highlight"><pre>
<span></span><span class="c1"># SPDX-License-Identifier: AGPL-3.0-or-later</span>
<span class="c1"># lint: pylint</span>
<span class="sd">&quot;&quot;&quot;This is the implementation of the Bing-WEB engine. Some of this</span>
<span class="sd">implementations are shared by other engines:</span>
<span class="sd">- :ref:`bing images engine`</span>
<span class="sd">- :ref:`bing news engine`</span>
<span class="sd">- :ref:`bing videos engine`</span>
<span class="sd">On the `preference page`_ Bing offers a lot of languages an regions (see section</span>
<span class="sd">&#39;Search results languages&#39; and &#39;Country/region&#39;). However, the abundant choice</span>
<span class="sd">does not correspond to reality, where Bing has a full-text indexer only for a</span>
<span class="sd">limited number of languages. By example: you can select a language like Māori</span>
<span class="sd">but you never get a result in this language.</span>
<span class="sd">What comes a bit closer to the truth are the `search-APIs`_ but they don`t seem</span>
<span class="sd">to be completely correct either (if you take a closer look you will find some</span>
<span class="sd">inaccuracies there too):</span>
<span class="sd">- :py:obj:`searx.engines.bing.bing_traits_url`</span>
<span class="sd">- :py:obj:`searx.engines.bing_videos.bing_traits_url`</span>
<span class="sd">- :py:obj:`searx.engines.bing_images.bing_traits_url`</span>
<span class="sd">- :py:obj:`searx.engines.bing_news.bing_traits_url`</span>
<span class="sd">.. _preference page: https://www.bing.com/account/general</span>
<span class="sd">.. _search-APIs: https://learn.microsoft.com/en-us/bing/search-apis/</span>
<span class="sd">&quot;&quot;&quot;</span>
<span class="c1"># pylint: disable=too-many-branches, invalid-name</span>
<span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">TYPE_CHECKING</span>
<span class="kn">import</span> <span class="nn">datetime</span>
<span class="kn">import</span> <span class="nn">re</span>
<span class="kn">import</span> <span class="nn">uuid</span>
<span class="kn">from</span> <span class="nn">urllib.parse</span> <span class="kn">import</span> <span class="n">urlencode</span>
<span class="kn">from</span> <span class="nn">lxml</span> <span class="kn">import</span> <span class="n">html</span>
<span class="kn">import</span> <span class="nn">babel</span>
<span class="kn">import</span> <span class="nn">babel.languages</span>
<span class="kn">from</span> <span class="nn">searx.utils</span> <span class="kn">import</span> <span class="n">eval_xpath</span><span class="p">,</span> <span class="n">extract_text</span><span class="p">,</span> <span class="n">eval_xpath_list</span><span class="p">,</span> <span class="n">eval_xpath_getindex</span>
<span class="kn">from</span> <span class="nn">searx.locales</span> <span class="kn">import</span> <span class="n">language_tag</span><span class="p">,</span> <span class="n">region_tag</span>
<span class="kn">from</span> <span class="nn">searx.enginelib.traits</span> <span class="kn">import</span> <span class="n">EngineTraits</span>
<span class="k">if</span> <span class="n">TYPE_CHECKING</span><span class="p">:</span>
<span class="kn">import</span> <span class="nn">logging</span>
<span class="n">logger</span><span class="p">:</span> <span class="n">logging</span><span class="o">.</span><span class="n">Logger</span>
<span class="n">traits</span><span class="p">:</span> <span class="n">EngineTraits</span>
<span class="n">about</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">&quot;website&quot;</span><span class="p">:</span> <span class="s1">&#39;https://www.bing.com&#39;</span><span class="p">,</span>
<span class="s2">&quot;wikidata_id&quot;</span><span class="p">:</span> <span class="s1">&#39;Q182496&#39;</span><span class="p">,</span>
<span class="s2">&quot;official_api_documentation&quot;</span><span class="p">:</span> <span class="s1">&#39;https://www.microsoft.com/en-us/bing/apis/bing-web-search-api&#39;</span><span class="p">,</span>
<span class="s2">&quot;use_official_api&quot;</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span>
<span class="s2">&quot;require_api_key&quot;</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span>
<span class="s2">&quot;results&quot;</span><span class="p">:</span> <span class="s1">&#39;HTML&#39;</span><span class="p">,</span>
<span class="p">}</span>
<span class="n">send_accept_language_header</span> <span class="o">=</span> <span class="kc">True</span>
<span class="sd">&quot;&quot;&quot;Bing tries to guess user&#39;s language and territory from the HTTP</span>
<span class="sd">Accept-Language. Optional the user can select a search-language (can be</span>
<span class="sd">different to the UI language) and a region (market code).&quot;&quot;&quot;</span>
<span class="c1"># engine dependent config</span>
<span class="n">categories</span> <span class="o">=</span> <span class="p">[</span><span class="s1">&#39;general&#39;</span><span class="p">,</span> <span class="s1">&#39;web&#39;</span><span class="p">]</span>
<span class="n">paging</span> <span class="o">=</span> <span class="kc">True</span>
<span class="n">time_range_support</span> <span class="o">=</span> <span class="kc">True</span>
<span class="n">safesearch</span> <span class="o">=</span> <span class="kc">True</span>
<span class="n">safesearch_types</span> <span class="o">=</span> <span class="p">{</span><span class="mi">2</span><span class="p">:</span> <span class="s1">&#39;STRICT&#39;</span><span class="p">,</span> <span class="mi">1</span><span class="p">:</span> <span class="s1">&#39;DEMOTE&#39;</span><span class="p">,</span> <span class="mi">0</span><span class="p">:</span> <span class="s1">&#39;OFF&#39;</span><span class="p">}</span> <span class="c1"># cookie: ADLT=STRICT</span>
<span class="n">base_url</span> <span class="o">=</span> <span class="s1">&#39;https://www.bing.com/search&#39;</span>
<span class="sd">&quot;&quot;&quot;Bing (Web) search URL&quot;&quot;&quot;</span>
<span class="n">bing_traits_url</span> <span class="o">=</span> <span class="s1">&#39;https://learn.microsoft.com/en-us/bing/search-apis/bing-web-search/reference/market-codes&#39;</span>
<span class="sd">&quot;&quot;&quot;Bing (Web) search API description&quot;&quot;&quot;</span>
<span class="k">def</span> <span class="nf">_get_offset_from_pageno</span><span class="p">(</span><span class="n">pageno</span><span class="p">):</span>
<span class="k">return</span> <span class="p">(</span><span class="n">pageno</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">*</span> <span class="mi">10</span> <span class="o">+</span> <span class="mi">1</span>
<span class="k">def</span> <span class="nf">set_bing_cookies</span><span class="p">(</span><span class="n">params</span><span class="p">,</span> <span class="n">engine_language</span><span class="p">,</span> <span class="n">engine_region</span><span class="p">,</span> <span class="n">SID</span><span class="p">):</span>
<span class="c1"># set cookies</span>
<span class="c1"># -----------</span>
<span class="n">params</span><span class="p">[</span><span class="s1">&#39;cookies&#39;</span><span class="p">][</span><span class="s1">&#39;_EDGE_V&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="s1">&#39;1&#39;</span>
<span class="c1"># _EDGE_S: F=1&amp;SID=3A5253BD6BCA609509B741876AF961CA&amp;mkt=zh-tw</span>
<span class="n">_EDGE_S</span> <span class="o">=</span> <span class="p">[</span>
<span class="s1">&#39;F=1&#39;</span><span class="p">,</span>
<span class="s1">&#39;SID=</span><span class="si">%s</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="n">SID</span><span class="p">,</span>
<span class="s1">&#39;mkt=</span><span class="si">%s</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="n">engine_region</span><span class="o">.</span><span class="n">lower</span><span class="p">(),</span>
<span class="s1">&#39;ui=</span><span class="si">%s</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="n">engine_language</span><span class="o">.</span><span class="n">lower</span><span class="p">(),</span>
<span class="p">]</span>
<span class="n">params</span><span class="p">[</span><span class="s1">&#39;cookies&#39;</span><span class="p">][</span><span class="s1">&#39;_EDGE_S&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="s1">&#39;&amp;&#39;</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">_EDGE_S</span><span class="p">)</span>
<span class="n">logger</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s2">&quot;cookie _EDGE_S=</span><span class="si">%s</span><span class="s2">&quot;</span><span class="p">,</span> <span class="n">params</span><span class="p">[</span><span class="s1">&#39;cookies&#39;</span><span class="p">][</span><span class="s1">&#39;_EDGE_S&#39;</span><span class="p">])</span>
<span class="c1"># &quot;_EDGE_CD&quot;: &quot;m=zh-tw&quot;,</span>
<span class="n">_EDGE_CD</span> <span class="o">=</span> <span class="p">[</span> <span class="c1"># pylint: disable=invalid-name</span>
<span class="s1">&#39;m=</span><span class="si">%s</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="n">engine_region</span><span class="o">.</span><span class="n">lower</span><span class="p">(),</span> <span class="c1"># search region: zh-cn</span>
<span class="s1">&#39;u=</span><span class="si">%s</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="n">engine_language</span><span class="o">.</span><span class="n">lower</span><span class="p">(),</span> <span class="c1"># UI: en-us</span>
<span class="p">]</span>
<span class="n">params</span><span class="p">[</span><span class="s1">&#39;cookies&#39;</span><span class="p">][</span><span class="s1">&#39;_EDGE_CD&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="s1">&#39;&amp;&#39;</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">_EDGE_CD</span><span class="p">)</span> <span class="o">+</span> <span class="s1">&#39;;&#39;</span>
<span class="n">logger</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s2">&quot;cookie _EDGE_CD=</span><span class="si">%s</span><span class="s2">&quot;</span><span class="p">,</span> <span class="n">params</span><span class="p">[</span><span class="s1">&#39;cookies&#39;</span><span class="p">][</span><span class="s1">&#39;_EDGE_CD&#39;</span><span class="p">])</span>
<span class="n">SRCHHPGUSR</span> <span class="o">=</span> <span class="p">[</span> <span class="c1"># pylint: disable=invalid-name</span>
<span class="s1">&#39;SRCHLANG=</span><span class="si">%s</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="n">engine_language</span><span class="p">,</span>
<span class="c1"># Trying to set ADLT cookie here seems not to have any effect, I assume</span>
<span class="c1"># there is some age verification by a cookie (and/or session ID) needed,</span>
<span class="c1"># to disable the SafeSearch.</span>
<span class="s1">&#39;ADLT=</span><span class="si">%s</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="n">safesearch_types</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;safesearch&#39;</span><span class="p">],</span> <span class="s1">&#39;DEMOTE&#39;</span><span class="p">),</span>
<span class="p">]</span>
<span class="n">params</span><span class="p">[</span><span class="s1">&#39;cookies&#39;</span><span class="p">][</span><span class="s1">&#39;SRCHHPGUSR&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="s1">&#39;&amp;&#39;</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">SRCHHPGUSR</span><span class="p">)</span>
<span class="n">logger</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s2">&quot;cookie SRCHHPGUSR=</span><span class="si">%s</span><span class="s2">&quot;</span><span class="p">,</span> <span class="n">params</span><span class="p">[</span><span class="s1">&#39;cookies&#39;</span><span class="p">][</span><span class="s1">&#39;SRCHHPGUSR&#39;</span><span class="p">])</span>
<div class="viewcode-block" id="request"><a class="viewcode-back" href="../../../dev/engines/online/bing.html#searx.engines.bing.request">[docs]</a><span class="k">def</span> <span class="nf">request</span><span class="p">(</span><span class="n">query</span><span class="p">,</span> <span class="n">params</span><span class="p">):</span>
<span class="w"> </span><span class="sd">&quot;&quot;&quot;Assemble a Bing-Web request.&quot;&quot;&quot;</span>
<span class="n">engine_region</span> <span class="o">=</span> <span class="n">traits</span><span class="o">.</span><span class="n">get_region</span><span class="p">(</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;searxng_locale&#39;</span><span class="p">],</span> <span class="s1">&#39;en-US&#39;</span><span class="p">)</span>
<span class="n">engine_language</span> <span class="o">=</span> <span class="n">traits</span><span class="o">.</span><span class="n">get_language</span><span class="p">(</span><span class="n">params</span><span class="p">[</span><span class="s1">&#39;searxng_locale&#39;</span><span class="p">],</span> <span class="s1">&#39;en&#39;</span><span class="p">)</span>
<span class="n">SID</span> <span class="o">=</span> <span class="n">uuid</span><span class="o">.</span><span class="n">uuid1</span><span class="p">()</span><span class="o">.</span><span class="n">hex</span><span class="o">.</span><span class="n">upper</span><span class="p">()</span>
<span class="n">CVID</span> <span class="o">=</span> <span class="n">uuid</span><span class="o">.</span><span class="n">uuid1</span><span class="p">()</span><span class="o">.</span><span class="n">hex</span><span class="o">.</span><span class="n">upper</span><span class="p">()</span>
<span class="n">set_bing_cookies</span><span class="p">(</span><span class="n">params</span><span class="p">,</span> <span class="n">engine_language</span><span class="p">,</span> <span class="n">engine_region</span><span class="p">,</span> <span class="n">SID</span><span class="p">)</span>
<span class="c1"># build URL query</span>
<span class="c1"># ---------------</span>
<span class="c1"># query term</span>
<span class="n">page</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">params</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;pageno&#39;</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
<span class="n">query_params</span> <span class="o">=</span> <span class="p">{</span>
<span class="c1"># fmt: off</span>
<span class="s1">&#39;q&#39;</span><span class="p">:</span> <span class="n">query</span><span class="p">,</span>
<span class="s1">&#39;pq&#39;</span><span class="p">:</span> <span class="n">query</span><span class="p">,</span>
<span class="s1">&#39;cvid&#39;</span><span class="p">:</span> <span class="n">CVID</span><span class="p">,</span>
<span class="s1">&#39;qs&#39;</span><span class="p">:</span> <span class="s1">&#39;n&#39;</span><span class="p">,</span>
<span class="s1">&#39;sp&#39;</span><span class="p">:</span> <span class="s1">&#39;-1&#39;</span>
<span class="c1"># fmt: on</span>
<span class="p">}</span>
<span class="c1"># page</span>
<span class="k">if</span> <span class="n">page</span> <span class="o">&gt;</span> <span class="mi">1</span><span class="p">:</span>
<span class="n">referer</span> <span class="o">=</span> <span class="n">base_url</span> <span class="o">+</span> <span class="s1">&#39;?&#39;</span> <span class="o">+</span> <span class="n">urlencode</span><span class="p">(</span><span class="n">query_params</span><span class="p">)</span>
<span class="n">params</span><span class="p">[</span><span class="s1">&#39;headers&#39;</span><span class="p">][</span><span class="s1">&#39;Referer&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">referer</span>
<span class="n">logger</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s2">&quot;headers.Referer --&gt; </span><span class="si">%s</span><span class="s2">&quot;</span><span class="p">,</span> <span class="n">referer</span><span class="p">)</span>
<span class="n">query_params</span><span class="p">[</span><span class="s1">&#39;first&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">_get_offset_from_pageno</span><span class="p">(</span><span class="n">page</span><span class="p">)</span>
<span class="k">if</span> <span class="n">page</span> <span class="o">==</span> <span class="mi">2</span><span class="p">:</span>
<span class="n">query_params</span><span class="p">[</span><span class="s1">&#39;FORM&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="s1">&#39;PERE&#39;</span>
<span class="k">elif</span> <span class="n">page</span> <span class="o">&gt;</span> <span class="mi">2</span><span class="p">:</span>
<span class="n">query_params</span><span class="p">[</span><span class="s1">&#39;FORM&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="s1">&#39;PERE</span><span class="si">%s</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="p">(</span><span class="n">page</span> <span class="o">-</span> <span class="mi">2</span><span class="p">)</span>
<span class="n">filters</span> <span class="o">=</span> <span class="s1">&#39;&#39;</span>
<span class="k">if</span> <span class="n">params</span><span class="p">[</span><span class="s1">&#39;time_range&#39;</span><span class="p">]:</span>
<span class="n">query_params</span><span class="p">[</span><span class="s1">&#39;filt&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="s1">&#39;custom&#39;</span>
<span class="k">if</span> <span class="n">params</span><span class="p">[</span><span class="s1">&#39;time_range&#39;</span><span class="p">]</span> <span class="o">==</span> <span class="s1">&#39;day&#39;</span><span class="p">:</span>
<span class="n">filters</span> <span class="o">=</span> <span class="s1">&#39;ex1:&quot;ez1&quot;&#39;</span>
<span class="k">elif</span> <span class="n">params</span><span class="p">[</span><span class="s1">&#39;time_range&#39;</span><span class="p">]</span> <span class="o">==</span> <span class="s1">&#39;week&#39;</span><span class="p">:</span>
<span class="n">filters</span> <span class="o">=</span> <span class="s1">&#39;ex1:&quot;ez2&quot;&#39;</span>
<span class="k">elif</span> <span class="n">params</span><span class="p">[</span><span class="s1">&#39;time_range&#39;</span><span class="p">]</span> <span class="o">==</span> <span class="s1">&#39;month&#39;</span><span class="p">:</span>
<span class="n">filters</span> <span class="o">=</span> <span class="s1">&#39;ex1:&quot;ez3&quot;&#39;</span>
<span class="k">elif</span> <span class="n">params</span><span class="p">[</span><span class="s1">&#39;time_range&#39;</span><span class="p">]</span> <span class="o">==</span> <span class="s1">&#39;year&#39;</span><span class="p">:</span>
<span class="n">epoch_1970</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">date</span><span class="p">(</span><span class="mi">1970</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">today_no</span> <span class="o">=</span> <span class="p">(</span><span class="n">datetime</span><span class="o">.</span><span class="n">date</span><span class="o">.</span><span class="n">today</span><span class="p">()</span> <span class="o">-</span> <span class="n">epoch_1970</span><span class="p">)</span><span class="o">.</span><span class="n">days</span>
<span class="n">filters</span> <span class="o">=</span> <span class="s1">&#39;ex1:&quot;ez5_</span><span class="si">%s</span><span class="s1">_</span><span class="si">%s</span><span class="s1">&quot;&#39;</span> <span class="o">%</span> <span class="p">(</span><span class="n">today_no</span> <span class="o">-</span> <span class="mi">365</span><span class="p">,</span> <span class="n">today_no</span><span class="p">)</span>
<span class="n">params</span><span class="p">[</span><span class="s1">&#39;url&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">base_url</span> <span class="o">+</span> <span class="s1">&#39;?&#39;</span> <span class="o">+</span> <span class="n">urlencode</span><span class="p">(</span><span class="n">query_params</span><span class="p">)</span>
<span class="k">if</span> <span class="n">filters</span><span class="p">:</span>
<span class="n">params</span><span class="p">[</span><span class="s1">&#39;url&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">params</span><span class="p">[</span><span class="s1">&#39;url&#39;</span><span class="p">]</span> <span class="o">+</span> <span class="s1">&#39;&amp;filters=&#39;</span> <span class="o">+</span> <span class="n">filters</span>
<span class="k">return</span> <span class="n">params</span></div>
<span class="k">def</span> <span class="nf">response</span><span class="p">(</span><span class="n">resp</span><span class="p">):</span>
<span class="c1"># pylint: disable=too-many-locals,import-outside-toplevel</span>
<span class="kn">from</span> <span class="nn">searx.network</span> <span class="kn">import</span> <span class="n">Request</span><span class="p">,</span> <span class="n">multi_requests</span> <span class="c1"># see https://github.com/searxng/searxng/issues/762</span>
<span class="n">results</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">result_len</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">dom</span> <span class="o">=</span> <span class="n">html</span><span class="o">.</span><span class="n">fromstring</span><span class="p">(</span><span class="n">resp</span><span class="o">.</span><span class="n">text</span><span class="p">)</span>
<span class="c1"># parse results again if nothing is found yet</span>
<span class="n">url_to_resolve</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">url_to_resolve_index</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">i</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">result</span> <span class="ow">in</span> <span class="n">eval_xpath_list</span><span class="p">(</span><span class="n">dom</span><span class="p">,</span> <span class="s1">&#39;//ol[@id=&quot;b_results&quot;]/li[contains(@class, &quot;b_algo&quot;)]&#39;</span><span class="p">):</span>
<span class="n">link</span> <span class="o">=</span> <span class="n">eval_xpath_getindex</span><span class="p">(</span><span class="n">result</span><span class="p">,</span> <span class="s1">&#39;.//h2/a&#39;</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span>
<span class="k">if</span> <span class="n">link</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
<span class="k">continue</span>
<span class="n">url</span> <span class="o">=</span> <span class="n">link</span><span class="o">.</span><span class="n">attrib</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;href&#39;</span><span class="p">)</span>
<span class="n">title</span> <span class="o">=</span> <span class="n">extract_text</span><span class="p">(</span><span class="n">link</span><span class="p">)</span>
<span class="n">content</span> <span class="o">=</span> <span class="n">eval_xpath</span><span class="p">(</span><span class="n">result</span><span class="p">,</span> <span class="s1">&#39;(.//p)[1]&#39;</span><span class="p">)</span>
<span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">content</span><span class="p">:</span>
<span class="c1"># Make sure that the element is free of &lt;a href&gt; links</span>
<span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">p</span><span class="o">.</span><span class="n">xpath</span><span class="p">(</span><span class="s1">&#39;.//a&#39;</span><span class="p">):</span>
<span class="n">e</span><span class="o">.</span><span class="n">getparent</span><span class="p">()</span><span class="o">.</span><span class="n">remove</span><span class="p">(</span><span class="n">e</span><span class="p">)</span>
<span class="n">content</span> <span class="o">=</span> <span class="n">extract_text</span><span class="p">(</span><span class="n">content</span><span class="p">)</span>
<span class="c1"># get the real URL either using the URL shown to user or following the Bing URL</span>
<span class="k">if</span> <span class="n">url</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="s1">&#39;https://www.bing.com/ck/a?&#39;</span><span class="p">):</span>
<span class="n">url_cite</span> <span class="o">=</span> <span class="n">extract_text</span><span class="p">(</span><span class="n">eval_xpath</span><span class="p">(</span><span class="n">result</span><span class="p">,</span> <span class="s1">&#39;.//div[@class=&quot;b_attribution&quot;]/cite&#39;</span><span class="p">))</span>
<span class="c1"># Bing can shorten the URL either at the end or in the middle of the string</span>
<span class="k">if</span> <span class="p">(</span>
<span class="n">url_cite</span>
<span class="ow">and</span> <span class="n">url_cite</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="s1">&#39;https://&#39;</span><span class="p">)</span>
<span class="ow">and</span> <span class="s1">&#39;&#39;</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">url_cite</span>
<span class="ow">and</span> <span class="s1">&#39;...&#39;</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">url_cite</span>
<span class="ow">and</span> <span class="s1">&#39;&#39;</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">url_cite</span>
<span class="p">):</span>
<span class="c1"># no need for an additional HTTP request</span>
<span class="n">url</span> <span class="o">=</span> <span class="n">url_cite</span>
<span class="k">else</span><span class="p">:</span>
<span class="c1"># resolve the URL with an additional HTTP request</span>
<span class="n">url_to_resolve</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">url</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s1">&#39;&amp;ntb=1&#39;</span><span class="p">,</span> <span class="s1">&#39;&amp;ntb=F&#39;</span><span class="p">))</span>
<span class="n">url_to_resolve_index</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="n">url</span> <span class="o">=</span> <span class="kc">None</span> <span class="c1"># remove the result if the HTTP Bing redirect raise an exception</span>
<span class="c1"># append result</span>
<span class="n">results</span><span class="o">.</span><span class="n">append</span><span class="p">({</span><span class="s1">&#39;url&#39;</span><span class="p">:</span> <span class="n">url</span><span class="p">,</span> <span class="s1">&#39;title&#39;</span><span class="p">:</span> <span class="n">title</span><span class="p">,</span> <span class="s1">&#39;content&#39;</span><span class="p">:</span> <span class="n">content</span><span class="p">})</span>
<span class="c1"># increment result pointer for the next iteration in this loop</span>
<span class="n">i</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="c1"># resolve all Bing redirections in parallel</span>
<span class="n">request_list</span> <span class="o">=</span> <span class="p">[</span>
<span class="n">Request</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">u</span><span class="p">,</span> <span class="n">allow_redirects</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">headers</span><span class="o">=</span><span class="n">resp</span><span class="o">.</span><span class="n">search_params</span><span class="p">[</span><span class="s1">&#39;headers&#39;</span><span class="p">])</span> <span class="k">for</span> <span class="n">u</span> <span class="ow">in</span> <span class="n">url_to_resolve</span>
<span class="p">]</span>
<span class="n">response_list</span> <span class="o">=</span> <span class="n">multi_requests</span><span class="p">(</span><span class="n">request_list</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">redirect_response</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">response_list</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">redirect_response</span><span class="p">,</span> <span class="ne">Exception</span><span class="p">):</span>
<span class="n">results</span><span class="p">[</span><span class="n">url_to_resolve_index</span><span class="p">[</span><span class="n">i</span><span class="p">]][</span><span class="s1">&#39;url&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">redirect_response</span><span class="o">.</span><span class="n">headers</span><span class="p">[</span><span class="s1">&#39;location&#39;</span><span class="p">]</span>
<span class="c1"># get number_of_results</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">result_len_container</span> <span class="o">=</span> <span class="s2">&quot;&quot;</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">eval_xpath</span><span class="p">(</span><span class="n">dom</span><span class="p">,</span> <span class="s1">&#39;//span[@class=&quot;sb_count&quot;]//text()&#39;</span><span class="p">))</span>
<span class="k">if</span> <span class="s2">&quot;-&quot;</span> <span class="ow">in</span> <span class="n">result_len_container</span><span class="p">:</span>
<span class="c1"># Remove the part &quot;from-to&quot; for paginated request ...</span>
<span class="n">result_len_container</span> <span class="o">=</span> <span class="n">result_len_container</span><span class="p">[</span><span class="n">result_len_container</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s2">&quot;-&quot;</span><span class="p">)</span> <span class="o">*</span> <span class="mi">2</span> <span class="o">+</span> <span class="mi">2</span> <span class="p">:]</span>
<span class="n">result_len_container</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">sub</span><span class="p">(</span><span class="s1">&#39;[^0-9]&#39;</span><span class="p">,</span> <span class="s1">&#39;&#39;</span><span class="p">,</span> <span class="n">result_len_container</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">result_len_container</span><span class="p">)</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">result_len</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">result_len_container</span><span class="p">)</span>
<span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span> <span class="c1"># pylint: disable=broad-except</span>
<span class="n">logger</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s1">&#39;result error :</span><span class="se">\n</span><span class="si">%s</span><span class="s1">&#39;</span><span class="p">,</span> <span class="n">e</span><span class="p">)</span>
<span class="k">if</span> <span class="n">result_len</span> <span class="ow">and</span> <span class="n">_get_offset_from_pageno</span><span class="p">(</span><span class="n">resp</span><span class="o">.</span><span class="n">search_params</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&quot;pageno&quot;</span><span class="p">,</span> <span class="mi">0</span><span class="p">))</span> <span class="o">&gt;</span> <span class="n">result_len</span><span class="p">:</span>
<span class="k">return</span> <span class="p">[]</span>
<span class="n">results</span><span class="o">.</span><span class="n">append</span><span class="p">({</span><span class="s1">&#39;number_of_results&#39;</span><span class="p">:</span> <span class="n">result_len</span><span class="p">})</span>
<span class="k">return</span> <span class="n">results</span>
<div class="viewcode-block" id="fetch_traits"><a class="viewcode-back" href="../../../dev/engines/online/bing.html#searx.engines.bing.fetch_traits">[docs]</a><span class="k">def</span> <span class="nf">fetch_traits</span><span class="p">(</span><span class="n">engine_traits</span><span class="p">:</span> <span class="n">EngineTraits</span><span class="p">):</span>
<span class="w"> </span><span class="sd">&quot;&quot;&quot;Fetch languages and regions from Bing-Web.&quot;&quot;&quot;</span>
<span class="n">xpath_market_codes</span> <span class="o">=</span> <span class="s1">&#39;//table[1]/tbody/tr/td[3]&#39;</span>
<span class="c1"># xpath_country_codes = &#39;//table[2]/tbody/tr/td[2]&#39;</span>
<span class="n">xpath_language_codes</span> <span class="o">=</span> <span class="s1">&#39;//table[3]/tbody/tr/td[2]&#39;</span>
<span class="n">_fetch_traits</span><span class="p">(</span><span class="n">engine_traits</span><span class="p">,</span> <span class="n">bing_traits_url</span><span class="p">,</span> <span class="n">xpath_language_codes</span><span class="p">,</span> <span class="n">xpath_market_codes</span><span class="p">)</span></div>
<span class="k">def</span> <span class="nf">_fetch_traits</span><span class="p">(</span><span class="n">engine_traits</span><span class="p">:</span> <span class="n">EngineTraits</span><span class="p">,</span> <span class="n">url</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">xpath_language_codes</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">xpath_market_codes</span><span class="p">:</span> <span class="nb">str</span><span class="p">):</span>
<span class="c1"># pylint: disable=too-many-locals,import-outside-toplevel</span>
<span class="kn">from</span> <span class="nn">searx.network</span> <span class="kn">import</span> <span class="n">get</span> <span class="c1"># see https://github.com/searxng/searxng/issues/762</span>
<span class="c1"># insert alias to map from a language (zh) to a language + script (zh_Hans)</span>
<span class="n">engine_traits</span><span class="o">.</span><span class="n">languages</span><span class="p">[</span><span class="s1">&#39;zh&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="s1">&#39;zh-hans&#39;</span>
<span class="n">resp</span> <span class="o">=</span> <span class="n">get</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">resp</span><span class="o">.</span><span class="n">ok</span><span class="p">:</span> <span class="c1"># type: ignore</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">&quot;ERROR: response from peertube is not OK.&quot;</span><span class="p">)</span>
<span class="n">dom</span> <span class="o">=</span> <span class="n">html</span><span class="o">.</span><span class="n">fromstring</span><span class="p">(</span><span class="n">resp</span><span class="o">.</span><span class="n">text</span><span class="p">)</span> <span class="c1"># type: ignore</span>
<span class="n">map_lang</span> <span class="o">=</span> <span class="p">{</span><span class="s1">&#39;jp&#39;</span><span class="p">:</span> <span class="s1">&#39;ja&#39;</span><span class="p">}</span>
<span class="k">for</span> <span class="n">td</span> <span class="ow">in</span> <span class="n">eval_xpath</span><span class="p">(</span><span class="n">dom</span><span class="p">,</span> <span class="n">xpath_language_codes</span><span class="p">):</span>
<span class="n">eng_lang</span> <span class="o">=</span> <span class="n">td</span><span class="o">.</span><span class="n">text</span>
<span class="k">if</span> <span class="n">eng_lang</span> <span class="ow">in</span> <span class="p">(</span><span class="s1">&#39;en-gb&#39;</span><span class="p">,</span> <span class="s1">&#39;pt-br&#39;</span><span class="p">):</span>
<span class="c1"># language &#39;en&#39; is already in the list and a language &#39;en-gb&#39; can&#39;t</span>
<span class="c1"># be handled in SearXNG, same with pt-br which is covered by pt-pt.</span>
<span class="k">continue</span>
<span class="n">babel_lang</span> <span class="o">=</span> <span class="n">map_lang</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">eng_lang</span><span class="p">,</span> <span class="n">eng_lang</span><span class="p">)</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s1">&#39;-&#39;</span><span class="p">,</span> <span class="s1">&#39;_&#39;</span><span class="p">)</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">sxng_tag</span> <span class="o">=</span> <span class="n">language_tag</span><span class="p">(</span><span class="n">babel</span><span class="o">.</span><span class="n">Locale</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="n">babel_lang</span><span class="p">))</span>
<span class="k">except</span> <span class="n">babel</span><span class="o">.</span><span class="n">UnknownLocaleError</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">&quot;ERROR: language (</span><span class="si">%s</span><span class="s2">) is unknown by babel&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">eng_lang</span><span class="p">))</span>
<span class="k">continue</span>
<span class="n">conflict</span> <span class="o">=</span> <span class="n">engine_traits</span><span class="o">.</span><span class="n">languages</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">sxng_tag</span><span class="p">)</span>
<span class="k">if</span> <span class="n">conflict</span><span class="p">:</span>
<span class="k">if</span> <span class="n">conflict</span> <span class="o">!=</span> <span class="n">eng_lang</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">&quot;CONFLICT: babel </span><span class="si">%s</span><span class="s2"> --&gt; </span><span class="si">%s</span><span class="s2">, </span><span class="si">%s</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">sxng_tag</span><span class="p">,</span> <span class="n">conflict</span><span class="p">,</span> <span class="n">eng_lang</span><span class="p">))</span>
<span class="k">continue</span>
<span class="n">engine_traits</span><span class="o">.</span><span class="n">languages</span><span class="p">[</span><span class="n">sxng_tag</span><span class="p">]</span> <span class="o">=</span> <span class="n">eng_lang</span>
<span class="n">map_region</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">&#39;en-ID&#39;</span><span class="p">:</span> <span class="s1">&#39;id_ID&#39;</span><span class="p">,</span>
<span class="s1">&#39;no-NO&#39;</span><span class="p">:</span> <span class="s1">&#39;nb_NO&#39;</span><span class="p">,</span>
<span class="p">}</span>
<span class="k">for</span> <span class="n">td</span> <span class="ow">in</span> <span class="n">eval_xpath</span><span class="p">(</span><span class="n">dom</span><span class="p">,</span> <span class="n">xpath_market_codes</span><span class="p">):</span>
<span class="n">eng_region</span> <span class="o">=</span> <span class="n">td</span><span class="o">.</span><span class="n">text</span>
<span class="n">babel_region</span> <span class="o">=</span> <span class="n">map_region</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">eng_region</span><span class="p">,</span> <span class="n">eng_region</span><span class="p">)</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s1">&#39;-&#39;</span><span class="p">,</span> <span class="s1">&#39;_&#39;</span><span class="p">)</span>
<span class="k">if</span> <span class="n">eng_region</span> <span class="o">==</span> <span class="s1">&#39;en-WW&#39;</span><span class="p">:</span>
<span class="n">engine_traits</span><span class="o">.</span><span class="n">all_locale</span> <span class="o">=</span> <span class="n">eng_region</span>
<span class="k">continue</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">sxng_tag</span> <span class="o">=</span> <span class="n">region_tag</span><span class="p">(</span><span class="n">babel</span><span class="o">.</span><span class="n">Locale</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="n">babel_region</span><span class="p">))</span>
<span class="k">except</span> <span class="n">babel</span><span class="o">.</span><span class="n">UnknownLocaleError</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">&quot;ERROR: region (</span><span class="si">%s</span><span class="s2">) is unknown by babel&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">eng_region</span><span class="p">))</span>
<span class="k">continue</span>
<span class="n">conflict</span> <span class="o">=</span> <span class="n">engine_traits</span><span class="o">.</span><span class="n">regions</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">sxng_tag</span><span class="p">)</span>
<span class="k">if</span> <span class="n">conflict</span><span class="p">:</span>
<span class="k">if</span> <span class="n">conflict</span> <span class="o">!=</span> <span class="n">eng_region</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">&quot;CONFLICT: babel </span><span class="si">%s</span><span class="s2"> --&gt; </span><span class="si">%s</span><span class="s2">, </span><span class="si">%s</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">sxng_tag</span><span class="p">,</span> <span class="n">conflict</span><span class="p">,</span> <span class="n">eng_region</span><span class="p">))</span>
<span class="k">continue</span>
<span class="n">engine_traits</span><span class="o">.</span><span class="n">regions</span><span class="p">[</span><span class="n">sxng_tag</span><span class="p">]</span> <span class="o">=</span> <span class="n">eng_region</span>
</pre></div>
<div class="clearer"></div>
</div>
</div>
</div>
<span id="sidebar-top"></span>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<p class="logo"><a href="../../../index.html">
<img class="logo" src="../../../_static/searxng-wordmark.svg" alt="Logo"/>
</a></p>
<h3><a href="../../../index.html">Table of Contents</a></h3>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../../user/index.html">User information</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../own-instance.html">Why use a private instance?</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../admin/index.html">Administrator documentation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../dev/index.html">Developer documentation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../utils/index.html">DevOps tooling box</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../src/index.html">Source-Code</a></li>
</ul>
<h3>Project Links</h3>
<ul>
<li><a href="https://github.com/searxng/searxng/tree/master">Source</a>
<li><a href="https://github.com/searxng/searxng/wiki">Wiki</a>
<li><a href="https://searx.space">Public instances</a>
<li><a href="https://github.com/searxng/searxng/issues">Issue Tracker</a>
</ul><h3>Navigation</h3>
<ul>
<li><a href="../../../index.html">Overview</a>
<ul>
<li><a href="../../index.html">Module code</a>
<ul>
<li><a href="../engines.html">searx.engines</a>
</ul>
</li></ul>
</li>
</ul>
</li>
</ul>
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="../../../search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false"/>
<input type="submit" value="Go" />
</form>
</div>
</div>
<script>document.getElementById('searchbox').style.display = "block"</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright SearXNG team.
</div>
<script src="../../../_static/version_warning_offset.js"></script>
</body>
</html>