Policy:User-Agent policy/ru: Difference between revisions

Content deleted Content added

Inline

Latest revision as of 01:03, 29 March 2024

Данная страница носит сугубо информативный характер, отражая текущее положение дел. Для обсуждения этой темы, пожалуйста, используйте список рассылки wikitech-l.

Начиная с 15 февраля 2010 года, сайты Викимедиа требуют наличия HTTP-заголовка User-Agent во всех запросах. Это оперативное решение, принятое техническим персоналом, которое анонсировалось и обсуждалось в технической рассылке^[1]^[2]. Обоснование заключается в том, что клиенты, которые не посылают строку User-agent, в основном являются некорректно работающими скриптами, которые создают большую нагрузку на серверы, не принося пользу проектам. Строки User-agent, которые начинаются с неописательных значений по умолчанию, например, python-requests/x, также могут быть заблокированы на сайтах Викимедиа (или разделах сайта, например, api.php).

Запросы (например, от браузеров или скриптов), которые не отправляют описательный заголовок User-Agent, могут столкнуться с сообщением об ошибке, подобным этому:

Scripts should use an informative User-Agent string with contact information, or they may be blocked without notice.

Requests from disallowed user agents may instead encounter a less helpful error message like this:

Our servers are currently experiencing a technical problem. Please try again in a few minutes.

This change is most likely to affect scripts (bots) accessing Wikimedia websites such as Wikipedia automatically, via api.php or otherwise, and command line programs.^[3] If you run a bot, please send a User-Agent header identifying the bot with an identifier that isn't going to be confused with many other bots, and supplying some way of contacting you (e.g. a userpage on the local wiki, a userpage on a related wiki using interwiki linking syntax, a URI for a relevant external website, or an email address), e.g.:

User-Agent: CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org) generic-library/0.0

The generic format is <client name>/<version> (<contact information>) <library/framework name>/<version> [<library name>/<version> ...]. Parts that are not applicable can be omitted.

If you run an automated agent, please consider following the Internet-wide convention of including the string "bot" in the User-Agent string, in any combination of lowercase or uppercase letters. This is recognized by Wikimedia's systems, and used to classify traffic and provide more accurate statistics.

Do not copy a browser's user agent for your bot, as bot-like behavior with a browser's user agent will be assumed malicious.^[4] Do not use generic agents such as "curl", "lwp", "Python-urllib", and so on. For large frameworks like pywikibot, there are so many users that just "pywikibot" is likely to be somewhat vague. Including detail about the specific task/script/etc would be a good idea, even if that detail is opaque to anyone besides the operator.^[5]

Web browsers generally send a User-Agent string automatically; if you encounter the above error, please refer to your browser's manual to find out how to set the User-Agent string. Note that some plugins or proxies for privacy enhancement may suppress this header. However, for anonymous surfing, it is recommended to send a generic User-Agent string, instead of suppressing it or sending an empty string. Note that other features are much more likely to identify you to a website — if you are interested in protecting your privacy, visit the Cover Your Tracks project.

Browser-based applications written in JavaScript are typically forced to send the same User-Agent header as the browser that hosts them. This is not a violation of policy, however such applications are encouraged to include the Api-User-Agent header to supply an appropriate agent.

As of 2015, Wikimedia sites do not reject all page views and API requests from clients that do not set a User-Agent header. As such, the requirement is not automatically enforced. Rather, it may be enforced in specific cases as needed.

^[6]

Примеры кода

On Wikimedia wikis, if you don't supply a User-Agent header, or you supply an empty or generic one, your request will fail with an HTTP 403 error. Other MediaWiki installations may have similar policies.

JavaScript

If you are calling the API from browser-based JavaScript, you won't be able to influence the User-Agent header: the browser will use its own. To work around this, use the Api-User-Agent header:

// Using XMLHttpRequest
xhr.setRequestHeader( 'Api-User-Agent', 'Example/1.0' );

// Using jQuery
$.ajax( {
    url: 'https://example/...',
    data: ...,
    dataType: 'json',
    type: 'GET',
    headers: { 'Api-User-Agent': 'Example/1.0' },
} ).then( function ( data )  {
    // ..
} );

// Using mw.Api
var api = new mw.Api( {
    ajax: {
        headers: { 'Api-User-Agent': 'Example/1.0' }
    }
} );
api.get( ... ).then( function ( data ) {
    // ...
});

// Using Fetch
fetch( 'https://example/...', {
    method: 'GET',
    headers: new Headers( {
        'Api-User-Agent': 'Example/1.0'
    } )
} ).then( function ( response ) {
    return response.json();
} ).then( function ( data ) {
    // ...
});

PHP

In PHP, you can identify your user-agent with code such as this:

ini_set( 'user_agent', 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)' );

cURL

Or if you use cURL:

curl_setopt( $curl, CURLOPT_USERAGENT, 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)' );

Python

In Python, you can use the Requests library to set a header:

import requests

url = 'https://example/...'
headers = {'User-Agent': 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)'}

response = requests.get(url, headers=headers)

Or, if you want to use SPARQLWrapper like in https://people.wikimedia.org/~bearloga/notes/wdqs-python.html:

from SPARQLWrapper import SPARQLWrapper, JSON

url = 'https://example/...'
user_agent = 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)'

sparql = SPARQLWrapper(url, agent = user_agent )
results = sparql.query()

Примечания

↑ The Wikitech-l February 2010 Archive by subject
↑ User-Agent: - Wikitech-l - lists.wikimedia.org
↑ API:FAQ - MediaWiki
↑ [Wikitech-l] User-Agent:
↑ Clarification on what is needed for "identifying the bot" in bot user-agent?
↑ gmane.science.linguistics.wikipedia.technical/83870 (deadlink)

См. также

Policy for crawlers and bots that wish to operate on Wikimedia websites

[1] The Wikitech-l February 2010 Archive by subject

[2] User-Agent: - Wikitech-l - lists.wikimedia.org

[3] API:FAQ - MediaWiki

[4] [Wikitech-l] User-Agent:

[5] Clarification on what is needed for "identifying the bot" in bot user-agent?

[6] .science.linguistics.wikipedia.technical/83870 (deadlink)

[1]

[2]

[3]

[4]

[5]

[6]

@@ Line 1: / Line 1: @@
+<languages />{{DISPLAYTITLE:Политика использования User-Agent}}
-<languages />
-{{notice|Данная страница носит сугубо информативный характер, отражая текущее положение дел. Для обсуждения этой темы, пожалуйста, используйте [[Special:MyLanguage/Mailing lists|список рассылки]] wikitech-l.}}
+{{notice|1=Данная страница носит сугубо информативный характер, отражая текущее положение дел. Для обсуждения этой темы, пожалуйста, используйте [[:m:Special:MyLanguage/Mailing lists|список рассылки]] wikitech-l.}}
+{{policy-staff}}
+Начиная с 15 февраля 2010 года, сайты Викимедиа требуют наличия '''HTTP-заголовка [[{{lwp|User-Agent}}|User-Agent]]''' во всех запросах. Это оперативное решение, принятое техническим персоналом, которое анонсировалось и обсуждалось в технической рассылке<ref>[[mailarchive:wikitech-l/2010-February/thread.html#46764|The Wikitech-l February 2010 Archive by subject]]</ref><ref>[[listarchive:list/wikitech-l@lists.wikimedia.org/thread/R4RU7XTBM5J3BTS6GGQW77NYS2E4WGLI/|User-Agent: - Wikitech-l - lists.wikimedia.org]]</ref>. Обоснование заключается в том, что клиенты, которые не посылают строку User-agent, в основном являются некорректно работающими скриптами, которые создают большую нагрузку на серверы, не принося пользу проектам. Строки User-agent, которые начинаются с неописательных значений по умолчанию, например, <code>python-requests/x</code>, также могут быть заблокированы на сайтах Викимедиа (или разделах сайта, например, <code>api.php</code>).
-<div class="mw-translate-fuzzy">
-Начиная с 15 февраля 2010 года, сайты Викимедиа требуют наличия '''HTTP-заголовка [[w:ru:User agent|User-Agent]]''' во всех запросах. Это оперативное решение, принятое техническим персоналом, которое анонсировалось и обсуждалось в технической рассылке<ref>[https://lists.wikimedia.org/pipermail/wikitech-l/2010-February/thread.html#46764 The Wikitech-l February 2010 Archive by subject]</ref><ref>[https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/thread/R4RU7XTBM5J3BTS6GGQW77NYS2E4WGLI/ User-Agent: | Wikipedia | Wikitech]</ref>. Обоснование заключается в том, что клиенты, которые не посылают строку User-agent, в основном являются некорректно работающими скриптами, которые создают большую нагрузку на серверы, не принося пользу проектам. Строки User-agent, которые начинаются с неописательных значений по умолчанию, например code>python-requests/x</code>, также могут быть заблокированы на сайтах Викимедиа (или разделах сайта, например, <code>api.php</code>).
-</div>
+Запросы (например, от браузеров или скриптов), которые не отправляют описательный заголовок User-Agent, могут столкнуться с сообщением об ошибке, подобным этому:
-<div lang="en" dir="ltr" class="mw-content-ltr">
-Requests (e.g. from browsers or scripts) that do not send a descriptive User-Agent header, may encounter an error message like this:
-</div>
-<div lang="en" dir="ltr" class="mw-content-ltr">
+:<span lang="en" dir="ltr" class="mw-content-ltr">''Scripts should use an informative User-Agent string with contact information, or they may be blocked without notice.''</span>
-:''Scripts should use an informative User-Agent string with contact information, or they may be blocked without notice.''
-</div>
 <div lang="en" dir="ltr" class="mw-content-ltr">
@@ Line 18: / Line 13: @@
 </div>
-<div lang="en" dir="ltr" class="mw-content-ltr">
+:<span lang="en" dir="ltr" class="mw-content-ltr">''Our servers are currently experiencing a technical problem. Please try again in a few minutes.''</span>
-:''Our servers are currently experiencing a technical problem. Please try again in a few minutes.
-</div>
 <div lang="en" dir="ltr" class="mw-content-ltr">
-This change is most likely to affect scripts (bots) accessing Wikimedia websites such as Wikipedia automatically, via api.php or otherwise, and command line programs.<ref>[//www.mediawiki.org/w/index.php?title=API:FAQ#do_I_get_HTTP_403_errors.3F API:FAQ - MediaWiki]</ref> If you run a bot, please send a User-Agent header identifying the bot with an identifier that isn't going to be confused with many other bots, and supplying some way of contacting you (e.g. a userpage on the local wiki, a userpage on a related wiki using interwiki linking syntax, a URI for a relevant external website, or an email address), e.g.:
+This change is most likely to affect scripts (bots) accessing Wikimedia websites such as Wikipedia automatically, via api.php or otherwise, and command line programs.<ref>[[:mw:Special:MyLanguage/API:FAQ|API:FAQ - MediaWiki]]</ref> If you run a bot, please send a User-Agent header identifying the bot with an identifier that isn't going to be confused with many other bots, and supplying some way of contacting you (e.g. a userpage on the local wiki, a userpage on a related wiki using interwiki linking syntax, a URI for a relevant external website, or an email address), e.g.:
 </div>
 <pre>
@@ Line 38: / Line 31: @@
 <div lang="en" dir="ltr" class="mw-content-ltr">
-Do not copy a browser's user agent for your bot, as bot-like behavior with a browser's user agent will be assumed malicious.<ref>[//lists.wikimedia.org/pipermail/wikitech-l/2010-February/046783.html [Wikitech-l&#93; User-Agent:]</ref> Do not use generic agents such as "curl", "lwp", "Python-urllib", and so on. For large frameworks like pywikibot, there are so many users that just "pywikibot" is likely to be somewhat vague. Including detail about the specific task/script/etc would be a good idea, even if that detail is opaque to anyone besides the operator.<ref>{{cite web|url=https://lists.wikimedia.org/pipermail/mediawiki-api/2014-July/003308.html|title=Clarification on what is needed for "identifying the bot" in bot user-agent?|publisher=Mediawiki-api|author=Anomie|date=31 July 2014}}</ref>
+Do not copy a browser's user agent for your bot, as bot-like behavior with a browser's user agent will be assumed malicious.<ref>[[mailarchive:wikitech-l/2010-February/046783.html|[Wikitech-l] User-Agent:]]</ref> Do not use generic agents such as "curl", "lwp", "Python-urllib", and so on. For large frameworks like pywikibot, there are so many users that just "pywikibot" is likely to be somewhat vague. Including detail about the specific task/script/etc would be a good idea, even if that detail is opaque to anyone besides the operator.<ref>[[mailarchive:mediawiki-api/2014-July/003308.html|Clarification on what is needed for "identifying the bot" in bot user-agent?]]</ref>
 </div>
 <div lang="en" dir="ltr" class="mw-content-ltr">
-Web browsers generally send a User-Agent string automatically; if you encounter the above error, please refer to your browser's manual to find out how to set the User-Agent string. Note that some plugins or proxies for privacy enhancement may suppress this header. However, for anonymous surfing, it is recommended to send a generic User-Agent string, instead of suppressing it or sending an empty string. Note that other features are much more likely to identify you to a website — if you are interested in protecting your privacy, visit the [https://panopticlick.eff.org/ Panopticlick project].
+Web browsers generally send a User-Agent string automatically; if you encounter the above error, please refer to your browser's manual to find out how to set the User-Agent string. Note that some plugins or proxies for privacy enhancement may suppress this header. However, for anonymous surfing, it is recommended to send a generic User-Agent string, instead of suppressing it or sending an empty string. Note that other features are much more likely to identify you to a website — if you are interested in protecting your privacy, visit the [//coveryourtracks.eff.org/ Cover Your Tracks project].
 </div>
@@ Line 51: / Line 44: @@
 <div lang="en" dir="ltr" class="mw-content-ltr">
 As of 2015, Wikimedia sites do not reject all page views and API requests from clients that do not set a User-Agent header. As such, the requirement is not automatically enforced. Rather, it may be enforced in specific cases as needed.
-</div><ref>gmane.science.linguistics.wikipedia.technical/83870 ([http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/83870/ deadlink])</ref>
+</div><ref>gmane.science.linguistics.wikipedia.technical/83870 ([//thread.gmane.org/gmane.science.linguistics.wikipedia.technical/83870/ deadlink])</ref>
+<span id="Code_examples"></span>
-<div lang="en" dir="ltr" class="mw-content-ltr">
-== Code examples ==
+== Примеры кода ==
-</div>
 <div lang="en" dir="ltr" class="mw-content-ltr">
@@ Line 126: / Line 118: @@
 <div lang="en" dir="ltr" class="mw-content-ltr">
-Or if you use [[wikipedia:cURL|cURL]]:
+Or if you use [[{{lwp|cURL}}|cURL]]:
 </div>
@@ Line 138: / Line 130: @@
 <div lang="en" dir="ltr" class="mw-content-ltr">
-In Python, you can use the [[wikipedia:Requests_(software)|Requests]] library to set a header:
+In Python, you can use the [[{{lwp|Requests (software)}}|Requests]] library to set a header:
 </div>
@@ Line 150: / Line 142: @@
 </syntaxhighlight>
+<div lang="en" dir="ltr" class="mw-content-ltr">
-== Примечания ==
+Or, if you want to use [//sparqlwrapper.readthedocs.io SPARQLWrapper] like in https://people.wikimedia.org/~bearloga/notes/wdqs-python.html:
+</div>
+<syntaxhighlight lang="python">
+from SPARQLWrapper import SPARQLWrapper, JSON
+url = 'https://example/...'
+user_agent = 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)'
+sparql = SPARQLWrapper(url, agent = user_agent )
+results = sparql.query()
+</syntaxhighlight>
+== {{int string|Notes}} ==
 <references />
-== См. также ==
+== {{int string|See also}} ==
 * <span lang="en" dir="ltr" class="mw-content-ltr">[[wikitech:Robot policy|Policy for crawlers and bots]] that wish to operate on Wikimedia websites</span>
-[[Category:Global policies{{#translation:}}]]
 [[Category:Bots{{#translation:}}]]
-[[Category:Policies maintained by the Wikimedia Foundation{{#translation:}}]]