Policy:User-Agent policy/zh: Difference between revisions

Content deleted Content added

Inline

Revision as of 00:16, 29 March 2024

This page is purely informative, reflecting the current state of affairs. To discuss this topic, please use the wikitech-l mailing list.

As of February 15, 2010, Wikimedia sites require a HTTP User-Agent header for all requests. This was an operative decision made by the technical staff and was announced and discussed on the technical mailing list.^[1]^[2] The rationale is, that clients that do not send a User-Agent string are mostly ill behaved scripts that cause a lot of load on the servers, without benefiting the projects. User-Agent strings that begin with non-descriptive default values, such as python-requests/x, may also be blocked from Wikimedia sites (or parts of a website, e.g. api.php).

Requests (e.g. from browsers or scripts) that do not send a descriptive User-Agent header, may encounter an error message like this:

Scripts should use an informative User-Agent string with contact information, or they may be blocked without notice.

Requests from disallowed user agents may instead encounter a less helpful error message like this:

Our servers are currently experiencing a technical problem. Please try again in a few minutes.

This change is most likely to affect scripts (bots) accessing Wikimedia websites such as Wikipedia automatically, via api.php or otherwise, and command line programs.^[3] If you run a bot, please send a User-Agent header identifying the bot with an identifier that isn't going to be confused with many other bots, and supplying some way of contacting you (e.g. a userpage on the local wiki, a userpage on a related wiki using interwiki linking syntax, a URI for a relevant external website, or an email address), e.g.:

User-Agent: CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org) generic-library/0.0

The generic format is <client name>/<version> (<contact information>) <library/framework name>/<version> [<library name>/<version> ...]. Parts that are not applicable can be omitted.

If you run an automated agent, please consider following the Internet-wide convention of including the string "bot" in the User-Agent string, in any combination of lowercase or uppercase letters. This is recognized by Wikimedia's systems, and used to classify traffic and provide more accurate statistics.

Do not copy a browser's user agent for your bot, as bot-like behavior with a browser's user agent will be assumed malicious.^[4] Do not use generic agents such as "curl", "lwp", "Python-urllib", and so on. For large frameworks like pywikibot, there are so many users that just "pywikibot" is likely to be somewhat vague. Including detail about the specific task/script/etc would be a good idea, even if that detail is opaque to anyone besides the operator.^[5]

Web browsers generally send a User-Agent string automatically; if you encounter the above error, please refer to your browser's manual to find out how to set the User-Agent string. Note that some plugins or proxies for privacy enhancement may suppress this header. However, for anonymous surfing, it is recommended to send a generic User-Agent string, instead of suppressing it or sending an empty string. Note that other features are much more likely to identify you to a website — if you are interested in protecting your privacy, visit the Cover Your Tracks project.

Browser-based applications written in JavaScript are typically forced to send the same User-Agent header as the browser that hosts them. This is not a violation of policy, however such applications are encouraged to include the Api-User-Agent header to supply an appropriate agent.

As of 2015, Wikimedia sites do not reject all page views and API requests from clients that do not set a User-Agent header. As such, the requirement is not automatically enforced. Rather, it may be enforced in specific cases as needed.

^[6]

Code examples

On Wikimedia wikis, if you don't supply a User-Agent header, or you supply an empty or generic one, your request will fail with an HTTP 403 error. Other MediaWiki installations may have similar policies.

JavaScript

If you are calling the API from browser-based JavaScript, you won't be able to influence the User-Agent header: the browser will use its own. To work around this, use the Api-User-Agent header:

// Using XMLHttpRequest
xhr.setRequestHeader( 'Api-User-Agent', 'Example/1.0' );

// Using jQuery
$.ajax( {
    url: 'https://example/...',
    data: ...,
    dataType: 'json',
    type: 'GET',
    headers: { 'Api-User-Agent': 'Example/1.0' },
} ).then( function ( data )  {
    // ..
} );

// Using mw.Api
var api = new mw.Api( {
    ajax: {
        headers: { 'Api-User-Agent': 'Example/1.0' }
    }
} );
api.get( ... ).then( function ( data ) {
    // ...
});

// Using Fetch
fetch( 'https://example/...', {
    method: 'GET',
    headers: new Headers( {
        'Api-User-Agent': 'Example/1.0'
    } )
} ).then( function ( response ) {
    return response.json();
} ).then( function ( data ) {
    // ...
});

PHP

In PHP, you can identify your user-agent with code such as this:

ini_set( 'user_agent', 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)' );

cURL

Or if you use cURL:

curl_setopt( $curl, CURLOPT_USERAGENT, 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)' );

Python

In Python, you can use the Requests library to set a header:

import requests

url = 'https://example/...'
headers = {'User-Agent': 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)'}

response = requests.get(url, headers=headers)

Or, if you want to use SPARQLWrapper like in https://people.wikimedia.org/~bearloga/notes/wdqs-python.html:

from SPARQLWrapper import SPARQLWrapper, JSON

url = 'https://example/...'
user_agent = 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)'

sparql = SPARQLWrapper(url, agent = user_agent )
results = sparql.query()

Notes

↑ The Wikitech-l February 2010 Archive by subject
↑ User-Agent: - Wikitech-l - lists.wikimedia.org
↑ API:FAQ - MediaWiki
↑ [Wikitech-l] User-Agent:
↑ title=Clarification on what is needed for "identifying the bot" in bot user-agent?
↑ gmane.science.linguistics.wikipedia.technical/83870 (deadlink)

@@ Line 1: / Line 1: @@
 <languages />
+{{notice|1=<div lang="en" dir="ltr" class="mw-content-ltr">
-{{notice|1=本頁面僅提供反映了當前狀態的信息，如需討論此主題請移步wikitech-l[[Special:MyLanguage/Mailing lists|郵件組]]。}}
+This page is purely informative, reflecting the current state of affairs. To discuss this topic, please use the wikitech-l [[:m:Special:MyLanguage/Mailing lists|mailing list]].
+</div>}}
+<div lang="en" dir="ltr" class="mw-content-ltr">
-自2010年2月15日起，維基媒體網站要求所有請求必須包含'''[[w:zh:用户代理|用户代理]]HTTP-{zh-hans:头字段;zh-hant:頭欄位;}-'''。做出本决定的技術人員在技术邮件列表内讨论并发布<ref>[https://lists.wikimedia.org/pipermail/wikitech-l/2010-February/thread.html#46764 The Wikitech-l February 2010 Archive by subject]</ref><ref>[https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/thread/R4RU7XTBM5J3BTS6GGQW77NYS2E4WGLI/ User-Agent: - Wikitech-l - lists.wikimedia.org]</ref>。對此的解釋是，沒有在請求中包含User-Agent字符串的客戶端基本都是運行有錯誤的代碼，它们給服務器造成了巨大的負擔，且对維基項目毫无貢獻。以不具有描述性的默认值开头的用户代理字段，例如<code>python-requests/x</code>也可能被维基媒体网站（或网站的一部分，如<code>api.php</code>页面）屏蔽。
+As of February 15, 2010, Wikimedia sites require a '''HTTP [[{{lwp|User-Agent}}|User-Agent]] header''' for all requests. This was an operative decision made by the technical staff and was announced and discussed on the technical mailing list.<ref>[[mailarchive:wikitech-l/2010-February/thread.html#46764|The Wikitech-l February 2010 Archive by subject]]</ref><ref>[[listarchive:list/wikitech-l@lists.wikimedia.org/thread/R4RU7XTBM5J3BTS6GGQW77NYS2E4WGLI/|User-Agent: - Wikitech-l - lists.wikimedia.org]]</ref> The rationale is, that clients that do not send a User-Agent string are mostly ill behaved scripts that cause a lot of load on the servers, without benefiting the projects. User-Agent strings that begin with non-descriptive default values, such as <code>python-requests/x</code>, may also be blocked from Wikimedia sites (or parts of a website, e.g. <code>api.php</code>).
+</div>
+<div lang="en" dir="ltr" class="mw-content-ltr">
-没有包含描述性用户代理字段的请求（如来自浏览器或脚本的请求）可能遇到以下的错误信息：
+Requests (e.g. from browsers or scripts) that do not send a descriptive User-Agent header, may encounter an error message like this:
+</div>
+<div lang="en" dir="ltr" class="mw-content-ltr">
-:''脚本應該使用可提供信息的User-Agent字符串，並在其中包括聯絡訊息，否則這些脚本所在IP可能在無通知的情況被封禁。''
+:''Scripts should use an informative User-Agent string with contact information, or they may be blocked without notice.''
+</div>
+<div lang="en" dir="ltr" class="mw-content-ltr">
-来自被禁止的用户代理的请求可能会遇到如下不太有用的錯誤訊息：
+Requests from disallowed user agents may instead encounter a less helpful error message like this:
+</div>
+<div lang="en" dir="ltr" class="mw-content-ltr">
-:''我們的伺服器目前遇到了技術問題。請幾分鐘后再試。''
+:''Our servers are currently experiencing a technical problem. Please try again in a few minutes.
+</div>
+<div lang="en" dir="ltr" class="mw-content-ltr">
-這項改動最可能影響通過api.php或其他方式自動訪問Wikimedia的脚本（機械人）和命令行程式。<ref>[//www.mediawiki.org/w/index.php?title=API:FAQ#do_I_get_HTTP_403_errors.3F API:FAQ - MediaWiki]</ref>如果你執行一個機械人，請在請求頭中包含能夠標識此機械人的User-Agent，并且不與很多其他機械人衝突。并且在其中包含你的聯絡方式（例如本地維基上的用戶頁，使用跨維基鏈接語法的相關維基項目的用戶頁，一個相關外部站點的URI，或是電郵地址），舉例如下：
+This change is most likely to affect scripts (bots) accessing Wikimedia websites such as Wikipedia automatically, via api.php or otherwise, and command line programs.<ref>[[:mw:Special:MyLanguage/API:FAQ|API:FAQ - MediaWiki]]</ref> If you run a bot, please send a User-Agent header identifying the bot with an identifier that isn't going to be confused with many other bots, and supplying some way of contacting you (e.g. a userpage on the local wiki, a userpage on a related wiki using interwiki linking syntax, a URI for a relevant external website, or an email address), e.g.:
+</div>
 <pre>
 User-Agent: CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org) generic-library/0.0
 </pre>
+<div lang="en" dir="ltr" class="mw-content-ltr">
-通常的格式是<code><client name>/<version> (<contact information>) <library/framework name>/<version> [<library name>/<version> ...]</code>，可省略其中不适用的部分。
+The generic format is <code><client name>/<version> (<contact information>) <library/framework name>/<version> [<library name>/<version> ...]</code>. Parts that are not applicable can be omitted.
+</div>
+<div lang="en" dir="ltr" class="mw-content-ltr">
-如果你執行一個自動代理，請考慮按照互聯網慣例在User-Agent中包括“bot”（不限大小寫）。這會被Wikimedia的系統識別，被用於將流量分類，并且提供更精確的統計數據。
+If you run an automated agent, please consider following the Internet-wide convention of including the string "bot" in the User-Agent string, in any combination of lowercase or uppercase letters. This is recognized by Wikimedia's systems, and used to classify traffic and provide more accurate statistics.
+</div>
+<div lang="en" dir="ltr" class="mw-content-ltr">
-不要拷貝瀏覽器上的的客戶代理字符串，行爲像機械人但是具有瀏覽器的客戶代理的行爲將被視爲有害。<ref>[//lists.wikimedia.org/pipermail/wikitech-l/2010-February/046783.html [Wikitech-l&#93; User-Agent:]</ref>亦不要使用通用代理，例如“curl”、“lwp”和“Python-urllib”等等。像pywikibot這樣的大型框架有很多用戶在用，僅僅使用“pywikibot”很可能比較模糊。在其中包含具體任務、脚本等細節通常是個好主意，即使這些信息對操作者外的其他人是模糊的。<ref>{{cite web|url=https://lists.wikimedia.org/pipermail/mediawiki-api/2014-July/003308.html|title=Clarification on what is needed for "identifying the bot" in bot user-agent?|publisher=Mediawiki-api|author=Anomie|date=31 July 2014}}</ref>
+Do not copy a browser's user agent for your bot, as bot-like behavior with a browser's user agent will be assumed malicious.<ref>[//lists.wikimedia.org/pipermail/wikitech-l/2010-February/046783.html [Wikitech-l&#93; User-Agent:]</ref> Do not use generic agents such as "curl", "lwp", "Python-urllib", and so on. For large frameworks like pywikibot, there are so many users that just "pywikibot" is likely to be somewhat vague. Including detail about the specific task/script/etc would be a good idea, even if that detail is opaque to anyone besides the operator.<ref>[[mailarchive:mediawiki-api/2014-July/003308.html|title=Clarification on what is needed for "identifying the bot" in bot user-agent?]]</ref>
+</div>
-<div class="mw-translate-fuzzy">
+<div lang="en" dir="ltr" class="mw-content-ltr">
+Web browsers generally send a User-Agent string automatically; if you encounter the above error, please refer to your browser's manual to find out how to set the User-Agent string. Note that some plugins or proxies for privacy enhancement may suppress this header. However, for anonymous surfing, it is recommended to send a generic User-Agent string, instead of suppressing it or sending an empty string. Note that other features are much more likely to identify you to a website — if you are interested in protecting your privacy, visit the [//coveryourtracks.eff.org/ Cover Your Tracks project].
-網路瀏覽器一般會自動包括User-Agent字符串，如果你遭遇上述錯誤，請參閲所使用瀏覽器的用戶手冊修改User-Agent字符串。請注意某些插件或隱私保護代理可能消除這個頭部。建議使用一個一般的User-Agent字符串，而不是消除之或留空。請注意其他特性更有可能被網站用於確定你的身份：如閣下對保護隱私感興趣，請瀏覽[https://coveryourtracks.eff.org/ Cover Your Tracks project]。
 </div>
+<div lang="en" dir="ltr" class="mw-content-ltr">
-以JavaScript編寫的基於瀏覽器的應用通常是强制與宿主瀏覽器的User-Agent頭部相同，這不被視爲違規，然而這些程序應該包含<code>Api-User-Agent</code>頭部來實現合適的用戶代理信息。
+Browser-based applications written in JavaScript are typically forced to send the same User-Agent header as the browser that hosts them. This is not a violation of policy, however such applications are encouraged to include the <code>Api-User-Agent</code> header to supply an appropriate agent.
+</div>
+<div lang="en" dir="ltr" class="mw-content-ltr">
-自2015年始，维基媒体站点不屏蔽未设置用户代理头的页面访问和API请求。因此这些要求没有被自动强制执行，但是如有需要，在某些特定情況下可能被强制执行。<ref>gmane.science.linguistics.wikipedia.technical/83870 ([http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/83870/ deadlink])</ref>
+As of 2015, Wikimedia sites do not reject all page views and API requests from clients that do not set a User-Agent header. As such, the requirement is not automatically enforced. Rather, it may be enforced in specific cases as needed.
-<span id="Code_examples"></span>
+</div><ref>gmane.science.linguistics.wikipedia.technical/83870 ([//thread.gmane.org/gmane.science.linguistics.wikipedia.technical/83870/ deadlink])</ref>
-== 代码示例 ==
+<div lang="en" dir="ltr" class="mw-content-ltr">
+== Code examples ==
+</div>
+<div lang="en" dir="ltr" class="mw-content-ltr">
-在维基媒体站点上，如果您不提供<code>User-Agent</code>头字段，或提供了一个空的、通用的字段，您的请求会返回HTTP 403错误。其他的MediaWiki站点可能会有类似的策略。
+On Wikimedia wikis, if you don't supply a <code>User-Agent</code> header, or you supply an empty or generic one, your request will fail with an HTTP 403 error. Other MediaWiki installations may have similar policies.
+</div>
 <div lang="en" dir="ltr" class="mw-content-ltr">
@@ Line 39: / Line 66: @@
 </div>
+<div lang="en" dir="ltr" class="mw-content-ltr">
-如果您以基于浏览器的JavaScript脚本调用API，您将无法改变<code>User-Agent</code>头字段：它由浏览器设定。请使用<code>Api-User-Agent</code>头字段解决这一问题。
+If you are calling the API from browser-based JavaScript, you won't be able to influence the <code>User-Agent</code> header: the browser will use its own. To work around this, use the <code>Api-User-Agent</code> header:
+</div>
 <syntaxhighlight lang="javascript">
@@ Line 86: / Line 115: @@
 </div>
+<div lang="en" dir="ltr" class="mw-content-ltr">
-在PHP语言中，您可以使用如下的代码识别您的用户代理：
+In PHP, you can identify your user-agent with code such as this:
+</div>
 <syntaxhighlight lang="php">
@@ Line 96: / Line 127: @@
 </div>
+<div lang="en" dir="ltr" class="mw-content-ltr">
-或如果您使用[[w:zh:cURL|cURL]]：
+Or if you use [[wikipedia:cURL|cURL]]:
+</div>
 <syntaxhighlight lang="php">
@@ Line 120: / Line 153: @@
 <div lang="en" dir="ltr" class="mw-content-ltr">
-Or, if you want to use [https://sparqlwrapper.readthedocs.io SPARQLWrapper] like in https://people.wikimedia.org/~bearloga/notes/wdqs-python.html:
+Or, if you want to use [//sparqlwrapper.readthedocs.io SPARQLWrapper] like in https://people.wikimedia.org/~bearloga/notes/wdqs-python.html:
 </div>
@@ Line 134: / Line 167: @@
+<div lang="en" dir="ltr" class="mw-content-ltr">
-<span id="Notes"></span>
-== 備註 ==
+== Notes ==
+</div>
 <references />
+<div lang="en" dir="ltr" class="mw-content-ltr">
-<span id="See_also"></span>
-== 參見 ==
+== See also ==
+</div>
-* 在维基媒体下属网站上[[wikitech:Robot policy|有关网络爬虫和机器人的方针]]
+* <span lang="en" dir="ltr" class="mw-content-ltr">[[wikitech:Robot policy|Policy for crawlers and bots]] that wish to operate on Wikimedia websites</span>
-[[Category:Global policies{{#translation:}}]]
 [[Category:Bots{{#translation:}}]]
-[[Category:Policies maintained by the Wikimedia Foundation{{#translation:}}]]