Policy:User-Agent policy/zh: Difference between revisions
Updating to match new version of source page |
Updating to match new version of source page |
||
(27 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
<languages /> |
<languages />{{DISPLAYTITLE:用戶代理方針}} |
||
{{notice|1=本頁面僅提供反映了當前狀態的信息,如需討論此主題請移步wikitech-l[[:m:Special:MyLanguage/Mailing lists|郵件組]]。}} |
|||
{{notice|1=<div lang="en" dir="ltr" class="mw-content-ltr"> |
|||
{{policy-staff}} |
|||
This page is purely informative, reflecting the current state of affairs. To discuss this topic, please use the wikitech-l [[:m:Special:MyLanguage/Mailing lists|mailing list]]. |
|||
</div>}} |
|||
自2010年2月15日起,維基媒體網站要求所有請求必須包含'''[[{{lwp|User-Agent}}|用户代理]]HTTP-{zh-hans:头字段;zh-hant:頭欄位;}-'''。做出本决定的技術人員在技术邮件列表内讨论并发布<ref>[[mailarchive:wikitech-l/2010-February/thread.html#46764|The Wikitech-l February 2010 Archive by subject]]</ref><ref>[[listarchive:list/wikitech-l@lists.wikimedia.org/thread/R4RU7XTBM5J3BTS6GGQW77NYS2E4WGLI/|User-Agent: - Wikitech-l - lists.wikimedia.org]]</ref>。對此的解釋是,沒有在請求中包含User-Agent字符串的客戶端基本都是運行有錯誤的代碼,它们給服務器造成了巨大的負擔,且对維基項目毫无貢獻。以不具有描述性的默认值开头的用户代理字段,例如<code>python-requests/x</code>也可能被维基媒体网站(或网站的一部分,如<code>api.php</code>页面)屏蔽。 |
|||
<div lang="en" dir="ltr" class="mw-content-ltr"> |
|||
As of February 15, 2010, Wikimedia sites require a '''HTTP [[{{lwp|User-Agent}}|User-Agent]] header''' for all requests. This was an operative decision made by the technical staff and was announced and discussed on the technical mailing list.<ref>[[mailarchive:wikitech-l/2010-February/thread.html#46764|The Wikitech-l February 2010 Archive by subject]]</ref><ref>[[listarchive:list/wikitech-l@lists.wikimedia.org/thread/R4RU7XTBM5J3BTS6GGQW77NYS2E4WGLI/|User-Agent: - Wikitech-l - lists.wikimedia.org]]</ref> The rationale is, that clients that do not send a User-Agent string are mostly ill behaved scripts that cause a lot of load on the servers, without benefiting the projects. User-Agent strings that begin with non-descriptive default values, such as <code>python-requests/x</code>, may also be blocked from Wikimedia sites (or parts of a website, e.g. <code>api.php</code>). |
|||
</div> |
|||
没有包含描述性用户代理字段的请求(如来自浏览器或脚本的请求)可能遇到以下的错误信息: |
|||
<div lang="en" dir="ltr" class="mw-content-ltr"> |
|||
Requests (e.g. from browsers or scripts) that do not send a descriptive User-Agent header, may encounter an error message like this: |
|||
</div> |
|||
:''脚本應該使用可提供信息的User-Agent字符串,並在其中包括聯絡訊息,否則這些脚本所在IP可能在無通知的情況被封禁。'' |
|||
<div lang="en" dir="ltr" class="mw-content-ltr"> |
|||
:''Scripts should use an informative User-Agent string with contact information, or they may be blocked without notice.'' |
|||
</div> |
|||
来自被禁止的用户代理的请求可能会遇到如下不太有用的錯誤訊息: |
|||
<div lang="en" dir="ltr" class="mw-content-ltr"> |
|||
Requests from disallowed user agents may instead encounter a less helpful error message like this: |
|||
</div> |
|||
:''我們的伺服器目前遇到了技術問題。請幾分鐘后再試。'' |
|||
<div lang="en" dir="ltr" class="mw-content-ltr"> |
|||
:''Our servers are currently experiencing a technical problem. Please try again in a few minutes. |
|||
</div> |
|||
這項改動最可能影響通過api.php或其他方式自動訪問Wikimedia的脚本(機械人)和命令行程式。<ref>[[:mw:Special:MyLanguage/API:FAQ|API:FAQ - MediaWiki]]</ref>如果你執行一個機械人,請在請求頭中包含能夠標識此機械人的User-Agent,并且不與很多其他機械人衝突。并且在其中包含你的聯絡方式(例如本地維基上的用戶頁,使用跨維基鏈接語法的相關維基項目的用戶頁,一個相關外部站點的URI,或是電郵地址),舉例如下: |
|||
<div lang="en" dir="ltr" class="mw-content-ltr"> |
|||
This change is most likely to affect scripts (bots) accessing Wikimedia websites such as Wikipedia automatically, via api.php or otherwise, and command line programs.<ref>[[:mw:Special:MyLanguage/API:FAQ|API:FAQ - MediaWiki]]</ref> If you run a bot, please send a User-Agent header identifying the bot with an identifier that isn't going to be confused with many other bots, and supplying some way of contacting you (e.g. a userpage on the local wiki, a userpage on a related wiki using interwiki linking syntax, a URI for a relevant external website, or an email address), e.g.: |
|||
</div> |
|||
<pre> |
<pre> |
||
User-Agent: CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org) generic-library/0.0 |
User-Agent: CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org) generic-library/0.0 |
||
</pre> |
</pre> |
||
⚫ | |||
<div lang="en" dir="ltr" class="mw-content-ltr"> |
|||
⚫ | |||
</div> |
|||
如果你執行一個自動代理,請考慮按照互聯網慣例在User-Agent中包括“bot”(不限大小寫)。這會被Wikimedia的系統識別,被用於將流量分類,并且提供更精確的統計數據。 |
|||
<div lang="en" dir="ltr" class="mw-content-ltr"> |
|||
If you run an automated agent, please consider following the Internet-wide convention of including the string "bot" in the User-Agent string, in any combination of lowercase or uppercase letters. This is recognized by Wikimedia's systems, and used to classify traffic and provide more accurate statistics. |
|||
</div> |
|||
不要拷貝瀏覽器上的的客戶代理字符串,行爲像機械人但是具有瀏覽器的客戶代理的行爲將被視爲有害。<ref>[[mailarchive:wikitech-l/2010-February/046783.html|[Wikitech-l] User-Agent:]]</ref>亦不要使用通用代理,例如“curl”、“lwp”和“Python-urllib”等等。像pywikibot這樣的大型框架有很多用戶在用,僅僅使用“pywikibot”很可能比較模糊。在其中包含具體任務、脚本等細節通常是個好主意,即使這些信息對操作者外的其他人是模糊的。<ref>[[mailarchive:mediawiki-api/2014-July/003308.html|Clarification on what is needed for "identifying the bot" in bot user-agent?]]</ref> |
|||
<div lang="en" dir="ltr" class="mw-content-ltr"> |
|||
Do not copy a browser's user agent for your bot, as bot-like behavior with a browser's user agent will be assumed malicious.<ref>[//lists.wikimedia.org/pipermail/wikitech-l/2010-February/046783.html [Wikitech-l] User-Agent:]</ref> Do not use generic agents such as "curl", "lwp", "Python-urllib", and so on. For large frameworks like pywikibot, there are so many users that just "pywikibot" is likely to be somewhat vague. Including detail about the specific task/script/etc would be a good idea, even if that detail is opaque to anyone besides the operator.<ref>[[mailarchive:mediawiki-api/2014-July/003308.html|title=Clarification on what is needed for "identifying the bot" in bot user-agent?]]</ref> |
|||
</div> |
|||
<div lang="en" dir="ltr" class="mw-content-ltr"> |
<div lang="en" dir="ltr" class="mw-content-ltr"> |
||
Line 47: | Line 28: | ||
</div> |
</div> |
||
以JavaScript編寫的基於瀏覽器的應用通常是强制與宿主瀏覽器的User-Agent頭部相同,這不被視爲違規,然而這些程序應該包含<code>Api-User-Agent</code>頭部來實現合適的用戶代理信息。 |
|||
<div lang="en" dir="ltr" class="mw-content-ltr"> |
|||
Browser-based applications written in JavaScript are typically forced to send the same User-Agent header as the browser that hosts them. This is not a violation of policy, however such applications are encouraged to include the <code>Api-User-Agent</code> header to supply an appropriate agent. |
|||
</div> |
|||
⚫ | |||
<div lang="en" dir="ltr" class="mw-content-ltr"> |
|||
<span id="Code_examples"></span> |
|||
As of 2015, Wikimedia sites do not reject all page views and API requests from clients that do not set a User-Agent header. As such, the requirement is not automatically enforced. Rather, it may be enforced in specific cases as needed. |
|||
== 代码示例 == |
|||
⚫ | |||
<div lang="en" dir="ltr" class="mw-content-ltr"> |
|||
== Code examples == |
|||
</div> |
|||
在维基媒体站点上,如果您不提供<code>User-Agent</code>头字段,或提供了一个空的、通用的字段,您的请求会返回HTTP 403错误。其他的MediaWiki站点可能会有类似的策略。 |
|||
<div lang="en" dir="ltr" class="mw-content-ltr"> |
|||
On Wikimedia wikis, if you don't supply a <code>User-Agent</code> header, or you supply an empty or generic one, your request will fail with an HTTP 403 error. Other MediaWiki installations may have similar policies. |
|||
</div> |
|||
<div lang="en" dir="ltr" class="mw-content-ltr"> |
<div lang="en" dir="ltr" class="mw-content-ltr"> |
||
Line 66: | Line 40: | ||
</div> |
</div> |
||
如果您以基于浏览器的JavaScript脚本调用API,您将无法改变<code>User-Agent</code>头字段:它由浏览器设定。请使用<code>Api-User-Agent</code>头字段解决这一问题。 |
|||
<div lang="en" dir="ltr" class="mw-content-ltr"> |
|||
If you are calling the API from browser-based JavaScript, you won't be able to influence the <code>User-Agent</code> header: the browser will use its own. To work around this, use the <code>Api-User-Agent</code> header: |
|||
</div> |
|||
<syntaxhighlight lang="javascript"> |
<syntaxhighlight lang="javascript"> |
||
Line 115: | Line 87: | ||
</div> |
</div> |
||
在PHP语言中,您可以使用如下的代码识别您的用户代理: |
|||
<div lang="en" dir="ltr" class="mw-content-ltr"> |
|||
In PHP, you can identify your user-agent with code such as this: |
|||
</div> |
|||
<syntaxhighlight lang="php"> |
<syntaxhighlight lang="php"> |
||
Line 127: | Line 97: | ||
</div> |
</div> |
||
或如果您使用[[{{lwp|cURL}}|cURL]]: |
|||
<div lang="en" dir="ltr" class="mw-content-ltr"> |
|||
Or if you use [[wikipedia:cURL|cURL]]: |
|||
</div> |
|||
<syntaxhighlight lang="php"> |
<syntaxhighlight lang="php"> |
||
Line 140: | Line 108: | ||
<div lang="en" dir="ltr" class="mw-content-ltr"> |
<div lang="en" dir="ltr" class="mw-content-ltr"> |
||
In Python, you can use the [[ |
In Python, you can use the [[{{lwp|Requests (software)}}|Requests]] library to set a header: |
||
</div> |
</div> |
||
Line 166: | Line 134: | ||
</syntaxhighlight> |
</syntaxhighlight> |
||
⚫ | |||
<div lang="en" dir="ltr" class="mw-content-ltr"> |
|||
⚫ | |||
</div> |
|||
<references /> |
<references /> |
||
⚫ | |||
<div lang="en" dir="ltr" class="mw-content-ltr"> |
|||
⚫ | |||
* 在维基媒体下属网站上[[wikitech:Robot policy|有关网络爬虫和机器人的方针]] |
|||
</div> |
|||
* <span lang="en" dir="ltr" class="mw-content-ltr">[[wikitech:Robot policy|Policy for crawlers and bots]] that wish to operate on Wikimedia websites</span> |
|||
[[Category:Bots{{#translation:}}]] |
[[Category:Bots{{#translation:}}]] |
Latest revision as of 01:04, 29 March 2024
本頁面僅提供反映了當前狀態的信息,如需討論此主題請移步wikitech-l郵件組。 |
這個方針或規程由維基媒體基金會維護。 Please note that in the event of any differences in meaning or interpretation between the original English version of this content and a translation, the original English version takes precedence. |
維基媒體政策 |
---|
維基媒體項目 |
基金會理事會與成員 |
其他 |
自2010年2月15日起,維基媒體網站要求所有請求必須包含用戶代理HTTP頭欄位。做出本決定的技術人員在技術郵件列表內討論並發布[1][2]。對此的解釋是,沒有在請求中包含User-Agent字符串的客戶端基本都是運行有錯誤的代碼,它們給服務器造成了巨大的負擔,且對維基項目毫無貢獻。以不具有描述性的默認值開頭的用戶代理字段,例如python-requests/x
也可能被維基媒體網站(或網站的一部分,如api.php
頁面)屏蔽。
沒有包含描述性用戶代理字段的請求(如來自瀏覽器或腳本的請求)可能遇到以下的錯誤信息:
- 腳本應該使用可提供信息的User-Agent字符串,並在其中包括聯絡訊息,否則這些腳本所在IP可能在無通知的情況被封禁。
來自被禁止的用戶代理的請求可能會遇到如下不太有用的錯誤訊息:
- 我們的伺服器目前遇到了技術問題。請幾分鐘後再試。
這項改動最可能影響通過api.php或其他方式自動訪問Wikimedia的腳本(機械人)和命令行程式。[3]如果你執行一個機械人,請在請求頭中包含能夠標識此機械人的User-Agent,並且不與很多其他機械人衝突。並且在其中包含你的聯絡方式(例如本地維基上的用戶頁,使用跨維基鏈接語法的相關維基項目的用戶頁,一個相關外部站點的URI,或是電郵地址),舉例如下:
User-Agent: CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org) generic-library/0.0
通常的格式是<client name>/<version> (<contact information>) <library/framework name>/<version> [<library name>/<version> ...]
,可省略其中不適用的部分。
如果你執行一個自動代理,請考慮按照互聯網慣例在User-Agent中包括「bot」(不限大小寫)。這會被Wikimedia的系統識別,被用於將流量分類,並且提供更精確的統計數據。
不要拷貝瀏覽器上的的客戶代理字符串,行爲像機械人但是具有瀏覽器的客戶代理的行爲將被視爲有害。[4]亦不要使用通用代理,例如「curl」、「lwp」和「Python-urllib」等等。像pywikibot這樣的大型框架有很多用戶在用,僅僅使用「pywikibot」很可能比較模糊。在其中包含具體任務、腳本等細節通常是個好主意,即使這些信息對操作者外的其他人是模糊的。[5]
Web browsers generally send a User-Agent string automatically; if you encounter the above error, please refer to your browser's manual to find out how to set the User-Agent string. Note that some plugins or proxies for privacy enhancement may suppress this header. However, for anonymous surfing, it is recommended to send a generic User-Agent string, instead of suppressing it or sending an empty string. Note that other features are much more likely to identify you to a website — if you are interested in protecting your privacy, visit the Cover Your Tracks project.
以JavaScript編寫的基於瀏覽器的應用通常是強制與宿主瀏覽器的User-Agent頭部相同,這不被視爲違規,然而這些程序應該包含Api-User-Agent
頭部來實現合適的用戶代理信息。
自2015年始,維基媒體站點不屏蔽未設置用戶代理頭的頁面訪問和API請求。因此這些要求沒有被自動強制執行,但是如有需要,在某些特定情況下可能被強制執行。[6]
代碼示例
在維基媒體站點上,如果您不提供User-Agent
頭字段,或提供了一個空的、通用的字段,您的請求會返回HTTP 403錯誤。其他的MediaWiki站點可能會有類似的策略。
JavaScript
如果您以基於瀏覽器的JavaScript腳本調用API,您將無法改變User-Agent
頭字段:它由瀏覽器設定。請使用Api-User-Agent
頭字段解決這一問題。
// Using XMLHttpRequest
xhr.setRequestHeader( 'Api-User-Agent', 'Example/1.0' );
// Using jQuery
$.ajax( {
url: 'https://example/...',
data: ...,
dataType: 'json',
type: 'GET',
headers: { 'Api-User-Agent': 'Example/1.0' },
} ).then( function ( data ) {
// ..
} );
// Using mw.Api
var api = new mw.Api( {
ajax: {
headers: { 'Api-User-Agent': 'Example/1.0' }
}
} );
api.get( ... ).then( function ( data ) {
// ...
});
// Using Fetch
fetch( 'https://example/...', {
method: 'GET',
headers: new Headers( {
'Api-User-Agent': 'Example/1.0'
} )
} ).then( function ( response ) {
return response.json();
} ).then( function ( data ) {
// ...
});
PHP
在PHP語言中,您可以使用如下的代碼識別您的用戶代理:
ini_set( 'user_agent', 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)' );
cURL
或如果您使用cURL:
curl_setopt( $curl, CURLOPT_USERAGENT, 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)' );
Python
In Python, you can use the Requests library to set a header:
import requests
url = 'https://example/...'
headers = {'User-Agent': 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)'}
response = requests.get(url, headers=headers)
Or, if you want to use SPARQLWrapper like in https://people.wikimedia.org/~bearloga/notes/wdqs-python.html:
from SPARQLWrapper import SPARQLWrapper, JSON
url = 'https://example/...'
user_agent = 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)'
sparql = SPARQLWrapper(url, agent = user_agent )
results = sparql.query()
備註
- ↑ The Wikitech-l February 2010 Archive by subject
- ↑ User-Agent: - Wikitech-l - lists.wikimedia.org
- ↑ API:FAQ - MediaWiki
- ↑ [Wikitech-l] User-Agent:
- ↑ Clarification on what is needed for "identifying the bot" in bot user-agent?
- ↑ gmane.science.linguistics.wikipedia.technical/83870 (deadlink)
參見
- 在維基媒體下屬網站上有關網絡爬蟲和機器人的方針