用戶代理方針

This page is a translated version of the page Policy:User-Agent policy and the translation is 73% complete.

本頁面僅提供反映了當前狀態的信息，如需討論此主題請移步wikitech-l郵件組。

自2010年2月15日起，維基媒體網站要求所有請求必須包含用戶代理HTTP頭欄位。做出本決定的技術人員在技術郵件列表內討論並發布^[1]^[2]。對此的解釋是，沒有在請求中包含User-Agent字符串的客戶端基本都是運行有錯誤的代碼，它們給服務器造成了巨大的負擔，且對維基項目毫無貢獻。以不具有描述性的默認值開頭的用戶代理字段，例如python-requests/x也可能被維基媒體網站（或網站的一部分，如api.php頁面）屏蔽。

沒有包含描述性用戶代理字段的請求（如來自瀏覽器或腳本的請求）可能遇到以下的錯誤信息：

腳本應該使用可提供信息的User-Agent字符串，並在其中包括聯絡訊息，否則這些腳本所在IP可能在無通知的情況被封禁。

來自被禁止的用戶代理的請求可能會遇到如下不太有用的錯誤訊息：

我們的伺服器目前遇到了技術問題。請幾分鐘後再試。

這項改動最可能影響通過api.php或其他方式自動訪問Wikimedia的腳本（機械人）和命令行程式。^[3]如果你執行一個機械人，請在請求頭中包含能夠標識此機械人的User-Agent，並且不與很多其他機械人衝突。並且在其中包含你的聯絡方式（例如本地維基上的用戶頁，使用跨維基鏈接語法的相關維基項目的用戶頁，一個相關外部站點的URI，或是電郵地址），舉例如下：

User-Agent: CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org) generic-library/0.0

通常的格式是<client name>/<version> (<contact information>) <library/framework name>/<version> [<library name>/<version> ...]，可省略其中不適用的部分。

如果你執行一個自動代理，請考慮按照互聯網慣例在User-Agent中包括「bot」（不限大小寫）。這會被Wikimedia的系統識別，被用於將流量分類，並且提供更精確的統計數據。

不要拷貝瀏覽器上的的客戶代理字符串，行爲像機械人但是具有瀏覽器的客戶代理的行爲將被視爲有害。^[4]亦不要使用通用代理，例如「curl」、「lwp」和「Python-urllib」等等。像pywikibot這樣的大型框架有很多用戶在用，僅僅使用「pywikibot」很可能比較模糊。在其中包含具體任務、腳本等細節通常是個好主意，即使這些信息對操作者外的其他人是模糊的。^[5]

Web browsers generally send a User-Agent string automatically; if you encounter the above error, please refer to your browser's manual to find out how to set the User-Agent string. Note that some plugins or proxies for privacy enhancement may suppress this header. However, for anonymous surfing, it is recommended to send a generic User-Agent string, instead of suppressing it or sending an empty string. Note that other features are much more likely to identify you to a website — if you are interested in protecting your privacy, visit the Cover Your Tracks project.

以JavaScript編寫的基於瀏覽器的應用通常是強制與宿主瀏覽器的User-Agent頭部相同，這不被視爲違規，然而這些程序應該包含Api-User-Agent頭部來實現合適的用戶代理信息。

自2015年始，維基媒體站點不屏蔽未設置用戶代理頭的頁面訪問和API請求。因此這些要求沒有被自動強制執行，但是如有需要，在某些特定情況下可能被強制執行。^[6]

代碼示例

在維基媒體站點上，如果您不提供User-Agent頭字段，或提供了一個空的、通用的字段，您的請求會返回HTTP 403錯誤。其他的MediaWiki站點可能會有類似的策略。

JavaScript

如果您以基於瀏覽器的JavaScript腳本調用API，您將無法改變User-Agent頭字段：它由瀏覽器設定。請使用Api-User-Agent頭字段解決這一問題。

// Using XMLHttpRequest
xhr.setRequestHeader( 'Api-User-Agent', 'Example/1.0' );

// Using jQuery
$.ajax( {
    url: 'https://example/...',
    data: ...,
    dataType: 'json',
    type: 'GET',
    headers: { 'Api-User-Agent': 'Example/1.0' },
} ).then( function ( data )  {
    // ..
} );

// Using mw.Api
var api = new mw.Api( {
    ajax: {
        headers: { 'Api-User-Agent': 'Example/1.0' }
    }
} );
api.get( ... ).then( function ( data ) {
    // ...
});

// Using Fetch
fetch( 'https://example/...', {
    method: 'GET',
    headers: new Headers( {
        'Api-User-Agent': 'Example/1.0'
    } )
} ).then( function ( response ) {
    return response.json();
} ).then( function ( data ) {
    // ...
});

PHP

在PHP語言中，您可以使用如下的代碼識別您的用戶代理：

ini_set( 'user_agent', 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)' );

cURL

或如果您使用cURL：

curl_setopt( $curl, CURLOPT_USERAGENT, 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)' );

Python

In Python, you can use the Requests library to set a header:

import requests

url = 'https://example/...'
headers = {'User-Agent': 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)'}

response = requests.get(url, headers=headers)

Or, if you want to use SPARQLWrapper like in https://people.wikimedia.org/~bearloga/notes/wdqs-python.html:

from SPARQLWrapper import SPARQLWrapper, JSON

url = 'https://example/...'
user_agent = 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)'

sparql = SPARQLWrapper(url, agent = user_agent )
results = sparql.query()

備註

↑ The Wikitech-l February 2010 Archive by subject
↑ User-Agent: - Wikitech-l - lists.wikimedia.org
↑ API:FAQ - MediaWiki
↑ [Wikitech-l] User-Agent:
↑ Clarification on what is needed for "identifying the bot" in bot user-agent?
↑ gmane.science.linguistics.wikipedia.technical/83870 (deadlink)

參見

在維基媒體下屬網站上有關網絡爬蟲和機器人的方針

[1] The Wikitech-l February 2010 Archive by subject

[2] User-Agent: - Wikitech-l - lists.wikimedia.org

[3] API:FAQ - MediaWiki

[4] [Wikitech-l] User-Agent:

[5] Clarification on what is needed for "identifying the bot" in bot user-agent?

[6] .science.linguistics.wikipedia.technical/83870 (deadlink)

[1]

[2]

[3]

[4]

[5]

[6]