Policy:User-Agent policy/zh: Difference between revisions

Content deleted Content added

Inline

Revision as of 00:49, 29 March 2024

本頁面僅提供反映了當前狀態的信息，如需討論此主題請移步wikitech-l郵件組。

自2010年2月15日起，維基媒體網站要求所有請求必須包含用户代理HTTP头字段。做出本决定的技術人員在技术邮件列表内讨论并发布^[1]^[2]。對此的解釋是，沒有在請求中包含User-Agent字符串的客戶端基本都是運行有錯誤的代碼，它们給服務器造成了巨大的負擔，且对維基項目毫无貢獻。以不具有描述性的默认值开头的用户代理字段，例如python-requests/x也可能被维基媒体网站（或网站的一部分，如api.php页面）屏蔽。

没有包含描述性用户代理字段的请求（如来自浏览器或脚本的请求）可能遇到以下的错误信息：

脚本應該使用可提供信息的User-Agent字符串，並在其中包括聯絡訊息，否則這些脚本所在IP可能在無通知的情況被封禁。

来自被禁止的用户代理的请求可能会遇到如下不太有用的錯誤訊息：

我們的伺服器目前遇到了技術問題。請幾分鐘后再試。

這項改動最可能影響通過api.php或其他方式自動訪問Wikimedia的脚本（機械人）和命令行程式。^[3]如果你執行一個機械人，請在請求頭中包含能夠標識此機械人的User-Agent，并且不與很多其他機械人衝突。并且在其中包含你的聯絡方式（例如本地維基上的用戶頁，使用跨維基鏈接語法的相關維基項目的用戶頁，一個相關外部站點的URI，或是電郵地址），舉例如下：

User-Agent: CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org) generic-library/0.0

通常的格式是<client name>/<version> (<contact information>) <library/framework name>/<version> [<library name>/<version> ...]，可省略其中不适用的部分。

如果你執行一個自動代理，請考慮按照互聯網慣例在User-Agent中包括“bot”（不限大小寫）。這會被Wikimedia的系統識別，被用於將流量分類，并且提供更精確的統計數據。

不要拷貝瀏覽器上的的客戶代理字符串，行爲像機械人但是具有瀏覽器的客戶代理的行爲將被視爲有害。^[4]亦不要使用通用代理，例如“curl”、“lwp”和“Python-urllib”等等。像pywikibot這樣的大型框架有很多用戶在用，僅僅使用“pywikibot”很可能比較模糊。在其中包含具體任務、脚本等細節通常是個好主意，即使這些信息對操作者外的其他人是模糊的。^[5]

Web browsers generally send a User-Agent string automatically; if you encounter the above error, please refer to your browser's manual to find out how to set the User-Agent string. Note that some plugins or proxies for privacy enhancement may suppress this header. However, for anonymous surfing, it is recommended to send a generic User-Agent string, instead of suppressing it or sending an empty string. Note that other features are much more likely to identify you to a website — if you are interested in protecting your privacy, visit the Cover Your Tracks project.

以JavaScript編寫的基於瀏覽器的應用通常是强制與宿主瀏覽器的User-Agent頭部相同，這不被視爲違規，然而這些程序應該包含Api-User-Agent頭部來實現合適的用戶代理信息。

自2015年始，维基媒体站点不屏蔽未设置用户代理头的页面访问和API请求。因此这些要求没有被自动强制执行，但是如有需要，在某些特定情況下可能被强制执行。^[6]

代码示例

On Wikimedia wikis, if you don't supply a User-Agent header, or you supply an empty or generic one, your request will fail with an HTTP 403 error. Other MediaWiki installations may have similar policies.

JavaScript

If you are calling the API from browser-based JavaScript, you won't be able to influence the User-Agent header: the browser will use its own. To work around this, use the Api-User-Agent header:

// Using XMLHttpRequest
xhr.setRequestHeader( 'Api-User-Agent', 'Example/1.0' );

// Using jQuery
$.ajax( {
    url: 'https://example/...',
    data: ...,
    dataType: 'json',
    type: 'GET',
    headers: { 'Api-User-Agent': 'Example/1.0' },
} ).then( function ( data )  {
    // ..
} );

// Using mw.Api
var api = new mw.Api( {
    ajax: {
        headers: { 'Api-User-Agent': 'Example/1.0' }
    }
} );
api.get( ... ).then( function ( data ) {
    // ...
});

// Using Fetch
fetch( 'https://example/...', {
    method: 'GET',
    headers: new Headers( {
        'Api-User-Agent': 'Example/1.0'
    } )
} ).then( function ( response ) {
    return response.json();
} ).then( function ( data ) {
    // ...
});

PHP

In PHP, you can identify your user-agent with code such as this:

ini_set( 'user_agent', 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)' );

cURL

Or if you use cURL:

curl_setopt( $curl, CURLOPT_USERAGENT, 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)' );

Python

In Python, you can use the Requests library to set a header:

import requests

url = 'https://example/...'
headers = {'User-Agent': 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)'}

response = requests.get(url, headers=headers)

Or, if you want to use SPARQLWrapper like in https://people.wikimedia.org/~bearloga/notes/wdqs-python.html:

from SPARQLWrapper import SPARQLWrapper, JSON

url = 'https://example/...'
user_agent = 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)'

sparql = SPARQLWrapper(url, agent = user_agent )
results = sparql.query()

备注

↑ The Wikitech-l February 2010 Archive by subject
↑ User-Agent: - Wikitech-l - lists.wikimedia.org
↑ API:FAQ - MediaWiki
↑ [Wikitech-l]|User-Agent:
↑ Clarification on what is needed for "identifying the bot" in bot user-agent?
↑ gmane.science.linguistics.wikipedia.technical/83870 (deadlink)

参见

Policy for crawlers and bots that wish to operate on Wikimedia websites

[1] The Wikitech-l February 2010 Archive by subject

[2] User-Agent: - Wikitech-l - lists.wikimedia.org

[3] API:FAQ - MediaWiki

[4] [Wikitech-l]|User-Agent:

[5] Clarification on what is needed for "identifying the bot" in bot user-agent?

[6] .science.linguistics.wikipedia.technical/83870 (deadlink)

[1]

[2]

[3]

[4]

[5]

[6]

@@ Line 30: / Line 30: @@
 自2015年始，维基媒体站点不屏蔽未设置用户代理头的页面访问和API请求。因此这些要求没有被自动强制执行，但是如有需要，在某些特定情況下可能被强制执行。<ref>gmane.science.linguistics.wikipedia.technical/83870 ([//thread.gmane.org/gmane.science.linguistics.wikipedia.technical/83870/ deadlink])</ref>
+<span id="Code_examples"></span>
-<div lang="en" dir="ltr" class="mw-content-ltr">
-== Code examples ==
+== 代码示例 ==
-</div>
 <div lang="en" dir="ltr" class="mw-content-ltr">