Policy:User-Agent policy/zh: Difference between revisions

Content deleted Content added

Inline

Revision as of 00:46, 29 March 2024

本頁面僅提供反映了當前狀態的信息，如需討論此主題請移步wikitech-l郵件組。

自2010年2月15日起，維基媒體網站要求所有請求必須包含用戶代理HTTP頭欄位。做出本決定的技術人員在技術郵件列表內討論並發布^[1]^[2]。對此的解釋是，沒有在請求中包含User-Agent字符串的客戶端基本都是運行有錯誤的代碼，它們給服務器造成了巨大的負擔，且對維基項目毫無貢獻。以不具有描述性的默認值開頭的用戶代理字段，例如python-requests/x也可能被維基媒體網站（或網站的一部分，如api.php頁面）屏蔽。

沒有包含描述性用戶代理字段的請求（如來自瀏覽器或腳本的請求）可能遇到以下的錯誤信息：

腳本應該使用可提供信息的User-Agent字符串，並在其中包括聯絡訊息，否則這些腳本所在IP可能在無通知的情況被封禁。

來自被禁止的用戶代理的請求可能會遇到如下不太有用的錯誤訊息：

我們的伺服器目前遇到了技術問題。請幾分鐘後再試。

這項改動最可能影響通過api.php或其他方式自動訪問Wikimedia的腳本（機械人）和命令行程式。^[3]如果你執行一個機械人，請在請求頭中包含能夠標識此機械人的User-Agent，並且不與很多其他機械人衝突。並且在其中包含你的聯絡方式（例如本地維基上的用戶頁，使用跨維基鏈接語法的相關維基項目的用戶頁，一個相關外部站點的URI，或是電郵地址），舉例如下：

User-Agent: CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org) generic-library/0.0

通常的格式是<client name>/<version> (<contact information>) <library/framework name>/<version> [<library name>/<version> ...]，可省略其中不適用的部分。

If you run an automated agent, please consider following the Internet-wide convention of including the string "bot" in the User-Agent string, in any combination of lowercase or uppercase letters. This is recognized by Wikimedia's systems, and used to classify traffic and provide more accurate statistics.

Do not copy a browser's user agent for your bot, as bot-like behavior with a browser's user agent will be assumed malicious.^[4] Do not use generic agents such as "curl", "lwp", "Python-urllib", and so on. For large frameworks like pywikibot, there are so many users that just "pywikibot" is likely to be somewhat vague. Including detail about the specific task/script/etc would be a good idea, even if that detail is opaque to anyone besides the operator.^[5]

Web browsers generally send a User-Agent string automatically; if you encounter the above error, please refer to your browser's manual to find out how to set the User-Agent string. Note that some plugins or proxies for privacy enhancement may suppress this header. However, for anonymous surfing, it is recommended to send a generic User-Agent string, instead of suppressing it or sending an empty string. Note that other features are much more likely to identify you to a website — if you are interested in protecting your privacy, visit the Cover Your Tracks project.

Browser-based applications written in JavaScript are typically forced to send the same User-Agent header as the browser that hosts them. This is not a violation of policy, however such applications are encouraged to include the Api-User-Agent header to supply an appropriate agent.

As of 2015, Wikimedia sites do not reject all page views and API requests from clients that do not set a User-Agent header. As such, the requirement is not automatically enforced. Rather, it may be enforced in specific cases as needed.

^[6]

Code examples

On Wikimedia wikis, if you don't supply a User-Agent header, or you supply an empty or generic one, your request will fail with an HTTP 403 error. Other MediaWiki installations may have similar policies.

JavaScript

If you are calling the API from browser-based JavaScript, you won't be able to influence the User-Agent header: the browser will use its own. To work around this, use the Api-User-Agent header:

// Using XMLHttpRequest
xhr.setRequestHeader( 'Api-User-Agent', 'Example/1.0' );

// Using jQuery
$.ajax( {
    url: 'https://example/...',
    data: ...,
    dataType: 'json',
    type: 'GET',
    headers: { 'Api-User-Agent': 'Example/1.0' },
} ).then( function ( data )  {
    // ..
} );

// Using mw.Api
var api = new mw.Api( {
    ajax: {
        headers: { 'Api-User-Agent': 'Example/1.0' }
    }
} );
api.get( ... ).then( function ( data ) {
    // ...
});

// Using Fetch
fetch( 'https://example/...', {
    method: 'GET',
    headers: new Headers( {
        'Api-User-Agent': 'Example/1.0'
    } )
} ).then( function ( response ) {
    return response.json();
} ).then( function ( data ) {
    // ...
});

PHP

In PHP, you can identify your user-agent with code such as this:

ini_set( 'user_agent', 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)' );

cURL

Or if you use cURL:

curl_setopt( $curl, CURLOPT_USERAGENT, 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)' );

Python

In Python, you can use the Requests library to set a header:

import requests

url = 'https://example/...'
headers = {'User-Agent': 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)'}

response = requests.get(url, headers=headers)

Or, if you want to use SPARQLWrapper like in https://people.wikimedia.org/~bearloga/notes/wdqs-python.html:

from SPARQLWrapper import SPARQLWrapper, JSON

url = 'https://example/...'
user_agent = 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)'

sparql = SPARQLWrapper(url, agent = user_agent )
results = sparql.query()

備註

↑ The Wikitech-l February 2010 Archive by subject
↑ User-Agent: - Wikitech-l - lists.wikimedia.org
↑ API:FAQ - MediaWiki
↑ [Wikitech-l] User-Agent:
↑ title=Clarification on what is needed for "identifying the bot" in bot user-agent?
↑ gmane.science.linguistics.wikipedia.technical/83870 (deadlink)

參見

Policy for crawlers and bots that wish to operate on Wikimedia websites

[1] The Wikitech-l February 2010 Archive by subject

[2] User-Agent: - Wikitech-l - lists.wikimedia.org

[3] API:FAQ - MediaWiki

[4] [Wikitech-l] User-Agent:

[5] title=Clarification on what is needed for "identifying the bot" in bot user-agent?

[6] .science.linguistics.wikipedia.technical/83870 (deadlink)

[1]

[2]

[3]

[4]

[5]

[6]

@@ Line 17: / Line 17: @@
 </pre>
+通常的格式是<code><client name>/<version> (<contact information>) <library/framework name>/<version> [<library name>/<version> ...]</code>，可省略其中不适用的部分。
-<div lang="en" dir="ltr" class="mw-content-ltr">
-The generic format is <code><client name>/<version> (<contact information>) <library/framework name>/<version> [<library name>/<version> ...]</code>. Parts that are not applicable can be omitted.
-</div>
 <div lang="en" dir="ltr" class="mw-content-ltr">