Policy:User-Agent policy/zh: Difference between revisions

Content deleted Content added

Inline

Revision as of 00:48, 29 March 2024

本页面仅提供反映了当前状态的信息，如需讨论此主题请移步wikitech-l邮件组。

自2010年2月15日起，维基媒体网站要求所有请求必须包含用户代理HTTP头字段。做出本决定的技术人员在技术邮件列表内讨论并发布^[1]^[2]。对此的解释是，没有在请求中包含User-Agent字符串的客户端基本都是运行有错误的代码，它们给服务器造成了巨大的负担，且对维基项目毫无贡献。以不具有描述性的默认值开头的用户代理字段，例如python-requests/x也可能被维基媒体网站（或网站的一部分，如api.php页面）屏蔽。

没有包含描述性用户代理字段的请求（如来自浏览器或脚本的请求）可能遇到以下的错误信息：

脚本应该使用可提供信息的User-Agent字符串，并在其中包括联络讯息，否则这些脚本所在IP可能在无通知的情况被封禁。

来自被禁止的用户代理的请求可能会遇到如下不太有用的错误讯息：

我们的伺服器目前遇到了技术问题。请几分钟后再试。

这项改动最可能影响通过api.php或其他方式自动访问Wikimedia的脚本（机械人）和命令行程式。^[3]如果你执行一个机械人，请在请求头中包含能够标识此机械人的User-Agent，并且不与很多其他机械人冲突。并且在其中包含你的联络方式（例如本地维基上的用户页，使用跨维基链接语法的相关维基项目的用户页，一个相关外部站点的URI，或是电邮地址），举例如下：

User-Agent: CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org) generic-library/0.0

通常的格式是<client name>/<version> (<contact information>) <library/framework name>/<version> [<library name>/<version> ...]，可省略其中不适用的部分。

如果你执行一个自动代理，请考虑按照互联网惯例在User-Agent中包括“bot”（不限大小写）。这会被Wikimedia的系统识别，被用于将流量分类，并且提供更精确的统计数据。

不要拷贝浏览器上的的客户代理字符串，行为像机械人但是具有浏览器的客户代理的行为将被视为有害。^[4]亦不要使用通用代理，例如“curl”、“lwp”和“Python-urllib”等等。像pywikibot这样的大型框架有很多用户在用，仅仅使用“pywikibot”很可能比较模糊。在其中包含具体任务、脚本等细节通常是个好主意，即使这些信息对操作者外的其他人是模糊的。^[5]

Web browsers generally send a User-Agent string automatically; if you encounter the above error, please refer to your browser's manual to find out how to set the User-Agent string. Note that some plugins or proxies for privacy enhancement may suppress this header. However, for anonymous surfing, it is recommended to send a generic User-Agent string, instead of suppressing it or sending an empty string. Note that other features are much more likely to identify you to a website — if you are interested in protecting your privacy, visit the Cover Your Tracks project.

Browser-based applications written in JavaScript are typically forced to send the same User-Agent header as the browser that hosts them. This is not a violation of policy, however such applications are encouraged to include the Api-User-Agent header to supply an appropriate agent.

As of 2015, Wikimedia sites do not reject all page views and API requests from clients that do not set a User-Agent header. As such, the requirement is not automatically enforced. Rather, it may be enforced in specific cases as needed.

^[6]

Code examples

On Wikimedia wikis, if you don't supply a User-Agent header, or you supply an empty or generic one, your request will fail with an HTTP 403 error. Other MediaWiki installations may have similar policies.

JavaScript

If you are calling the API from browser-based JavaScript, you won't be able to influence the User-Agent header: the browser will use its own. To work around this, use the Api-User-Agent header:

// Using XMLHttpRequest
xhr.setRequestHeader( 'Api-User-Agent', 'Example/1.0' );

// Using jQuery
$.ajax( {
    url: 'https://example/...',
    data: ...,
    dataType: 'json',
    type: 'GET',
    headers: { 'Api-User-Agent': 'Example/1.0' },
} ).then( function ( data )  {
    // ..
} );

// Using mw.Api
var api = new mw.Api( {
    ajax: {
        headers: { 'Api-User-Agent': 'Example/1.0' }
    }
} );
api.get( ... ).then( function ( data ) {
    // ...
});

// Using Fetch
fetch( 'https://example/...', {
    method: 'GET',
    headers: new Headers( {
        'Api-User-Agent': 'Example/1.0'
    } )
} ).then( function ( response ) {
    return response.json();
} ).then( function ( data ) {
    // ...
});

PHP

In PHP, you can identify your user-agent with code such as this:

ini_set( 'user_agent', 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)' );

cURL

Or if you use cURL:

curl_setopt( $curl, CURLOPT_USERAGENT, 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)' );

Python

In Python, you can use the Requests library to set a header:

import requests

url = 'https://example/...'
headers = {'User-Agent': 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)'}

response = requests.get(url, headers=headers)

Or, if you want to use SPARQLWrapper like in https://people.wikimedia.org/~bearloga/notes/wdqs-python.html:

from SPARQLWrapper import SPARQLWrapper, JSON

url = 'https://example/...'
user_agent = 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)'

sparql = SPARQLWrapper(url, agent = user_agent )
results = sparql.query()

备注

↑ The Wikitech-l February 2010 Archive by subject
↑ User-Agent: - Wikitech-l - lists.wikimedia.org
↑ API:FAQ - MediaWiki
↑ [Wikitech-l]|User-Agent:
↑ Clarification on what is needed for "identifying the bot" in bot user-agent?
↑ gmane.science.linguistics.wikipedia.technical/83870 (deadlink)

参见

Policy for crawlers and bots that wish to operate on Wikimedia websites

[1] The Wikitech-l February 2010 Archive by subject

[2] User-Agent: - Wikitech-l - lists.wikimedia.org

[3] API:FAQ - MediaWiki

[4] [Wikitech-l]|User-Agent:

[5] Clarification on what is needed for "identifying the bot" in bot user-agent?

[6] .science.linguistics.wikipedia.technical/83870 (deadlink)

[1]

[2]

[3]

[4]

[5]

[6]

@@ Line 21: / Line 21: @@
 如果你執行一個自動代理，請考慮按照互聯網慣例在User-Agent中包括“bot”（不限大小寫）。這會被Wikimedia的系統識別，被用於將流量分類，并且提供更精確的統計數據。
+不要拷貝瀏覽器上的的客戶代理字符串，行爲像機械人但是具有瀏覽器的客戶代理的行爲將被視爲有害。<ref>[[mailarchive:wikitech-l/2010-February/046783.html|[Wikitech-l&#93;|User-Agent:]]</ref>亦不要使用通用代理，例如“curl”、“lwp”和“Python-urllib”等等。像pywikibot這樣的大型框架有很多用戶在用，僅僅使用“pywikibot”很可能比較模糊。在其中包含具體任務、脚本等細節通常是個好主意，即使這些信息對操作者外的其他人是模糊的。<ref>[[mailarchive:mediawiki-api/2014-July/003308.html|Clarification on what is needed for "identifying the bot" in bot user-agent?]]</ref>
-<div class="mw-translate-fuzzy">
-不要拷貝瀏覽器上的的客戶代理字符串，行爲像機械人但是具有瀏覽器的客戶代理的行爲將被視爲有害。<ref>[mailarchive:wikitech-l/2010-February/046783.html|[Wikitech-l&#93; User-Agent:]</ref>亦不要使用通用代理，例如“curl”、“lwp”和“Python-urllib”等等。像pywikibot這樣的大型框架有很多用戶在用，僅僅使用“pywikibot”很可能比較模糊。在其中包含具體任務、脚本等細節通常是個好主意，即使這些信息對操作者外的其他人是模糊的。<ref>[[mailarchive:mediawiki-api/2014-July/003308.html|Clarification on what is needed for "identifying the bot" in bot user-agent?]]</ref>
-</div>
 <div lang="en" dir="ltr" class="mw-content-ltr">