Policy:User-Agent policy/zh: Difference between revisions

Content deleted Content added

Inline

Revision as of 00:50, 29 March 2024

本页面仅提供反映了当前状态的信息，如需讨论此主题请移步wikitech-l邮件组。

自2010年2月15日起，维基媒体网站要求所有请求必须包含用户代理HTTP头字段。做出本决定的技术人员在技术邮件列表内讨论并发布^[1]^[2]。对此的解释是，没有在请求中包含User-Agent字符串的客户端基本都是运行有错误的代码，它们给服务器造成了巨大的负担，且对维基项目毫无贡献。以不具有描述性的默认值开头的用户代理字段，例如python-requests/x也可能被维基媒体网站（或网站的一部分，如api.php页面）屏蔽。

没有包含描述性用户代理字段的请求（如来自浏览器或脚本的请求）可能遇到以下的错误信息：

脚本应该使用可提供信息的User-Agent字符串，并在其中包括联络讯息，否则这些脚本所在IP可能在无通知的情况被封禁。

来自被禁止的用户代理的请求可能会遇到如下不太有用的错误讯息：

我们的伺服器目前遇到了技术问题。请几分钟后再试。

这项改动最可能影响通过api.php或其他方式自动访问Wikimedia的脚本（机械人）和命令行程式。^[3]如果你执行一个机械人，请在请求头中包含能够标识此机械人的User-Agent，并且不与很多其他机械人冲突。并且在其中包含你的联络方式（例如本地维基上的用户页，使用跨维基链接语法的相关维基项目的用户页，一个相关外部站点的URI，或是电邮地址），举例如下：

User-Agent: CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org) generic-library/0.0

通常的格式是<client name>/<version> (<contact information>) <library/framework name>/<version> [<library name>/<version> ...]，可省略其中不适用的部分。

如果你执行一个自动代理，请考虑按照互联网惯例在User-Agent中包括“bot”（不限大小写）。这会被Wikimedia的系统识别，被用于将流量分类，并且提供更精确的统计数据。

不要拷贝浏览器上的的客户代理字符串，行为像机械人但是具有浏览器的客户代理的行为将被视为有害。^[4]亦不要使用通用代理，例如“curl”、“lwp”和“Python-urllib”等等。像pywikibot这样的大型框架有很多用户在用，仅仅使用“pywikibot”很可能比较模糊。在其中包含具体任务、脚本等细节通常是个好主意，即使这些信息对操作者外的其他人是模糊的。^[5]

Web browsers generally send a User-Agent string automatically; if you encounter the above error, please refer to your browser's manual to find out how to set the User-Agent string. Note that some plugins or proxies for privacy enhancement may suppress this header. However, for anonymous surfing, it is recommended to send a generic User-Agent string, instead of suppressing it or sending an empty string. Note that other features are much more likely to identify you to a website — if you are interested in protecting your privacy, visit the Cover Your Tracks project.

以JavaScript编写的基于浏览器的应用通常是强制与宿主浏览器的User-Agent头部相同，这不被视为违规，然而这些程序应该包含Api-User-Agent头部来实现合适的用户代理信息。

自2015年始，维基媒体站点不屏蔽未设置用户代理头的页面访问和API请求。因此这些要求没有被自动强制执行，但是如有需要，在某些特定情况下可能被强制执行。^[6]

代码示例

在维基媒体站点上，如果您不提供User-Agent头字段，或提供了一个空的、通用的字段，您的请求会返回HTTP 403错误。其他的MediaWiki站点可能会有类似的策略。

JavaScript

如果您以基于浏览器的JavaScript脚本调用API，您将无法改变User-Agent头字段：它由浏览器设定。请使用Api-User-Agent头字段解决这一问题。

// Using XMLHttpRequest
xhr.setRequestHeader( 'Api-User-Agent', 'Example/1.0' );

// Using jQuery
$.ajax( {
    url: 'https://example/...',
    data: ...,
    dataType: 'json',
    type: 'GET',
    headers: { 'Api-User-Agent': 'Example/1.0' },
} ).then( function ( data )  {
    // ..
} );

// Using mw.Api
var api = new mw.Api( {
    ajax: {
        headers: { 'Api-User-Agent': 'Example/1.0' }
    }
} );
api.get( ... ).then( function ( data ) {
    // ...
});

// Using Fetch
fetch( 'https://example/...', {
    method: 'GET',
    headers: new Headers( {
        'Api-User-Agent': 'Example/1.0'
    } )
} ).then( function ( response ) {
    return response.json();
} ).then( function ( data ) {
    // ...
});

PHP

在PHP语言中，您可以使用如下的代码识别您的用户代理：

ini_set( 'user_agent', 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)' );

cURL

或如果您使用cURL：

curl_setopt( $curl, CURLOPT_USERAGENT, 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)' );

Python

In Python, you can use the Requests library to set a header:

import requests

url = 'https://example/...'
headers = {'User-Agent': 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)'}

response = requests.get(url, headers=headers)

Or, if you want to use SPARQLWrapper like in https://people.wikimedia.org/~bearloga/notes/wdqs-python.html:

from SPARQLWrapper import SPARQLWrapper, JSON

url = 'https://example/...'
user_agent = 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)'

sparql = SPARQLWrapper(url, agent = user_agent )
results = sparql.query()

备注

↑ The Wikitech-l February 2010 Archive by subject
↑ User-Agent: - Wikitech-l - lists.wikimedia.org
↑ API:FAQ - MediaWiki
↑ [Wikitech-l]|User-Agent:
↑ Clarification on what is needed for "identifying the bot" in bot user-agent?
↑ gmane.science.linguistics.wikipedia.technical/83870 (deadlink)

参见

在维基媒体下属网站上有关网络爬虫和机器人的方针

[1] The Wikitech-l February 2010 Archive by subject

[2] User-Agent: - Wikitech-l - lists.wikimedia.org

[3] API:FAQ - MediaWiki

[4] [Wikitech-l]|User-Agent:

[5] Clarification on what is needed for "identifying the bot" in bot user-agent?

[6] .science.linguistics.wikipedia.technical/83870 (deadlink)

[1]

[2]

[3]

[4]

[5]

[6]

@@ Line 138: / Line 138: @@
 == {{int string|See also}} ==
+* 在维基媒体下属网站上[[wikitech:Robot policy|有关网络爬虫和机器人的方针]]
-* <span lang="en" dir="ltr" class="mw-content-ltr">[[wikitech:Robot policy|Policy for crawlers and bots]] that wish to operate on Wikimedia websites</span>
 [[Category:Bots{{#translation:}}]]