Policy:User-Agent policy: Difference between revisions

From Wikimedia Foundation Governance Wiki
Content deleted Content added
Krinkle (talk | contribs)
mNo edit summary
tvar update
Line 1: Line 1:
<languages />
<languages />
{{notice|<translate><!--T:1-->
{{notice|<translate><!--T:1-->
This page is purely informative, reflecting the current state of affairs. To discuss this topic, please use the wikitech-l [[<tvar|mail-lists>Special:MyLanguage/Mailing lists</>|mailing list]].</translate>}}
This page is purely informative, reflecting the current state of affairs. To discuss this topic, please use the wikitech-l [[<tvar name="mail-lists">Special:MyLanguage/Mailing lists</tvar>|mailing list]].</translate>}}


<translate>
<translate>
<!--T:2-->
<!--T:2-->
As of February 15, 2010, Wikimedia sites require a '''HTTP [[w:User-Agent|User-Agent]] header''' for all requests. This was an operative decision made by the technical staff and was announced and discussed on the technical mailing list.<ref>[<tvar|ref1url>https://lists.wikimedia.org/pipermail/wikitech-l/2010-February/thread.html#46764</> The Wikitech-l February 2010 Archive by subject]</ref><ref>[<tvar|ref2url>http://www.gossamer-threads.com/lists/wiki/wikitech/189275</> User-Agent: | Wikipedia | Wikitech]</ref> The rationale is, that clients that do not send a User-Agent string are mostly ill behaved scripts that cause a lot of load on the servers, without benefiting the projects. User-Agent strings that begin with non-descriptive default values, such <code>python-requests/x</code>, may also be blocked from Wikimedia sites (or parts of a website, e.g. <code>api.php</code>).
As of February 15, 2010, Wikimedia sites require a '''HTTP [[w:User-Agent|User-Agent]] header''' for all requests. This was an operative decision made by the technical staff and was announced and discussed on the technical mailing list.<ref>[<tvar name="ref1url">https://lists.wikimedia.org/pipermail/wikitech-l/2010-February/thread.html#46764</tvar> The Wikitech-l February 2010 Archive by subject]</ref><ref>[<tvar name="ref2url">http://www.gossamer-threads.com/lists/wiki/wikitech/189275</tvar> User-Agent: | Wikipedia | Wikitech]</ref> The rationale is, that clients that do not send a User-Agent string are mostly ill behaved scripts that cause a lot of load on the servers, without benefiting the projects. User-Agent strings that begin with non-descriptive default values, such <code>python-requests/x</code>, may also be blocked from Wikimedia sites (or parts of a website, e.g. <code>api.php</code>).


<!--T:3-->
<!--T:3-->
Line 20: Line 20:


<!--T:7-->
<!--T:7-->
This change is most likely to affect scripts (bots) accessing Wikimedia websites such as Wikipedia automatically, via api.php or otherwise, and command line programs.<ref>[<tvar|ref3url>//www.mediawiki.org/w/index.php?title=API:FAQ#do_I_get_HTTP_403_errors.3F</> API:FAQ - MediaWiki]</ref> If you run a bot, please send a User-Agent header identifying the bot with an identifier that isn't going to be confused with many other bots, and supplying some way of contacting you (e.g. a userpage on the local wiki, a userpage on a related wiki using interwiki linking syntax, a URI for a relevant external website, or an email address), e.g.:</translate>
This change is most likely to affect scripts (bots) accessing Wikimedia websites such as Wikipedia automatically, via api.php or otherwise, and command line programs.<ref>[<tvar name="ref3url">//www.mediawiki.org/w/index.php?title=API:FAQ#do_I_get_HTTP_403_errors.3F</tvar> API:FAQ - MediaWiki]</ref> If you run a bot, please send a User-Agent header identifying the bot with an identifier that isn't going to be confused with many other bots, and supplying some way of contacting you (e.g. a userpage on the local wiki, a userpage on a related wiki using interwiki linking syntax, a URI for a relevant external website, or an email address), e.g.:</translate>
<pre>
<pre>
User-Agent: CoolTool/0.0 (https://example.org/cool-tool/; cool-tool@example.org) generic-library/0.0
User-Agent: CoolTool/0.0 (https://example.org/cool-tool/; cool-tool@example.org) generic-library/0.0
Line 32: Line 32:


<!--T:8-->
<!--T:8-->
Do not copy a browser's user agent for your bot, as bot-like behavior with a browser's user agent will be assumed malicious.<ref>[<tvar|ref4url>//lists.wikimedia.org/pipermail/wikitech-l/2010-February/046783.html [Wikitech-l&#93;</> User-Agent:]</ref> Do not use generic agents such as "curl", "lwp", "Python-urllib", and so on. For large frameworks like pywikibot, there are so many users that just "pywikibot" is likely to be somewhat vague. Including detail about the specific task/script/etc would be a good idea, even if that detail is opaque to anyone besides the operator.<ref>{{cite web|url=<tvar|ref5url>http://lists.wikimedia.org/pipermail/mediawiki-api/2014-July/003308.html</>|title=Clarification on what is needed for "identifying the bot" in bot user-agent?|publisher=Mediawiki-api|author=Anomie|date=31 July 2014}}</ref>
Do not copy a browser's user agent for your bot, as bot-like behavior with a browser's user agent will be assumed malicious.<ref>[<tvar name="ref4url">//lists.wikimedia.org/pipermail/wikitech-l/2010-February/046783.html [Wikitech-l&#93;</tvar> User-Agent:]</ref> Do not use generic agents such as "curl", "lwp", "Python-urllib", and so on. For large frameworks like pywikibot, there are so many users that just "pywikibot" is likely to be somewhat vague. Including detail about the specific task/script/etc would be a good idea, even if that detail is opaque to anyone besides the operator.<ref>{{cite web|url=<tvar name="ref5url">http://lists.wikimedia.org/pipermail/mediawiki-api/2014-July/003308.html</tvar>|title=Clarification on what is needed for "identifying the bot" in bot user-agent?|publisher=Mediawiki-api|author=Anomie|date=31 July 2014}}</ref>


<!--T:9-->
<!--T:9-->
For more information, please refer to the [[mw:API:Main page#Identifying your client|MediaWiki API Documentation]].<ref>As an example (among [[<tvar|api-quick>mw:API:Quick_start_guide#Identifying_your_client</>|other examples]]) of how to set a user-agent, in PHP, one [<tvar|php-url>http://php.net/manual/en/function.curl-setopt.php</> might use] the following, if one's cURL handle is <tvar|handle-example><code>$ch</code>:<syntaxhighlight lang="php">curl_setopt($ch, CURLOPT_USERAGENT , 'CoolTool/0.0 (https://example.org/cool-tool/; cool-tool@example.org) generic-library/0.0');</syntaxhighlight></></ref>
For more information, please refer to the [[mw:API:Main page#Identifying your client|MediaWiki API Documentation]].<ref>As an example (among [[<tvar name="api-quick">mw:API:Quick_start_guide#Identifying_your_client</tvar>|other examples]]) of how to set a user-agent, in PHP, one [<tvar name="php-url">http://php.net/manual/en/function.curl-setopt.php</tvar> might use] the following, if one's cURL handle is <tvar name="handle-example"><code>$ch</code>:<syntaxhighlight lang="php">curl_setopt($ch, CURLOPT_USERAGENT , 'CoolTool/0.0 (https://example.org/cool-tool/; cool-tool@example.org) generic-library/0.0');</syntaxhighlight></tvar></ref>


<!--T:10-->
<!--T:10-->
Web browsers generally send a User-Agent string automatically; if you encounter the above error, please refer to your browser's manual to find out how to set the User-Agent string. Note that some plugins or proxies for privacy enhancement may suppress this header. However, for anonymous surfing, it is recommended to send a generic User-Agent string, instead of suppressing it or sending an empty string. Note that other features are much more likely to identify you to a website — if you are interested in protecting your privacy, visit the [<tvar|eff-url>https://panopticlick.eff.org/</> Panopticlick project].
Web browsers generally send a User-Agent string automatically; if you encounter the above error, please refer to your browser's manual to find out how to set the User-Agent string. Note that some plugins or proxies for privacy enhancement may suppress this header. However, for anonymous surfing, it is recommended to send a generic User-Agent string, instead of suppressing it or sending an empty string. Note that other features are much more likely to identify you to a website — if you are interested in protecting your privacy, visit the [<tvar name="eff-url">https://panopticlick.eff.org/</tvar> Panopticlick project].


<!--T:11-->
<!--T:11-->
Browser-based applications written in Flash or JavaScript are typically forced to send the same User-Agent header as the browser that hosts them. This is not a violation of policy, however such applications are encouraged to include the <tvar|header-code><code>Api-User-Agent</code></> header to supply an appropriate agent.
Browser-based applications written in Flash or JavaScript are typically forced to send the same User-Agent header as the browser that hosts them. This is not a violation of policy, however such applications are encouraged to include the <tvar name="header-code"><code>Api-User-Agent</code></tvar> header to supply an appropriate agent.


<!--T:13-->
<!--T:13-->
Line 53: Line 53:
== See also == <!--T:15-->
== See also == <!--T:15-->
</translate>
</translate>
* <translate><!--T:16--> [[<tvar|policy>wikitech:Robot policy</>|Policy for crawlers and bots]] that wish to operate on Wikimedia websites</translate>
* <translate><!--T:16--> [[<tvar name="policy">wikitech:Robot policy</tvar>|Policy for crawlers and bots]] that wish to operate on Wikimedia websites</translate>


[[Category:Global policies{{#translation:}}]]
[[Category:Global policies{{#translation:}}]]

Revision as of 11:58, 3 July 2021

As of February 15, 2010, Wikimedia sites require a HTTP User-Agent header for all requests. This was an operative decision made by the technical staff and was announced and discussed on the technical mailing list.[1][2] The rationale is, that clients that do not send a User-Agent string are mostly ill behaved scripts that cause a lot of load on the servers, without benefiting the projects. User-Agent strings that begin with non-descriptive default values, such python-requests/x, may also be blocked from Wikimedia sites (or parts of a website, e.g. api.php).

Requests (e.g. from browsers or scripts) that do not send a descriptive User-Agent header, may encounter an error message like this:

Scripts should use an informative User-Agent string with contact information, or they may be blocked without notice.

Requests from disallowed user agents may instead encounter a less helpful error message like this:

Our servers are currently experiencing a technical problem. Please try again in a few minutes.

This change is most likely to affect scripts (bots) accessing Wikimedia websites such as Wikipedia automatically, via api.php or otherwise, and command line programs.[3] If you run a bot, please send a User-Agent header identifying the bot with an identifier that isn't going to be confused with many other bots, and supplying some way of contacting you (e.g. a userpage on the local wiki, a userpage on a related wiki using interwiki linking syntax, a URI for a relevant external website, or an email address), e.g.:

User-Agent: CoolTool/0.0 (https://example.org/cool-tool/; cool-tool@example.org) generic-library/0.0

The generic format is <client name>/<version> (<contact information>) <library/framework name>/<version> [<library name>/<version> ...]. Parts that are not applicable can be omitted.

If you run an automated agent, please consider following the Internet-wide convention of including the string "bot" in the User-Agent string, in any combination of lowercase or uppercase letters. This is recognized by Wikimedia's systems, and used to classify traffic and provide more accurate statistics.

Do not copy a browser's user agent for your bot, as bot-like behavior with a browser's user agent will be assumed malicious.[4] Do not use generic agents such as "curl", "lwp", "Python-urllib", and so on. For large frameworks like pywikibot, there are so many users that just "pywikibot" is likely to be somewhat vague. Including detail about the specific task/script/etc would be a good idea, even if that detail is opaque to anyone besides the operator.[5]

For more information, please refer to the MediaWiki API Documentation.[6]

Web browsers generally send a User-Agent string automatically; if you encounter the above error, please refer to your browser's manual to find out how to set the User-Agent string. Note that some plugins or proxies for privacy enhancement may suppress this header. However, for anonymous surfing, it is recommended to send a generic User-Agent string, instead of suppressing it or sending an empty string. Note that other features are much more likely to identify you to a website — if you are interested in protecting your privacy, visit the Panopticlick project.

Browser-based applications written in Flash or JavaScript are typically forced to send the same User-Agent header as the browser that hosts them. This is not a violation of policy, however such applications are encouraged to include the Api-User-Agent header to supply an appropriate agent.

As of 2015, Wikimedia sites do not reject all page views and API requests from clients that do not set a User-Agent header. As such, the requirement is not automatically enforced. Rather, it may be enforced in specific cases as needed.[7]

Notes

  1. The Wikitech-l February 2010 Archive by subject
  2. User-Agent: | Wikipedia | Wikitech
  3. API:FAQ - MediaWiki
  4. [Wikitech-l] User-Agent:
  5. Anomie (31 July 2014). "Clarification on what is needed for "identifying the bot" in bot user-agent?". Mediawiki-api. 
  6. As an example (among other examples) of how to set a user-agent, in PHP, one might use the following, if one's cURL handle is $ch:
    curl_setopt($ch, CURLOPT_USERAGENT , 'CoolTool/0.0 (https://example.org/cool-tool/; cool-tool@example.org) generic-library/0.0');
    
  7. gmane.science.linguistics.wikipedia.technical/83870 (deadlink)

See also