Policy:User-Agent policy: Difference between revisions
Content deleted Content added
BDavis (WMF) (talk | contribs) Undo revision 25156116 by 2601:8C1:8280:4840:38C3:E945:4936:5343 (talk) Dates here are implementation dates, not last status check dates |
Replaced content with "{{#if:{{#translation:}}|[[Category:{{subst:PAGENAME}}/Translations|{{#translation:}}]]}}" |
||
Line 1: | Line 1: | ||
{{#if:{{#translation:}}|[[Category:User-Agent policy/Translations|{{#translation:}}]]}} |
|||
<languages /> |
|||
{{notice|<translate><!--T:1--> |
|||
This page is purely informative, reflecting the current state of affairs. To discuss this topic, please use the wikitech-l [[<tvar name="mail-lists">Special:MyLanguage/Mailing lists</tvar>|mailing list]].</translate>}} |
|||
<translate> |
|||
<!--T:2--> |
|||
As of February 15, 2010, Wikimedia sites require a '''HTTP [[w:User-Agent|User-Agent]] header''' for all requests. This was an operative decision made by the technical staff and was announced and discussed on the technical mailing list.<ref>[<tvar name="ref1url">https://lists.wikimedia.org/pipermail/wikitech-l/2010-February/thread.html#46764</tvar> The Wikitech-l February 2010 Archive by subject]</ref><ref>[<tvar name="ref2url">https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/thread/R4RU7XTBM5J3BTS6GGQW77NYS2E4WGLI/</tvar> User-Agent: - Wikitech-l - lists.wikimedia.org]</ref> The rationale is, that clients that do not send a User-Agent string are mostly ill behaved scripts that cause a lot of load on the servers, without benefiting the projects. User-Agent strings that begin with non-descriptive default values, such as <code>python-requests/x</code>, may also be blocked from Wikimedia sites (or parts of a website, e.g. <code>api.php</code>). |
|||
<!--T:3--> |
|||
Requests (e.g. from browsers or scripts) that do not send a descriptive User-Agent header, may encounter an error message like this: |
|||
<!--T:4--> |
|||
:''Scripts should use an informative User-Agent string with contact information, or they may be blocked without notice.'' |
|||
<!--T:5--> |
|||
Requests from disallowed user agents may instead encounter a less helpful error message like this: |
|||
<!--T:6--> |
|||
:''Our servers are currently experiencing a technical problem. Please try again in a few minutes. |
|||
<!--T:7--> |
|||
This change is most likely to affect scripts (bots) accessing Wikimedia websites such as Wikipedia automatically, via api.php or otherwise, and command line programs.<ref>[<tvar name="ref3url">//www.mediawiki.org/w/index.php?title=API:FAQ#do_I_get_HTTP_403_errors.3F</tvar> API:FAQ - MediaWiki]</ref> If you run a bot, please send a User-Agent header identifying the bot with an identifier that isn't going to be confused with many other bots, and supplying some way of contacting you (e.g. a userpage on the local wiki, a userpage on a related wiki using interwiki linking syntax, a URI for a relevant external website, or an email address), e.g.:</translate> |
|||
<pre> |
|||
User-Agent: CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org) generic-library/0.0 |
|||
</pre> |
|||
<translate> |
|||
<!--T:22--> |
|||
The generic format is <tvar name="fmt"><code><client name>/<version> (<contact information>) <library/framework name>/<version> [<library name>/<version> ...]</code></tvar>. Parts that are not applicable can be omitted. |
|||
<!--T:14--> |
|||
If you run an automated agent, please consider following the Internet-wide convention of including the string "bot" in the User-Agent string, in any combination of lowercase or uppercase letters. This is recognized by Wikimedia's systems, and used to classify traffic and provide more accurate statistics. |
|||
<!--T:8--> |
|||
Do not copy a browser's user agent for your bot, as bot-like behavior with a browser's user agent will be assumed malicious.<ref>[<tvar name="ref4url">//lists.wikimedia.org/pipermail/wikitech-l/2010-February/046783.html [Wikitech-l]</tvar> User-Agent:]</ref> Do not use generic agents such as "curl", "lwp", "Python-urllib", and so on. For large frameworks like pywikibot, there are so many users that just "pywikibot" is likely to be somewhat vague. Including detail about the specific task/script/etc would be a good idea, even if that detail is opaque to anyone besides the operator.<ref>{{cite web|url=<tvar name="ref5url">https://lists.wikimedia.org/pipermail/mediawiki-api/2014-July/003308.html</tvar>|title=Clarification on what is needed for "identifying the bot" in bot user-agent?|publisher=Mediawiki-api|author=Anomie|date=31 July 2014}}</ref> |
|||
<!--T:10--> |
|||
Web browsers generally send a User-Agent string automatically; if you encounter the above error, please refer to your browser's manual to find out how to set the User-Agent string. Note that some plugins or proxies for privacy enhancement may suppress this header. However, for anonymous surfing, it is recommended to send a generic User-Agent string, instead of suppressing it or sending an empty string. Note that other features are much more likely to identify you to a website — if you are interested in protecting your privacy, visit the [<tvar name="eff-url">https://coveryourtracks.eff.org/</tvar> Cover Your Tracks project]. |
|||
<!--T:11--> |
|||
Browser-based applications written in JavaScript are typically forced to send the same User-Agent header as the browser that hosts them. This is not a violation of policy, however such applications are encouraged to include the <tvar name="header-code"><code>Api-User-Agent</code></tvar> header to supply an appropriate agent. |
|||
<!--T:13--> |
|||
As of 2015, Wikimedia sites do not reject all page views and API requests from clients that do not set a User-Agent header. As such, the requirement is not automatically enforced. Rather, it may be enforced in specific cases as needed.</translate><ref>gmane.science.linguistics.wikipedia.technical/83870 ([http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/83870/ deadlink])</ref><translate> |
|||
== Code examples == <!--T:17--> |
|||
<!--T:18--> |
|||
On Wikimedia wikis, if you don't supply a <tvar|1><code>User-Agent</code></> header, or you supply an empty or generic one, your request will fail with an HTTP 403 error. Other MediaWiki installations may have similar policies. |
|||
=== JavaScript === <!--T:23--> |
|||
<!--T:19--> |
|||
If you are calling the API from browser-based JavaScript, you won't be able to influence the <tvar|1><code>User-Agent</code></> header: the browser will use its own. To work around this, use the <tvar name="header-code"><code>Api-User-Agent</code></tvar> header: |
|||
</translate> |
|||
<syntaxhighlight lang="javascript"> |
|||
// Using XMLHttpRequest |
|||
xhr.setRequestHeader( 'Api-User-Agent', 'Example/1.0' ); |
|||
</syntaxhighlight> |
|||
<syntaxhighlight lang="javascript"> |
|||
// Using jQuery |
|||
$.ajax( { |
|||
url: 'https://example/...', |
|||
data: ..., |
|||
dataType: 'json', |
|||
type: 'GET', |
|||
headers: { 'Api-User-Agent': 'Example/1.0' }, |
|||
} ).then( function ( data ) { |
|||
// .. |
|||
} ); |
|||
</syntaxhighlight> |
|||
<syntaxhighlight lang="javascript"> |
|||
// Using mw.Api |
|||
var api = new mw.Api( { |
|||
ajax: { |
|||
headers: { 'Api-User-Agent': 'Example/1.0' } |
|||
} |
|||
} ); |
|||
api.get( ... ).then( function ( data ) { |
|||
// ... |
|||
}); |
|||
</syntaxhighlight> |
|||
<syntaxhighlight lang="javascript"> |
|||
// Using Fetch |
|||
fetch( 'https://example/...', { |
|||
method: 'GET', |
|||
headers: new Headers( { |
|||
'Api-User-Agent': 'Example/1.0' |
|||
} ) |
|||
} ).then( function ( response ) { |
|||
return response.json(); |
|||
} ).then( function ( data ) { |
|||
// ... |
|||
}); |
|||
</syntaxhighlight> |
|||
<translate> |
|||
=== PHP === <!--T:24--> |
|||
<!--T:20--> |
|||
In PHP, you can identify your user-agent with code such as this: |
|||
</translate> |
|||
<syntaxhighlight lang="php"> |
|||
ini_set( 'user_agent', 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)' ); |
|||
</syntaxhighlight> |
|||
<translate> |
|||
=== cURL === <!--T:25--> |
|||
<!--T:21--> |
|||
Or if you use [[wikipedia:cURL|cURL]]: |
|||
</translate> |
|||
<syntaxhighlight lang="php"> |
|||
curl_setopt( $curl, CURLOPT_USERAGENT, 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)' ); |
|||
</syntaxhighlight> |
|||
<translate> |
|||
=== Python === <!--T:26--> |
|||
<!--T:27--> |
|||
In Python, you can use the [[wikipedia:Requests_(software)|Requests]] library to set a header: |
|||
</translate> |
|||
<syntaxhighlight lang="python"> |
|||
import requests |
|||
url = 'https://example/...' |
|||
headers = {'User-Agent': 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)'} |
|||
response = requests.get(url, headers=headers) |
|||
</syntaxhighlight> |
|||
<translate> |
|||
<!--T:28--> |
|||
Or, if you want to use [<tvar name="url">https://sparqlwrapper.readthedocs.io</tvar> SPARQLWrapper] like in <tvar name="url2">https://people.wikimedia.org/~bearloga/notes/wdqs-python.html</tvar>: |
|||
</translate> |
|||
<syntaxhighlight lang="python"> |
|||
from SPARQLWrapper import SPARQLWrapper, JSON |
|||
url = 'https://example/...' |
|||
user_agent = 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)' |
|||
sparql = SPARQLWrapper(url, agent = user_agent ) |
|||
results = sparql.query() |
|||
</syntaxhighlight> |
|||
<translate> |
|||
== Notes == <!--T:12--> |
|||
</translate> |
|||
<references /> |
|||
<translate> |
|||
== See also == <!--T:15--> |
|||
</translate> |
|||
* <translate><!--T:16--> [[<tvar name="policy">wikitech:Robot policy</tvar>|Policy for crawlers and bots]] that wish to operate on Wikimedia websites</translate> |
|||
[[Category:Global policies{{#translation:}}]] |
|||
[[Category:Bots{{#translation:}}]] |
|||
[[Category:Policies maintained by the Wikimedia Foundation{{#translation:}}]] |