Policy:User-Agent policy/ja: Difference between revisions

From Wikimedia Foundation Governance Wiki
Content deleted Content added
FuzzyBot (talk | contribs)
Updating to match new version of source page
FuzzyBot (talk | contribs)
Updating to match new version of source page
 
(59 intermediate revisions by 6 users not shown)
Line 1: Line 1:
<languages />{{DISPLAYTITLE:<span lang="en" dir="ltr" class="mw-content-ltr">User-Agent policy</span>}}
<languages />
{{notice|This page is purely informative, reflecting the current state of affairs. To discuss this topic, please use the wikitech-l [[Special:MyLanguage/Mailing lists|mailing list]].}}
{{notice|1=このページは純粋に情報提供のためにあり、事態の現状を表すものです。この話題に関する議論は、wikitech-l [[:m:Special:MyLanguage/Mailing lists|メーリングリストをご利用ください]]}}
{{policy-staff}}
As of 2015, no user agent requirement is technically enforced in general, though it may be enforced in specific cases as needed.
<ref>http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/83870/</ref>


2010年2月15日以降、ウィキメディアのサイト群では全てのリクエストについて'''HTTP [[{{lwp|User-Agent}}|ユーザーエージェント]] ヘッダ'''が必要とされます。これは技術職員による運用上の決定であり、技術メーリングリストで告知と議論がされました<ref>[[mailarchive:wikitech-l/2010-February/thread.html#46764|The Wikitech-l 2010年2月話題ごとの過去ログ]]</ref><ref>[[listarchive:list/wikitech-l@lists.wikimedia.org/thread/R4RU7XTBM5J3BTS6GGQW77NYS2E4WGLI/|User-Agent: - Wikitech-l - lists.wikimedia.org]]</ref>。その論理的根拠は、ユーザーエージェント文字列を送信しないクライアントは、たいていプロジェクトに益を与えることなくサーバに大きな負荷をかける行儀の悪いスクリプトであるということです。<code>python-requests/x</code>のような非説明的な既定値で始まるユーザーエージェント文字列も、ウィキメディアのウェブサイト(もしくはウェブサイトの一部、例えば<code>api.php</code>)からブロックされる場合があります。
As of February 15, 2010, Wikimedia sites require a '''HTTP [[w:User-Agent|User-Agent]] header''' for all requests. This was an operative decision made by the technical staff and was announced and discussed on the technical mailing list.<ref>[//lists.wikimedia.org/pipermail/wikitech-l/2010-February/subject.html#46777 The Wikitech-l February 2010 Archive by subject]</ref><ref>[http://www.gossamer-threads.com/lists/wiki/wikitech/189275 User-Agent: | Wikipedia | Wikitech]</ref> The rationale is, that clients that do not send a User-Agent string are mostly ill behaved scripts that cause a lot of load on the servers, without benefiting the projects. Note that non-descriptive default values for the User-Agent string, such as used by Perl's libwww, may also be blocked from using Wikimedia web sites (or parts of the web sites, such as api.php).


(例えばブラウザもしくはスクリプトからの)説明的なユーザーエージェントヘッダを送信しないリクエストは現在、次のようなエラーメッセージに出くわす場合があります。
User agents (browsers or scripts) that do not send a User-Agent header may now encounter an error message like this:


:''スクリプトは連絡先情報を含む情報を提供するユーザーエージェント文字列を使用するべきであり、さもなくば予告なくブロックの対象となる場合があります。''
:''Scripts should use an informative User-Agent string with contact information, or they may be IP-blocked without notice.''


許可されていないユーザーエージェントからのリクエストは、次のようなあまり役に立たないエラーメッセージに出くわす場合があります。
User agents that send a User-Agent header that is blacklisted (for example, any User-Agent string that begins with "lwp", whether it is informative or not) may encounter a less helpful error message (lie) like this:


:''現在、このサーバーには技術的な問題が発生しています。数分後に再度お試しください。''
:''Our servers are currently experiencing a technical problem. This is probably temporary and should be fixed soon. Please try again in a few minutes.


<div lang="en" dir="ltr" class="mw-content-ltr">
This change is most likely to affect scripts (bots) accessing Wikimedia websites such as Wikipedia automatically, via api.php or otherwise, and command line programs.<ref>[//www.mediawiki.org/w/index.php?title=API:FAQ#do_I_get_HTTP_403_errors.3F API:FAQ - MediaWiki]</ref> If you run a bot, please send a User-Agent header identifying the bot with an identifier that isn't going to be confused with many other bots, and supplying some way of contacting you (e.g. a userpage on the local wiki, a userpage on a related wiki using interwiki linking syntax, a URI for a relevant external website, or an email address), e.g.:
This change is most likely to affect scripts (bots) accessing Wikimedia websites such as Wikipedia automatically, via api.php or otherwise, and command line programs.<ref>[[:mw:Special:MyLanguage/API:FAQ|API:FAQ - MediaWiki]]</ref> If you run a bot, please send a User-Agent header identifying the bot with an identifier that isn't going to be confused with many other bots, and supplying some way of contacting you (e.g. a userpage on the local wiki, a userpage on a related wiki using interwiki linking syntax, a URI for a relevant external website, or an email address), e.g.:
</div>
<pre>
<pre>
User-Agent: MyCoolTool/1.1 (http://example.com/MyCoolTool/; MyCoolTool@example.com) BasedOnSuperLib/1.4
User-Agent: CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org) generic-library/0.0
</pre>
</pre>
Do not copy a browser's user agent for your bot, as bot-like behavior with a browser's user agent will be assumed malicious.<ref>[//lists.wikimedia.org/pipermail/wikitech-l/2010-February/046783.html [Wikitech-l&#93; User-Agent:]</ref> Do not use generic agents such as "curl", "lwp", "Python-urllib", and so on. For large frameworks like pywikibot, there are so many users that just "pywikibot" is likely to be somewhat vague. Including detail about the specific task/script/etc would be a good idea, even if that detail is opaque to anyone besides the operator.<ref>{{cite web|url=http://lists.wikimedia.org/pipermail/mediawiki-api/2014-July/003308.html|title=Clarification on what is needed for "identifying the bot" in bot user-agent?|publisher=Mediawiki-api|author=Anomie|date=31 July 2014}}</ref>


<div lang="en" dir="ltr" class="mw-content-ltr">
For more information, please refer to the [[mw:API:Quick start guide#Identifying your client|MediaWiki API Documentation]].<ref>As an example (among [[mw:API:Quick_start_guide#Identifying_your_client|other examples]]) of how to set a user-agent, in PHP, one [http://php.net/manual/en/function.curl-setopt.php might use] the following, if one's cURL handle is <code>$ch</code>:<source lang="php">curl_setopt( $ch , CURLOPT_USERAGENT , 'MyCoolTool/1.1 (http://example.com/MyCoolTool/; MyCoolTool@example.com) BasedOnSuperLib/1.4' );</source></ref>
The generic format is <code><client name>/<version> (<contact information>) <library/framework name>/<version> [<library name>/<version> ...]</code>. Parts that are not applicable can be omitted.
</div>


<div lang="en" dir="ltr" class="mw-content-ltr">
Web browsers generally send a User-Agent string automatically; if you encounter the above error, please refer to your browser's manual to find out how to set the User-Agent string. Note that some plugins or proxies for privacy enhancement may suppress this header. However, for anonymous surfing, it is recommended to send a generic User-Agent string, instead of suppressing it or sending an empty string. Note that other features are much more likely to identify you to a website — if you are interested in protecting your privacy, visit the [https://panopticlick.eff.org/ Panopticlick project].
If you run an automated agent, please consider following the Internet-wide convention of including the string "bot" in the User-Agent string, in any combination of lowercase or uppercase letters. This is recognized by Wikimedia's systems, and used to classify traffic and provide more accurate statistics.
</div>


<div lang="en" dir="ltr" class="mw-content-ltr">
Browser-based applications written in Flash or JavaScript are typically forced to send the same User-Agent header as the browser that hosts them. This is not a violation of policy, however such applications are encouraged to include the <code>Api-User-Agent</code> header to supply an appropriate agent.
Do not copy a browser's user agent for your bot, as bot-like behavior with a browser's user agent will be assumed malicious.<ref>[[mailarchive:wikitech-l/2010-February/046783.html|[Wikitech-l] User-Agent:]]</ref> Do not use generic agents such as "curl", "lwp", "Python-urllib", and so on. For large frameworks like pywikibot, there are so many users that just "pywikibot" is likely to be somewhat vague. Including detail about the specific task/script/etc would be a good idea, even if that detail is opaque to anyone besides the operator.<ref>[[mailarchive:mediawiki-api/2014-July/003308.html|Clarification on what is needed for "identifying the bot" in bot user-agent?]]</ref>
</div>


<div lang="en" dir="ltr" class="mw-content-ltr">
== 注記 ==
Web browsers generally send a User-Agent string automatically; if you encounter the above error, please refer to your browser's manual to find out how to set the User-Agent string. Note that some plugins or proxies for privacy enhancement may suppress this header. However, for anonymous surfing, it is recommended to send a generic User-Agent string, instead of suppressing it or sending an empty string. Note that other features are much more likely to identify you to a website — if you are interested in protecting your privacy, visit the [//coveryourtracks.eff.org/ Cover Your Tracks project].
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
Browser-based applications written in JavaScript are typically forced to send the same User-Agent header as the browser that hosts them. This is not a violation of policy, however such applications are encouraged to include the <code>Api-User-Agent</code> header to supply an appropriate agent.
</div>

2015年現在、ウィキメディアのサイト群はユーザーエージェントヘッダが設定されていないクライアントからのページ閲覧およびAPIリクエストをすべて拒否しているわけではありません。したがって、要件は自動的に強制されません。むしろ、必要に応じて個別の事例で強制される場合があります。<ref>gmane.science.linguistics.wikipedia.technical/83870 ([//thread.gmane.org/gmane.science.linguistics.wikipedia.technical/83870/ deadlink])</ref>
<span id="Code_examples"></span>
== コード例 ==

<div lang="en" dir="ltr" class="mw-content-ltr">
On Wikimedia wikis, if you don't supply a <code>User-Agent</code> header, or you supply an empty or generic one, your request will fail with an HTTP 403 error. Other MediaWiki installations may have similar policies.
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
=== JavaScript ===
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
If you are calling the API from browser-based JavaScript, you won't be able to influence the <code>User-Agent</code> header: the browser will use its own. To work around this, use the <code>Api-User-Agent</code> header:
</div>

<syntaxhighlight lang="javascript">
// Using XMLHttpRequest
xhr.setRequestHeader( 'Api-User-Agent', 'Example/1.0' );
</syntaxhighlight>
<syntaxhighlight lang="javascript">
// Using jQuery
$.ajax( {
url: 'https://example/...',
data: ...,
dataType: 'json',
type: 'GET',
headers: { 'Api-User-Agent': 'Example/1.0' },
} ).then( function ( data ) {
// ..
} );
</syntaxhighlight>
<syntaxhighlight lang="javascript">
// Using mw.Api
var api = new mw.Api( {
ajax: {
headers: { 'Api-User-Agent': 'Example/1.0' }
}
} );
api.get( ... ).then( function ( data ) {
// ...
});
</syntaxhighlight>
<syntaxhighlight lang="javascript">
// Using Fetch
fetch( 'https://example/...', {
method: 'GET',
headers: new Headers( {
'Api-User-Agent': 'Example/1.0'
} )
} ).then( function ( response ) {
return response.json();
} ).then( function ( data ) {
// ...
});
</syntaxhighlight>

<div lang="en" dir="ltr" class="mw-content-ltr">
=== PHP ===
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
In PHP, you can identify your user-agent with code such as this:
</div>

<syntaxhighlight lang="php">
ini_set( 'user_agent', 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)' );
</syntaxhighlight>

<div lang="en" dir="ltr" class="mw-content-ltr">
=== cURL ===
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
Or if you use [[{{lwp|cURL}}|cURL]]:
</div>

<syntaxhighlight lang="php">
curl_setopt( $curl, CURLOPT_USERAGENT, 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)' );
</syntaxhighlight>

<div lang="en" dir="ltr" class="mw-content-ltr">
=== Python ===
</div>

Pythonでは、 [[{{lwp|Requests (software)}}|Requests]] ライブラリをヘッダー設定に使うことができます。

<syntaxhighlight lang="python">
import requests

url = 'https://example/...'
headers = {'User-Agent': 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)'}

response = requests.get(url, headers=headers)
</syntaxhighlight>

<div lang="en" dir="ltr" class="mw-content-ltr">
Or, if you want to use [//sparqlwrapper.readthedocs.io SPARQLWrapper] like in https://people.wikimedia.org/~bearloga/notes/wdqs-python.html:
</div>

<syntaxhighlight lang="python">
from SPARQLWrapper import SPARQLWrapper, JSON

url = 'https://example/...'
user_agent = 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)'

sparql = SPARQLWrapper(url, agent = user_agent )
results = sparql.query()
</syntaxhighlight>

== {{int string|Notes}} ==
<references />
<references />


== {{int string|See also}} ==
[[Category:Global policies{{#translation:}}]]

* <span lang="en" dir="ltr" class="mw-content-ltr">[[wikitech:Robot policy|Policy for crawlers and bots]] that wish to operate on Wikimedia websites</span>

[[Category:Bots{{#translation:}}]]
[[Category:Bots{{#translation:}}]]
[[Category:Policies maintained by the Wikimedia Foundation{{#translation:}}]]

Latest revision as of 01:03, 29 March 2024

2010年2月15日以降、ウィキメディアのサイト群では全てのリクエストについてHTTP ユーザーエージェント ヘッダが必要とされます。これは技術職員による運用上の決定であり、技術メーリングリストで告知と議論がされました[1][2]。その論理的根拠は、ユーザーエージェント文字列を送信しないクライアントは、たいていプロジェクトに益を与えることなくサーバに大きな負荷をかける行儀の悪いスクリプトであるということです。python-requests/xのような非説明的な既定値で始まるユーザーエージェント文字列も、ウィキメディアのウェブサイト(もしくはウェブサイトの一部、例えばapi.php)からブロックされる場合があります。

(例えばブラウザもしくはスクリプトからの)説明的なユーザーエージェントヘッダを送信しないリクエストは現在、次のようなエラーメッセージに出くわす場合があります。

スクリプトは連絡先情報を含む情報を提供するユーザーエージェント文字列を使用するべきであり、さもなくば予告なくブロックの対象となる場合があります。

許可されていないユーザーエージェントからのリクエストは、次のようなあまり役に立たないエラーメッセージに出くわす場合があります。

現在、このサーバーには技術的な問題が発生しています。数分後に再度お試しください。

This change is most likely to affect scripts (bots) accessing Wikimedia websites such as Wikipedia automatically, via api.php or otherwise, and command line programs.[3] If you run a bot, please send a User-Agent header identifying the bot with an identifier that isn't going to be confused with many other bots, and supplying some way of contacting you (e.g. a userpage on the local wiki, a userpage on a related wiki using interwiki linking syntax, a URI for a relevant external website, or an email address), e.g.:

User-Agent: CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org) generic-library/0.0

The generic format is <client name>/<version> (<contact information>) <library/framework name>/<version> [<library name>/<version> ...]. Parts that are not applicable can be omitted.

If you run an automated agent, please consider following the Internet-wide convention of including the string "bot" in the User-Agent string, in any combination of lowercase or uppercase letters. This is recognized by Wikimedia's systems, and used to classify traffic and provide more accurate statistics.

Do not copy a browser's user agent for your bot, as bot-like behavior with a browser's user agent will be assumed malicious.[4] Do not use generic agents such as "curl", "lwp", "Python-urllib", and so on. For large frameworks like pywikibot, there are so many users that just "pywikibot" is likely to be somewhat vague. Including detail about the specific task/script/etc would be a good idea, even if that detail is opaque to anyone besides the operator.[5]

Web browsers generally send a User-Agent string automatically; if you encounter the above error, please refer to your browser's manual to find out how to set the User-Agent string. Note that some plugins or proxies for privacy enhancement may suppress this header. However, for anonymous surfing, it is recommended to send a generic User-Agent string, instead of suppressing it or sending an empty string. Note that other features are much more likely to identify you to a website — if you are interested in protecting your privacy, visit the Cover Your Tracks project.

Browser-based applications written in JavaScript are typically forced to send the same User-Agent header as the browser that hosts them. This is not a violation of policy, however such applications are encouraged to include the Api-User-Agent header to supply an appropriate agent.

2015年現在、ウィキメディアのサイト群はユーザーエージェントヘッダが設定されていないクライアントからのページ閲覧およびAPIリクエストをすべて拒否しているわけではありません。したがって、要件は自動的に強制されません。むしろ、必要に応じて個別の事例で強制される場合があります。[6]

コード例

On Wikimedia wikis, if you don't supply a User-Agent header, or you supply an empty or generic one, your request will fail with an HTTP 403 error. Other MediaWiki installations may have similar policies.

JavaScript

If you are calling the API from browser-based JavaScript, you won't be able to influence the User-Agent header: the browser will use its own. To work around this, use the Api-User-Agent header:

// Using XMLHttpRequest
xhr.setRequestHeader( 'Api-User-Agent', 'Example/1.0' );
// Using jQuery
$.ajax( {
    url: 'https://example/...',
    data: ...,
    dataType: 'json',
    type: 'GET',
    headers: { 'Api-User-Agent': 'Example/1.0' },
} ).then( function ( data )  {
    // ..
} );
// Using mw.Api
var api = new mw.Api( {
    ajax: {
        headers: { 'Api-User-Agent': 'Example/1.0' }
    }
} );
api.get( ... ).then( function ( data ) {
    // ...
});
// Using Fetch
fetch( 'https://example/...', {
    method: 'GET',
    headers: new Headers( {
        'Api-User-Agent': 'Example/1.0'
    } )
} ).then( function ( response ) {
    return response.json();
} ).then( function ( data ) {
    // ...
});

PHP

In PHP, you can identify your user-agent with code such as this:

ini_set( 'user_agent', 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)' );

cURL

Or if you use cURL:

curl_setopt( $curl, CURLOPT_USERAGENT, 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)' );

Python

Pythonでは、 Requests ライブラリをヘッダー設定に使うことができます。

import requests

url = 'https://example/...'
headers = {'User-Agent': 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)'}

response = requests.get(url, headers=headers)
from SPARQLWrapper import SPARQLWrapper, JSON

url = 'https://example/...'
user_agent = 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)'

sparql = SPARQLWrapper(url, agent = user_agent )
results = sparql.query()

備考

  1. The Wikitech-l 2010年2月話題ごとの過去ログ
  2. User-Agent: - Wikitech-l - lists.wikimedia.org
  3. API:FAQ - MediaWiki
  4. [Wikitech-l] User-Agent:
  5. Clarification on what is needed for "identifying the bot" in bot user-agent?
  6. gmane.science.linguistics.wikipedia.technical/83870 (deadlink)

参照