Policy talk:Privacy policy: Difference between revisions

From Wikimedia Foundation Governance Wiki
Content deleted Content added
Krib (talk | contribs)
Tfinc (talk | contribs)
Line 995: Line 995:
:We don't need to keep around timestamps down to a fraction of a second forever. [[User:PiRSquared17|PiRSquared17]] ([[User talk:PiRSquared17|talk]]) 21:13, 20 December 2013 (UTC)
:We don't need to keep around timestamps down to a fraction of a second forever. [[User:PiRSquared17|PiRSquared17]] ([[User talk:PiRSquared17|talk]]) 21:13, 20 December 2013 (UTC)
::Not sure about that. I wonder if de.wiki also has agreed to a decrease of its own [[right to fork]], a right which they constantly use as a threat. Making dumps unusable would greatly reduce the contractual power of de.wiki, dunno if they really want it. --[[User:Nemo_bis|Nemo]] 21:43, 20 December 2013 (UTC)
::Not sure about that. I wonder if de.wiki also has agreed to a decrease of its own [[right to fork]], a right which they constantly use as a threat. Making dumps unusable would greatly reduce the contractual power of de.wiki, dunno if they really want it. --[[User:Nemo_bis|Nemo]] 21:43, 20 December 2013 (UTC)

While we believe this proposal is based on legitimate concerns, we want to highlight some of the practical considerations of such a proposal. Due to the holidays, we’ve addressed this only briefly, but we hope it serves to explain our perspective.

In summary, public access to metadata around page creation and editing is critical to the health and well-being of the site and is used in numerous places and for numerous use cases:

* Protecting against vandalism, incorrect and inappropriate content: there are several bots that patrol Wikipedia’s articles that protect the site against these events. Without public access to metadata, the effectiveness of these bots will be much reduced, and it is impossible for humans to perform these tasks at scale.
* Community workflows: Processes that contribute to the quality and governance of the project will also be affected: blocking users, assessing adminship nominations, determining eligible participants in article deletions.
* Powertools: certain bulk processes will be broken without public access to this metadata.
* Research: researchers around the world use this public metadata for analysis that is useful for both to the site and the movement. It is essential that they continue to have access.
* Forking: In order to have a full copy of our projects and their change histories all metadata needs to be exposed alongside content.

In summary, public and open-licensed revision metadata is vital to the technical and social functioning of Wikipedia, and any removal of this data would have serious impact on a number of processes and actions critical to the project. [[User:Tfinc|Tfinc]] ([[User talk:Tfinc|talk]]) 00:54, 21 December 2013 (UTC)

Revision as of 00:54, 21 December 2013

Template:Autotranslate User:MiszaBot/config Template:Autotranslate

Shortcut:
T:P


What is changing?

Several comments below ask about what’s new in this draft as compared to the current privacy policy. To help new folks just joining the conversation, we have outlined the main changes in this box. But feel free to join the discussion about these changes here.

As a general matter, because the current privacy policy was written in 2008, it did not anticipate many technologies that we are using today. Where the current policy is silent, the new draft spells out to users how their data is collected and used. Here are some specific examples:

  1. Cookies: The current policy mentions the use of temporary session cookies and broadly states some differences in the use of cookies between mere reading and logged-in reading or editing. The FAQ in the new draft lists specific cookies that we use and specifies what they are used for and when they expire. The draft policy further clarifies that we will never use third-party cookies without permission from users. It also outlines other technologies that we may consider using to collect data like tracking pixels or local storage.
  2. Location data: Whereas the current policy does not address collection and use of location data, the draft policy spells out how you may be communicating the location of your device through GPS and similar technologies, meta data from uploaded images, and IP addresses. It also explains how we may use that data.
  3. Information we receive automatically: The current policy does not clearly explain that we can receive certain data automatically. The new draft explains that when you make requests to our servers you submit certain information automatically. It also specifies how we use this information to administer the sites, provide greater security, fight vandalism, optimize mobile applications, and otherwise make it easier for you to use the sites.
  4. Limited data sharing: The current policy narrowly states that user passwords and cookies shouldn’t be disclosed except as required by law, but doesn’t specify how other data may be shared. The new draft expressly lists how all data may be shared, not just passwords and cookies. This includes discussing how we share some data with volunteer developers, whose work is essential for our open source projects. It also includes providing non-personal data to researchers who can share their findings with our community so that we can understand the projects and make them better.
  5. Never selling user data: The current policy doesn’t mention this. While long-term editors and community members understand that selling data is against our ethos, newcomers have no way of knowing how our projects are different from most other websites unless we expressly tell them. The new draft spells out that we would never sell or rent their data or use it to sell them anything.
  6. Notifications: We introduced notifications after the current policy was drafted. So, unsurprisingly, it doesn’t mention them. The new draft explains how notifications are used, that they can sometimes collect data through tracking pixels, and how you can opt out.
  7. Scope of the policy: The current policy states its scope in general terms, and we want to be clearer about when the policy applies. The new draft includes a section explaining what the policy does and doesn’t cover in more detail.
  8. Surveys and feedback: The current policy doesn’t specifically address surveys and feedback forms. The new draft explains when we may use surveys and how we will notify you what information we collect.
  9. Procedures for updating the policy: The new draft specifically indicates how we will notify you if the policy needs to be changed. This is consistent with our current practice, but we want to make our commitment clear: we will provide advance notice for substantial changes to the privacy policy, allow community comment, and provide those changes in multiple languages.

This is of course not a comprehensive list of changes. If you see other changes that you are curious about, feel free to raise them and we will clarify the intent.

The purpose of a privacy policy is to inform users about what information is collected, how it is used, and whom it is shared with. The current policy did this well back when it was written, but it is simply outdated. We hope that with your help the new policy will address all the relevant information about use of personal data on the projects. YWelinder (WMF) (talk) 01:07, 6 September 2013 (UTC)[reply]



NSA, FISC, NSL, FISAAA, PRISM...

The following discussion is closed: Closing first couple sections given staleness, and apparently completeness will archive these sections when last two are done. Jalexander--WMF 22:23, 18 December 2013 (UTC) [reply]

The WMF and many people with access to nonpublic information (like (for users with accounts) their IP addresses and possibly their email addresses) are subject to the contradictory laws of the USA. The WMF and many people with access to nonpublic information may be required to make such information available to unaccountable agencies while being legally restrained from telling them that the information was shared. Admitting new information sharing mechanisms, or even just the requests may result in imprisonment without trails, without access to the laws leading to imprisonment, or even transcripts of the decisions, evidence, or who their accusers were.

Until the WMF and people with access to nonpublic information remove themselves from such jurisdictions, the guarantees in the WMF's privacy policy, the access to nonpublic information policy, the data retention guidelines, the transparency report, and the requests for user information procedure, are untrue.

To service campaign contributors, your information may be given to third parties for marketing purposes.

Your data may be secretly retained by the WMF for as long as required by US agencies, and/or by those agencies themselves for as long as they want.

The WMF may be prevented from revealing their actual policies but forced to claim that they protect users' privacy per their public policies. -- Jeandré, 2013-09-04t12:47z

See also Talk:Privacy policy/Call for input (2013)#Technical and legal coercion aspects.

Hi Jeandré, while I'm someone who knows for a fact that we would strongly rebel against secret requests and unreasonable demands from the government (any government) I'm certainly sympathetic to these concerns (I think much of what the US government has done is illegal and immoral). That said I have yet to see where we could 'go' to remove everyone from jurisdictions where this (or other equally bad issues) would be a problem. Europe, for example, is generally not better, it has significant issues as well. Jalexander (talk) 20:07, 4 September 2013 (UTC)[reply]
As far as I know, the voters in New Zealand and Iceland care about doing the right thing, and don't have the same kinds of laws as the USA and UK. -- Jeandré, 2013-09-05t09:27z
Les lois européennes sont infiniment plus protectrices que les lois américaines. Pourquoi croyez-vous que les grosses sociétés informatique (Google, Micro$oft, Apple, etc.) essaient d'imposer, heureusement sans trop de succès (voir les quelques affaires récentes, par exemple entre Google et les CNIL européennes) , que ce soit le droit américain qui s'applique au détriment du droit européen ? 78.251.243.204 20:18, 5 September 2013 (UTC)[reply]
Et de toutes façons ce n'est pas seulement une question de quelle loi est plus protectrice ou pas, c'est une question de que les lois des différents pays doivent être respectées. Chaque pays est souverain et établit ses lois de manière démocratique, on n'a pas à lui imposer des lois qui n'ont aucune légitimité. Seuls les Américains votent pour élire leur congrès. Les lois américaines ne s'appliquent donc qu'à eux 78.251.243.204 20:21, 5 September 2013 (UTC)[reply]
Nous somme désolée, 78.251.243.204, mais nous pouvons pas échapper la juridiction du gouvernement des États-Unis en ce moment. La Fondation est sensible aux préoccupations concernant la vaste capacité du gouvernement à accéder à l'information des utilisateurs. La Fondation croit fermement en le droit des utilisateurs de participer aux projets en manière anonyme et croit en la protection de la vie privée des utilisateurs. Pour ces raisons, la Fondation recueille beaucoup moins d'informations des utilisateurs que d'autres sites comparables, et conserve l'information pour la période plus courte qui est compatible avec les objectifs de l'entretien, la compréhension, et l'amélioration des sites Wikimedia, et nos obligations au titre de la loi. Nos "directrices de rétention d'information," qui sont à venir bientôt, expliquent ces pratiques plus en détail. DRenaud (WMF) (talk) 21:48, 11 December 2013 (UTC)[reply]

PRISM etc

Not sure if this is completely on topic, please point me towards the discussion if not, this is not my area of knowledge.

  1. Is the Wikimedia Foundation subject to the same FISA laws that Microsoft, Google etc have had to comply with and give over information?
  2. If so does the Wikimedia Foundation record anything they may want?
  3. If so this privacy policy will need to reflect this.

--Mrjohncummings (talk) 16:06, 4 September 2013 (UTC)[reply]

The WMF has been very clear that we have not been contacted in relation to that. General Counsel Geoff Brigham said in a blog post that "The Wikimedia Foundation has not received requests or legal orders to participate in PRISM, to comply with the Foreign Intelligence Surveillance Act (FISA), or to participate in or facilitate any secret intelligence surveillance program. We also have not “changed” our systems to make government surveillance easier, as the New York Times has claimed is the case for some service providers." Philippe (WMF) (talk) 20:58, 4 September 2013 (UTC)[reply]
Just to add to what Philippe has said, it is our understanding of the law that we can not be forced to 'Lie' (though they can force us to not comment/confirm including while we fight for it to be released), while I can certainly understand people's concerns about "them not even being able to tell us if it's true" I really do stress that we haven't received anything and would fight like crazy if we did. Also, we're really really bad liars, we are an incredibly leaky organization. Jalexander (talk) 08:03, 5 September 2013 (UTC)[reply]
This may be a crackpot idea, but given that you cannot be forced to lie, but can be forced to keep quiet, would it be possible for somebody - perhaps in the legal department - to report on a regular basis in a regular spot that "We haven't been contacted by the US Gov't this week to provide any information on users"? Smallbones (talk) 01:05, 7 September 2013 (UTC)[reply]
"Also, we're really really bad liars, we are an incredibly leaky organization." I assume that you're joking, but if you're not, why have a privacy policy at all? (Not joking.) -- Gyrofrog (talk) 03:59, 8 September 2013 (UTC)[reply]
Given the choice between believing Microsoft/Google/Facebook/US.gov or Snowden, I'd go with Snowden every time. I think the current evidence shows that the people at Google are lying by commision because they're being forced to. While I have orders of maginitude more trust in the people at the WMF than those at Google, I think Ladar Levison's decision to shut down Lavabit and his strong recommendation against trusting organizations "with physical ties to the United States" indicates that he didn't want to lie by commision. -- Jeandré, 2013-09-05t09:27z
Appreciate the discussion. Template:User' suggestion is that we implement what is actually the well-known Warrant canary scheme. Part of Template:User's excellent point is that it seems like either Google or Snowden are lying, and that if Google is lying, warrant canaries don't seem to work against the full might of the US Government. Was lavabit publishing a warrant canary? More importantly, should the WMF be doing so on a more regular basis? (the comments from Philippe & Jalexander are great for today, but not regularly made.) --Elvey (talk) 22:21, 8 September 2013 (UTC)[reply]
Even if the 2013-06-08 released slide is wrong, and organizations are not currently forced to lie by commission, but only by omision; then a warrant canary still wouldn't help if a WMF developer is asked to contravene the privacy policy (and/or the access to nonpublic information policy, the data retention guidelines, the transparency report, the requests for user information procedure) and forced not to tell the people who provide the warrant.
Every possible person with the ability to contravene these policies and who is subject to US law, would then have to provide daily warranties. I'm not actually suggesting this, because I think the 2013-06-08 released slide is correct, and organizations like Google are being forced to lie by commission.
Until this is clarified, I don't think any privacy policy from any organization "with physical ties to the United States" can be truthful unless it clearly states that it can't currently protect anyone's privacy if the powers that be come knocking. -- Jeandré, 2013-09-23t10:09z
Is it possible for anyone to verify exactly what software the WMF's servers are running and how the software is configured? It is trivial to download Mediawiki and various extensions, but is it possible for anyone to verify that the version of Mediawiki as run by the WMF isn't modified to provide information to the NSA? --Stefan2 (talk) 12:57, 5 September 2013 (UTC)[reply]
We are very transparent about our servers, how they are configured, and what they run. For example, you can see our production code and deployment recipes on Gerrit and piles of additional information on Wikitech. So I don’t think we object to transparency like that in principle. But verification that source code matches specific binaries is an extremely difficult challenge, even under relatively small and controlled circumstances where you can control every part of the build, and where you’re simply asking about a binary at one point in time, rather than on a live, running system. To do the same thing for an entire network infrastructure (not just Mediawiki, but the web server, operating system, network switches, etc.) would be effectively impossible, both in terms of difficulty and in terms of making it secure (since it would require trusted access to the live system in order to perform monitoring). Even if it were achievable, it would also make management difficult in practice: for example, we sometimes have security patches deployed that are not yet public (for legitimate, genuine security reasons), and we also have to be able to change configurations quickly and fluidly in response to changes in traffic, performance, etc., and doing this would be difficult if configurations and binaries had to be checksummed, compared, verified, etc. - LVilla (WMF) (talk) 02:05, 6 September 2013 (UTC)[reply]
Given everything that's happened, I'm not so sure I trust anyone anymore about what is and isn't watched/kept. I now assume everything is being watched/recorded/analyzed online. You can only hide in the bushes for so long, eventually you'll want to come out and play (online), so I guess you suck it up and move on. Government never tells you about it, one guys leaks it, then they move to make it more transparent and do the about face. Makes you wonder what else they're hiding, and it's sad that they have to hide it from us... 99.251.24.168 02:35, 6 September 2013 (UTC)[reply]
I understand why you are finding it hard to trust anyone, and I am glad that Stefan2 was trying to be creative about ways to increase trust. I just don't think this particular idea solves the problem. If it helps, we're trying to work on this issue; most notably right now by pushing the US government to allow more transparency from targets of national security letters. Suggestions on how else we can do that are welcome. - LVilla (WMF) (talk) 17:09, 6 September 2013 (UTC)[reply]
Of course it would be a bad idea to give anyone unlimited read access to the live servers. For example, it would allow anyone to extract any information from any database table, including information normally only available to checkusers and oversighters. Thanks, your reply sounds reassuring. --Stefan2 (talk) 19:13, 6 September 2013 (UTC)[reply]
Although I do not have any questions at this time concerning this, I wanted to thank you for addressing it in advance as it would have come to mind as I do live in the United States. Koi Sekirei (talk) 00:50, 8 September 2013 (UTC)[reply]
Prisms may still be used for disco parties. —Preceding unsigned comment added by 180.216.68.185 (talkcontribs) 14:29, 11 Sep 2013 (UTC)

I'm probably about to be dismissed as a nut case, but I would favor simply havin g Wikipedia programmed to automatically post any government requests in an appropriately titled article. 24.168.74.11 19:33, 12 September 2013 (UTC)[reply]

It's not nutty to want more transparency on this issue, but it's impossible to do this in an actual, automated fashion, and not clear that a semi-automated process is legal. We will be pushing shortly an overall transparency report, and we plan to do that regularly in the future. Hopefully that resolves some of the concerns. -LVilla (WMF) (talk) 16:29, 30 September 2013 (UTC)[reply]

Subject to US law

I think we should expand the section on the data being kept in the USA, and therefore subject to American laws. The PATRIOT Act comes to mind, where they can and will use any data you store in the US at any point in time against you at a later date. Doesn't matter where you live. So you might not want to post that nasty anti-American rant on a talk page, it might come back to bite you in the choo-choo later... Or the DMCA. I think of a certain Russian computer scientist who could have been arrested had he came to the US to give a speach as he posted information on anti-circumvention measures (Dmitry Sklyarov) ... Oaktree b (talk) 22:09, 4 September 2013 (UTC)[reply]

While some of this may be true (though there are lots of laws in Europe and other countries which can be problematic with what you post too and the US allows) I'm not sure I understand your example. There is very little (if any) added risk to posting your anti-american rant on the talk page on an American server. There are certainly risks, but the PATRIOT act does not necessarily make it more risky (especially given the legal system and our desire to fight against demands) then many other location options. Jalexander (talk) 00:29, 5 September 2013 (UTC)[reply]

This section concerns me as well as worries me. "to comply with the law, or to protect you and others" I think most of us are aware that our freedom in all areas is slowly but steadily eroding. In many countries, there is not even a pretense at giving freedom priority over other values, while in many others it is only a pretense. I wonder if there is a country left in the world that has not put that value at the bottom of a list of many other values like security and equality. Politicians and lawyers can and will find a way to abuse that which they can abuse for their own purposes. Laws were made to facilitate the sending of millions of people into concentration camps, why should they stop at keeping knowledge sacred? "to comply with the law, or to protect you and others" That is a mightily large back door.

Well I live in Canada, and even if I do my edits in Canada, should I do something distasteful to the Americans, they can hold me at the border for some stupid reason. We also have data privacy laws here in Canada (PIPEDA), but those don't apply to Canadian data stored on American servers. My point is you're essentially at their mercy, whether you like it or not. Just so people are made to understand that. You live in country XYZ, but American law applies to your edits and any data you divulge, so beware. 99.251.24.168 02:09, 6 September 2013 (UTC)[reply]
C'est partiellement mais pas complètement vrai, je pense. Une légende court depuis longtemps qui voudrait que c'est la loi du pays où se trouve les serveurs qui s'applique. La jurisprudence n'est pas encore établie, mais pour l'instant c'est faux. Les serveurs étant situés aux EU, les lois américaines s'appliquent en partie. Mais les producteurs et les consommateurs de contenu étant dans d'autres pays, d'autres lois peuvent s'appliquer. Par exemple, pour la Wikipédia francophone, une grosse partie des producteurs et les consommateurs de contenu se trouvant dans d'autres pays comme la France, le Canada, la Belgique, etc., il est très probable que certaines des lois de ces pays s'appliquent. Par exemple, une société dont le siège et les serveurs sont localisés au Luxembourg ont été condamné à appliquer le droit français ; Twitter a été poursuivi pour ne pas appliquer les lois françaises relatives à la liberté d'expression, mais l'affaire n'est pas allée jusqu'au procès car Twitter a préféré passer un accord avec les parties civiles ; Google est attaquée par les différentes CNIL européennes pour non respect des lois européennes de protection des données personnelles, plus contraignantes que les lois américaines ; dans ces deux cas, Twitter et Google prétendent qu'ils ne doivent appliquer que les lois américaines, mais cela est fortement contesté, et on peut douter que la justice leur donne raison. Ce serait très commode pour les entreprises multinationnales, mais quelle perte de souveraineté pour les citoyens et les pays concernés ! Je n'y crois pas du tout 78.251.253.2 11:18, 6 September 2013 (UTC)[reply]
Thanks for your comment. Please see my response to a related discussion here. YWelinder (WMF) (talk) 19:42, 7 September 2013 (UTC)[reply]

Legal response

Thanks for raising this question. I’ll tackle it in two parts:

First, generally: as we say in more detail in the policy’s section on our legal obligations, we must comply with applicable law, but we will fight government requests when that is possible and appropriate. For example, unlike some websites, we already are pretty aggressive about not complying with subpoenas that are not legally enforceable. (We’ll have precise numbers on that in a transparency report soon.) We’d love to hear specific feedback on how we can improve that section, such as additional grounds that we should consider when fighting subpoenas.

In addition, we are currently working on a document that will explain our policy and procedure for subpoenas and other court orders concerning private data. We will publish the document publicly, follow it when responding to requests, and also provide it to law enforcement so that they know about our unusually strict policy on protecting user data.

Second, with regards to surveillance programs like PRISM and FISA court orders: We are subject to US law, including FISA. However, as we have previously publicly stated, we have not received any FISA orders, and we have not participated in or facilitated any government surveillance programs. In the unlikely instance that we ever receive an order, we are making plans to oppose it.

Beyond the legal realm, we continue to evaluate and pursue appropriate public advocacy options to oppose government surveillance when it is inconsistent with our mission. For example, the Wikimedia Foundation signed a letter with the Center for Democracy and Technology requesting transparency and accountability for PRISM. If you are interested in proposing or engaging in advocacy on this issue, please consider joining the advocacy advisory group. We also continue to implement technical measures that improve user privacy and make surveillance more difficult. For example, we enabled HTTPS on Wikimedia sites by default for logged in users. For more information, see our HTTPS roadmap.

As always, we greatly appreciate your input on this complex issue. Please note that if you have questions that are specific to surveillance, and not tied to the privacy policy itself, the best place to discuss those is on the Meta page on the PRISM talk page, not here.

Best, Stephen LaPorte (WMF) (talk) 00:03, 6 September 2013 (UTC)[reply]

La question n'est pas de résister du mieux possible à l'application de lois avec lesquelles nous ne sommes pas d'accord : les lois sont là, elles ont été votées démocratiquement, nous devons les appliquer, point barre. Nous ne devons pas faire de politique ! Occupons-nous plutôt d'écrire l'encyclopédie, et appliquons les lois quand elles s'appliquent, de quelque pays qu'elles soient 78.251.253.2 11:38, 6 September 2013 (UTC)[reply]
Nous ne devons pas faire de politique? C'est une position que j'ai du mal à comprendre, pour la raison suivante: à quoi bon contribuer à une encyclopédie si elle aussi devient un instrument de répression? Au contraire, je suis persuadé que l'histoire nous apprend que nous devons résister aux lois injustes le mieux possible ... bien qu'on puisse parler de votes démocratiques dans le cas des lois en question, je conteste cette interprétation (à la surface, c'en étaient -- modulo la désinformation, la corruption/le lobbyisme, la pression venant des services secrets ...), elles ont été promulguées par un électorat en majorité analphabète en matière de technologie, donc sujet à toute sorte de manipulation -- les avis d'experts indépendants ne comptent plus pour des nèfles. C'est la peur qui gouverne la société pré-(techno)fasciste, pas la raison.
Summary: I strongly oppose unquestioning compliance with unjust laws, passed democratically or not. We can not abstain from being political in this matter because otherwise what we do becomes part of the unjust system. Ɯ (talk) 10:51, 10 September 2013 (UTC)[reply]

Localisation des serveurs aux Etats-Unis et loi applicable

Les explications indiquent que les serveurs sont situés aux Etats-Unis et que nous devons accepter que ce soit la loi américaine de protection des données personnelles qui s'applique, même si elle est moins protectrice que la nôtre, et que dans le cas contraire nous ne devons pas utiliser Wikipédia. Ca veut dire que nous devons nous barrer tout de suite ? De toutes façons, je ne crois pas que ce soit légal. La Wikipédia francophone concernant en grande partie des Français (ainsi que des Québécois, Belges, Africains, Suisses, etc.), je pense que les juridictions des publics concernés ont leur mot à dire, et que leurs lois doivent d'appliquer. La jurisprudence n'est pas encore bien établie, mais d'ores et déjà certains décisions judiciaires sont allées dans ce sens. En tous cas, personnellement, je ne suis pas du tout d'accord pour donner mon consentement à ce que ce soit la loi américaine qui s'applique. Bien trop dangereux ! La loi américaine n'est pas assez protectrice ! Sans parler de toutes ces lois liberticides prises à la suite des attentats du 11 septembre, sans grand contre-pouvoir pour contrôler leur mise en oeuvre ! 78.251.246.17 22:55, 4 September 2013 (UTC)[reply]

Pourquoi parles-tu uniquement de la Wikipédia francophone ? Il existe plusieurs centaines de projets dans plein de langues, dont les pays pourraient également avoir leur mot à dire. En clair, la fondation ne peut pas suivre toutes les lois du monde et s'arrête donc à celle de son pays. Elfix 07:47, 5 September 2013 (UTC)[reply]
Le problème est qu'on a plusieurs centaines de projets dans plein de langues, mais aussi plusieurs centaines de pays qui, que vous le vouliez ou non, sont souverains, ont leurs propres lois, et ont le droit d'avoir leurs propres lois. C'est un fait. Qu'on le veuille ou non. Et la question n'est pas de savoir si la fondation peut suivre toutes les lois du monde, la question est qu'elle DOIT suivre les lois du monde, car ses activités ne s'arrêtent pas aux frontières de son pays mais s'étendent dans le monde entier. Non seulement elle DOIT suivre les lois des pays auxquels ses activités s'étendent, mais pour un pays comme la France ou n'importe quel pays européen, dont les lois sont beaucoup plus protectrices vis-à-vis de la vie privée des citoyens que la loi américaine, c'est même hautement souhaitable. C'est la raison pour laquelle cette clause est mauvaise. Si l'excuse pour laquelle la Fondation explique qu'il faut adopter la loi américaine, même si elle est moins protectrice que celle de notre pays, est que les serveurs sont aux Etats-Unis, dans ce cas rapatrions les serveurs en Europe. Dans tous les cas ce sont les lois les plus protectrices que nous devons respecter, car si nous respectons les lois les plus protectrices, alors nous respectons toutes les lois, y compris les lois américaines ou de tous les pays 78.251.243.204 18:26, 5 September 2013 (UTC)[reply]
J'ai fait le point en anglais plus haut, mais c'est la même: toute information que vous soumettez au Wikipedia anglais/françcais/allemand etc. est gardée aux USA, donc votre loi locale ne s'applique probablement pas. Au Canada par exemple, nous avons LPRPDE (PIPEDA en anglais) pour la protection des données et des documents électroniques; toute information qui n'est pas sur un ordinateur canadien n'est pas protégée. Donc, si pour une raison ou un autre, Obama ou le gouvernement américain décide de fouiller dans votre information, tant pis! Toute protection locale s'arrête à la frontière. Vous n'avez qu'à regarder le cas d'Edward Snowden ou de Julien Assange; on peut très facilement vous rendre la vie très difficile s'ils décident que vous êtes l'ennemi des USA... Gare à vous. Caveat emptor. 99.251.24.168 02:24, 6 September 2013 (UTC)[reply]
Bonjour 99.251.24.168 et merci de votre réponse :-) J'ai moi aussi répondu plus haut. Je pense au contraire que les lois des pays souverains ont toute chance de s'appliquer. Mais dans le cas que vous décrivez de données canadiennes conservées sur des serveurs américains, les lois américaines s'appliquent AUSSI, et c'est bien normal, les EU sont un pays souverain, comme le Canada. Dans les affaires de ce type, qui concernent plusieurs pays, le droit applicable est toujours un compromis entre les différents droits concernés. Ne croyez pas que seules les lois du pays hébergeant les serveurs s'appliquent. Cave canem ! ;-) 78.251.253.2 11:47, 6 September 2013 (UTC)[reply]

Thank you for your comments and my apologies for responding in English. Jurisdiction is a complex issue that is determined based on a case-by-case analysis. Generally, we apply U.S. law, but we are sensitive to European data protection laws. For example, a version of this privacy policy was reviewed by a privacy counsel in Europe to ensure consistency with general principles of data protection.

The important issue for our users' data is our commitment to privacy rather than the general privacy law in the country where the Wikimedia Foundation is based. Our privacy policy generally limits the data collection and use to what is necessary to provide and improve the Wikimedia projects. For example, we commit to never selling user data or using it to sell them products. In other words, the commitments we make in this policy go beyond commitments made by many online sites, including those based in Europe. And we encourage users to focus on and provide feedback about those commitments because the commitments are ultimately what matters for their privacy on the Wikimedia sites.YWelinder (WMF) (talk) 19:36, 7 September 2013 (UTC)[reply]

Certes, plus que de savoir si c'est la législation de tel ou tel pays qui s'applique, c'est plutôt les détails des Règles ou Charte de protection des données personnelles de Wikimédia qui nous importent. Cependant, les législations (américaines, européennes) sont des références communes et pratiques offrant une base rassurante, parce qu'elles ne nous sont pas complètement inconnues. Dans cette logique, et pour nous aider à mieux appréhender la Charte, serait-il possible qu'une personne compétente nous fasse un résumé de ce qui diffère entre cette Charte et les législations américaine ou européennes ? Comment la Charte se situe-t-elle par rapport à ces législations ? 85.170.120.230 10:43, 8 September 2013 (UTC)[reply]
Hi 85! There is currently no significant body of federal online privacy law in the United States. There are, however, some specific federal and state-by-state laws that mostly have to do with the treatment and disclosure of particularly sensitive materials, such as medical and criminal records. These kinds of privacy laws, for the most part, do not apply to us, as we do not collect such types of sensitive information.
We are, of our own volition, doing as much as we are capable of to meet the expectations of community members domestically and abroad, well above and beyond what United States law requires of us.
Regarding user information that we may be required to furnish pursuant to formal legal process, you will find this issue addressed here. For more information on how United States privacy law compares with privacy laws in Europe, you may wish to consult the writings of Professor Paul M. Schwartz. DRenaud (WMF) (talk) 23:30, 18 December 2013 (UTC)[reply]
Thanks for the link: it was an interesting read, though more focused on the dynamics of politics (with some scattered mention of the last EU directive drafts). The report by professor Chris Hoofnagle linked above, while definitely more boring, has a more "usable" list of differences and issues. --Nemo 12:49, 19 December 2013 (UTC)[reply]

Localisation des serveurs aux Etats-Unis et loi applicable bis

Je demande le retrait du paragraphe Où se trouve la Fondation et qu’est-ce que ceci implique pour moi ? 78.251.243.204 19:05, 5 September 2013 (UTC)[reply]

My apologies for the response in English. If someone would be so kind as to translate this into French, I would be much obliged. Are there any particular reasons that you are requesting removal of that section? Is there any specific language that concerns you? If so, please specify. Mpaulson (WMF) (talk) 22:23, 5 September 2013 (UTC)[reply]
Traduction / translation : « Excusez-moi de répondre en anglais. Si quelqu'un avait la gentillesse de tranduire mon message en français, je lui en serai reconnaissant. Y a-t-il des raisons particulières pour que vous demandiez le retrait de cette section ? Y a-t-il une langue spécifique qui vous concerne ? Si tel est le cas, veuillez le préciser. » Jules78120 (talk) 22:37, 5 September 2013 (UTC)[reply]
Merci Mpaulson de votre réponse (et merci à Jules78120 pour sa sympathique traduction :-) ). Les raisons particulières qui me poussent à demander le retrait de cette section sont les mêmes que celle déjà développées plus haut dans la section Localisation des serveurs aux Etats-Unis et loi applicable et dans plusieurs autres sections telles par exemple que NSA, FISC, NSL, FISAAA, PRISM... Je me permets juste d'être un peu plus insistant dans ma demande, avec votre permission :-) 78.251.243.204 00:54, 6 September 2013 (UTC)[reply]
So, while we as an organization and I personally have some sizable objections to PRISM and many of the actions taken by the US government recently with regards to privacy, removing this section will not actually change the applicability of US law. The Foundation is located in the US, meaning that using our sites leads to the transfer of data to the US, and thus is subject to US law. Mpaulson (WMF) (talk) 01:09, 6 September 2013 (UTC)[reply]
Bien sûr que les serveurs sont situés aux EU et que les lois américaines s'appliquent (à ce propos, on devrait peut-être songer à redéménager les serveurs en dehors des EU !). Par contre, je ne suis pas d'accord avec la phrase « Vous consentez également au transfert de vos informations par nous depuis les États-Unis vers d’autres pays qui sont susceptibles d’avoir des lois sur la protection des données différentes ou moins contraignantes que dans votre pays, en lien avec les services qui vous sont fournis. » Je ne suis pas d'accord pour que mes données soient transmises n'importe où, y compris à des entreprises situées dans des pays où les lois autoriseraient n'importe qui à faire n'importe quoi avec. Si nos données sont transmises, elles ne doivent l'être qu'avec la garantie que nos données seront protégées au moins autant que dans notre pays, ou en tous cas au moins autant qu'aux EU. Quelque soit l'entreprise ou le pays vers lesquels sont transmises nos données, on doit s'assurer que la Charte de confidentialité soit garantie. Sinon, on ne transmet pas. La Charte n'établit, je trouve, pas ce point assez clairement (par exemple les paragraphes Si l’organisation est cédée (très peu probable !) et À nos prestataires de services manquent à mon avis de précision) 78.251.253.2 12:36, 6 September 2013 (UTC)[reply]
P.S. : EU en français = Etats-Unis = United States = US en anglais ; je m'excuse, j'aurais dû écrire Etats-Unis en toutes lettres :-) 85.170.120.230 01:51, 7 September 2013 (UTC)[reply]
Unfortunately, US privacy law is still very much developing and the EU considers the US to have less stringent data protection laws than the US. So using a Wikimedia Site means that, if you are a resident of Europe, your data is being transferred to a country with less stringent data protection laws that your country. There isn't really a way for you to use the Wikimedia Sites without consenting to that kind of transfer unfortunately. But differences in privacy regimes aside, the Wikimedia Foundation seeks to put into place contractual and technological protections with third parties (no matter what country they may be located in) if they are to receive nonpublic user information, to help ensure that their practices meet the standards of the Wikimedia Foundation's privacy policy. Mpaulson (WMF) (talk) 18:59, 6 September 2013 (UTC)[reply]
This is not quite correct. If I visit google.com from Italy, I'm asked whether I want to accept a cookie or not, though in USA you are not. Moreover, Google managers were held criminally liable for privacy violation in a meritless case which however ruled that «the jurisdiction of the Italian Courts applies [...] regardless of where the Google servers with the uploaded content are located».[1] --Nemo 19:26, 6 September 2013 (UTC)[reply]
What does this mean: "the EU considers the US to have less stringent data protection laws than the US"? PiRSquared17 (talk) 19:27, 6 September 2013 (UTC)[reply]
«Special precautions need to be taken when personal data is transferred to countries outside the EEA that do not provide EU-standard data protection.»[2] «The Commission has so far recognized [...] the US Department of Commerce's Safe harbor Privacy Principles, and the transfer of Air Passenger Name Record to the United States' Bureau of Customs and Border Protection as providing adequate protection.»[3] «In many respects, the US is a data haven in comparison to international standards. Increasing globalization of US business, evidenced by the Safe Harbor agreement, is driving more thinking about data protection in other countries. Still, political and economic forces make a European style data protection law of general applicability highly unlikely in the near future».[4] WMF is also not in [5], FWIW. --Nemo 19:46, 6 September 2013 (UTC)[reply]
Note that we cannot be in the Safe Harbor program, because the Federal Trade Commission does not have jurisdiction over non-profit organizations. (See "Eligibility for Self-Certification" on the Safe Harbor page.) We would likely join if we could. -LVilla (WMF) (talk) 22:47, 17 September 2013 (UTC)[reply]
Interesting. I was merely answering PiRSquared17's question, but if the WMF would like to join the self-certification program if only it was possible, why not adhere to those obligations in the policy? It won't trigger the law obligations (and advantages), but WMF is free to voluntarily stick itself to higher standards. --Nemo 14:13, 27 September 2013 (UTC)[reply]
Indeed. This is another example of a response we have seen elsewhere on this page, where WMF has argues that as a non-profit it is not required to adhere to certain privacy-related standards. It would of course be possible to adhere to those standards voluntarily, and I think there should be an explicit statement of what consideration if any has been given to such voluntary adherence. Spectral sequence (talk) 17:15, 27 September 2013 (UTC)[reply]
@Mpaulson : J'ai l'impression que vous avez mal compris mon abréviation EU, qui signifiait Etats-Unis (d'Amérique). Pardon. Ceci dit, même si les lois américaines sont en effet souvent considérées moins protectrices des données personnelles que les lois européennes, les Règles de protection des données personnelles (Privacy Policy) de Wikimédia peuvent tout à fait garantir un niveau de protection supérieur aux lois américaines. Garantir un niveau de protection inférieur aux lois américaines ne serait pas légal, mais garantir un niveau de protection supérieur aux lois américaines, et même supérieur aux lois européennes ou à d'autres lois, est tout à fait possible et compatible avec le droit américain. Il suffit d'adopter des Règles au moins aussi protectrices que les différentes législations nationales (un plus grand commun dénominateur des différentes législations, donc). Je ne vois pas ce qui nous en empêche. Et il faut bien entendu que tous les prestataires de services s'engagent ensuite à respecter ce niveau de protection (comme déjà stipulé dans le paragraphe À nos prestataires de services) 85.170.120.230 02:22, 7 September 2013 (UTC)[reply]
Dans un but de meilleure compréhension, serait-il possible que quelqu'un de compétent nous explique en quoi ces Règles de Confidentialités diffèrent du droit européen ? En quoi elles seraient moins protectrices que celui-ci ? Une explication du genre de celle donnée ci-dessus dans la section What is changing? serait très intéressante ! 85.170.120.230 02:32, 7 September 2013 (UTC)[reply]
En particulier, comme évoqué par Nemo, comment se situe la WMF par rapport au cadre juridique Safe Harbor ? 85.170.120.230 12:10, 8 September 2013 (UTC)[reply]
Hi Anonymous. Without going into exhaustive detail, the United States as a whole largely has no explicit privacy framework. The Safe Harbor framework is not so much a United States privacy framework as a system where organizations in the United States can agree to maintain minimum levels of protection similar to that provided in the European Union. This is a particularly helpful system for large companies that tend to have a big physical presence in Europe (and therefore are definitely subject to European laws) and have the need to send massive amounts of personal information between the United States and the European Union. As LVilla mentioned earlier, even if we had the resources available to meet the exact standards required to participate in the Safe Harbor program, we are not eligible because the FTC (who enforces the program) does not have jurisdiction over WMF because it's a non-profit. In the United States, there are federal (i.e. national) laws that may touch on privacy, such as those protecting children, but even those may not apply to every organization or every situation. There are also state laws that address specific aspects of privacy, but those vary from state-to-state and also tend to only address specific scenarios. California is amongst the most protective, but still does not come anywhere the regulatory framework that the European Union has.
One way organizations in the United States have attempted to provide higher standards is through their commitments to do so in their privacy policies. This is what we are doing here with our privacy policy. This draft is meant to explain the minimum levels of protections we can guarantee at this point in the organization's evolution. We are striving to provide greater protections as we learn and grow (and it should be noted that nothing in this or any privacy policy draft we will ever have will prevent us from providing greater protections than outlined in the policy). Mpaulson (WMF) (talk) 18:14, 27 September 2013 (UTC)[reply]

Closing off, stale. Will archive in 24-48 hours, a new section is probably best if further questions. Jalexander--WMF 22:15, 6 November 2013 (UTC)[reply]

Actually I think this is perfect. Comment by Spectral sequence 17:15, 27 September 2013 (UTC) has not been addressed (yes, we know this is legal in USA; would it be legal in EU? not hard to understand the question). LVilla said above "We would likely join if we could", so let's pretend that you can: what would it entail? --Nemo 22:42, 6 November 2013 (UTC)[reply]
By the way, Restoring Trust in EU-US data flows - Frequently Asked Questions (European Commission - MEMO/13/1059 27/11/2013). --Nemo 09:13, 2 December 2013 (UTC)[reply]
Hello Nemo, thanks for this link. We are in the process of researching and preparing a response to address Spectral sequence's questions. Stephen LaPorte (WMF) (talk) 20:06, 9 December 2013 (UTC)[reply]
Nice, looking forward to it. --Nemo 12:01, 19 December 2013 (UTC)[reply]

Collection of "unique device identification numbers"

MOVED FROM WIKIPEDIA VILLAGE PUMP

Hi, at http://meta.wikimedia.org/wiki/Privacy_policy/BannerTestA, it says:

Because of how browsers work and similar to other major websites, we receive some information automatically when you visit the Wikimedia Sites. This information includes the type of device you are using (possibly including unique device identification numbers), the type and version of your browser, your browser’s language preference, the type and version of your device’s operating system, in some cases the name of your internet service provider or mobile carrier, the website that referred you to the Wikimedia Sites and the website you exited the Wikimedia Sites from, which pages you request and visit, and the date and time of each request you make to the Wikimedia Sites.

What sort of "unique device identification numbers" is it referring to? I thought browsers didn't provide that information. 86.169.185.183 (talk) 17:40, 4 September 2013 (UTC)[reply]

Looking at similar privacy policies, it looks like this may refer to mobile devices: "AFID, Android ID, IMEI, UDID". --  Gadget850 talk 17:45, 4 September 2013 (UTC)[reply]
You mean that when you access a website through a browser on an Android device the website can collect a unique device ID? Is that really correct? (I can believe it for general apps, where, presumably the app can do "anything" within permissions, but I didn't think there was any such browser-website mechanism). 86.169.185.183 (talk) 18:58, 4 September 2013 (UTC)[reply]
I think this question is more appropriate for the Talk page discussion on the privacy policy draft. Steven Walling (WMF) • talk 20:31, 4 September 2013 (UTC)[reply]

I see that this information is "receive[d] [...] automatically". That doesn't necessarily mean this information needs to be collected and stored. Personally I am fine with this information being temporarily handled in a volatile location in order to cater to the display needs of each individual device. I do not however, believe that this information should be stored or used for any other means. Participation in this data-mining should be off by default. WMF would of course be free to nag users into opting in. Because this is a _free_ encyclopedia, users should be _free_ to at least view it in the way they want, without having all their habits and device details harvested non-consensually. Contributions? Edits? Sure, take all you want. There's an implicit agreement to such data-mining when a user submits an edit. But there isn't one from just viewing a page. --129.107.225.212 16:59, 5 September 2013 (UTC)[reply]

Thanks, but that is not really relevant to my question (not sure if it was supposed to be), My question is whether it is technically possible for a website to obtain "unique device identification numbers" from a web browser. The text implies that it is; previously I believed it wasn't. I am hoping that someone will be able to answer the question. 86.167.19.217 17:27, 5 September 2013 (UTC)[reply]
You are correct in stating that browsers are sandboxed from retrieving this type of information. However, our mobile apps and our mobile app deployment infrastructure may utilize "unique device identification numbers" to identify mobile devices (such as a device tokens, device unique user agents, or potentially UDIDs). Our mobile apps may need this ID for certain functionality, such as sending push notifications or delivering test deployments. Thanks, Stephen LaPorte (WMF) (talk) 17:11, 6 September 2013 (UTC)[reply]
I think we have no intention of accessing or recording device UDID, IMEI number, or anything else like that. (It's also getting increasingly hard for apps to get access to those, as the OS vendors don't like creepy apps either.) In the cases where we do usage tracking and need identifiers, they'll be either based on something already in the system -- like your username/ID -- or a randomly-generated token. --brion (talk) 17:20, 6 September 2013 (UTC)[reply]
In that case, I think the wording needs adjusting since it currently says "Because of how browsers work [...] we receive some information automatically when you visit the Wikimedia Sites [...] possibly including unique device identification numbers". Mobile apps are not "browsers". 86.160.215.210 20:53, 9 September 2013 (UTC)[reply]
Thanks -- I made a small change to clarify that it applies to mobile applications. - Stephen LaPorte (WMF) (talk) 22:33, 6 November 2013 (UTC)[reply]
Thanks to the long term foundation policy of enabling widespread vandalism from IP addresses (because who cares how much time dedicated users spend reverting vandalism when they could be productively editing.. far more important not scare off someone who wants to add 'is a dick' to a biography), and the genius decision to enable vandalism from IPv6 addresses, Wikimedia is now actively enabling access to unique identifying data not just by Wikimedia admins, but by absolutely anyone in the world. Unless a Wikipedia user forced onto an IPv6 network takes extraordinary steps- steps which they are highly unlikely to be aware of unless they are reasonably technically savvy and thus have a Wikipedia account anyway- they will now be trackable to the household, if not the *device* level. Genius! John Nevard (talk) 14:37, 14 September 2013 (UTC)[reply]

Further clarification on unique identifiers for mobile applications?

Below, @Nemo bis: asked for clarification about why the policy still mentions unique device identification numbers after Brion's response. The intention for this sentence is to clarify that our applications could possibly collect unique device identification numbers, which may still be applicable for some applications, although not all of them. This sort of technical detail will depend precisely on the operating system, device, and application. I would welcome an alternative phrasing, if you think this could be clarified further in the policy. Thanks for everyone's attention to detail here. Stephen LaPorte (WMF) (talk) 20:49, 22 November 2013 (UTC)[reply]

Yes, add that said unique device identification numbers are not accessed nor recorded, per Brion above. Covering them and not explicitly excluding their usage is worse than not mentioning them at all. --Nemo 10:35, 25 November 2013 (UTC)[reply]

So, what is the purpose of all this?

The following discussion is closed: closing the top section given staleness but leaving unsampled logs area open will archive when both sections done. Jalexander--WMF 22:25, 18 December 2013 (UTC) [reply]

I've read the draft from beginning to end, and I have no idea what you wanted me as a user to get from it. What's the purpose, what does it improve compared to the much shorter and more concise current policy which provides very clear and straightforward protections such as the four (4) magic words «Sampled raw log data» (see also #Data retention above)? Is the purpose just adding tracking pixels and cookies for everyone, handwashing (see section above) and generally reducing privacy commitments for whatever reason? --Nemo 21:31, 4 September 2013 (UTC)[reply]

Hi Nemo, Thanks for your comment. I outlined some specific reasons for why we needed an update above. YWelinder (WMF) (talk) 01:12, 6 September 2013 (UTC)[reply]
See here for Yana's summary. Geoffbrigham (talk) 02:12, 6 September 2013 (UTC)[reply]
The summary only says things I already knew, because I read the text. What's missing is the rationale for such changes, or why the changes are supposed to be an improvement. One hint: are there good things that we are not or will not be able to do due to the current policy and what changes are proposed in consequence?
Additionally, the summary doesn't even summarise that well IMHO, e.g. the language about cookies is not very clear and you didn't write anything about making request logs unsampled (which means having logs of all requests a user makes). --Nemo 06:47, 6 September 2013 (UTC)[reply]
I've forwarded your question to our tech team. Relevant members of the tech team are out for a conference and will respond to this shortly.YWelinder (WMF) (talk) 01:04, 12 September 2013 (UTC)[reply]

Unsampled request logs/tracking

Hey Nemo!
You have raised the question why we want the ability to store unsampled data and that’s a great question!
Two important use-cases come to mind. The first use case is funnel analysis for fundraising. As you know, we are 100% dependent on the donations by people like you -- people who care about the mission of the Wikimedia movement and who believe in a world in which every single human being can freely share in the sum of all knowledge.
We want to run the fundraiser as short as possible without annoying people with banners. So it’s crucial to understand the donation funnel, when are people dropping out and why. We can only answer those kind of questions if we store unsampled webrequest traffic.
The second use case is measuring the impact of Wikipedia Zero. Wikipedia Zero’s mission is to increase the number of people who can visit Wikipedia on their mobile phone without having to pay for the data charges: this is an important program that embodies our mission. Measuring the impact means knowing how many people (unique visitors) are benefiting from this program. If we can measure this then we can also be transparent to our donors in explaining how their money is used and how much impact their donations are making.
I hope this gives you a better understanding of why we need to store unsampled webrequest data. It is important to note that we will not build long historic reader profiles: the Data Retention Guidelines (soon to be released) will have clear limits on how long we will store this type of data.
Best regards,
(in my role as Product Manager Analytics @ WMF)
Drdee Drdee (talk) 23:03, 12 September 2013 (UTC)[reply]
Thank you for your answer. Note that this is only one of the unexplained points of the policy, though probably the most controversial one (and for some reason very well hidden), so I'm making a subsection. I'll wait for answers on the rest; at some point we should add at the top a notice of the expected improvements users should like this policy for (this is the only one mentioned so far apart from longer login duration, if I remember correctly).
Frankly, your answer is worse than anything I could have expected: are you seriously going to tell our half billion users that you want them to allow you to track every visit to our websites in order to target them better for donations and for the sake of some visitors of other domains (the mobile and zero ones)? This just doesn't work. I'm however interested in knowing more.
  • Why does fundraising require unconditional tracking of all visits to Wikimedia projects? If the aim is understanding the "donation funnel" (note: the vast majority of readers of this talk doesn't understand you when you talk like this), why can't they just use something like the ClickTracking done in 2009-2010 for the usability initiative, or the EventLogging which stores or should store only aggregate data (counts) of events like clicks of specific things?
  • I know that Wikipedia Zero has struggled to find metrics for impact measure, but from what I understood we do have some metrics and they were used to confirm that "we need some patience". If we need more statistics so desperately as to desire tracking all our visitors, I assume other less dramatic options have been considered as well? For instance, surely the mobile operators need how much traffic they're giving out for free that they would otherwise charge; how hard can it be for them to provide this number? (Of course I know it's not easy to negotiate with them; but we need to consider the alternatives.) --Nemo 06:51, 13 September 2013 (UTC)[reply]
Hi Nemo,
I think you are switching your arguments: first you ask why we would need to store unsampled webrequest data. You specifically asked "are there good things that we are not or will not be able to do due to the current policy and what changes are proposed in consequence?". I give you two use cases both being a type of funnel analysis that require unsampled data (the two use cases are btw not an exhaustive list). Then you switch gears by setting up a Straw man argument and saying that we will use it for better targeting of visitors. That's not what I said, if you read my response then I said we want to know when and why people drop out of a funnel.
The fact that you quote our half billion users indicates that we need unsampled data: we don't know for sure how many unique visitors we have :) We have to rely on third-party estimates. You see even you know of use-cases for unsampled data :)
Regarding Wikipedia Zero: the .zero. domain will soon be deprecated so that will leave us with only the .m. domain so we cannot restrict unsampled storage to .zero. In addition, most Wikipedia Zero carriers do not charge for .m. domains as well.
Regarding the Fundraising: I am answering your question and I am sure you know what a donation funnel is; I was not addressing the general public. EventLogging does not store aggregate data but raw unsampled data.
I am not sure how I can counter your argument 'This just doesn't work'.
Drdee (talk) 19:08, 18 September 2013 (UTC)[reply]
I'm sorry that you feel that way, I didn't intend to switch arguments. What does "We want to run the fundraiser as short as possible" mean if not that you want to extract more money out of the banners? That's the argument usually used by the fundraiding team, that the higher the "ROI" is the shorter the campaign will be. If you meant something else I'm sorry, but then could you please explain what you meant?
I'm also sorry for my unclear "This just doesn't work"; I meant that in this section I'm asking why the users, with whom we have a contract, should agree to revise it: what do they gain ("what is the purpose")? I still don't see an answer. For instance, knowing for sure how many unique users we have is not a gain for them; it's just the satisfaction of a curiosity the WMF or wikimedians like me can have.
As for Zero, I don't understand your reply. Are you saying that yes, other ways to get usage stats were considered but only unsampled tracking works? And that I'm wrong when I assume that operators would know how much traffic they're giving for free? --Nemo 14:55, 27 September 2013 (UTC)[reply]
Hi Nemo, I'll let other folks chime in to articulate the needs for the Fundraiser and Zero, I am with you on the fact that Wikimedia should collect as little data as possible but let me expand on the point you make about "curiosity regarding UVs". Measuring reach in terms of uniques is more than just a matter of "curiosity". We currently rely on third-party data (comScore) to estimate unique visitors but there are many reasons why we want to reliably monitor high-level traffic data based on uniques. We recently obtained data about the proportion of entries from Google properties as part of a review of how much of our readership depends on search engines. I cite this example because any significant drop in search engine-driven traffic is likely to affect Wikimedia's ability to reach individual donors, new contributors and potential new registered users. Similarly, we intervened in the past to opt out of projects such as Google QuickView based on evidence that they were impacting our ability to reach and engage visitors by creating intermediaries between the user and the content. Using UV data (particularly in combination with User Agents) also helps us determine whether decisions we make about browser support affect a substantial part of our visitor population. As Diederik pointed out, EventLogging does collect unsampled behavioral data about user interaction with our websites to help us run tests and improve site performance and user experience. The exact data collected by EventLogging is specified in these schemas and is subject to the data retention guidelines that the Legal team is in the process of sharing. DarTar (talk) 20:23, 9 December 2013 (UTC)[reply]

Information We Collect: proposed disclosure is misleadingly incomplete.

The following discussion is closed: closing given lack of response and staleness, will archive in a couple days unless reopened. Jalexander--WMF 22:30, 18 December 2013 (UTC) [reply]

Paragraph 1 of Information We Collect:

"We actively collect some types of information with a variety of commonly used technologies. These may include tracking pixels, JavaScript, and a variety of “locally stored data” technologies, such as cookies and local storage. We realize that a couple of these terms do not have the best reputation in town and can be used for less-than-noble purposes. So we want to be as clear as we can about why we use these methods and the type of information we use them to collect."

I strongly object to this policy as proposed. Clear about what is collected? Not yet! No mention of screen / window resolution, plugin versions, fonts available, or lots more. Let's not set a bad example and be deceitful about what we collect and justify it (to ourselves) as necessary for security reasons.

Is it appropriate for users to edit the draft directly at this time? Is the last sentence even a sentence? The draft sure seems to be an early draft, and it's not edit-protected. I could swap in something like this:

(Newer suggestion below.)"We actively collect some types of information with a variety of commonly used technologies. These generally include WP:tracking pixels, JavaScript, cookies, and a variety of other “locally stored data” technologies, such WP:local storage, and may include collected information regarding screen / window resolution, plugin versions, fonts available and more. We realize that a couple of these technologies have poor reputations and can be used for less-than-noble purposes. Therefore, we want to be as clear as we can about why we use these methods and the type of information we collect using them."

--Elvey (talk) 22:38, 8 September 2013 (UTC)[reply]

Hi Elvey, thanks for your comments. We are going to check with Tech on this and get back to you. Geoffbrigham (talk) 03:22, 9 September 2013 (UTC)[reply]
Dear Elvey,
Thank you for raising this issue. I believe you are asking why we have not included a comprehensive list of the information we are collecting or may collect in the future and you mention a couple of examples including: screen / window resolution, plugin versions and fonts available.
My first response would be that we are already transparent about the information we collect when assessing the efficacy of a new feature. I believe that a better place to disclose that information is not within the Privacy Policy, because it’s a policy which stipulates our principles and guidelines. Those principles and guidelines are embodied when we actually run experiments and collect data. For example, currently we use EventLogging to instrument our features. The mobile team created a schema to determine the number of upload attempts using the mobile Commons app, in order to measure whether new educational UI features were helping more people make their first upload. The schema will tell you exactly what information is collected and for what purpose and if you have a question you can interact with the developers through the talk page.
My second response is that it seems that you are alluding to the practice of browser sniffing to uniquely identify a reader by collecting as much information about the browser as possible including plugins and fonts. The EFF has a website called panopticlick that shows you how unique your browser is based on this technique.
This technique can be used to keep tracking people even when they clear their cookies after each session. Suffice to say, we will never employ this technique because it would violate our principle of collecting as little data as possible.
You are right that you could edit the new Privacy Policy but it would complicate the discussion significantly as we would not refer to the same draft anymore. The Legal Team will make changes in response to feedback from the community after the discussion regarding such change has been fleshed out and they are also trying to track changes internally, both things that would not work very well if everyone was editing the draft.
I hope this addresses your concerns but please feel free to add a follow-up question.
Best regards,
(in my role as Product Manager Analytics @ WMF)
Drdee (talk) 21:21, 11 September 2013 (UTC)[reply]
NOTE: what follows is a back and forth with Elvey and Drdee; indentation indicates who said what.
Thanks so much, for a thorough response!
I'm pleased to see that we 'are already transparent about the information we collect when assessing the efficacy of a new feature,' as your example shows. On the other hand, indeed, I strongly object to this policy as proposed, because I don't see that we 'are already transparent about the information we collect,' in general, yet. The place to disclose the latter is within the Privacy Policy, IMO.
I believe we are transparent about the information we collect: we clearly identify different types of information that we collect and for what purpose.
Re. your second response: Indeed, that is what concerns me. We disagree; I do not see browser sniffing as necessarily incompatible with the principle of collecting as little data as is consistent with maintenance, understanding, and improvement of the Wikimedia Sites; I can think of cases where it would aid security. However, I would be happy to see language in the policy that made it clear(er) that browser sniffing is incompatible with policy. What language do you suggest we add to do so, if you are amenable? How 'bout we swap in something like this?:
"We actively collect some types of information with a variety of commonly used technologies. These generally include EN:tracking pixels, JavaScript, cookies, and a variety of other “locally stored data” technologies, such W:local storage, and may include collected information regarding screen / window resolution. We realize that a couple of these technologies have poor reputations and can be used for less-than-noble purposes. Therefore, we want to be as clear as we can about why we use these methods and the type of information we collect using them. Extensive browser sniffing is incompatible with this policy; we will not collect plugin versions, fonts available, HTTP_ACCEPT headers, or color depth information."
I cannot imagine how browser sniffing would ever be compatible with this Privacy Policy (see also my follow-up comment).
Umm, you don't have to imagine. I've already said I can think of cases where browser sniffing would aid security. So unless I'm imagining those cases (and I'm confident that they're not imaginary), does that not mean that the policy allows browser sniffing because it allows collection to aid security, which is part of maintenance. If not, why not?
Any objections to s/seek to put requirements/put requirements/g ? I see no reason to be so wishy-washy. If there are to be exceptions, I feel the policy must state that any such exceptions will be specified, say, in the noted FAQ section. [Update: I see this is discussed already at #Seek_or_find.3F We already have the non-wishy-washy, "We will never use third-party cookies," so I see this seek to crud as unjustifiable.]
We are looking into this to see if it's feasible but it will require a bit of thought and we also look if we should do it in combination with the Data Retention Policy. Stay tuned.
As there were no objections, I did the substitution some days ago. If someone does s/put requirements/has not but plans to to put requirements/g, at least we'll have clarity - it'll be clear that we don't have the requirements in place, and if not, it'll be clear that we do, when this becomes policy.
A desire to 'refer to the same draft' is reasonable, but already out the window; the draft is rapidly evolving due to many recent edits by both the legal team and others. (In future, when a draft is proposed, clarity around this could be created with a statement, perhaps enforced with technical measures, or perhaps just noted with a permalink to the version as proposed.)
In-line reply encouraged. --Elvey (talk) 21:47, 13 September 2013 (UTC)[reply]
In-line replies: Drdee (talk) 20:40, 18 September 2013 (UTC)[reply]
In-line replies: --Elvey (talk) 17:46, 30 October 2013 (UTC)[reply]

┌───────────────────┘
Hello @Elvey: Are there questions here that I can help resolve? Best, Stephen LaPorte (WMF) (talk) 19:36, 11 December 2013 (UTC)[reply]

Strip Wikimedia Data Collection to the Barest Minimum - Introduction

The following discussion is closed: Closed given lack of response and changes made by Michelle, will archive in a couple days unless reopened. Jalexander--WMF 22:31, 18 December 2013 (UTC) [reply]

Two suggestions for the privacy policy:

  1. Lose the cutesy language and cartoons being used to make Wikimedia's disturbingly extensive user tracking seem less threatening
  2. Eliminate Wikimedia's disturbingly extensive user tracking.

It is fundamentally misleading to tell users that Wikimedia does not require any personal information to create an account, and then to actually collect vastly more behavioral information on each user than could ever be requested in a sign-up form, under the guise of "understanding our users better" — exactly the creepy line of every Orwellian data-vacuuming Web site today.

And ironically what is all this "understanding" producing? A site with fairly gruesome usability that's barely changed years and years later. Yet Wikimedia wants to keep track of every piece of content read by every "anonymous" user — associated with information like IP address and detailed browser info, which today in malevolent hands can often easily be associated with real name, address, Kindergarten academic record, likelihood to support an opposition candidate, and favorite desert topping.

It's just not Wikimedia's concern that someone is interested in both Pokemon and particle physics. That doesn't improve either article. That doesn't improve the interface. That doesn't improve the Byzantine and Kafkaesque bureaucracy of trying to find somewhere to report a gang of editors controlling and distorting an article.

To find the phrase "tracking pixels" here is jaw dropping. This is inherently a hacking-like technique to install a spyware file on a user's computer, to evade their express effort not to be tracked by clearing cookies. Web developers bringing these "normal" techniques used by "every other Web site" to Wikimedia, apparently don't understand, that "every other Web site" today is evil — and Wikimedia sites are supposed to be a radically different exception to this.

For readability this comment continues in "Strip Wikimedia Data Collection to the Barest Minimum - Privacy Specifics"

Privacycomment (talk)Privacycomment

Hi Privacycomment,
Sorry for the slow response -- I understand your concerns as follows:
1) Why are you misleading users when saying they do not need to provide personal information to create an account but meanwhile you collect a lot of behavioral data?
2) Can you demonstrate the benefits of understanding our users better?
3) Why is Wikimedia interested in creating an interest graph?
4) Why are we using tracking pixels?
Question 1: Interacting with our servers will provide us with some data: url visited, timestamp, used browser, etc. It seems that you define this as behavioral data but in fact it is not -- it is non-analyzed webrequest data that we have to store, for a minimum amount of time, to be able to monitor server performance and provide key performance indicators about usage of all the Wikimedia projects (those are two very important use cases). Without that data we would be flying in the dark -- how could we even do capacity planning?
In theory, we could analyze data and infer behavior from that, such as you mention in your paragraph about reader behavior, but atm we are not doing such things.
It's also very important to note that we do not buy 3rd party databases to add demographic data to our data and obviously we would never disclose webrequest data containing Personal Identifiable Information in raw form nor sell it. So I do not agree that we are misleading the users, in fact we are really trying to be as transparent and clear as possible.
Question 2: Our efforts to understand our users in the context of how they use new features have only begun quite recently. The Product team was formed in February 2012 and the E2 / E3 teams (now renamed to Core Features and Growth) started in March 2012. I do not agree that there has been no progress: for example, the E3 team worked on simplifying the account creation process and those improvements were the result of data-inspired decision-making. Other new features that we have rolled out / are rolling out like mw:VisualEditor, mw:Echo and mw:Flow are all supported by data-informed decision-making. I am sure we will see the fruits of this approach soon.
Question 3: AFAICT, there are currently no plans to make an interest graph of the readers but your example is actually a great use case! It could help uncover articles that are being targeted by vandals and in that way it could alleviate the work pressure on patrollers, oversighters and admins.
Question 4: Regarding tracking pixels -- I think we need to unravel this concept a bit more clearly. There are three use cases of tracking pixels:
1) as a very light way to push data from the browser to the server
2) a specific technique of bypassing browser origin restrictions
3) a method to infer whether an email message was opened / read
I suspect that you have big concerns regarding 3) and Fabrice Florin's answer regarding the use of tracking pixels was in this context. On the other hand, mw:EventLogging uses constructing of image requests to push data to server which is 1). I am not aware of an exampe of 2) in our context but given that we have many domain names I would not be entirely surprised that we would use 2) as well.
I hope this addresses your concerns,
Drdee (talk) 21:36, 1 October 2013 (UTC) (in my role as Product Manager Analytics @ WMF)[reply]
Regarding tracking pixels, perhaps part of the problem is in terminology. The term "tracking pixels" heavily implies use case #3. I'm not sure there is a widely-recognized term for use cases #1 and #2; any such term would probably be quickly adopted as a euphemism by those using use case #3, leading to the euphemism treadmill. Regarding use case #2, things these days are more likely to use techniques such as CORS as these are less restrictive. Regarding use case #3, I note that many email clients will specifically block externally-loaded images to prevent this.
I suppose CentralAuth's use of such pixel-images might be considered an instance of use case #2: it loads a 1x1 transparent pixel from all the other domains to attempt to set the login cookies for those domains, because the current domain can't set cookies on all those other domains. This could be done (possibly better) in other ways, but it has the advantage of working even when the client has JavaScript disabled. BJorsch (WMF) (talk) 14:31, 2 October 2013 (UTC)[reply]
We have edited the tracking pixel language to reflect feedback received in this discussion thread and others like it. Please let us know if you have any further questions or concerns regarding the applicable tracking pixel language. Thanks! Mpaulson (WMF) (talk) 19:22, 22 November 2013 (UTC)[reply]

Strip Wikimedia Data Collection to the Barest Minimum - Privacy Specifics

The following discussion is closed: Closed given staleness and lack of response, will archive in a couple days unless reopened. Jalexander--WMF 22:31, 18 December 2013 (UTC) [reply]

This is what Wikimedia should know about its users —

For anonymous readers, the sole data collected should be IP address, URL visited, and basic user-agent data (as specifics can be quasi-identifying): platform, browser name, major version, screen size. And this data should be immediately split into three separate log files, each separately randomized in half-hour time blocks, with the default Web server log disabled or immediately obliterated. So that, that secret governmental order to hand over every Wikipedia article read by a particular IP address simply can't be complied with. And so that that great new Wikimedia employee, who no one would suspect is working for a supragovernmental/governmental/corporate/mafia espionage operation, can't get at it either.

For anonymous editors the sole data collected should be that of anonymous readers, plus:

  • the data of the actual edit of course
  • the IP address of the edit, stored for one week (without data backups) and then obliterated, and viewable only by administrators investigating potential spam, vandalism, or other violations of Wikimedia rules during that week.

Public-facing edit records, and administrator-facing edit records after one week, should associate only the phrase "Anonymous Edit" or "One-Time Edit by [ad hoc nickname]". Wikimedia should use automated systems to detect any administrator accessing the IP address data associated with edits which are not likely to be spam, vandalism, or other violations of Wikimedia rules.

For logged-in users the sole data collected should be that of anonymous editors, plus:

  • their username at sign-up and log-in
  • their email address at sign-up if given
  • a public-facing list of their edits (of all types) on their user page
  • the contents of a Wikimedia browser cookie, set when they log in to a Wikimedia site, and deleted if/when they log out, which contains solely their username and encrypted password
  • an administration-facing log of Wikimedia messaging and banners which they have already received
  • an optional administration-facing flag in their account, indicating that they have donated to Wikimedia in month/year, without further identifying data, so as to suppress fundraising banners (if they have elected to overtly identify themselves with a Wikimedia username when making a donation).

Email addresses should be accessible for use for bulk mailings only by Wikimedia employees, and the email list file should be encrypted to prevent theft by corrupt or disgruntled Wikimedia employees.

For basic-level administrators the sole data collected should be that of logged-in users, plus their (pseudonymously-signed) administrator contract.

And no Wikimedia server or office should be located in any country — whether admitting to be a dictatorship or still pretending to be a democracy — which overtly, or by secret order, requires Wikimedia to collect or retain any data other than that specified here for these non-commerce functions.

Thank you for your consideration of these points,

Privacycomment (talk)Privacycomment

Strip Wikimedia Data Collection to the Barest Minimum - Further Considerations

Thanks Privacycomment for this post. I just want to add my perspective with some ideas on how to look at data-relevant processes in general and how to use the artificial differences in national laws on an action done in the physical or digital world.

  • First and foremost Wikipedia is a labor of love of knowledge nerds worldwide. This means that it is from an outside view an "international organization" much like the Red Cross - only to battle information disasters. This could be used to get servers and employees special status and protections under international treaties (heritage, information/press etc)
  • History teaches that those protections might not be a sufficient deterrent in heated moments of national political/legal idiocy, so Wikimedia should enact technical as well as content procedures to minimize the damage.

Data Protection

  • Collect as few data as possible and purge it as fast as possible. Period. You cannot divulge what you do not have.
  • Compartmentalize the data so that a breach - let's say in the US - does not automatically give access to data of other countries' userbases.
  • Play with laws: as there are a lot of protections well established when used against homes, or private property shape your installation and software to imitate those - no "official" central mail server that can be accessed with provider legislature, but a lot of private servers that are each protected and must be subpoenaed individually etc...
  • Offer a privacy wikipedia version that can only be accessed via tor - and where nothing is stored (I know this might be too much to admin against spam pros)
  • Use Perfect forward secrecy, hashes etc to create a situation, where most of the necessary information can be blindly validated without you having any possibility to actually see the information exchanged. This also helps with legal problems due to deniability. Again - compartmentalize.

Physical and digital infrastructure concerns

  • An internal organization along those lines and with the Red Cross as an example would offer a variety of possibilities when faced with legal threats: First and foremost, much like choosing where to pay taxes, one could quickly relocate the headquarters for a specific project to another legal system so that one can proof, that e.g. the US national chapter of wikimedia has no possible way of influencing let's say the Icelandic chapter who happens to have a national project called wikipedia.org
  • Another important step in being an international and truly independent organization is to finally use the power of interconnected networks and distribute the infrastructure with liberal computer legislation in mind much more as is now the case. Not to compare the content - just the legal possibilities - of the megaupload case with those of wikimedia, as long as US authorities have physical access to most of the servers, they do not need to do anything but be creative with domestic laws to hurt the organisation and millions of international users, too...
  • If this might be too difficult, let users choose between different mirrors that also conform to different IT legislation

Information Activism

  • Focus on a secure mediawiki with strong crypto, which can be deployed by information activists

So: paranoia off. But the problem really is that data collected now can and will be abused in the next 10, if not 50-100 years. If we limit the amount of data and purge data, those effects can be minimized. No one knows if something that is perfectly legal to write now might not bite one in the ass if legislation is changed in the future.

Cheers, --Gego (talk) 13:53, 9 September 2013 (UTC)[reply]

Hi Gego,
The idea of having a secure mediawiki with strong crypto is a technical proposal and as such is best to be presented as an RFC on Mediawiki but it's outside the scope of the new Privacy Policy.
Drdee (talk) 00:40, 7 November 2013 (UTC)[reply]

There's a lot of discussion about the data collected from those who edit pages, but what about those who passively read Wikipedia? I can't figure out what's collected, how long it's stored, and how it's used.

Frankly I don't see why ANY personally identifiable information should EVER be collected from a passive reader. In the good old days when I went to the library to read the paper encyclopaedia, no one stood next to me with a clipboard noting every page I read or even flipped past. So why should you do that now?

I don't object to real time statistics collection, e.g., counting the number of times a page is read, listing the countries from which each page is read from at least once, that sort of thing. But update the counters in real time and erase the HTTP GET log buffer without ever writing it to disk. If you decide to collect some other statistic, add it to the real-time code and start counting from that point forward.

Please resist the strong urge to log every single HTTP GET just because you can, just in case somebody might eventually think of something interesting to do with it someday. This is EXACTLY how the NSA thinks and it's why they store such a terrifying amount of stuff. 2602:304:B3CE:D590:0:0:0:1 14:54, 10 September 2013 (UTC)[reply]

2602, I will be linking to this comment from below but you may be interested in the section started at the bottom of the page at Tracking of visited pages . Jalexander (talk) 03:37, 11 September 2013 (UTC)[reply]

There are a number of use cases for collecting unsampled data, including generating detailed understandings on how readers interact with Wikipedia content and how this might change over time, finding and identifying very low frequency (but important) events, and and looking at interactions with long-tail content that may reveal new sources of editors. But it's important to understand that we are interested in the behavior of Wikipedia readers, but in aggregate, not as individuals. TNegrin (WMF) (talk) 01:49, 19 December 2013 (UTC)[reply]

Dear 2602,
We need to store webrequest data for a very limited time from a security point of view: in case of a DDoS we need to be able to investigate where it originates and block some ip ranges. Sometimes we need to verify whether we are reachable from a certain country. And there other uses cases so not storing webrequest is not an option. The Data Retention guidelines, which will be published soon, will put clear timeframes on how long we can store webrequest data.
I hope this addresses your concern.
Best, Drdee (talk) 00:51, 7 November 2013 (UTC)[reply]
The current policy only allows sampled logs. Are you saying the the sysadmins are currently unable to protect the sites from DDoS? I never noticed.
Also, https://blog.archive.org/2013/10/25/reader-privacy-at-the-internet-archive/ , linked below, shows it definitely is an option. --Nemo 10:07, 8 November 2013 (UTC)[reply]

Text based page delivery (or 'how' we read)

The following discussion is closed: close given staleness and lack of response, will archive in a couple days unless reopened. Jalexander--WMF 22:33, 18 December 2013 (UTC) [reply]

No, I'm not going to harp on about using Rory as it appears that you're determined to use him whether he is redundant gimmick or not.

Other than feeling that one instance of his use is sufficient, if he is to be used as currently stands, serious consideration needs to be given to rules of thumb pertaining to desktop publishing and website development. Culturally, the English language is read from left to right, meaning that English readers are acclimatised to the left hand side of the page being the central focal point when dealing with anything text orientated. Not only is there no word-wrap around the Rory images in order to allow for a longer continuum of text (remembering that we read ahead by a minimum of several words at a time), the entire left side of the document disturbs the reader's expectations by sandwiching the text (and tables!) to the right. Bear in mind that these rules of thumb were developed through experience and behavioural studies over many years, right down to serif being preferred for paper documents, while sans serif reads more comfortably online. It's foolhardy to disregard certain standards which have been proven in order to 'experiment' with other techniques.

I've spent over three decades involved with pedagogical issues surrounding visual teaching methods/delivery, from secondary education to Post Graduate research presentation (I'm speaking of delivery at tertiary MA and PhD level by 100% research), so I'm not just blowing smoke.

If Rory is to be used, the 'culturally logical' layout for any Latin script language is to about-face the set-out of current draft and have him on the right-hand side. I'd also suggest that he could be made a little smaller and that word-wrap be used. --Iryna Harpy (talk) 04:21, 10 September 2013 (UTC)[reply]

Hi Iryna Harpy! Thank you for bringing this point up. I know that some of the decision involved in placing Rory on the left-hand side of the page and not wrapping the text around him had to do with making the format easily adaptable to different scripts and different screen/window sizes. I'll have one of the people who helped with the layout address those issues in more detail on this thread.
On a related note, based on community feedback, we are going to experiment with how to make Rory more useful in explaining the major concepts of the privacy policy over the next week. Some of the ideas we are going to try are either providing Rory with a narrative or with bullet points about the big concepts. If you have other ideas, we'd love to hear them. We're going to try to get some prototypes out to the community to see if they think that adds value. I'm hoping once we have a better idea of what text would accompany Rory (if any and assuming Rory stays in the policy), we can experiment with the layout to see if there are ways to make it more readable as you suggested. Mpaulson (WMF) (talk) 23:47, 10 September 2013 (UTC)[reply]
Right. I'm seeing both support for and opposition to Rory, but I want to make clear we have not "determined to use him." As explained above, we are playing with the idea, which is why your feedback for or against is important. If, after taking into account community feedback, it doesn't make sense after some experimentation, we won't use him; if it does, we might. That said, IMHO, visuals are important, as I suggested above. So alternative ideas are also welcome. Many thanks. Geoffbrigham (talk) 07:44, 11 September 2013 (UTC)[reply]
Thank you both (Mpaulson (WMF) & Geoffbrigham) for your responses. I suspect I speak for quite a few people responding to the draft policy when I say that my main concern was that Rory had already been locked into the presentation and was going to be worked in regardless of whether his 'presence' was superfluous or not. As I'm now feeling a little more assured that he's not a given, I'll abstain from further critiques regarding that aspect of the updated policy until the proposed prototypes are up and will judge as objectively as is possible at that point. I'm certainly not going to approach the subject with prejudice and will reserve judgement bearing context in mind. Cheers! --Iryna Harpy (talk) 02:01, 12 September 2013 (UTC)[reply]
I agree one hundred percent with Iryna Harp's concerns about text layout. As I understand the purpose is: "We want to make these documents as accessible as possible to as many people as possible." Congratulations, you have managed to do the opposite.
The big text boxes at the top, which are not part of the Privacy Policy, are not helping either. It's even hard to find out where the actual proposed privacy policy begins.
I believe you have successfully managed to prevent the majority of people of reading the proposed privacy policy.
Suggestions:
  1. If you do something special with the layout like text placement, illustrations and use of big icons, make sure it increases accessibility and not the opposite.
  2. Make the page look like a regular Wikipedia article page where people can start reading the proposed privacy policy immediately.
  3. Rename the page so it's clear from the name that this is a proposal and not the current privacy policy. For example "Proposed privacy policy" or "Privacy policy (draft)" or "Privacy policy (proposal)".
  4. Remove the side notes that are not part of the proposal. Instead, add a side bar at the right linking to side notes.
Cheers! --Aviertje (talk) 09:06, 16 September 2013 (UTC)[reply]
(Reply to second suggestion. Moved by Aviertje (talk) 18:50, 17 September 2013 (UTC))[reply]
I think we should provide a link at top to the main policy, as we did with the Terms of Use. See http://wikimediafoundation.org/wiki/Terms_of_Use Geoffbrigham (talk) 10:32, 17 September 2013 (UTC)[reply]
Geoffbrigham, I moved your in text replies down. I hope you approve. I also added numbering to my suggestions.
Providing a link at the top to the main policy would certainly help. But I don't understand putting in an obstacle and providing a link to move past it. There shouldn't be any obstacle accessing the terms of use or privacy policy. When people want to consult the terms of use or privacy policy, they want to read the real deal and not any unofficial comments. Any accompanying comments should not form an obstacle. --Aviertje (talk) 18:50, 17 September 2013 (UTC)[reply]
I understand your point, Aviertje, but, in the context of the terms of use, the user-friendly summary was in fact proposed by the community (not WMF), and we have received a number of positive comments about it since. In this discussion, people are saying that they want nutshell summaries of our privacy principles, and, as I see it, the user-friendly summary will satisfy that need. So, if you don't mind, I would like to monitor this issue and see if others feel strongly. In the meantime, I will have this link put above the user-friendly summary:
This is a summary of the [draft] Privacy Policy. To read the full terms, scroll down or click here.
Thanks. Geoffbrigham (talk) 11:54, 18 September 2013 (UTC)[reply]
The unofficial summary and the official privacy policy (or terms of use) serve completely different purposes and should not be mixed. I looked up the proposal for an informal summary and looked at the following edits to this proposal. It was suggested to create a separate informal summary containing a link to the official terms of use and managed by the community. Placing this unofficial summary above the official terms of use seems to be your own initiative. The fact that people value the unofficial summary does not mean it should be located here. It might be a good idea though to include such summary (officially) in the introduction of the privacy policy/terms of use. --Aviertje (talk) 22:03, 18 September 2013 (UTC)[reply]
Let's see if we hear additional objections. I know the user-friendly summary was posted at the top of the terms of use for some time during the consultation, and I don't recall any objection. Now that the issue has been raised, I will monitor and see if there is any other opposition to its placement vis-a-vis the privacy policy. Tx. Geoffbrigham (talk) 22:53, 18 September 2013 (UTC)[reply]
(Reply to fourth suggestion. Moved by Aviertje (talk) 18:50, 17 September 2013 (UTC))[reply]
I'm sorry. Could you explain this a bit more. Thanks. Geoffbrigham (talk) 10:32, 17 September 2013 (UTC)[reply]
With 'side bar' I meant a sidebar, a box at the right like on the page with the current privacy policy.
With side notes I meant all comments that are not part of the proposed privacy policy. Like the "This draft Privacy Policy needs your feedback..", "Want to help translate?..", "This is a user-friendly summary of the privacy policy..". Even the "This is a draft of a proposed privacy policy.." can be removed if the title is changed like I suggested in suggestion 3 above. --Aviertje (talk) 18:50, 17 September 2013 (UTC)[reply]
Thanks. I am monitoring to see if others feel the same way. Geoffbrigham (talk) 21:53, 15 November 2013 (UTC)[reply]

Comments by Shell

The following discussion is closed: From what I can tell all of the sections were responded too and/or fixed with a couple spawning separate discussions outside of this thread. Closing and will archive in a couple days unless reopened. Pinging Shell just in case. Jalexander--WMF 22:37, 18 December 2013 (UTC)[reply]

Lots of small details.

  • Your Public Contributions: "Please do not contribute any information that you are uncomfortable making permanently public, like the picture of you in that terrible outfit your mom forced you to wear when you were eight." Such a picture is unlikely to be kept anyway, so it's not a good example. I'd either remove the example or change it into something like: ...permanently public. For instance, if you reveal your real name somewhere, it will be permanently linked to your other contributions. (A better example/phrasing would be appreciated)
  • Account Information & Registration:
Template:Blockquote
This is a subtle point, so I am not sure the best way to explain it. Stephen LaPorte (WMF) (talk) 01:08, 7 November 2013 (UTC)[reply]
Yeah, I think your new version is a little better. //Shell 06:19, 11 November 2013 (UTC)[reply]
Made the change (see here). Stephen LaPorte (WMF) (talk) 00:07, 16 November 2013 (UTC)[reply]
Good. You accidentally removed a period, so I added it back. //Shell 08:49, 17 November 2013 (UTC)[reply]
Good catch, thanks. Stephen LaPorte (WMF) (talk) 19:07, 22 November 2013 (UTC)[reply]
  • Information Related to Your Use of the Wikimedia Sites: "We also want this Policy and our practices to reflect our community’s values." This looks like a stray sentence - can it be removed completely?
  • Information We Collect:
    • "For example, by using local storage to store your most recently read articles directly on your device so it can be retrieved quickly; and by using cookies, we can learn about the topics searched so that we can optimize the search results we deliver to you." This is a really long sentence that should be split up. Also, I don't understand how using local storage to store read articles can optimize search results. To me they seem like separate things.
      • Is this clearer?
Template:Blockquote
Stephen LaPorte (WMF) (talk) 01:08, 7 November 2013 (UTC)[reply]
Yes, it's clearer. However, I'm always skeptical about delivering different search results for different people. I hope that such things would be explicitly marked and that there'd be an opt-out. //Shell 06:19, 11 November 2013 (UTC)[reply]
Updated. I appreciate the feedback on the feature. Technically, I am not sure if it would be used to deliver different results, or merely optimize the delivery time for the same results. I believe the feature is still under development, and @RobLa-WMF: may be able to point you to more information, if any is available yet. Thanks, Stephen LaPorte (WMF) (talk) 00:07, 16 November 2013 (UTC)[reply]
Ok. In general, it's enough if such things are documented by the Signpost when it's implemented. If there is an outline of how it would work out, I'm interested in reading about it in this case. //Shell 08:49, 17 November 2013 (UTC)[reply]

General notes:

//Shell 23:08, 10 September 2013 (UTC)[reply]

Hi Shell! Thank you for your detailed comments. We really appreciate you taking the time to help us on this. The legal team and I will go through your comments and suggestions in greater detail tomorrow and will respond in-line accordingly (probably with some questions for you). =) Thanks again! Mpaulson (WMF) (talk) 00:02, 11 September 2013 (UTC)[reply]
Apologies for the delay. We will be on this shortly. Thanks. Geoffbrigham (talk) 10:37, 17 September 2013 (UTC)[reply]
Bump. Do you have ant comments? //Shell 10:35, 4 October 2013 (UTC)[reply]
Further apologies for the ongoing delay. We are juggling a couple of priorities right now, but intend to focus on your comments this week or next. Thanks. Geoffbrigham (talk) 19:21, 10 October 2013 (UTC)[reply]
Hello @Skalman: My apologies for taking so long to review your comments. I have left a few comments above, and suggested some alternative language a few spots. I appreciate your detailed feedback on the policy -- it's helpful indeed. Stephen LaPorte (WMF) (talk) 01:08, 7 November 2013 (UTC)[reply]
I have responded to your comments inline. You didn't comment all my points - do you intend to? //Shell 06:19, 11 November 2013 (UTC)[reply]
@Skalman: Yes, now I have followed up on your comments inline. Thanks again for spending time reviewing the policy so thoroughly. Your feedback has been helpful, and it has improved this draft. Cheers, Stephen LaPorte (WMF) (talk) 00:07, 16 November 2013 (UTC)[reply]
@Slaporte (WMF): I've responded again. I'm glad to help. //Shell 08:49, 17 November 2013 (UTC)[reply]

Structure of the document

The following discussion is closed: closing given the lack of additional response or support for the change, will archive in a couple days unless reopened. Jalexander--WMF 22:38, 18 December 2013 (UTC) [reply]

I find the current privacy policy much more clear than the new one. It's much easier to retrieve information from it. One of the reasons I think is because there are redundant headings in the proposed new privacy policy. If you remove the headers "Welkom!", "Use of info", "Sharing", "Protection" and "Important info", the document suddenly makes much more sense.

It looks like meaningless headers were added, only to provide for short descriptions for the big icons. I suppose this is done to make the document look more attractive to a younger audience and to lure people into clicking the icons at the top. Also the numbering of chapters seems to be removed to support the structure created by the big icons. It may look more cool and attractive and you perhaps get more clicks, but I'm sure the actual information comes across much harder. --Aviertje (talk) 15:15, 19 September 2013 (UTC)[reply]

Hi Aviertje. Thank you for your suggestion. The purpose of the icons was to make it easy for people to skip to sections that they are looking for or are the most interested in. I see what you're saying about some being redundant, such as when the "Sharing" icon and heading is immediately followed by the subsection heading "When May We Share Your Information?". However, other times, the icon and accompanying heading help group together related subsections. For example, the "Important Info" icon and heading groups together "Where is the Foundation and What Does that Mean for Me?", "Changes to This Privacy Policy", "Contact Us", and "Thank You!". The Introduction icon and heading do something similar.
The hope was not to make the document look attractive (although that's not the worst thing to do if you are trying to encourage people to read something), but to make it more navigable. What do others think? Do the icons and section headings help? Mpaulson (WMF) (talk) 18:12, 19 September 2013 (UTC)[reply]
It's funny that you mention "Important info" as an example. "Important info" says absolutely nothing. Each chapter could be named that. Naming it "I don't know what to name this" would even be more informative.
BTW. "Welcome!" is a bad title, because the section is not about welcoming the reader. It can be a greeting if you make it normal text instead of a header. "A Little Background" is also a bad title. The inconsistent use of the informal word "info" looks strange. --Aviertje (talk) 16:34, 20 September 2013 (UTC)[reply]
Hi Aviertje. If you have suggestions as to what the titles should be renamed as, I'd like to hear them and hear what other community members think, both about the current titles and your proposed titles. Mpaulson (WMF) (talk) 22:31, 26 September 2013 (UTC)[reply]
Like I said at the beginning lose the meaningless headers "Welkom!", "Use of info", "Sharing", "Protection" and "Important info". As to "A Little Background", what do you think of "Language used in this policy"? --Aviertje (talk) 22:15, 30 September 2013 (UTC)[reply]
I would like to hear if others feel the same way. Thanks. Geoffbrigham (talk) 20:31, 15 November 2013 (UTC)[reply]

Explicit agreement to cookie usage

The following discussion is closed: closing given the lack of additional response or support for the change, will archive in a couple days unless reopened. Jalexander--WMF 22:39, 18 December 2013 (UTC) [reply]
Many websites across the Internet now utilize a more explicit means of communicating the website's use of cookies and requirement to agree to said usage. For example, one can visit https://www.google.co.uk/ where on the bottom one will notice:

Cookies help us deliver our services. By using our services, you agree to our use of cookies. [button]OK[/button] [link]Learn more[/link]

It is in the best interest of users and the Foundation to include a similar message indicating to that effect the use of cookies and an explicit click of the "OK" button indicating the user's acceptance to the Privacy Policy. This is particularly helpful rather than stating that the user automatically agrees to the Policy when said user visits the Sites. 184.147.55.86 21:33, 2 October 2013 (UTC)[reply]

Great idea. A volunteer started to document cookies used by WMF sites at cookie jar, but it is not even close to being complete. By the way, I believe that google.co.uk asks the user about cookies because of the EU's E-Privacy Directive. The Wikimedia Foundation might legally need to do so because they use cookies that are not "strictly necessary", but I'm sure they would have done so by now if legally required. Let's leave that to Legal. Even if it's not legally required, it may be better to inform users, however, as you say. PiRSquared17 (talk) 21:55, 2 October 2013 (UTC)[reply]
A lot of European websites inform about cookies because of that directive. As Wikimedia Foundation projects are hosted in the United States, I would assume that the Wikimedia Foundation doesn't have to do this, but nothing prevents the Foundation from informing about cookies either. --Stefan2 (talk) 10:26, 3 October 2013 (UTC)[reply]
We have seen this in several places here recently: the rationale, explicit, implict or presumed, that if WMF is not legally required to do something then it need not. While that may be correct legally, it suggests a somewhat limited commitment to users' privacy. There is no technical reason not to follow the European practice voluntarily, and I for one would suggest that WMF sites should do so as a matter of good practice. Could we hear an explicit reason why WMF has decided not to do so in these cases: a reason, that is, going beyond "we don't have to". Spectral sequence (talk) 18:42, 4 October 2013 (UTC)[reply]
I agree that this is a good idea. Could someone make a mockup of this? PiRSquared17 (talk) 02:42, 5 October 2013 (UTC)[reply]
Hi All. Thank you for bringing this issue up. This possibility was actually discussed internally when we were formulating this draft of the privacy policy. We decided not to explore this option further mostly because we were concerned that such pop-ups would take away from the user experience -- people generally do not like pop-ups as part of their interactions with a site. We don't think that having such a pop-up would be wrong or inappropriate per se, but it's a trade-off. If there was a significant call from the community to implement such pop-ups, we would happily discuss this possibility with the tech team again. Mpaulson (WMF) (talk) 20:36, 1 November 2013 (UTC)[reply]

The draft EU Data Protection Regulation

The following discussion is closed: close given staleness and lack of response, will archive in a couple days unless reopened. Jalexander--WMF 22:40, 18 December 2013 (UTC) [reply]

The draft EU Data Protection Regulation will probably come into force in 2016. It is proposed that it will apply to all non-EU companies processing the data of EU citizens. While of course we appreciate that the WMF is legally situated in the USA, its interactions with users situated in the EU will be affected by the Regulation. How does the proposed Privacy Policy sit with respect to the EU proposals? Spectral sequence (talk) 17:28, 6 October 2013 (UTC)[reply]

Hi Spectral sequence! We are aware of the upcoming draft and are tracking the regulation accordingly. The privacy policy draft was written with EU principles in mind and reviewed by EU counsel, but does not incorporate every EU regulation or proposal. Frankly, there is a lot that can happen to the content of proposed EU Data Protection Regulation between now and actually implementation of the regulation and we don't think it's wise to speculate quite yet as to how this will impact the privacy policy draft. We will, of course, keep tracking it and once the language of the regulation has been finalized and an adoption timeline is being established, we reevaluate the privacy policy to see if any changes are needed. Mpaulson (WMF) (talk) 19:22, 1 November 2013 (UTC)[reply]

Revision of "What This Privacy Policy Doesn’t Cover"

The following discussion is closed: close given lack of response/staleness, will archive in a couple days unless reopened. Jalexander--WMF 22:41, 18 December 2013 (UTC) [reply]

I've rewritten the discussion of what the policy doesn't cover, after an earlier discussion with Nemo about the clarity and organization of it. The substance is largely the same (or is intended to be), but hopefully it is easier to find relevant sections now. Please review and leave any comments here. Thanks! -LVilla (WMF) (talk) 19:52, 21 November 2013 (UTC)[reply]

Generation of editor profiles

I'd like once more to point out serious concerns about the generation and publication of detailed user profiles on Wikimedia websites or servers. This issue is repeatedly dealt with, at least, on the German Wikipedia (i.e. here, and actually again in the signpost equivalent at deWP). While the toolserver's policy accords to European standards concerning data privacy, wmlabs (which will completly replace the toolsever in 2014) does not meet these requirements. A contributor's poll at Meta clearly showed the community's preference of an opt-in solution for user data mining tools. Nevertheless WMF is giving the opportunity to run a detailed user profiling tool that does not allow an opt-in, even not an opt-out. We are aware that American data protection standards differ from European standards, and that such tools are considered to be legal in the USA. They are yet not needed by anyone. Thus, we still hope that WMF does not impose US points of view on their global contributors, whenever weak data policies are not required by US law, nor needed by contributors to improve the projects' contents. Looking forward a WMF statement on this issue. --Martina Nolte (talk) 20:53, 24 November 2013 (UTC)[reply]

Can you expand on what you mean by a 'US point of view'? --Krenair (talkcontribs) 01:06, 25 November 2013 (UTC)[reply]
Sure. User contribution data are publicly available in the edit histories. According to US law, it is okay to aggregate these data and generate detailed user profiles; people tend to feel okay with such a tool. In European countries an aggregation of personal data and the publication of user profiles without consent are considered illegal; people feel offended by such a tool. The views on what is okay or not okay depend on local laws. Laws reflect a culture's values and points of views. --Martina Nolte (talk) 04:27, 25 November 2013 (UTC)[reply]
+1 - I would generally like to underline this. -jkb- 10:13, 25 November 2013 (UTC)[reply]
Other discussions: Kurier (2013-09), Kurier (2013-10), labs-l (2013-09), labs-l (2013-10). I regret bringing this up on dewiki a little, as I didn't realize it would start this much drama. On the other hand, I do think that this is something we really should be discussing. But all the data will be public as long as db dumps with detailed info are published. PiRSquared17 (talk) 17:39, 26 November 2013 (UTC)[reply]
No need to regret it, no drama. This is an important discussion and it has to be made: 5th most used website, 1.7 billion edits with user information, 14 years of data collecting, our data. NNW (talk) 18:23, 26 November 2013 (UTC)[reply]
You're right. It's good that this is being discussed at least. I was a bit surprised that almost nobody commented about it on enwiki though. PiRSquared17 (talk) 20:21, 26 November 2013 (UTC)[reply]
Perhaps the experience of the 20th century might explain why Germans are quite sensitive concerning these topics. NNW (talk) 09:07, 27 November 2013 (UTC)[reply]
Right, the raw data are available by dumps. But not yet aggregated to individual user profiles. WMF could even think about slimming down these dumps; a matter of data economy (as much personal data as needed, as few personal data as possible). Editors agreed to publish their content contributions under a free licence; they do not automatically agree to publish their editing behaviour, or even their individual profiles. As I said, the "drama" is due to a quite different view on data privacy issues. --Martina Nolte (talk) 19:49, 26 November 2013 (UTC)[reply]
I'm another who feels that this is a really pertinent Privacy issue which requires careful consideration here. And not just from a purely legal perspective (after all, if the Foundation is adopting a "cuddly" approach to volunteers, legality is surely just one dimension in the picture). User profiling—with its abuses as well as uses—is one reason why I prefer to edit Wikipedia as an IP. —MistyMorn (talk) 11:20, 27 November 2013 (UTC)[reply]
I may have missed something, but the only comment I can see from a WMF member is this. The fact that user profiling—including provision of potentially sensitive personal information—may be done either with or (though rather more arduously for most) without tools made publicly available through Wikimedia doesn't mean that users cannot be informed of such possibilities in the present document. MistyMorn (talk) 20:37, 3 December 2013 (UTC)[reply]

To make clear that the above mentionned questions are not individual concerns of single Wikipedia/Wikimedia contributors, I'd like to point to this site (German language yet, a translation is planned). --Martina Nolte (talk) 19:40, 9 December 2013 (UTC)[reply]

Hi, Martina: thanks for notifying us about that discussion. We're discussing this issue and considering how best to handle. -LVilla (WMF) (talk) 20:21, 9 December 2013 (UTC)[reply]

I've been a Toolserver user for 6 years and the EU data protection directive along with other TS oddities has been a thorn in development. An example: My User activity tool which lists (more or less) publicly available data to make it easier to prune membership lists. If data-mining were allowed we could partially generate and manage these lists automatically. Or email an inactive user that is familiar with a particular city, if questions come up. See whose on IRC and likely up at this time of day.

Additionally, our cultural partners have requested in-depth analytics that cannot be done on the Toolserver because of the privacy policy. WikiProjects are also interested in see who reads their pages, how much they read, what links they follow, what search terms or forums brought them there, and more.

Finally, do not falsely misrepresent the German/European view as some sort of "global" view. The US and many other countries will not adopt data-protection style legislation (despite what WM-DE has said to my face). Also, it's technically impossible third parties from doing analysis on their own, the data is public afterall. You've had your chance and chose to continue decommissioning to (IIRC) free 5-10% of WM-DE's budget. —Dispenser (talk) 18:53, 10 December 2013 (UTC)[reply]

You know that a majority of users voted for opt-in? That's what's usually called This is wanted by the community. And you can check that not all of the opt-in voters come from Germany/Europe. NNW (talk) 20:29, 10 December 2013 (UTC)[reply]
A majority, but not an impressive one: 54% (possibly slightly higher if some people voted for multiple options) is not what I would call overwhelming consensus. More to the point, there seems to be a pretty strong split based on which projects people come from: I looked over the list of voters on the RfC, and I recognized a great many names from the English Wikipedia under "Remove opt-in completely" and almost none under "Keep opt-in". Not very scientific, I know, but I suspect a more methodical analysis would support the same conclusion. I'm not sure there's any middle ground to be reached on this in terms of the privacy policy; I expect the eventual solution will be to have Labs' policy be that you can't offer data-mining services, or you have to make them opt-in, for projects where the community has indicated they don't want them (or hasn't indicated that they do want them). Emufarmers (talk) 13:10, 13 December 2013 (UTC)[reply]

Standing in for Erik as he’s on vacation, my position is that we shouldn’t introduce a policy limitation on what can/can’t be created on WMF servers for public data. However, we can look into adopting a mechanism by which the community can disable specific tools on the basis of community consensus. Legal tells me the Labs Terms of Use already allows the Foundation to take something down if necessary, but a formal mechanism for disabling specific tools based on community consensus has not yet been developed.

This approach would allow the community both the ability to experiment and be bold with how we use our data, as well as provide a check on a tool if the tool is deemed inappropriate. I think this strikes the right balance of experimentation and user privacy protection.

Obfuscating user contributions data or limiting our existing export will not happen because we have a commitment to not only make all of Wikipedia available, but to allow anyone to fork and take the history, content, etc. with them. Removing that ability would be a disservice to the community and we currently have no plans to revisit it. Tfinc (talk) 21:38, 19 December 2013 (UTC)[reply]

One question that immediately comes to mind is "which community?". For example, consider the ongoing complaints by some on dewiki about wanting to prevent people from creating tools to analyze contributions. Is consensus on dewiki enough to take the tool down for everyone? Or a consensus on a metawiki discussion contributed to mainly by German editors? And then what if other wikis' communities who weren't notified of this discussion (or ignored it) are upset when a useful tool goes away? Or would we just force the tool author to "break" their tool so it doesn't work on dewiki but continues to function on all other wikis? Anomie (talk) 14:23, 20 December 2013 (UTC)[reply]
As the one who'd be tasked with enforcing this, I can tell you that I would require a very clear consensus, and that if the consultation seems to be dominated by a particular subgroup I'd make a serious effort to widen the discussion before any action is taken. Honestly, engineering should be very hesitant to step in and disable a tool or impose conditions on it beyond those of the terms of use; but it's also our responsibility to do so if the tool breaks something or if the community is overwhelmingly opposed to it: Labs isn't free hosting, it's a home for development work that benefits the projects.

I am hoping that if any (sub) community makes it clear that it would rather opt out of some tool, the tool maintainers would be considerate enough to heed the request without intervention by operations, though – and I believe most will without hesitation. MPelletier (WMF) (talk) 15:44, 20 December 2013 (UTC)[reply]

Contradiction

The following discussion is closed.

"We believe that you shouldn't have to provide personal information to participate in the free knowledge movement."/"If you want to create a standard account, we do not require you to submit any personal information to do so" -- According to your definition of "personal information," this term refers, among other things, to "address, phone number, email address, password, identification number on government-issued ID, IP address, credit card number". Bur clearly you provide your IP address when creating an account. — Pajz (talk) 06:31, 1 December 2013 (UTC)[reply]

Interesting point, Pajz; I think we intended to say that we don't force you to provide that information; whereas IP address must be provided by the nature of the architecture of the internet. So, yes, this is possibly contradictory, but I think only in a minor way. We're considering tweaking that definition for other reasons, so we'll try to take that into account when we revise it. -LVilla (WMF) (talk) 20:22, 3 December 2013 (UTC)[reply]
In discussing this comment after I posted it, I realized that I had misunderstood how we handle IPs when new user accounts are created. So I'd propose changing the "standard account" sentence to read: "If you want to create a standard account, we require only a username and a password. Your username will be publicly visible, so please be careful about using your real name as your username. Your password is only used to verify that the account is yours. Your IP address is also automatically submitted to us, and we record it temporarily to help fight spam. No other personal information is required: no name, no email address, no date of birth, no credit card information." -LVilla (WMF) (talk) 21:36, 3 December 2013 (UTC)[reply]
It's used for more than just fighting spam though. "prevent abuse" is probably more accurate, though a little more vague. Legoktm (talk) 20:23, 4 December 2013 (UTC)[reply]
Yes, that makes sense; will make that change. Anyone else have other comments/suggestions? -LVilla (WMF) (talk) 19:03, 9 December 2013 (UTC)[reply]
And I've made the change. Thanks for the comments, both of you. I'm closing this; if anyone else has comments on the new language, please open a new discussion section. -LVilla (WMF) (talk) 23:18, 18 December 2013 (UTC)[reply]
Err, that's not what I suggested. I said not to say "fight spam". Legoktm (talk) 16:24, 19 December 2013 (UTC)[reply]
Gah, copy-paste fail. Fixed. -LVilla (WMF) (talk) 19:38, 19 December 2013 (UTC)[reply]

When May We Share Your Information? Because You Made It Public

Privacy policy#Because_You_Made_It_Public: "Any information you post publicly on the Wikimedia Sites is just that – public."

Does this mean the WMF is allowed to share any of the information, by any means, in any form, for any purpose, to anyone? --Aviertje (talk) 13:03, 1 December 2013 (UTC)[reply]

It means that, for example, the WMF can distribute dumps with all your edits, etc. in them. I think this should be changed to exclude oversighted (or deleted?) info, though, even if it was originally public. PiRSquared17 (talk) 15:57, 1 December 2013 (UTC)[reply]
I doubt that going back to redact information from old dumps is really feasible, though. Anomie (talk) 14:15, 2 December 2013 (UTC)[reply]
We could, in theory, delete it from the dumps we provide. However, many other people mirror and distribute those dumps, and we can't (as a practical matter) reach out and take those down. So any promise here to exclude deleted information would be a false promise. We'd prefer to be up-front, and warn people that their public edits really are public- that's what this language attempts to do.
That said, I sort of see the original commenter's point about the language being perhaps somewhat confusing. We'd be happy to listen to any suggestions on how to improve it.-LVilla (WMF) (talk) 19:40, 9 December 2013 (UTC)[reply]
In theory, yes. But actually doing so would probably be technically prohibitive. Anomie (talk) 14:46, 10 December 2013 (UTC)[reply]
Oh, yes, absolutely. Don't worry, I highly doubt Legal (at least under my watch) will be in the business of forcing anybody to be open up and edit dumps :) -LVilla (WMF) (talk)
@Aviertje: I should have said this earlier, but this is about information you post publicly, as opposed to information we record privately and then later make public. So, for example, if you put your real name in your user name, or post your mailing address on your talk page, that is public information; we can't reasonably know about it or treat it specially (though in some circumstances the community may help you delete it). Does that make sense?
If it would help, we could add something like the italic text: "Any information you post publicly on the Wikimedia Sites is just that – public. For example, if you put your mailing address on your talk page, that is public, and not protected by this policy. Please think carefully about your desired level of anonymity before you disclose personal information on your user page or elsewhere." If you have any other suggestions on how to make it more clear, please let us know. -LVilla (WMF) (talk) 23:41, 18 December 2013 (UTC)[reply]

Regarding site visiting logs

First question: is our every visit to wikimedia sites logged (e.g. some ip, logged in or not, visited page https://meta.wikimedia.org/w/xxxx at some time) and stored? If yes, then how long will it be stored? The current Privacy policy says: "When a visitor requests or reads a page, or sends email to a Wikimedia server, no more information is collected than is typically collected by web sites. The Wikimedia Foundation may keep raw logs of such transactions, but these will not be published or used to track legitimate users.", in which the "may keep raw logs" is ambiguous. Also, regarding "these will not be published or used to track legitimate users." does that mean these data can be used to track illegitimate(for example, suspected vandalism) users?

Second question: recently I just heard some user claiming that though Checkusers' range of access excludes user visit log, in some necessary occasions they can apply to access those data. Is that true?--朝鲜的轮子 (talk) 06:57, 4 December 2013 (UTC)[reply]

CheckUser does not have access to a user's visit log. Legoktm (talk) 20:23, 4 December 2013 (UTC)[reply]
By "does not have access", do you mean "never ever, even when there is need", or "possible when checking such log can be helpful to proving connections between users"?--朝鲜的轮子 (talk) 03:15, 5 December 2013 (UTC)[reply]
Checkusers only have access to what is stored in the checkuser table. A user's visits are not stored in that table. Hence, checkusers "never ever" have access to it via the CheckUser tool. Legoktm (talk) 03:17, 5 December 2013 (UTC)[reply]
And Checkusers will never ever use anything beyond reach of Checkuser tool?--朝鲜的轮子 (talk) 03:56, 5 December 2013 (UTC)[reply]
What User:Legoktm wrote is incomplete. Is there other information, stored on some hardware controlled by the Wikimedia Foundation, in addition to the information available to checkusers? If so, what information is available at that location, and who has access to it? --Stefan2 (talk) 21:56, 7 December 2013 (UTC)[reply]
Well, I would likely be the one they'd have to apply to - and I've never heard of such a thing. To my knowledge, there is no such application process or access to any other data. I don't want to categorically speak to what may or may not be on the servers - I'm not technical enough to know - but I can say that if it exists, it is not and has not been used that way. At least, not for the last several years that I've been around. Philippe (WMF) (talk) 00:41, 8 December 2013 (UTC)[reply]
wikitech:Logs has a summary of the sorts of raw access logs that are probably being referred to here (note this may not be a complete list). Access to this data is limited to people with access to the servers involved, and as far as I know getting access requires an NDA and is generally limited to WMF employees and contractors involved in maintaining the site and software. Also as far as I know, the sorts of illegitimate uses this data might be used to track are more along the lines of someone trying to break or break into the servers, not on-wiki vandalism. BJorsch (WMF) (talk) 14:37, 9 December 2013 (UTC)[reply]
The current privacy policy only allows sampled logs, which means it's hard to do any tracking/user profiling/fingerprinting/user behaviour analysis/however you may wish to call it. The proposed text, in short, proposes to allow unlimited tracking; see in particular #Unsampled request logs/tracking and #Reader Privacy at the Internet Archive for more information. --Nemo 14:17, 7 December 2013 (UTC)[reply]
I think the major concern on unsampled tracking is fundraising and research. What about anti-vandalism? Does WMF think that it is necessary and legitimate to use anything to if it helps to identify a vandal, in principle?--朝鲜的轮子 (talk) 22:52, 11 December 2013 (UTC)[reply]
Thank you for your questions 朝鲜的轮子! For the first one, as you say, we collect different types of information from users (either automatically or intentionally) when visiting Wikimedia Sites, logged in or not. We are currently working on data retention guidelines that will apply to all non-public data we collect from Wikimedia Sites. The guidelines will describe the different types of information that we collect (with examples), and will describe for how long each type of information would be retained. The data retention guidelines would work along with the Privacy Policy, being updated over time to reflect current retention practices, and will allow us to further fulfill our commitment in the Privacy Policy of keeping your private data “for the shortest possible time”.
Regarding your second question, I believe Philippe (WMF)’s comment covers exactly what you ask. From my knowledge, we have no way for Checkusers to access any type of raw server log in their Checkuser capacity. Furthermore, we have never given log access to the community and we have no intention of doing so. Please let us know more information on where you heard this if you want us to dive deeper into this. Thanks again! --JVargas (WMF) (talk) 00:08, 19 December 2013 (UTC)[reply]

Please add concerning user profiles

Sorry, me English is not good enough to write it directly in English, so I hope somebody will translate it.

  • Wir veröffentlichen ohne Deine ausdrückliche Zustimmung kein Nutzerprofil von Dir, also Daten, die Deine zeitlichen Editiergewohnheiten und Interessengebiete zusammenfassen. Wenn wir Daten an andere weitergeben, die das Erstellen solcher Profile ermöglichen (zum Beispiel WikiLabs), so verpflichten wir sie, ebenfalls keine in dieser Weise aggregierten Nutzerdaten ohne Deine Zustimmung zu veröffentlichen.

--Anka Friedrich (talk) 11:25, 7 December 2013 (UTC)[reply]

Rough translation: "Withour your explicit consent, we do not publish user profiles about you, i.e. data summarizing your temporal editing habits and interest areas. If we release data to others who enable the generation of such profiles (e.g. WikiLabs), we require them to likewise not publish user data that have been aggregated in this way, except with your consent." Regards, Tbayer (WMF) (talk) 03:34, 10 December 2013 (UTC)[reply]
Considering that second sentence would require us to stop publicly releasing data dumps and to break history pages and the API, I would oppose such a change. Anomie (talk) 14:49, 10 December 2013 (UTC)[reply]
Tbayer, thank You for Translation. --Anka Friedrich (talk) 15:25, 15 December 2013 (UTC)[reply]
Anomie, no, but everybody, who gets the dump or gets access to the API would have to aggree not to aggregate data without consent. --Anka Friedrich (talk) 15:25, 15 December 2013 (UTC)[reply]
The dumps and access to the API are given to everyone in the world without restrictions. And I oppose requiring people to "sign up" so we can force them to agree to some pointless requirement before allowing them to access these things. You also overlooked history pages, Special:Contributions, and other on-wiki interfaces which would also have to be restricted or broken. Anomie (talk) 14:16, 16 December 2013 (UTC)[reply]

The ability to store unsampled log data (a.k.a. loss of privacy in exchange for money)

One of the changes between the existing privacy policy and the new draft is that the draft will now allow the Foundation to retain unsampled log data — in effect, this means that every single visit by every single visitor to each and every Wikimedia project (and perhaps other sites owned/run by the Foundation) will now be recorded and retained on WMF servers. It is shocking to me that the only reasons given for such a broad, controversial and hardly advertised change are (1) fundraising and (2) the ability to measure statistics in Wikipedia Zero, a project that is limited in terms of geography, scope and type of access (mobile devices).

Given that Wikipedia Zero is just one of many project led by the Foundation, and that it applies to a limited number of visitors who are using a very specific medium to access the projects, I fail to see the need to sacrifice the privacy of everyone who will ever visit a Wikimedia project. Moreover, I am disappointed and terrified to learn that the Foundation thinks it is reasonable to sacrifice our privacy in exchange for more money — especially since our fundraising campaigns appear to have been quite effective, or at least to have enabled the WMF to reach their revenue goals without much trouble. odder (talk) 22:22, 7 December 2013 (UTC)[reply]

"will now be recorded and retained" is probably a bit strong. s/will/may/ would probably be more accurate. Personally, I can see the ability to record full logs when needed to be useful in debugging, performance analysis, and analysis of which features should be prioritized for improvement or development or even possible removal. Boring stuff to most people. BJorsch (WMF) (talk) 14:46, 9 December 2013 (UTC)[reply]
"May" is only a legalese euphemism for "will" (in this case). If there are no plans to store and use unsampled log data, for whatever purpose, then surely there will be no problem to revert to the wording of the current privacy policy, which only allows storing sampled data. odder (talk) 15:35, 9 December 2013 (UTC)[reply]
Believe whatever you want, I'm not about to engage in arguing over conspiracy theories. BJorsch (WMF) (talk) 15:25, 10 December 2013 (UTC)[reply]
Maybe I missed it somewhere but it would be helpful to listen all data types from the logs. Especially I am interested in this question: Do you save every pageview incl. IP address and/or username? Do you have logs in which you can see what page I have read (!), how long I have read them etc etc. Raymond (talk) 16:55, 9 December 2013 (UTC)[reply]
Yes, I also would appreciate to know if you have, or plan, such visitor logs. --Martina Nolte (talk) 19:46, 9 December 2013 (UTC)[reply]
+1 --Steinsplitter (talk) 19:48, 9 December 2013 (UTC)[reply]
+1, by all means! Ca$e (talk) 09:36, 10 December 2013 (UTC)[reply]
+1 ...84.133.109.103 09:38, 10 December 2013 (UTC)[reply]
+1 -jkb- 09:41, 10 December 2013 (UTC)[reply]
+1 I told you so."Dance" Alexpl (talk) 09:57, 10 December 2013 (UTC)[reply]
+1 for showing an example of the currently log data. --Zhuyifei1999 (talk) 10:08, 10 December 2013 (UTC)[reply]
+1 ---<(kmk)>- (talk) 13:46, 10 December 2013 (UTC) there is no need to trade my privacy for (even more) funds.[reply]
+1 -- smial (talk) 14:33, 11 December 2013 (UTC)[reply]
I do not know of any logs that record all pageviews, or of any plans to start collecting such logs. The logs testwiki.log and test2wiki.log mentioned on wikitech:Logs do contain user information and URL (as part of a larger amount of debugging information) for requests that aren't served from the caches, but only for testwiki and test2wiki which the vast majority of people have no reason to ever visit. I also don't know of any logs or log analyses that show pages read by any user or IP or how long anyone might have spent reading any particular page. BJorsch (WMF) (talk) 15:25, 10 December 2013 (UTC)[reply]
I thought the fundraising people already do exactly that and call it "User site behavior collection". (I cant actually tell from that link if those proposals have already been implemented ?!?) Alexpl (talk) 18:04, 10 December 2013 (UTC)[reply]
I was not aware of that. Note though that's a proposal and not something that is currently being done. It seems like a useful study though, and it's far from tracking everyone all the time that some of the more paranoid here seem to be expecting. BJorsch (WMF) (talk) 00:52, 11 December 2013 (UTC)[reply]
I am pretty sure their intentions were good. But due to the nature of the wikipedia project, it seems a bit "pre Snowden" or just unworldly to believe that WMF can limit the access to those Data once the mechanism to collect them has been installed. The first dude with access, who seeks future employment at a hip company (...), can do irreversable damage and sell every WP-contributor out. Alexpl (talk) 08:58, 12 December 2013 (UTC)[reply]
@User:BJorsch (WMF). If there are no logs and no plans to start collecting them, why does was the draft changed, so that the foundation would be allowed to do just that?---<(kmk)>- (talk) 18:52, 10 December 2013 (UTC)[reply]
For the reasons that have been officially stated, perhaps? But really, you'd probably want to ask one of the people involved in drafting this. I just commented here to add a few other potential uses for the ability to collect non-sampled logs when needed, since people seemed to be focusing overmuch on the two examples in the draft. BJorsch (WMF) (talk) 00:52, 11 December 2013 (UTC)[reply]
does "I do not know of any logs that record all pageviews, or of any plans to start collecting such logs. " mean Wikimedia does not have such logs and plans or is it meant literally; you BJorsch do not know about it? ...Sicherlich Post 08:39, 11 December 2013 (UTC)[reply]
+1 -- smial (talk) 14:32, 11 December 2013 (UTC)[reply]
The latter, obviously. I'm certainly not aware of everything everyone associated with the Foundation does or plans, nor am I in any position to set policy. BJorsch (WMF) (talk) 15:19, 11 December 2013 (UTC)[reply]
I guess we just assumed that the "WMF" tag in your signature would grant you preferential access to all relevant information on this matter :) Alexpl (talk) 16:57, 11 December 2013 (UTC)[reply]

Okay, so now we know the private opinion and asuming of you BJorsch. Is it possible to get an officical statement of the WMF? ...Sicherlich Post 17:20, 12 December 2013 (UTC)[reply]

BJorsch's opinion is that users asking these questions are more paranoid. I would sure prefer an official, and hopeully more sober, WMF statement on this logging issue. --Martina Nolte (talk) 17:36, 20 December 2013 (UTC)[reply]

Why do you need special logging for WP Zero? PiRSquared17 (talk) 20:27, 20 December 2013 (UTC)[reply]

Regarding some introductory remarks

Hi,

I would like to share some observations from reading the introductory remarks of this document. I apologize if anything has already been brought up.

  • "[1] Gathering, sharing, and understanding information is what built the Wikimedia Sites. [2] Continuing to do so in novel ways helps us learn how to make them better. [3] We believe that information-gathering and use should go hand-in-hand with transparency. [4] This Privacy Policy explains how the Wikimedia Foundation, the non-profit organization that hosts the Wikimedia Sites, like Wikipedia, collects, uses, and shares information we receive from you." — That sounds really strange to me. What built the Wikimedia Sites? Our contributions to the project (i.e. content contributed by volunteers), and that's pretty obvious to the reader of this document. Reading [1] in isolation, information is hence understood in the sense of information about a public figure or a historical event. However, between [1,2] and [2,3,4], the meaning of "information" gradually shifts. Suddenly, "information" is no longer what is contributed to the projects but, in fact, "personal information." If I weren't sure that you're writing this policy in good faith, I'd probably interpret this as a (pretty obvious) rhetorical trick.
I see your point, Pajz. What would be your recommended rewrite? One possibility:
The Wikimedia movement is founded on a simple, but powerful principle: we can do more together than any of us can do alone. We cannot work collectively without gathering, sharing and analyzing information about our users as we seek new ways to make our Wikimedia Sites more useable, effective, safer, and useful.
We believe that information-gathering and use should go hand-in-hand with transparency. This Privacy Policy explains
Geoffbrigham (talk) 22:41, 18 December 2013 (UTC)[reply]
Yep, that's fine IMO. — Pajz (talk) 16:50, 19 December 2013 (UTC)[reply]
I will ask James to make the change (after Michelle gives her thumbs up). Thanks for the suggestion. Geoffbrigham (talk) 17:58, 19 December 2013 (UTC)[reply]
 Done Jalexander--WMF 19:52, 19 December 2013 (UTC)[reply]
  • "The Wikimedia Sites were primarily created to help you share your knowledge with the world, and we share your contributions because you have asked us to do so." — Really? As far as I'm aware, the Wikimedia Sites were primarily created to help you be able to access all knowledge of the world ("Imagine a world ..."). The sentence sounds like the sites were primarily a platform for users to express themselves whereas, in fact, I think it's quite clear that contributors are the means, not the end.
To be honest, I'm OK with this formulation. The Wikimedia vision is: "Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment." Consistent with this vision, the sites were created to allow users (like you) to "share [their] knowledge of the world." Also, a person's contributions are shared only when the user requests that we do so as part of this overall vision. I'm open to an alternative proposal that captures the needs of this paragraph, but for now I would personally leave it as it is. :) Geoffbrigham (talk) 23:01, 18 December 2013 (UTC)[reply]
Ah, never heard of that mission statement. What I had in mind was "Imagine a world in which every single person on the planet is given free access to the sum of all human knowledge. That's what we're doing." (https://en.wikiquote.org/wiki/Jimmy_Wales#Sourced) (Which makes sense.) Hmm. I don't like the mission statement either, but in this case it's outside the scope of this policy. — Pajz (talk) 16:50, 19 December 2013 (UTC)[reply]
Interesting. The nuances in the differences are meaningful. Here is the official vision statement (I think): http://wikimediafoundation.org/wiki/Vision I accordingly will leave this as it is. Geoffbrigham (talk) 17:58, 19 December 2013 (UTC)[reply]
  • "Because everyone (not just lawyers) should be able to easily understand how and why their information is collected and used, we use common language instead of more formal terms. Here is a table of translations" — I don't quite see the connection between the first conditional clause and the presentation of a "table of translations". Actually, I was quite amused reading this passage. It sounds like "Hey, we want to make things really simple here, that's why we replaced everything difficult with a word, and here's the dictionary you need to understand these words". Which, of course, would be simplification voodoo. I think you should separate these statements. You first point is that you use common language, your second point is that, in order to avoid redundancy (or whatever), you prepend some definitions.
How about this:
Because everyone (not just lawyers) should be able to easily understand how and why their information is collected and used, we use common language instead of more formal terms throughout this Policy. To help ensure your understanding of some particular key terms, here is a table of translations:
Geoffbrigham (talk) 23:01, 18 December 2013 (UTC)[reply]
That's better. — Pajz (talk) 16:50, 19 December 2013 (UTC)[reply]
Great. I will ask for the change (as above). Geoffbrigham (talk) 17:58, 19 December 2013 (UTC)[reply]
 Done Jalexander--WMF 19:57, 19 December 2013 (UTC)[reply]

Best, — Pajz (talk) 17:05, 13 December 2013 (UTC)[reply]

Short summaries of each section

Reading an entire privacy policy requires a lot of effort, even if it's well-written. Previously we had Rory to break up the sections and give you a pause while reading; now we only have the blue section icons, which I think are the bare minimum. I propose that we add back something to make reading more pleasant. One way to do that is 500px's short summaries of each section. What do you think? Can we do something similar? //Shell 09:59, 17 December 2013 (UTC)[reply]

Hi Shell! Thank you for the suggestion. We are working on drafting some summarizing bullet points to put into the left column and will put them up as soon as we have them ready. We would definitely appreciate your input (and input from others) on the bullet points once they are ready. Mpaulson (WMF) (talk) 20:39, 18 December 2013 (UTC)[reply]

Handling our user data - an appeal

Preface (Wikimedia Deutschland)

For several months, there have been regular discussions on data protection and the way Wikimedia deals with it, in the German-speaking community – one of the largest non-English-speaking communities in the Wikimedia movement. Of course, this particularly concerns people actively involved in Wikipedia, but also those active on other Wikimedia projects.

The German-speaking community has always been interested in data protection. However, this particular discussion was triggered when the Deep User Inspector tool on Tool Labs nullified a long-respected agreement in the Toolserver, that aggregated personalized data would only be available after an opt-in by the user.

As the Wikimedia Foundation is currently reviewing its privacy policy and has requested feedback and discussion her by 15 January, Wikimedia Deutschland has asked the community to draft a statement. The text presented below was largely written by User:NordNordWest and signed by almost 120 people involved in German Wikimedia projects. It highlights the many concerns and worries of the German-speaking community, so we believe it can enhance the discussion on these issues. We would like to thank everyone involved.

This text was published in German simultaneously in the Wikimedia Deutschland-blog and in the Kurier, an analogue to the English "Signpost". This translation has been additionally sent as a draft to the WMF movement-blog.

(preface Denis Barthel (WMDE) (talk), 20.12.)

Starting position

The revelations by Edward Snowden and the migration of programs from the Toolserver to ToolLabs prompted discussions among the community on the subject of user data and how to deal with it. On the one hand, a diverse range of security features are available to registered users:

  • Users can register under a pseudonym.
  • The IP address of registered users is not shown. Only users with CheckUser permission can see IP addresses.
  • Users have a right to anonymity. This includes all types of personal data: names, age, background, gender, family status, occupation, level of education, religion, political views, sexual orientation, etc.
  • As a direct reaction to Snowden’s revelations, the HTTPS protocol has been used as standard since summer 2013 (see m:HTTPS), so that, among other things, it should no longer be visible from outside which pages are called up by which users and what information is sent by a user.

On the other hand, however, all of a user’s contributions are recorded with exact timestamps. Access to this data is available to everyone and allows the creation of user profiles. While the tools were running on the Toolserver, user profiles could only be created from aggregated data with the consent of the user concerned (opt-in procedure). This was because the Toolserver was operated by Wikimedia Deutschland and therefore subject to German data protection law, one of the strictest in the world. However, evaluation tools that were independent of the Foundation and any of its chapters already existed.

One example is Wikichecker, which, however, only concerns English-language Wikipedia. The migration of programs to ToolLabs, which means that they no longer have to function in accordance with German data protection law, prompted a survey of whether a voluntary opt-in system should still be mandatory for X!’s Edit Counter or whether opt-in should be abandoned altogether. The survey resulted in a majority of 259 votes for keeping opt-in, with 26 users voting for replacing it with an opt-out solution and 195 in favor of removing it completely. As a direct reaction to these results, a new tool – Deep User Inspector – was programmed to provide aggregated user data across projects without giving users a chance to object. Alongside basic numbers of contributions, the tool also provides statistics on, for example, the times on weekdays when a user was active, lists of voting behavior, or a map showing the location of subjects on which the user has edited articles. This aggregation of data allows simple inferences to be made about each individual user. A cluster of edits on articles relating to a certain region, for example, makes it possible to deduce where the user most probably lives.

Problems

Every user knows that user data is recorded every time something is edited. However, there is a significant difference between a single data set and the aggregated presentation of this data. Aggregated data means that the user’s right to anonymity can be reduced, or, in the worst case, lost altogether. Here are some examples:

  • A list of the times that a user edits often allows a deduction to be made as to the time zone where he or she lives.
  • From the coordinates of articles that a user has edited, it is generally possible to determine the user’s location even more precisely. It would be rare for people to solely edit area X, when in fact they came from area Y.
  • The most precise deductions can be made by analyzing the coordinates of a photo location, as it stands to reason that the user must have been physically present to take the photo.
  • Places of origin and photo locations can reveal information on the user’s means of transport (e.g. whether someone owns a car), as well as on his or her routes and times of travel. This makes it possible to create movement profiles on users who upload a large number of photos.
  • Time analyses of certain days of the year allow inferences to be drawn about a user’s family status. It is probable, for example, that those who tend not to edit during the school holidays are students, parents or teachers.
  • Assumptions on religious orientation can also be made if a user tends not to edit on particular religious holidays.
  • Foreign photo locations either reveal information about a user’s holiday destination, and therefore perhaps disclose something about his or her financial situation, or suggest that the user is a photographer.
  • If users work in a country or a company where editing is prohibited during working hours, they are particularly vulnerable if the recorded time reveals that they have been editing during these hours. In the worst-case scenario, somebody who wishes to harm the user and knows extra information about his or her life (which is not unusual if someone has been an editor for several years) could pass this information on to the user’s employer. Disputes within Wikipedia would thus be carried over into real life.

Suggestions

Wikipedia is the fifth most visited website in the world. The way it treats its users therefore serves as an important example to others. It would be illogical and ridiculous to increase user protection on the one hand but, on the other hand, to allow users’ right to anonymity to be eroded. The most important asset that Wikipedia, Commons and other projects have is their users. They create the content that has ensured these projects’ success. But users are not content, and we should make sure that we protect them. The Wikimedia Foundation should commit to making the protection of its registered users a higher priority and should take the necessary steps to achieve this. Similarly to the regulations for the Toolserver, it should first require an opt-in for all the tools on its own servers that compile detailed aggregations of user data. Users could do this via their personal settings, for example. Since Wikipedia was founded in 2001, the project has grown without any urgent need for these kinds of tools, and at present there seems to be no reason why this should change in the future. By creating free content, the community enables Wikimedia to collect the donations needed to run WikiLabs. That this should lead to users loosing their right of anonymity, although the majority opposes this, is absurd. To ensure that user data are not evaluated on non-Wikimedia servers, the Foundation is asked to take the following steps:

  • Wikipedia dumps should no longer contain any detailed user information. The license only requires the name of the author and not the time or the day when they edited.
  • There should only be limited access to user data on the API.
  • It might be worth considering whether or not it is necessary or consistent with project targets to store and display the IP addresses of registered users (if they are stored), as well as precise timestamps that are accurate to the minute of all their actions. The time limit here could be how long it reasonably takes CheckUsers to make a query. After all, data that are not available cannot be misused for other purposes.

Original signatures

  1. Martina Disk. 21:28, 24. Nov. 2013 (CET)
  2. NNW 18:52, 26. Nov. 2013 (CET)
  3. ireas :disk: 19:23, 26. Nov. 2013 (CET)
  4. Henriette (Diskussion) 19:24, 26. Nov. 2013 (CET)
  5. Raymond Disk. 08:38, 27. Nov. 2013 (CET)
  6. Richard Zietz 22px|8)|link= 22:18, 27. Nov. 2013 (CET)
  7. Alchemist-hp (Diskussion) 23:47, 27. Nov. 2013 (CET)
  8. Lencer (Diskussion) 11:54, 28. Nov. 2013 (CET)
  9. Smial (Diskussion) 00:09, 29. Nov. 2013 (CET)
  10. Charlez k (Diskussion) 11:55, 29. Nov. 2013 (CET)
  11. elya (Diskussion) 19:07, 29. Nov. 2013 (CET)
  12. Krib (Diskussion) 20:26, 29. Nov. 2013 (CET)
  13. Jbergner (Diskussion) 09:36, 30. Nov. 2013 (CET)
  14. TMg 12:55, 30. Nov. 2013 (CET)
  15. AFBorchertD/B 21:22, 30. Nov. 2013 (CET)
  16. Sargoth 22:06, 2. Dez. 2013 (CET)
  17. Hilarmont 09:27, 3. Dez. 2013 (CET)
  18. --25px|verweis=Portal:Radsport Poldine - AHA 13:09, 3. Dez. 2013 (CET)
  19. XenonX3 – (RIP Lady Whistler) 13:11, 3. Dez. 2013 (CET)
  20. -- Ra'ike Disk. LKU WPMin 13:19, 3. Dez. 2013 (CET)
  21. --muns (Diskussion) 13:22, 3. Dez. 2013 (CET)
  22. --Hubertl (Diskussion) 13:24, 3. Dez. 2013 (CET)
  23. --Aschmidt (Diskussion) 13:28, 3. Dez. 2013 (CET)
  24. Anika (Diskussion) 13:32, 3. Dez. 2013 (CET)
  25. K@rl 13:34, 3. Dez. 2013 (CET)
  26. --DaB. (Diskussion) 13:55, 3. Dez. 2013 (CET) (Auch wenn ich das mit den Dumps etwas übertrieben finde.)
  27. --AndreasPraefcke (Diskussion) 14:05, 3. Dez. 2013 (CET) Gerade das mit den Dumps ist wichtig, und auch auf den Wikipedia-Websites sollte diese Info nicht angezeigt werden. So ungefähr (nicht genauer durchdacht, nur als ungefähre Idee): Edits von heute: wie gehabt sekundengenau angezeigt, Edits von dieser Woche: minutengenau, Edits der letzten sches Wochen: stundengenau, Edits der letzten 12 Monate: tagesgenau, Edits davor: monatsgenau – die Reihenfolge muss natürlich gewahrt werden; Edits und darauffolgende reine Reverts: ganz aus der Datenbank raus)
    Man sollte aber trotz berechtigter Interessen am Datenschutz nicht vergessen, dass diese Art der Datums-/Zeitbeschneidung ein zweischneidiges Schwert ist. Versionsgeschichtenimporte einerseits und URV-Prüfungen andererseits würden deutlich erschwert ;-) -- Ra'ike Disk. LKU WPMin 14:19, 3. Dez. 2013 (CET) (wobei für letzteres eine tagesgenaue Anzeige für den Vergleich mit Webarchiv reichen würde)
  28. --Mabschaaf 14:08, 3. Dez. 2013 (CET)
  29. --Itti 14:28, 3. Dez. 2013 (CET)
  30. ...Sicherlich Post 14:52, 3. Dez. 2013 (CET)
  31. --Odeesi talk to me rate me 16:29, 3. Dez. 2013 (CET)
  32. --gbeckmann Diskussion 17:23, 3. Dez. 2013 (CET)
  33. --Zinnmann d 17:24, 3. Dez. 2013 (CET)
  34. --Kolossos 17:41, 3. Dez. 2013 (CET)
  35. -- Andreas Werle (Diskussion) (heute mal "ohne" Zeitstempel...)
  36. --Gleiberg (Diskussion) 18:03, 3. Dez. 2013 (CET)
  37. --Jakob Gottfried (Diskussion) 18:30, 3. Dez. 2013 (CET)
  38. --Wiegels „…“ 18:55, 3. Dez. 2013 (CET)
  39. --Pyfisch (Diskussion) 20:29, 3. Dez. 2013 (CET)
  40. -- NacowY Disk 23:01, 3. Dez. 2013 (CET)
  41. -- RE rillke fragen? 23:17, 3. Dez. 2013 (CET) Ja. Natürlich nicht nur die API, sondern auch die "normalen Seiten" (index.php) sollten ein (sinnvolles) Limit haben. Eine Einschränkung von Endanwendungen durch Richtlinien lehne ich ab, genauso wie überstürztes Handeln. Man wird viel abwägen müssen und eventuell Ausnahmen für bestimmte Benutzergruppen schaffen müssen oder neue Wege, Daten darzustellen. Checkuser-Daten werden meines Wissens automatisch nach 3 Mon. gelöscht: S. User:Catfisheye/Fragen_zur_Checkusertätigkeit_auf_Commons#cite_ref-5
  42. --Christian1985 (Disk) 23:25, 3. Dez. 2013 (CET)
  43. --Jocian 04:45, 4. Dez. 2013 (CET)
  44. -- CC 04:50, 4. Dez. 2013 (CET)
  45. --Don-kun Diskussion 07:10, 4. Dez. 2013 (CET)
  46. --Zeitlupe (Diskussion) 09:09, 4. Dez. 2013 (CET)
  47. --Geitost 09:25, 4. Dez. 2013 (CET)
  48. Everywhere West (Diskussion) 09:29, 4. Dez. 2013 (CET)
  49. -jkb- 09:29, 4. Dez. 2013 (CET)
  50. -- Wurmkraut (Diskussion) 09:47, 4. Dez. 2013 (CET)
  51. Simplicius Hi… ho… Diderot! 09:53, 4. Dez. 2013 (CET)
  52. --Hosse Talk 12:49, 4. Dez. 2013 (CET)
  53. Port(u#o)s 12:57, 4. Dez. 2013 (CET)
  54. --Howwi (Diskussion) 14:26, 4. Dez. 2013 (CET)
  55.  — Felix Reimann 17:17, 4. Dez. 2013 (CET)
  56. --Bubo 18:30, 4. Dez. 2013 (CET)
  57. --Coffins (Diskussion) 19:22, 4. Dez. 2013 (CET)
  58. --Firefly05 (Diskussion) 20:09, 4. Dez. 2013 (CET)
  59. Es geht darum, den Grundsatz und das Regel-Ausnahme-Schema klarzustellen. --Björn 20:13, 4. Dez. 2013 (CET)
  60. --V ¿ 21:46, 4. Dez. 2013 (CET)
  61. --Merlissimo 21:59, 4. Dez. 2013 (CET)
  62. --Stefan »Στέφανος«  22:02, 4. Dez. 2013 (CET)
  63. -<)kmk(>- (Diskussion) 22:57, 4. Dez. 2013 (CET)
  64. --lutki (Diskussion) 23:06, 4. Dez. 2013 (CET)
  65. -- Ukko 23:22, 4. Dez. 2013 (CET)
  66. --Video2005 (Diskussion) 02:17, 5. Dez. 2013 (CET)
  67. --Baumfreund-FFM (Diskussion) 07:30, 5. Dez. 2013 (CET)
  68. --dealerofsalvation 07:35, 5. Dez. 2013 (CET)
  69. --Gripweed (Diskussion) 09:32, 5. Dez. 2013 (CET)
  70. --Sinuhe20 (Diskussion) 10:05, 5. Dez. 2013 (CET)
  71. --PerfektesChaos 10:22, 5. Dez. 2013 (CET)
  72. --Tkarcher (Diskussion) 13:51, 5. Dez. 2013 (CET)
  73. --BishkekRocks (Diskussion) 14:43, 5. Dez. 2013 (CET)
  74. --PG ein miesepetriger Badener 15:34, 5. Dez. 2013 (CET)
  75. --He3nry Disk. 16:32, 5. Dez. 2013 (CET)
  76. --Sjokolade (Diskussion) 18:15, 5. Dez. 2013 (CET)
  77. --Lienhard Schulz Post 18:43, 5. Dez. 2013 (CET)
  78. --Kein Einstein (Diskussion) 19:35, 5. Dez. 2013 (CET)
  79. --Stefan (Diskussion) 22:19, 5. Dez. 2013 (CET)
  80. --Rauenstein 22:58, 5. Dez. 2013 (CET)
  81. --Anka Wau! 23:45, 5. Dez. 2013 (CET)
  82. --es grüßt ein Fröhlicher DeutscherΛV¿? Diskussionsseite 06:42, 6. Dez. 2013 (CET)
  83. --Doc.Heintz 08:55, 6. Dez. 2013 (CET)
  84. --Shisha-Tom ohne Uhrzeit, 6. Dez. 2013
  85. --BesondereUmstaende (Diskussion) 14:57, 6. Dez. 2013 (CET)
  86. --Varina (Diskussion) 16:37, 6. Dez. 2013 (CET)
  87. --Studmult (Diskussion) 17:30, 6. Dez. 2013 (CET)
  88. --GT1976 (Diskussion) 20:51, 6. Dez. 2013 (CET)
  89. --Wikifreund (Diskussion) 22:04, 6. Dez. 2013 (CET)
  90. --Wnme 23:07, 6. Dez. 2013 (CET)
  91. -- ST 00:47, 7. Dez. 2013 (CET)
  92. --Flo Beck (Diskussion) 13:45, 7. Dez. 2013 (CET)
  93. IW 16:34, 7. Dez. 2013 (CET)
  94. --Blech (Diskussion) 17:48, 7. Dez. 2013 (CET)
  95. --Falkmart (Diskussion) 18:21, 8. Dez. 2013 (CET)
  96. --Partynia RM 22:53, 8. Dez. 2013 (CET)
  97. --ElRaki 01:09, 9. Dez. 2013 (CET) so viele Benutzerdaten wie möglich löschen/so wenig Benutzerdaten wie unbedingt nötig behalten
  98. --Userin:MoSchle--MoSchle (Diskussion) 03:57, 9. Dez. 2013 (CET)
  99. --Daniel749 Disk. (STWPST) 16:32, 9. Dez. 2013 (CET)
  100. --Knopfkind 21:19, 9. Dez. 2013 (CET)
  101. --Saibot2 (Diskussion) 23:14, 9. Dez. 2013 (CET)
  102. --Atlasowa (Diskussion) 15:03, 10. Dez. 2013 (CET) Der Aufruf richtet sich aber ebenso an WMDE, die ja die Abschaffung des Toolservers beschlossen hat und damit die Entwicklung zum DUI ermöglicht hat. Nur Briefträger zu WMF sein ist zu wenig. Wenn WMDE sich Gutachten zur Spendenkultur in Deutschland schreiben lassen kann, um beim WMF Lobbyismus für eine eigene Spendensammlung zu machen, dann kann WMDE ja wohl auch Gutachten zum dt./europ. Datenschutz in Auftrag geben.
  103. Conny 20:49, 10. Dez. 2013 (CET).
  104. ----Fussballmann Kontakt 21:38, 10. Dez. 2013 (CET)
  105. --Steinsplitter (Disk) 23:40, 10. Dez. 2013 (CET)
  106. --Gps-for-five (Diskussion) 03:03, 11. Dez. 2013 (CET)
  107. --Kolja21 (Diskussion) 03:55, 11. Dez. 2013 (CET)
  108. --Laibwächter (Diskussion) 09:50, 11. Dez. 2013 (CET)
  109. -- Achim Raschka (Diskussion) 15:18, 11. Dez. 2013 (CET)
  110. --Alabasterstein (Diskussion) 20:32, 13. Dez. 2013 (CET)
  111. --Grueslayer Diskussion 10:51, 14. Dez. 2013 (CET)
  112. Daten nur erheben, wenn unbedingt für den Betrieb (bzw. rechtlich) notwendig. Alles andere sollte gar nicht erhoben werden. Die Rückschlüsse auf die Zeitzonen und das Wohngebiet (häufig auch von Benutzern selbst angegeben) sehe ich gar nicht als gravierend an. Vielmehr, dass im Wiki alles protokolliert wird. Die halte ich nicht für nötig. Wer muss schon wissen, wer vor 10 Jahren wo genau editiert hat. Nach einem Jahr sollte die Vorratsdatenspeicherung anonymisiert werden (also in der Artikelhistorie kanns dirn bleiben, da nötig, jedoch nicht in der Benutzer-Beitragsliste).--Alberto568 (Diskussion) 21:51, 14. Dez. 2013 (CET)
  113. --Horgner (Diskussion) 15:48, 16. Dez. 2013 (CET)
  114. --Oursana (Diskussion) 21:52, 16. Dez. 2013 (CET)
  115. --Meslier (Diskussion) 23:53, 16. Dez. 2013 (CET)
  116. -- Martin Bahmann (Diskussion) 09:20, 18. Dez. 2013 (CET)
  117. DerHexer (Disk.Bew.) 15:24, 19. Dez. 2013 (CET)
  118. Neotarf (Diskussion) 01:58, 20. Dez. 2013 (CET)
  119. --Lutheraner (Diskussion) 13:17, 20. Dez. 2013 (CET)

Comments

Can WMDE get an EU lawyer to assess whether such analysis of data is lawful under the current or draft EU directive and what it would take to respect it? I see that the draft contains some provisions on "analytics"; if the WMF adhered to EU standards (see also #Localisation des serveurs aux Etats-Unis et loi applicable bis) we might automatically solve such [IMHO minor] problems too. --Nemo 16:12, 20 December 2013 (UTC)[reply]

See also #Please_add_concerning_user_profiles (permalink, s) and #Generation_of_editor_profiles (permalink, s). PiRSquared17 (talk) 20:36, 20 December 2013 (UTC)[reply]

On a more personal note than the official response below, I shall repeat here advice I have regularly given to editors on the English Wikipedia in my capacity as Arbitrator: "Editing a public wiki is an inherently public activity, akin to participating in a meeting in a public place. While we place no requirement that you identify yourself or give any details about yourself to participate – and indeed do our best to allow you to remain pseudonymous – we cannot prevent bystanders from recognizing you by other methods. If the possibility of being recognized places you in danger or is not acceptable to you, then you should not involve yourself in public activities – including editing Wikipedia." MPelletier (WMF) (talk) 21:10, 20 December 2013 (UTC)[reply]

We can prevent creating user profiles by aggregating data. It has been done at the toolserver. It can be done at WikiLabs. NNW (talk) 21:29, 20 December 2013 (UTC)[reply]

Additional signatures

  1. --Geolina163 (talk) 16:06, 20 December 2013 (UTC)[reply]
  2. --Density (talk) 16:35, 20 December 2013 (UTC)[reply]
  3. --Minihaa (talk) 16:57, 20 December 2013 (UTC) bitte um Datensparsamkeit.[reply]
  4. --Theaitetos (talk) 17:08, 20 December 2013 (UTC)[reply]
  5. -- Sir Gawain (talk) 17:17, 20 December 2013 (UTC)[reply]
  6. --1971markus (talk) 18:26, 20 December 2013 (UTC)[reply]
  7. --Goldzahn (talk) 19:22, 20 December 2013 (UTC)[reply]
  8. --Spischot (talk) 21:38, 20 December 2013 (UTC)[reply]
  9. --Bomzibar (talk) 22:43, 20 December 2013 (UTC)[reply]
    --Charlez k (talk) 22:51, 20 December 2013 (UTC) already signed, see above (Original signatures) --Krib (talk) 23:05, 20 December 2013 (UTC)[reply]

Response

Please note the response by Tfinc above in the Generation of editor profiles and my follow up to it. Obfuscating user contributions data or limiting our existing export will not happen. The Wikipedia projects are wikis, edits to it are by nature public activities that have always been, and always must be, available for scrutiny. MPelletier (WMF) (talk) 21:10, 20 December 2013 (UTC)[reply]

We don't need to keep around timestamps down to a fraction of a second forever. PiRSquared17 (talk) 21:13, 20 December 2013 (UTC)[reply]
Not sure about that. I wonder if de.wiki also has agreed to a decrease of its own right to fork, a right which they constantly use as a threat. Making dumps unusable would greatly reduce the contractual power of de.wiki, dunno if they really want it. --Nemo 21:43, 20 December 2013 (UTC)[reply]

While we believe this proposal is based on legitimate concerns, we want to highlight some of the practical considerations of such a proposal. Due to the holidays, we’ve addressed this only briefly, but we hope it serves to explain our perspective.

In summary, public access to metadata around page creation and editing is critical to the health and well-being of the site and is used in numerous places and for numerous use cases:

  • Protecting against vandalism, incorrect and inappropriate content: there are several bots that patrol Wikipedia’s articles that protect the site against these events. Without public access to metadata, the effectiveness of these bots will be much reduced, and it is impossible for humans to perform these tasks at scale.
  • Community workflows: Processes that contribute to the quality and governance of the project will also be affected: blocking users, assessing adminship nominations, determining eligible participants in article deletions.
  • Powertools: certain bulk processes will be broken without public access to this metadata.
  • Research: researchers around the world use this public metadata for analysis that is useful for both to the site and the movement. It is essential that they continue to have access.
  • Forking: In order to have a full copy of our projects and their change histories all metadata needs to be exposed alongside content.

In summary, public and open-licensed revision metadata is vital to the technical and social functioning of Wikipedia, and any removal of this data would have serious impact on a number of processes and actions critical to the project. Tfinc (talk) 00:54, 21 December 2013 (UTC)[reply]