Policy talk:Privacy policy

From Wikimedia Foundation Governance Wiki

Template:Autotranslate User:MiszaBot/config Template:Autotranslate

Shortcut:
T:P


What is changing?

Several comments below ask about what’s new in this draft as compared to the current privacy policy. To help new folks just joining the conversation, we have outlined the main changes in this box. But feel free to join the discussion about these changes here.

As a general matter, because the current privacy policy was written in 2008, it did not anticipate many technologies that we are using today. Where the current policy is silent, the new draft spells out to users how their data is collected and used. Here are some specific examples:

  1. Cookies: The current policy mentions the use of temporary session cookies and broadly states some differences in the use of cookies between mere reading and logged-in reading or editing. The FAQ in the new draft lists specific cookies that we use and specifies what they are used for and when they expire. The draft policy further clarifies that we will never use third-party cookies without permission from users. It also outlines other technologies that we may consider using to collect data like tracking pixels or local storage.
  2. Location data: Whereas the current policy does not address collection and use of location data, the draft policy spells out how you may be communicating the location of your device through GPS and similar technologies, meta data from uploaded images, and IP addresses. It also explains how we may use that data.
  3. Information we receive automatically: The current policy does not clearly explain that we can receive certain data automatically. The new draft explains that when you make requests to our servers you submit certain information automatically. It also specifies how we use this information to administer the sites, provide greater security, fight vandalism, optimize mobile applications, and otherwise make it easier for you to use the sites.
  4. Limited data sharing: The current policy narrowly states that user passwords and cookies shouldn’t be disclosed except as required by law, but doesn’t specify how other data may be shared. The new draft expressly lists how all data may be shared, not just passwords and cookies. This includes discussing how we share some data with volunteer developers, whose work is essential for our open source projects. It also includes providing non-personal data to researchers who can share their findings with our community so that we can understand the projects and make them better.
  5. Never selling user data: The current policy doesn’t mention this. While long-term editors and community members understand that selling data is against our ethos, newcomers have no way of knowing how our projects are different from most other websites unless we expressly tell them. The new draft spells out that we would never sell or rent their data or use it to sell them anything.
  6. Notifications: We introduced notifications after the current policy was drafted. So, unsurprisingly, it doesn’t mention them. The new draft explains how notifications are used, that they can sometimes collect data through tracking pixels, and how you can opt out.
  7. Scope of the policy: The current policy states its scope in general terms, and we want to be clearer about when the policy applies. The new draft includes a section explaining what the policy does and doesn’t cover in more detail.
  8. Surveys and feedback: The current policy doesn’t specifically address surveys and feedback forms. The new draft explains when we may use surveys and how we will notify you what information we collect.
  9. Procedures for updating the policy: The new draft specifically indicates how we will notify you if the policy needs to be changed. This is consistent with our current practice, but we want to make our commitment clear: we will provide advance notice for substantial changes to the privacy policy, allow community comment, and provide those changes in multiple languages.

This is of course not a comprehensive list of changes. If you see other changes that you are curious about, feel free to raise them and we will clarify the intent.

The purpose of a privacy policy is to inform users about what information is collected, how it is used, and whom it is shared with. The current policy did this well back when it was written, but it is simply outdated. We hope that with your help the new policy will address all the relevant information about use of personal data on the projects. YWelinder (WMF) (talk) 01:07, 6 September 2013 (UTC)[reply]


So, what is the purpose of all this?

The following discussion is closed: closing the top section given staleness but leaving unsampled logs area open will archive when both sections done. Jalexander--WMF 22:25, 18 December 2013 (UTC) [reply]

I've read the draft from beginning to end, and I have no idea what you wanted me as a user to get from it. What's the purpose, what does it improve compared to the much shorter and more concise current policy which provides very clear and straightforward protections such as the four (4) magic words «Sampled raw log data» (see also #Data retention above)? Is the purpose just adding tracking pixels and cookies for everyone, handwashing (see section above) and generally reducing privacy commitments for whatever reason? --Nemo 21:31, 4 September 2013 (UTC)[reply]

Hi Nemo, Thanks for your comment. I outlined some specific reasons for why we needed an update above. YWelinder (WMF) (talk) 01:12, 6 September 2013 (UTC)[reply]
See here for Yana's summary. Geoffbrigham (talk) 02:12, 6 September 2013 (UTC)[reply]
The summary only says things I already knew, because I read the text. What's missing is the rationale for such changes, or why the changes are supposed to be an improvement. One hint: are there good things that we are not or will not be able to do due to the current policy and what changes are proposed in consequence?
Additionally, the summary doesn't even summarise that well IMHO, e.g. the language about cookies is not very clear and you didn't write anything about making request logs unsampled (which means having logs of all requests a user makes). --Nemo 06:47, 6 September 2013 (UTC)[reply]
I've forwarded your question to our tech team. Relevant members of the tech team are out for a conference and will respond to this shortly.YWelinder (WMF) (talk) 01:04, 12 September 2013 (UTC)[reply]

Unsampled request logs/tracking

Hey Nemo!
You have raised the question why we want the ability to store unsampled data and that’s a great question!
Two important use-cases come to mind. The first use case is funnel analysis for fundraising. As you know, we are 100% dependent on the donations by people like you -- people who care about the mission of the Wikimedia movement and who believe in a world in which every single human being can freely share in the sum of all knowledge.
We want to run the fundraiser as short as possible without annoying people with banners. So it’s crucial to understand the donation funnel, when are people dropping out and why. We can only answer those kind of questions if we store unsampled webrequest traffic.
The second use case is measuring the impact of Wikipedia Zero. Wikipedia Zero’s mission is to increase the number of people who can visit Wikipedia on their mobile phone without having to pay for the data charges: this is an important program that embodies our mission. Measuring the impact means knowing how many people (unique visitors) are benefiting from this program. If we can measure this then we can also be transparent to our donors in explaining how their money is used and how much impact their donations are making.
I hope this gives you a better understanding of why we need to store unsampled webrequest data. It is important to note that we will not build long historic reader profiles: the Data Retention Guidelines (soon to be released) will have clear limits on how long we will store this type of data.
Best regards,
(in my role as Product Manager Analytics @ WMF)
Drdee Drdee (talk) 23:03, 12 September 2013 (UTC)[reply]
Thank you for your answer. Note that this is only one of the unexplained points of the policy, though probably the most controversial one (and for some reason very well hidden), so I'm making a subsection. I'll wait for answers on the rest; at some point we should add at the top a notice of the expected improvements users should like this policy for (this is the only one mentioned so far apart from longer login duration, if I remember correctly).
Frankly, your answer is worse than anything I could have expected: are you seriously going to tell our half billion users that you want them to allow you to track every visit to our websites in order to target them better for donations and for the sake of some visitors of other domains (the mobile and zero ones)? This just doesn't work. I'm however interested in knowing more.
  • Why does fundraising require unconditional tracking of all visits to Wikimedia projects? If the aim is understanding the "donation funnel" (note: the vast majority of readers of this talk doesn't understand you when you talk like this), why can't they just use something like the ClickTracking done in 2009-2010 for the usability initiative, or the EventLogging which stores or should store only aggregate data (counts) of events like clicks of specific things?
  • I know that Wikipedia Zero has struggled to find metrics for impact measure, but from what I understood we do have some metrics and they were used to confirm that "we need some patience". If we need more statistics so desperately as to desire tracking all our visitors, I assume other less dramatic options have been considered as well? For instance, surely the mobile operators need how much traffic they're giving out for free that they would otherwise charge; how hard can it be for them to provide this number? (Of course I know it's not easy to negotiate with them; but we need to consider the alternatives.) --Nemo 06:51, 13 September 2013 (UTC)[reply]
Hi Nemo,
I think you are switching your arguments: first you ask why we would need to store unsampled webrequest data. You specifically asked "are there good things that we are not or will not be able to do due to the current policy and what changes are proposed in consequence?". I give you two use cases both being a type of funnel analysis that require unsampled data (the two use cases are btw not an exhaustive list). Then you switch gears by setting up a Straw man argument and saying that we will use it for better targeting of visitors. That's not what I said, if you read my response then I said we want to know when and why people drop out of a funnel.
The fact that you quote our half billion users indicates that we need unsampled data: we don't know for sure how many unique visitors we have :) We have to rely on third-party estimates. You see even you know of use-cases for unsampled data :)
Regarding Wikipedia Zero: the .zero. domain will soon be deprecated so that will leave us with only the .m. domain so we cannot restrict unsampled storage to .zero. In addition, most Wikipedia Zero carriers do not charge for .m. domains as well.
Regarding the Fundraising: I am answering your question and I am sure you know what a donation funnel is; I was not addressing the general public. EventLogging does not store aggregate data but raw unsampled data.
I am not sure how I can counter your argument 'This just doesn't work'.
Drdee (talk) 19:08, 18 September 2013 (UTC)[reply]
I'm sorry that you feel that way, I didn't intend to switch arguments. What does "We want to run the fundraiser as short as possible" mean if not that you want to extract more money out of the banners? That's the argument usually used by the fundraiding team, that the higher the "ROI" is the shorter the campaign will be. If you meant something else I'm sorry, but then could you please explain what you meant?
I'm also sorry for my unclear "This just doesn't work"; I meant that in this section I'm asking why the users, with whom we have a contract, should agree to revise it: what do they gain ("what is the purpose")? I still don't see an answer. For instance, knowing for sure how many unique users we have is not a gain for them; it's just the satisfaction of a curiosity the WMF or wikimedians like me can have.
As for Zero, I don't understand your reply. Are you saying that yes, other ways to get usage stats were considered but only unsampled tracking works? And that I'm wrong when I assume that operators would know how much traffic they're giving for free? --Nemo 14:55, 27 September 2013 (UTC)[reply]
Hi Nemo, I'll let other folks chime in to articulate the needs for the Fundraiser and Zero, I am with you on the fact that Wikimedia should collect as little data as possible but let me expand on the point you make about "curiosity regarding UVs". Measuring reach in terms of uniques is more than just a matter of "curiosity". We currently rely on third-party data (comScore) to estimate unique visitors but there are many reasons why we want to reliably monitor high-level traffic data based on uniques. We recently obtained data about the proportion of entries from Google properties as part of a review of how much of our readership depends on search engines. I cite this example because any significant drop in search engine-driven traffic is likely to affect Wikimedia's ability to reach individual donors, new contributors and potential new registered users. Similarly, we intervened in the past to opt out of projects such as Google QuickView based on evidence that they were impacting our ability to reach and engage visitors by creating intermediaries between the user and the content. Using UV data (particularly in combination with User Agents) also helps us determine whether decisions we make about browser support affect a substantial part of our visitor population. As Diederik pointed out, EventLogging does collect unsampled behavioral data about user interaction with our websites to help us run tests and improve site performance and user experience. The exact data collected by EventLogging is specified in these schemas and is subject to the data retention guidelines that the Legal team is in the process of sharing. DarTar (talk) 20:23, 9 December 2013 (UTC)[reply]

Strip Wikimedia Data Collection to the Barest Minimum - Further Considerations

Thanks Privacycomment for this post. I just want to add my perspective with some ideas on how to look at data-relevant processes in general and how to use the artificial differences in national laws on an action done in the physical or digital world.

  • First and foremost Wikipedia is a labor of love of knowledge nerds worldwide. This means that it is from an outside view an "international organization" much like the Red Cross - only to battle information disasters. This could be used to get servers and employees special status and protections under international treaties (heritage, information/press etc)
  • History teaches that those protections might not be a sufficient deterrent in heated moments of national political/legal idiocy, so Wikimedia should enact technical as well as content procedures to minimize the damage.

Data Protection

  • Collect as few data as possible and purge it as fast as possible. Period. You cannot divulge what you do not have.
  • Compartmentalize the data so that a breach - let's say in the US - does not automatically give access to data of other countries' userbases.
  • Play with laws: as there are a lot of protections well established when used against homes, or private property shape your installation and software to imitate those - no "official" central mail server that can be accessed with provider legislature, but a lot of private servers that are each protected and must be subpoenaed individually etc...
  • Offer a privacy wikipedia version that can only be accessed via tor - and where nothing is stored (I know this might be too much to admin against spam pros)
  • Use Perfect forward secrecy, hashes etc to create a situation, where most of the necessary information can be blindly validated without you having any possibility to actually see the information exchanged. This also helps with legal problems due to deniability. Again - compartmentalize.

Physical and digital infrastructure concerns

  • An internal organization along those lines and with the Red Cross as an example would offer a variety of possibilities when faced with legal threats: First and foremost, much like choosing where to pay taxes, one could quickly relocate the headquarters for a specific project to another legal system so that one can proof, that e.g. the US national chapter of wikimedia has no possible way of influencing let's say the Icelandic chapter who happens to have a national project called wikipedia.org
  • Another important step in being an international and truly independent organization is to finally use the power of interconnected networks and distribute the infrastructure with liberal computer legislation in mind much more as is now the case. Not to compare the content - just the legal possibilities - of the megaupload case with those of wikimedia, as long as US authorities have physical access to most of the servers, they do not need to do anything but be creative with domestic laws to hurt the organisation and millions of international users, too...
  • If this might be too difficult, let users choose between different mirrors that also conform to different IT legislation

Information Activism

  • Focus on a secure mediawiki with strong crypto, which can be deployed by information activists

So: paranoia off. But the problem really is that data collected now can and will be abused in the next 10, if not 50-100 years. If we limit the amount of data and purge data, those effects can be minimized. No one knows if something that is perfectly legal to write now might not bite one in the ass if legislation is changed in the future.

Cheers, --Gego (talk) 13:53, 9 September 2013 (UTC)[reply]

Hi Gego,
The idea of having a secure mediawiki with strong crypto is a technical proposal and as such is best to be presented as an RFC on Mediawiki but it's outside the scope of the new Privacy Policy.
Drdee (talk) 00:40, 7 November 2013 (UTC)[reply]
Hello @Gego: I appreciate the vision that you have for a privacy conscious Wikipedia, and I hope some of your points are consistent with our intention in the privacy policy's introduction. If you are reading Wikipedia, you can currently access it via Tor, and data dumps may enable more alternative opportunities to view Wikipedia. As Dario explained below, there are some practical benefits to collecting limited amounts of data, and we describe how that data is retained in the draft data retention guidelines. Building more distributed corporate infrastructure is a complex problem (for example, being present in multiple legal jurisdictions often increases the cost of compliance, and may require removing content for defamation, copyright, or other unexpected legal reasons), and it is probably beyond the scope of this draft update to the privacy policy. Thank you for raising these points, and I am glad you thinking about these issues on a long term basis. Best, Stephen LaPorte (WMF) (talk) 23:53, 9 January 2014 (UTC)[reply]

Information about passive readers

There's a lot of discussion about the data collected from those who edit pages, but what about those who passively read Wikipedia? I can't figure out what's collected, how long it's stored, and how it's used.

Frankly I don't see why ANY personally identifiable information should EVER be collected from a passive reader. In the good old days when I went to the library to read the paper encyclopaedia, no one stood next to me with a clipboard noting every page I read or even flipped past. So why should you do that now?

I don't object to real time statistics collection, e.g., counting the number of times a page is read, listing the countries from which each page is read from at least once, that sort of thing. But update the counters in real time and erase the HTTP GET log buffer without ever writing it to disk. If you decide to collect some other statistic, add it to the real-time code and start counting from that point forward.

Please resist the strong urge to log every single HTTP GET just because you can, just in case somebody might eventually think of something interesting to do with it someday. This is EXACTLY how the NSA thinks and it's why they store such a terrifying amount of stuff. 2602:304:B3CE:D590:0:0:0:1 14:54, 10 September 2013 (UTC)[reply]

2602, I will be linking to this comment from below but you may be interested in the section started at the bottom of the page at Tracking of visited pages . Jalexander (talk) 03:37, 11 September 2013 (UTC)[reply]

There are a number of use cases for collecting unsampled data, including generating detailed understandings on how readers interact with Wikipedia content and how this might change over time, finding and identifying very low frequency (but important) events, and and looking at interactions with long-tail content that may reveal new sources of editors. But it's important to understand that we are interested in the behavior of Wikipedia readers, but in aggregate, not as individuals. TNegrin (WMF) (talk) 01:49, 19 December 2013 (UTC)[reply]

Dear 2602,
We need to store webrequest data for a very limited time from a security point of view: in case of a DDoS we need to be able to investigate where it originates and block some ip ranges. Sometimes we need to verify whether we are reachable from a certain country. And there other uses cases so not storing webrequest is not an option. The Data Retention guidelines, which will be published soon, will put clear timeframes on how long we can store webrequest data.
I hope this addresses your concern.
Best, Drdee (talk) 00:51, 7 November 2013 (UTC)[reply]
The current policy only allows sampled logs. Are you saying that the sysadmins are currently unable to protect the sites from DDoS? I never noticed.
Also, https://blog.archive.org/2013/10/25/reader-privacy-at-the-internet-archive/ , linked below, shows it definitely is an option. --Nemo 10:07, 8 November 2013 (UTC)[reply]

The ability to store unsampled log data (a.k.a. loss of privacy in exchange for money)

One of the changes between the existing privacy policy and the new draft is that the draft will now allow the Foundation to retain unsampled log data — in effect, this means that every single visit by every single visitor to each and every Wikimedia project (and perhaps other sites owned/run by the Foundation) will now be recorded and retained on WMF servers. It is shocking to me that the only reasons given for such a broad, controversial and hardly advertised change are (1) fundraising and (2) the ability to measure statistics in Wikipedia Zero, a project that is limited in terms of geography, scope and type of access (mobile devices).

Given that Wikipedia Zero is just one of many project led by the Foundation, and that it applies to a limited number of visitors who are using a very specific medium to access the projects, I fail to see the need to sacrifice the privacy of everyone who will ever visit a Wikimedia project. Moreover, I am disappointed and terrified to learn that the Foundation thinks it is reasonable to sacrifice our privacy in exchange for more money — especially since our fundraising campaigns appear to have been quite effective, or at least to have enabled the WMF to reach their revenue goals without much trouble. odder (talk) 22:22, 7 December 2013 (UTC)[reply]

"will now be recorded and retained" is probably a bit strong. s/will/may/ would probably be more accurate. Personally, I can see the ability to record full logs when needed to be useful in debugging, performance analysis, and analysis of which features should be prioritized for improvement or development or even possible removal. Boring stuff to most people. BJorsch (WMF) (talk) 14:46, 9 December 2013 (UTC)[reply]
"May" is only a legalese euphemism for "will" (in this case). If there are no plans to store and use unsampled log data, for whatever purpose, then surely there will be no problem to revert to the wording of the current privacy policy, which only allows storing sampled data. odder (talk) 15:35, 9 December 2013 (UTC)[reply]
Believe whatever you want, I'm not about to engage in arguing over conspiracy theories. BJorsch (WMF) (talk) 15:25, 10 December 2013 (UTC)[reply]
No, it isn't such a euphemism. You use language like this to ensure that the Foundation has the flexibility to run programs that may make use of this capability. LFaraone (talk) 23:18, 1 January 2014 (UTC)[reply]
That's my point precisely. odder (talk) 19:47, 7 January 2014 (UTC)[reply]
Maybe I missed it somewhere but it would be helpful to listen all data types from the logs. Especially I am interested in this question: Do you save every pageview incl. IP address and/or username? Do you have logs in which you can see what page I have read (!), how long I have read them etc etc. Raymond (talk) 16:55, 9 December 2013 (UTC)[reply]
Yes, I also would appreciate to know if you have, or plan, such visitor logs. --Martina Nolte (talk) 19:46, 9 December 2013 (UTC)[reply]
+1 --Steinsplitter (talk) 19:48, 9 December 2013 (UTC)[reply]
+1, by all means! Ca$e (talk) 09:36, 10 December 2013 (UTC)[reply]
+1 ...84.133.109.103 09:38, 10 December 2013 (UTC)[reply]
+1 -jkb- 09:41, 10 December 2013 (UTC)[reply]
+1 I told you so."Dance" Alexpl (talk) 09:57, 10 December 2013 (UTC)[reply]
+1 for showing an example of the currently log data. --Zhuyifei1999 (talk) 10:08, 10 December 2013 (UTC)[reply]
+1 ---<(kmk)>- (talk) 13:46, 10 December 2013 (UTC) there is no need to trade my privacy for (even more) funds.[reply]
+1 -- smial (talk) 14:33, 11 December 2013 (UTC)[reply]
See wikitech for the format of the raw logs we receive from the front end caches. This data is sent 1:1 to a log aggregation server where it gets downsampled in real time. See the filters.*.erb files for what HTTP paths we currently log data on and with what frequency. The format is
pipe <sample rate> filter-program <filter -d specifies project, -p specifies the page> >> <output location>
Mwalker (WMF) (talk) 01:19, 9 January 2014 (UTC)[reply]
I do not know of any logs that record all pageviews, or of any plans to start collecting such logs. The logs testwiki.log and test2wiki.log mentioned on wikitech:Logs do contain user information and URL (as part of a larger amount of debugging information) for requests that aren't served from the caches, but only for testwiki and test2wiki which the vast majority of people have no reason to ever visit. I also don't know of any logs or log analyses that show pages read by any user or IP or how long anyone might have spent reading any particular page. BJorsch (WMF) (talk) 15:25, 10 December 2013 (UTC)[reply]
I thought the fundraising people already do exactly that and call it "User site behavior collection". (I cant actually tell from that link if those proposals have already been implemented ?!?) Alexpl (talk) 18:04, 10 December 2013 (UTC)[reply]
I was not aware of that. Note though that's a proposal and not something that is currently being done. It seems like a useful study though, and it's far from tracking everyone all the time that some of the more paranoid here seem to be expecting. BJorsch (WMF) (talk) 00:52, 11 December 2013 (UTC)[reply]
I am pretty sure their intentions were good. But due to the nature of the wikipedia project, it seems a bit "pre Snowden" or just unworldly to believe that WMF can limit the access to those Data once the mechanism to collect them has been installed. The first dude with access, who seeks future employment at a hip company (...), can do irreversable damage and sell every WP-contributor out. Alexpl (talk) 08:58, 12 December 2013 (UTC)[reply]
Two things; first, I delayed that experiment until after we'd sorted out the new privacy policy. No code has been written, no code has been deployed. Second, the experiment was designed explicitly to not send any data to the server beyond averages a single time (nor store locally on the client anything beyond counts and times). I wasn't even going to use cookies which could be sniffed from the wire. Data in the experiment that was stored locally would have been useful for statistical correlation if someone had access to your computer (or network connection); but if that was the case they wouldn't need to bother with my data, they would get it directly. I'll point out that the following places document what information would actually be collected: From the RfC, raw source Schema:EventCapsule and Schema:AnonymousUserSiteUsage. If you have concerns specifically about these data, I encourage you to put them on the talk page of the RfC. Mwalker (WMF) (talk) 01:05, 9 January 2014 (UTC)[reply]
@User:BJorsch (WMF). If there are no logs and no plans to start collecting them, why does was the draft changed, so that the foundation would be allowed to do just that?---<(kmk)>- (talk) 18:52, 10 December 2013 (UTC)[reply]
For the reasons that have been officially stated, perhaps? But really, you'd probably want to ask one of the people involved in drafting this. I just commented here to add a few other potential uses for the ability to collect non-sampled logs when needed, since people seemed to be focusing overmuch on the two examples in the draft. BJorsch (WMF) (talk) 00:52, 11 December 2013 (UTC)[reply]
does "I do not know of any logs that record all pageviews, or of any plans to start collecting such logs. " mean Wikimedia does not have such logs and plans or is it meant literally; you BJorsch do not know about it? ...Sicherlich Post 08:39, 11 December 2013 (UTC)[reply]
+1 -- smial (talk) 14:32, 11 December 2013 (UTC)[reply]
The latter, obviously. I'm certainly not aware of everything everyone associated with the Foundation does or plans, nor am I in any position to set policy. BJorsch (WMF) (talk) 15:19, 11 December 2013 (UTC)[reply]
I guess we just assumed that the "WMF" tag in your signature would grant you preferential access to all relevant information on this matter :) Alexpl (talk) 16:57, 11 December 2013 (UTC)[reply]

Okay, so now we know the private opinion and asuming of you BJorsch. Is it possible to get an officical statement of the WMF? ...Sicherlich Post 17:20, 12 December 2013 (UTC)[reply]

BJorsch's opinion is that users asking these questions are more paranoid. I would sure prefer an official, and hopeully more sober, WMF statement on this logging issue. --Martina Nolte (talk) 17:36, 20 December 2013 (UTC)[reply]

Why do you need special logging for WP Zero? PiRSquared17 (talk) 20:27, 20 December 2013 (UTC)[reply]

No official reply since December 7

The future?

The Wikimedia Foundation did not manage to post an official reply to my (and other people's) concerns, even though I started this section on December 7, way before the end–of–year holiday period started. I am very disappointed that it seems to always take such a long time to get a reply from the WMF (the same thing happened for the draft access to nonpublic data policy); it has a very paralyzing effect on any discussion, introduces further concerns and worries, and effectively lengthens any consultation period to a state when no one cares anymore. odder (talk) 10:07, 21 January 2014 (UTC)[reply]

Maybe they're giving up on this update. It would not be unwise to rewrite everything from scratch. --Nemo 10:09, 21 January 2014 (UTC)[reply]
Is the silence an indication that we might need to add WMF to File:Prism slide 5.jpg and that the Foundation therefore is unable to answer? --Stefan2 (talk) 12:07, 21 January 2014 (UTC)[reply]

We are working on a response to the question of unsampeld log data. It's taking some time to verify things internally, and we were previously focusing on posting the data retention guidelines (which hopefully answered some of your other questions, in addition to Matt and Brad's other responses above). Thanks for your patience. Stephen LaPorte (WMF) (talk) 21:22, 22 January 2014 (UTC)[reply]

Short summaries of each section

Reading an entire privacy policy requires a lot of effort, even if it's well-written. Previously we had Rory to break up the sections and give you a pause while reading; now we only have the blue section icons, which I think are the bare minimum. I propose that we add back something to make reading more pleasant. One way to do that is 500px's short summaries of each section. What do you think? Can we do something similar? //Shell 09:59, 17 December 2013 (UTC)[reply]

Hi Shell! Thank you for the suggestion. We are working on drafting some summarizing bullet points to put into the left column and will put them up as soon as we have them ready. We would definitely appreciate your input (and input from others) on the bullet points once they are ready. Mpaulson (WMF) (talk) 20:39, 18 December 2013 (UTC)[reply]
@Skalman: Some draft summaries were just put up and it would be great to get some feedback. We're still thinking about the right formatting for them as well. Jalexander--WMF 01:48, 10 January 2014 (UTC)[reply]
Overall, great! Comments added below:
Shell, I changed the wording to: You are consenting to the use of your information in the U.S. and to the transfer of that information to other countries in connection to providing our services to you and others. Does that make more sense? Thanks! RPatel (WMF) (talk) 00:41, 15 January 2014 (UTC)[reply]
Yes, it sounds clear. //Shell 09:50, 16 January 2014 (UTC)[reply]

Handling our user data - an appeal

Preface (Wikimedia Deutschland)

For several months, there have been regular discussions on data protection and the way Wikimedia deals with it, in the German-speaking community – one of the largest non-English-speaking communities in the Wikimedia movement. Of course, this particularly concerns people actively involved in Wikipedia, but also those active on other Wikimedia projects.

The German-speaking community has always been interested in data protection. However, this particular discussion was triggered when the Deep User Inspector tool on Tool Labs nullified a long-respected agreement in the Toolserver, that aggregated personalized data would only be available after an opt-in by the user.

As the Wikimedia Foundation is currently reviewing its privacy policy and has requested feedback and discussion her by 15 January, Wikimedia Deutschland has asked the community to draft a statement. The text presented below was largely written by User:NordNordWest and signed by almost 120 people involved in German Wikimedia projects. It highlights the many concerns and worries of the German-speaking community, so we believe it can enhance the discussion on these issues. We would like to thank everyone involved.

This text was published in German simultaneously in the Wikimedia Deutschland-blog and in the Kurier, an analogue to the English "Signpost". This translation has been additionally sent as a draft to the WMF movement-blog.

(preface Denis Barthel (WMDE) (talk), 20.12.)

Starting position

The revelations by Edward Snowden and the migration of programs from the Toolserver to ToolLabs prompted discussions among the community on the subject of user data and how to deal with it. On the one hand, a diverse range of security features are available to registered users:

  • Users can register under a pseudonym.
  • The IP address of registered users is not shown. Only users with CheckUser permission can see IP addresses.
  • Users have a right to anonymity. This includes all types of personal data: names, age, background, gender, family status, occupation, level of education, religion, political views, sexual orientation, etc.
  • As a direct reaction to Snowden’s revelations, the HTTPS protocol has been used as standard since summer 2013 (see m:HTTPS), so that, among other things, it should no longer be visible from outside which pages are called up by which users and what information is sent by a user.

On the other hand, however, all of a user’s contributions are recorded with exact timestamps. Access to this data is available to everyone and allows the creation of user profiles. While the tools were running on the Toolserver, user profiles could only be created from aggregated data with the consent of the user concerned (opt-in procedure). This was because the Toolserver was operated by Wikimedia Deutschland and therefore subject to German data protection law, one of the strictest in the world. However, evaluation tools that were independent of the Foundation and any of its chapters already existed.

One example is Wikichecker, which, however, only concerns English-language Wikipedia. The migration of programs to ToolLabs, which means that they no longer have to function in accordance with German data protection law, prompted a survey of whether a voluntary opt-in system should still be mandatory for X!’s Edit Counter or whether opt-in should be abandoned altogether. The survey resulted in a majority of 259 votes for keeping opt-in, with 26 users voting for replacing it with an opt-out solution and 195 in favor of removing it completely. As a direct reaction to these results, a new tool – Deep User Inspector – was programmed to provide aggregated user data across projects without giving users a chance to object. Alongside basic numbers of contributions, the tool also provides statistics on, for example, the times on weekdays when a user was active, lists of voting behavior, or a map showing the location of subjects on which the user has edited articles. This aggregation of data allows simple inferences to be made about each individual user. A cluster of edits on articles relating to a certain region, for example, makes it possible to deduce where the user most probably lives.

Problems

Every user knows that user data is recorded every time something is edited. However, there is a significant difference between a single data set and the aggregated presentation of this data. Aggregated data means that the user’s right to anonymity can be reduced, or, in the worst case, lost altogether. Here are some examples:

  • A list of the times that a user edits often allows a deduction to be made as to the time zone where he or she lives.
  • From the coordinates of articles that a user has edited, it is generally possible to determine the user’s location even more precisely. It would be rare for people to solely edit area X, when in fact they came from area Y.
  • The most precise deductions can be made by analyzing the coordinates of a photo location, as it stands to reason that the user must have been physically present to take the photo.
  • Places of origin and photo locations can reveal information on the user’s means of transport (e.g. whether someone owns a car), as well as on his or her routes and times of travel. This makes it possible to create movement profiles on users who upload a large number of photos.
  • Time analyses of certain days of the year allow inferences to be drawn about a user’s family status. It is probable, for example, that those who tend not to edit during the school holidays are students, parents or teachers.
  • Assumptions on religious orientation can also be made if a user tends not to edit on particular religious holidays.
  • Foreign photo locations either reveal information about a user’s holiday destination, and therefore perhaps disclose something about his or her financial situation, or suggest that the user is a photographer.
  • If users work in a country or a company where editing is prohibited during working hours, they are particularly vulnerable if the recorded time reveals that they have been editing during these hours. In the worst-case scenario, somebody who wishes to harm the user and knows extra information about his or her life (which is not unusual if someone has been an editor for several years) could pass this information on to the user’s employer. Disputes within Wikipedia would thus be carried over into real life.

Suggestions

Wikipedia is the fifth most visited website in the world. The way it treats its users therefore serves as an important example to others. It would be illogical and ridiculous to increase user protection on the one hand but, on the other hand, to allow users’ right to anonymity to be eroded. The most important asset that Wikipedia, Commons and other projects have is their users. They create the content that has ensured these projects’ success. But users are not content, and we should make sure that we protect them. The Wikimedia Foundation should commit to making the protection of its registered users a higher priority and should take the necessary steps to achieve this. Similarly to the regulations for the Toolserver, it should first require an opt-in for all the tools on its own servers that compile detailed aggregations of user data. Users could do this via their personal settings, for example. Since Wikipedia was founded in 2001, the project has grown without any urgent need for these kinds of tools, and at present there seems to be no reason why this should change in the future. By creating free content, the community enables Wikimedia to collect the donations needed to run WikiLabs. That this should lead to users loosing their right of anonymity, although the majority opposes this, is absurd. To ensure that user data are not evaluated on non-Wikimedia servers, the Foundation is asked to take the following steps:

  • Wikipedia dumps should no longer contain any detailed user information. The license only requires the name of the author and not the time or the day when they edited.
  • There should only be limited access to user data on the API.
  • It might be worth considering whether or not it is necessary or consistent with project targets to store and display the IP addresses of registered users (if they are stored), as well as precise timestamps that are accurate to the minute of all their actions. The time limit here could be how long it reasonably takes CheckUsers to make a query. After all, data that are not available cannot be misused for other purposes.

Original signatures

  1. Martina Disk. 21:28, 24. Nov. 2013 (CET)
  2. NNW 18:52, 26. Nov. 2013 (CET)
  3. ireas :disk: 19:23, 26. Nov. 2013 (CET)
  4. Henriette (Diskussion) 19:24, 26. Nov. 2013 (CET)
  5. Raymond Disk. 08:38, 27. Nov. 2013 (CET)
  6. Richard Zietz 22px|8)|link= 22:18, 27. Nov. 2013 (CET)
  7. Alchemist-hp (Diskussion) 23:47, 27. Nov. 2013 (CET)
  8. Lencer (Diskussion) 11:54, 28. Nov. 2013 (CET)
  9. Smial (Diskussion) 00:09, 29. Nov. 2013 (CET)
  10. Charlez k (Diskussion) 11:55, 29. Nov. 2013 (CET)
  11. elya (Diskussion) 19:07, 29. Nov. 2013 (CET)
  12. Krib (Diskussion) 20:26, 29. Nov. 2013 (CET)
  13. Jbergner (Diskussion) 09:36, 30. Nov. 2013 (CET)
  14. TMg 12:55, 30. Nov. 2013 (CET)
  15. AFBorchertD/B 21:22, 30. Nov. 2013 (CET)
  16. Sargoth 22:06, 2. Dez. 2013 (CET)
  17. Hilarmont 09:27, 3. Dez. 2013 (CET)
  18. --25px|verweis=Portal:Radsport Poldine - AHA 13:09, 3. Dez. 2013 (CET)
  19. XenonX3 – (RIP Lady Whistler) 13:11, 3. Dez. 2013 (CET)
  20. -- Ra'ike Disk. LKU WPMin 13:19, 3. Dez. 2013 (CET)
  21. --muns (Diskussion) 13:22, 3. Dez. 2013 (CET)
  22. --Hubertl (Diskussion) 13:24, 3. Dez. 2013 (CET)
  23. --Aschmidt (Diskussion) 13:28, 3. Dez. 2013 (CET)
  24. Anika (Diskussion) 13:32, 3. Dez. 2013 (CET)
  25. K@rl 13:34, 3. Dez. 2013 (CET)
  26. --DaB. (Diskussion) 13:55, 3. Dez. 2013 (CET) (Auch wenn ich das mit den Dumps etwas übertrieben finde.)
  27. --AndreasPraefcke (Diskussion) 14:05, 3. Dez. 2013 (CET) Gerade das mit den Dumps ist wichtig, und auch auf den Wikipedia-Websites sollte diese Info nicht angezeigt werden. So ungefähr (nicht genauer durchdacht, nur als ungefähre Idee): Edits von heute: wie gehabt sekundengenau angezeigt, Edits von dieser Woche: minutengenau, Edits der letzten sches Wochen: stundengenau, Edits der letzten 12 Monate: tagesgenau, Edits davor: monatsgenau – die Reihenfolge muss natürlich gewahrt werden; Edits und darauffolgende reine Reverts: ganz aus der Datenbank raus)
    Man sollte aber trotz berechtigter Interessen am Datenschutz nicht vergessen, dass diese Art der Datums-/Zeitbeschneidung ein zweischneidiges Schwert ist. Versionsgeschichtenimporte einerseits und URV-Prüfungen andererseits würden deutlich erschwert ;-) -- Ra'ike Disk. LKU WPMin 14:19, 3. Dez. 2013 (CET) (wobei für letzteres eine tagesgenaue Anzeige für den Vergleich mit Webarchiv reichen würde)
  28. --Mabschaaf 14:08, 3. Dez. 2013 (CET)
  29. --Itti 14:28, 3. Dez. 2013 (CET)
  30. ...Sicherlich Post 14:52, 3. Dez. 2013 (CET)
  31. --Odeesi talk to me rate me 16:29, 3. Dez. 2013 (CET)
  32. --gbeckmann Diskussion 17:23, 3. Dez. 2013 (CET)
  33. --Zinnmann d 17:24, 3. Dez. 2013 (CET)
  34. --Kolossos 17:41, 3. Dez. 2013 (CET)
  35. -- Andreas Werle (Diskussion) (heute mal "ohne" Zeitstempel...)
  36. --Gleiberg (Diskussion) 18:03, 3. Dez. 2013 (CET)
  37. --Jakob Gottfried (Diskussion) 18:30, 3. Dez. 2013 (CET)
  38. --Wiegels „…“ 18:55, 3. Dez. 2013 (CET)
  39. --Pyfisch (Diskussion) 20:29, 3. Dez. 2013 (CET)
  40. -- NacowY Disk 23:01, 3. Dez. 2013 (CET)
  41. -- RE rillke fragen? 23:17, 3. Dez. 2013 (CET) Ja. Natürlich nicht nur die API, sondern auch die "normalen Seiten" (index.php) sollten ein (sinnvolles) Limit haben. Eine Einschränkung von Endanwendungen durch Richtlinien lehne ich ab, genauso wie überstürztes Handeln. Man wird viel abwägen müssen und eventuell Ausnahmen für bestimmte Benutzergruppen schaffen müssen oder neue Wege, Daten darzustellen. Checkuser-Daten werden meines Wissens automatisch nach 3 Mon. gelöscht: S. User:Catfisheye/Fragen_zur_Checkusertätigkeit_auf_Commons#cite_ref-5
  42. --Christian1985 (Disk) 23:25, 3. Dez. 2013 (CET)
  43. --Jocian 04:45, 4. Dez. 2013 (CET)
  44. -- CC 04:50, 4. Dez. 2013 (CET)
  45. --Don-kun Diskussion 07:10, 4. Dez. 2013 (CET)
  46. --Zeitlupe (Diskussion) 09:09, 4. Dez. 2013 (CET)
  47. --Geitost 09:25, 4. Dez. 2013 (CET)
  48. Everywhere West (Diskussion) 09:29, 4. Dez. 2013 (CET)
  49. -jkb- 09:29, 4. Dez. 2013 (CET)
  50. -- Wurmkraut (Diskussion) 09:47, 4. Dez. 2013 (CET)
  51. Simplicius Hi… ho… Diderot! 09:53, 4. Dez. 2013 (CET)
  52. --Hosse Talk 12:49, 4. Dez. 2013 (CET)
  53. Port(u#o)s 12:57, 4. Dez. 2013 (CET)
  54. --Howwi (Diskussion) 14:26, 4. Dez. 2013 (CET)
  55.  — Felix Reimann 17:17, 4. Dez. 2013 (CET)
  56. --Bubo 18:30, 4. Dez. 2013 (CET)
  57. --Coffins (Diskussion) 19:22, 4. Dez. 2013 (CET)
  58. --Firefly05 (Diskussion) 20:09, 4. Dez. 2013 (CET)
  59. Es geht darum, den Grundsatz und das Regel-Ausnahme-Schema klarzustellen. --Björn 20:13, 4. Dez. 2013 (CET)
  60. --V ¿ 21:46, 4. Dez. 2013 (CET)
  61. --Merlissimo 21:59, 4. Dez. 2013 (CET)
  62. --Stefan »Στέφανος«  22:02, 4. Dez. 2013 (CET)
  63. -<)kmk(>- (Diskussion) 22:57, 4. Dez. 2013 (CET)
  64. --lutki (Diskussion) 23:06, 4. Dez. 2013 (CET)
  65. -- Ukko 23:22, 4. Dez. 2013 (CET)
  66. --Video2005 (Diskussion) 02:17, 5. Dez. 2013 (CET)
  67. --Baumfreund-FFM (Diskussion) 07:30, 5. Dez. 2013 (CET)
  68. --dealerofsalvation 07:35, 5. Dez. 2013 (CET)
  69. --Gripweed (Diskussion) 09:32, 5. Dez. 2013 (CET)
  70. --Sinuhe20 (Diskussion) 10:05, 5. Dez. 2013 (CET)
  71. --PerfektesChaos 10:22, 5. Dez. 2013 (CET)
  72. --Tkarcher (Diskussion) 13:51, 5. Dez. 2013 (CET)
  73. --BishkekRocks (Diskussion) 14:43, 5. Dez. 2013 (CET)
  74. --PG ein miesepetriger Badener 15:34, 5. Dez. 2013 (CET)
  75. --He3nry Disk. 16:32, 5. Dez. 2013 (CET)
  76. --Sjokolade (Diskussion) 18:15, 5. Dez. 2013 (CET)
  77. --Lienhard Schulz Post 18:43, 5. Dez. 2013 (CET)
  78. --Kein Einstein (Diskussion) 19:35, 5. Dez. 2013 (CET)
  79. --Stefan (Diskussion) 22:19, 5. Dez. 2013 (CET)
  80. --Rauenstein 22:58, 5. Dez. 2013 (CET)
  81. --Anka Wau! 23:45, 5. Dez. 2013 (CET)
  82. --es grüßt ein Fröhlicher DeutscherΛV¿? Diskussionsseite 06:42, 6. Dez. 2013 (CET)
  83. --Doc.Heintz 08:55, 6. Dez. 2013 (CET)
  84. --Shisha-Tom ohne Uhrzeit, 6. Dez. 2013
  85. --BesondereUmstaende (Diskussion) 14:57, 6. Dez. 2013 (CET)
  86. --Varina (Diskussion) 16:37, 6. Dez. 2013 (CET)
  87. --Studmult (Diskussion) 17:30, 6. Dez. 2013 (CET)
  88. --GT1976 (Diskussion) 20:51, 6. Dez. 2013 (CET)
  89. --Wikifreund (Diskussion) 22:04, 6. Dez. 2013 (CET)
  90. --Wnme 23:07, 6. Dez. 2013 (CET)
  91. -- ST 00:47, 7. Dez. 2013 (CET)
  92. --Flo Beck (Diskussion) 13:45, 7. Dez. 2013 (CET)
  93. IW 16:34, 7. Dez. 2013 (CET)
  94. --Blech (Diskussion) 17:48, 7. Dez. 2013 (CET)
  95. --Falkmart (Diskussion) 18:21, 8. Dez. 2013 (CET)
  96. --Partynia RM 22:53, 8. Dez. 2013 (CET)
  97. --ElRaki 01:09, 9. Dez. 2013 (CET) so viele Benutzerdaten wie möglich löschen/so wenig Benutzerdaten wie unbedingt nötig behalten
  98. --user:MoSchle--MoSchle (Diskussion) 03:57, 9. Dez. 2013 (CET)
  99. --Daniel749 Disk. (STWPST) 16:32, 9. Dez. 2013 (CET)
  100. --Knopfkind 21:19, 9. Dez. 2013 (CET)
  101. --Saibot2 (Diskussion) 23:14, 9. Dez. 2013 (CET)
  102. --Atlasowa (Diskussion) 15:03, 10. Dez. 2013 (CET) Der Aufruf richtet sich aber ebenso an WMDE, die ja die Abschaffung des Toolservers beschlossen hat und damit die Entwicklung zum DUI ermöglicht hat. Nur Briefträger zu WMF sein ist zu wenig. Wenn WMDE sich Gutachten zur Spendenkultur in Deutschland schreiben lassen kann, um beim WMF Lobbyismus für eine eigene Spendensammlung zu machen, dann kann WMDE ja wohl auch Gutachten zum dt./europ. Datenschutz in Auftrag geben.
  103. ----Fussballmann Kontakt 21:38, 10. Dez. 2013 (CET)
  104. --Steinsplitter (Disk) 23:40, 10. Dez. 2013 (CET)
  105. --Gps-for-five (Diskussion) 03:03, 11. Dez. 2013 (CET)
  106. --Kolja21 (Diskussion) 03:55, 11. Dez. 2013 (CET)
  107. --Laibwächter (Diskussion) 09:50, 11. Dez. 2013 (CET)
  108. -- Achim Raschka (Diskussion) 15:18, 11. Dez. 2013 (CET)
  109. --Alabasterstein (Diskussion) 20:32, 13. Dez. 2013 (CET)
  110. --Grueslayer Diskussion 10:51, 14. Dez. 2013 (CET)
  111. Daten nur erheben, wenn unbedingt für den Betrieb (bzw. rechtlich) notwendig. Alles andere sollte gar nicht erhoben werden. Die Rückschlüsse auf die Zeitzonen und das Wohngebiet (häufig auch von Benutzern selbst angegeben) sehe ich gar nicht als gravierend an. Vielmehr, dass im Wiki alles protokolliert wird. Die halte ich nicht für nötig. Wer muss schon wissen, wer vor 10 Jahren wo genau editiert hat. Nach einem Jahr sollte die Vorratsdatenspeicherung anonymisiert werden (also in der Artikelhistorie kanns dirn bleiben, da nötig, jedoch nicht in der Benutzer-Beitragsliste).--Alberto568 (Diskussion) 21:51, 14. Dez. 2013 (CET)
  112. --Horgner (Diskussion) 15:48, 16. Dez. 2013 (CET)
  113. --Oursana (Diskussion) 21:52, 16. Dez. 2013 (CET)
  114. --Meslier (Diskussion) 23:53, 16. Dez. 2013 (CET)
  115. -- Martin Bahmann (Diskussion) 09:20, 18. Dez. 2013 (CET)
  116. DerHexer (Disk.Bew.) 15:24, 19. Dez. 2013 (CET)
  117. Neotarf (Diskussion) 01:58, 20. Dez. 2013 (CET)
  118. --Lutheraner (Diskussion) 13:17, 20. Dez. 2013 (CET)
  119. --Lienhard Schulz (talk) 07:53, 21 December 2013 (UTC)[reply]
  120. --Brainswiffer (talk) 16:33, 1 January 2014 (UTC)[reply]

Comments

Can WMDE get an EU lawyer to assess whether such analysis of data is lawful under the current or draft EU directive and what it would take to respect it? I see that the draft contains some provisions on "analytics"; if the WMF adhered to EU standards (see also #Localisation des serveurs aux Etats-Unis et loi applicable bis) we might automatically solve such [IMHO minor] problems too. --Nemo 16:12, 20 December 2013 (UTC)[reply]

See also #Please_add_concerning_user_profiles (permalink, s) and #Generation_of_editor_profiles (permalink, s). PiRSquared17 (talk) 20:36, 20 December 2013 (UTC)[reply]

On a more personal note than the official response below, I shall repeat here advice I have regularly given to editors on the English Wikipedia in my capacity as Arbitrator: "Editing a public wiki is an inherently public activity, akin to participating in a meeting in a public place. While we place no requirement that you identify yourself or give any details about yourself to participate – and indeed do our best to allow you to remain pseudonymous – we cannot prevent bystanders from recognizing you by other methods. If the possibility of being recognized places you in danger or is not acceptable to you, then you should not involve yourself in public activities – including editing Wikipedia." MPelletier (WMF) (talk) 21:10, 20 December 2013 (UTC)[reply]

We can prevent creating user profiles by aggregating data. It has been done at the toolserver. It can be done at WikiLabs. NNW (talk) 21:29, 20 December 2013 (UTC)[reply]
No, you cannot. Those tools existed anyways, just elsewhere. You cannot prevent aggregation of public data without making that data not public anymore; including on the website itself (remove it from the API and people will just screen scrape for it) and in the dumps. Transparency isn't an accident, it's one of the basic principles of wikis in general and of the projects in particular. MPelletier (WMF) (talk) 18:12, 21 December 2013 (UTC)[reply]
Laws can prevent it though :) (looks like it may happen rather soon in EU). If everyone here takes extremist stances and collate everything as if there were no differences between publishing data and using it, or querying a database and making someone else query it, then it will be very hard to have any dialogue. To reiterate a point above, if a Wikimedia project includes Google Analytics and sends all private data to Google, our users don't care whether it was put by the WMF or a sysop, they just want it removed. --Nemo 18:23, 21 December 2013 (UTC)[reply]
No, actually, laws do not. The directive everyone refers to does not have anything to say about what people are allowed to do with publicly available information, but about private information which edit times most definitely are not.

Contrarywise, whether someone accesses a tool (or project page) is private information and this is why the rules already do forbid disclosing it; so your Google Analytics example is a good illustration of what we do already forbid. MPelletier (WMF) (talk) 20:55, 21 December 2013 (UTC)[reply]

I'm glad you have such legal certainties; I do not and I asked lawyers to comment, in the meanwhile I only said that law can forbid something if they wish (this seems rather obvious to me). As for Google Analytics, of course it's not the same thing, but it was just an example where it's easier to agree that it doesn't matter whether it's WMF or an user to place it on our servers (though the proposed draft explicitly does not cover the case of a sysop adding Google Analytics to a project). --Nemo 22:33, 21 December 2013 (UTC)[reply]
"your Google Analytics example is a good illustration of what we do already forbid." Oh, really? Just a short while ago a Software Engineer on the Wikimedia Foundation's Analytics team wrote about Analytics for tools hosted on labs?: "I don't think there are any technical reasons people can't use Google Analytics on a Labs instance. The only thing I can think of is that it'd be nice if people used something Open Source like PiWik. But I'll ask and report back in a bit." > later > "Google Analytics or any other analytics solution is strictly forbidden by Labs rules *unless* there's a landing page with a disclaimer that if the user continues, their behavior will be tracked." So that's the "good illustration of what we do already forbid": just put up a disclaimer. --Atlasowa (talk) 00:58, 22 December 2013 (UTC)[reply]
"Those tools existed anyways, just elsewhere.": This is told so often and it is still no good point. There are so many bridges and there are so many people crashing their cars into them. Does that mean we have to do it, too? A first step could be just to stop creating user profile on WMF servers. It was the end of the Toolserver limitations that started all the discussion. Of course there will be always someone who can and will do it somewhere but that is no reason to invite people to do it here on servers that are paid with donations for our work. I want to create an encyclopedia, not to collect money for spying on me. NNW (talk) 12:15, 22 December 2013 (UTC)[reply]

Additional signatures

  1. --Geolina163 (talk) 16:06, 20 December 2013 (UTC)[reply]
  2. --Density (talk) 16:35, 20 December 2013 (UTC)[reply]
  3. --Minihaa (talk) 16:57, 20 December 2013 (UTC) bitte um Datensparsamkeit.[reply]
  4. --Theaitetos (talk) 17:08, 20 December 2013 (UTC)[reply]
  5. -- Sir Gawain (talk) 17:17, 20 December 2013 (UTC)[reply]
  6. --1971markus (talk) 18:26, 20 December 2013 (UTC)[reply]
  7. --Goldzahn (talk) 19:22, 20 December 2013 (UTC)[reply]
  8. --Spischot (talk) 21:38, 20 December 2013 (UTC)[reply]
  9. --Bomzibar (talk) 22:43, 20 December 2013 (UTC)[reply]
    --Charlez k (talk) 22:51, 20 December 2013 (UTC) already signed, see above (Original signatures) --Krib (talk) 23:05, 20 December 2013 (UTC)[reply]
  10. --J. Patrick Fischer (talk) 09:14, 21 December 2013 (UTC)[reply]
  11. --Túrelio (talk) 15:07, 21 December 2013 (UTC)[reply]
  12. --Poupou l'quourouce (talk) 17:46, 21 December 2013 (UTC)[reply]
  13. --Nordlicht8 (talk) 21:54, 21 December 2013 (UTC)[reply]
  14. -- FelixReimann (talk) 11:16, 22 December 2013 (UTC)[reply]
  15. --Asio otus (talk) 11:54, 22 December 2013 (UTC)[reply]
  16. --Rosenzweig (talk) 12:26, 22 December 2013 (UTC)[reply]
  17. --Mellebga (talk) 13:47, 25 December 2013 (UTC)[reply]
  18. --Pasleim (talk) 15:24, 26 December 2013 (UTC)[reply]
  19. Elvaube ?! 13:32, 29 December 2013 (UTC)[reply]
  20. --Zipferlak (talk) 13:18, 2 January 2014 (UTC)[reply]
  21. --Gerbil (talk) 15:04, 5 January 2014 (UTC)[reply]
  22. --Sebastian.Dietrich (talk) 22:41, 9 January 2014 (UTC)[reply]
  23. --Stefan Bellini (talk) 18:57, 12 January 2014 (UTC)[reply]
  24. --SteKrueBe (talk) 23:48, 12 January 2014 (UTC)[reply]
  25. --Wilhelm-Conrad (talk) 23:02, 14 January 2014 (UTC)[reply]
  26. --Cubefox (talk) 20:37, 15 January 2014 (UTC)[reply]
  27. --Yellowcard (talk) 22:47, 16 January 2014 (UTC)[reply]
  28. --Ghilt (talk) 23:55, 19 January 2014 (UTC)[reply]

Response

Please note the response by Tfinc above in the Generation of editor profiles and my follow up to it. Obfuscating user contributions data or limiting our existing export will not happen. The Wikipedia projects are wikis, edits to it are by nature public activities that have always been, and always must be, available for scrutiny. MPelletier (WMF) (talk) 21:10, 20 December 2013 (UTC)[reply]

We don't need to keep around timestamps down to a fraction of a second forever. PiRSquared17 (talk) 21:13, 20 December 2013 (UTC)[reply]
Not sure about that. I wonder if de.wiki also has agreed to a decrease of its own right to fork, a right which they constantly use as a threat. Making dumps unusable would greatly reduce the contractual power of de.wiki, dunno if they really want it. --Nemo 21:43, 20 December 2013 (UTC)[reply]

While we believe this proposal is based on legitimate concerns, we want to highlight some of the practical considerations of such a proposal. Due to the holidays, we’ve addressed this only briefly, but we hope it serves to explain our perspective.

In summary, public access to metadata around page creation and editing is critical to the health and well-being of the site and is used in numerous places and for numerous use cases:

  • Protecting against vandalism, incorrect and inappropriate content: there are several bots that patrol Wikipedia’s articles that protect the site against these events. Without public access to metadata, the effectiveness of these bots will be much reduced, and it is impossible for humans to perform these tasks at scale.
  • Community workflows: Processes that contribute to the quality and governance of the project will also be affected: blocking users, assessing adminship nominations, determining eligible participants in article deletion discussions.
  • Powertools: certain bulk processes will be broken without public access to this metadata.
  • Research: researchers around the world use this public metadata for analysis that is useful for both to the site and the movement. It is essential that they continue to have access.
  • Forking: In order to have a full copy of our projects and their change histories all metadata needs to be exposed alongside content.

In summary, public and open-licensed revision metadata is vital to the technical and social functioning of Wikipedia, and any removal of this data would have serious impact on a number of processes and actions critical to the project. Tfinc (talk) 00:54, 21 December 2013 (UTC)[reply]

How was it possible for Wikipedia to grow 13 years without aggregating user data? What has changed since the start of WikiLabs that this is necessary? Why is it necessary for creating an encyclopedia to know the exact second of my edit 5 years ago? Where does the licenses say that it is necessary that the exact second of my edit has to be part of a fork? NNW (talk) 10:38, 21 December 2013 (UTC)[reply]
I understand the part on aggregation and analytics, but the point about seconds is quite silly: sure, seconds could not be necessary in some ideal version of MediaWiki where they don't matter; but they also don't matter at all for your privacy. To avoid inferences about timezone we should remove hours of the day, not seconds. --Nemo 18:12, 21 December 2013 (UTC)[reply]
If you read the appeal above you will see that I do know that talking about seconds is silly. But it is senseless to start with hours when some people don't understand the basic proplem with that data. Seconds just carry the topic to extremes so it may get understood that no one needs five year old timestamps for preventing vandalism or whatever. NNW (talk) 12:02, 22 December 2013 (UTC)[reply]
Actually, I read it but I don't see that. The text does not specify what level of precision in timestamps you want to achieve. --Nemo 10:19, 29 December 2013 (UTC)[reply]
I cannot offer a complete solution to this problem. The appeal in a nutshell is As much transparency as necessary, as much privacy as possible. I am not that much into technical questions. Perhaps some of the suggestions cannot be implemented for some technical reasons I don't know. Perhaps there are some better ways to keep users’ anonymity. All I did was centralizing a growing dissatisfaction about the way our data is handled and to start a discussion about it. NNW (talk) 11:56, 29 December 2013 (UTC)[reply]
Thanks. This is a frank and reasonable way to frame it. --Nemo 12:03, 29 December 2013 (UTC)[reply]
It's true that most actions of plain vandalism can be efficiently performed if we know the exact order of events, in order to revert edits correctly.
But the precision of timestamps is needed for things where there are battles related to the order of events in the history, for example battles of licences: we need to be able to prove the anteriority of a work. Precise timestamps are then needed, but we could hide this info by replacing these exact timestamps by digital signatures generated by the server, and making an API reserved to CheckUser admins, that would be able to assert which event occured before another one. IT could also be used for anonimizing contributions made by users that asked their account to be deleted and their past contributions to be fully anonymized (while maintaining the validity of their past work and provability and permanence/irrevocability of their agreed licences).
Other open projects have experienced this issue when it was difficult to assert the licencing terms (for example on OpenStreetMap before it changed its licence from CC-BY-SA to ODbL for new controbutions, and needed to check its data according to the time the user actually accepted the new Contributor Terms and actually accepted to relicence, or not, its past contributions, in order to cleanup the active online database then published exclusively using the new licence: this did not mean that the old database was illegal, but that it has been frozen at a precise timestamp, and all further edits made exclusively on the new licence that users had to accept beore continuing making new edits).
Precise timestamps are then needed for long terms, and this is not just ot fight active abuses and spams (with bots interested in a short period of time not exceeding one month; after that time, a bot alone cannot work reliably without human review to restrict its searches, if something must be reverted, or in case of doubt, with all user rights transferred to a special aggregated/anonymized user account detached from the original user).
Note that timestamps and goelocation data stored in media files are a problem, users chsould have a way to cleanup a media file from these data by reducng the precision (for example only the date, or just the year, and a weaker geolocation, or deletion of unnecessary metadata such as stored hardware ID's of digital cameras, version of the tool used to modify the photos, possibly online by using external services like Google Picasa), or other kind of information which may store such data using stealth technics such as steganography (using technics that will be discovered only years laters): Commons should have a tool to inspect these metadata, to allow the orogonal uploaded to cleanup these hidden details, to be dropped permanently by dropping also the stored old versions of these media files.
Fully anonimizing photos and videos is a really difficult challenge (it is far easier to do it on graphics with reduced color spaces or with vector graphics accepting some randomized alteration of any unnecessary geometric precision), as things initially invisible may be revealed later by new procesing algorithms (like those already used now by Google which can precisely identify places and people by looking at some small portions of photos or assembling multiple ones from the same "exposed public user account" and in the same timestamp period, or photos/videos participating to the same topic elsewhere)!
Note that these media analysis tools may also be used to "assert" the licencing terms and legitimate author of a work, that has been reused elsewhere without permission (and there are already examples where legitimate Wikimedia contents have been attacked later by abusers trying to take the authorship and building a fake anteriority). This has already done severe damages in Wikimedia projects (for example when several editions of WikiQuotes had to be fully erased and restarted from zero, a few years ago, when we could no longer prove the origin or anteriority of a work). verdy_p (talk) 13:33, 22 December 2013 (UTC)[reply]

Question of context

AxelBoldt, NNW, and everyone else...

I regret to admit that the context in which the members of the appeal came up with the feature request is unclear to me due to the language barrier. Please provide me with links of where the opt-out idea originated; even if they're in German, I will be grateful as I would not have to try to search for the discussion myself. Gryllida (talk) 07:20, 31 December 2013 (UTC)[reply]

As far as I know the opt-out idea was made by Cyberpower678 first when he started the RFC for X!'s Edit Counter [1]. Such tools at the toolserver always had an opt-in (also as far as I know). NNW (talk) 13:08, 31 December 2013 (UTC)[reply]
NNW, is there a place lack of opt-in feature was discussed, first time, for the DUI tool specifically? Gryllida (talk) 15:13, 31 December 2013 (UTC)[reply]
Gryllida, the DUI was the direct result of the RFC for X!'s Edit Counter. Any opt-in/opt-out/nothing-at-all discussions were held there. As Ricordisamoa refused to change anything (see link in the thread below) there was nothing left to discuss. Some reactions to his tool can be found at User talk:Ricordisamoa#Deep user inspector. NNW (talk) 15:40, 31 December 2013 (UTC)[reply]
NNW, «the DUI was the direct result of the RFC for X!'s Edit Counter» is a useful observation. ☺ Where can I see evidence for that, for reference, as it appears to be of relevance to this thread? Gryllida (talk) 15:56, 31 December 2013 (UTC)[reply]
[2]. NNW (talk) 16:05, 31 December 2013 (UTC)[reply]
NNW, you have linked me to the RFC text at the initial stage while its discussion section is empty. Community views could be of interest in this discussion though. ☺ For me to not go through the history manually, could you please locate the RFC in an archive and link me to that? Gryllida (talk) 15:56, 31 December 2013 (UTC)[reply]
Ah, the latest revision appears to contain the archive. Thanks! ☺ Gryllida (talk) 15:58, 31 December 2013 (UTC)[reply]
Even though a translated message about the RfC was spammed to all wikis (by me), most commenters seem to be from enwiki or dewiki. I'd say dewiki mainly wanted to keep opt-in, enwiki wanted to remove it or use opt-out, which is not surprising. PiRSquared17 (talk) 23:55, 1 January 2014 (UTC)[reply]

NNWThanks for the context. It appears that the tool functions as a proxy to already available information, and the WMF lack authority to eliminate it entirely, such as if it were hosted externally. Hence it appears useless for them to add actionable clauses about it into their privacy policy.

I only see work on an Externsion as a last resort, for the DUI tool to fail to function at the wikis that choose to request such extension with community consencus. If the community is willing to experiment, the WMF labs resources are available for collaborative community work on it. Gryllida (talk) 09:22, 3 January 2014 (UTC)[reply]

Response

Thank you to all the users who contributed to this discussion, and who signed on to this appeal. We take these concerns seriously, and understand why you are concerned, even when we disagree with some of your analysis (as we first discussed in our blog).

As I understand the appeal, there are really four main requests. I’d like to summarize and respond to each of these requests here.

Protecting users

At the highest level, the appeal asks that the Foundation "commit to making the protection of its registered users a higher priority and should take the necessary steps to achieve this". We believe strongly that we have already made protection of all of our users a high priority. This can be seen in our longstanding policies — like the relatively small amount of data that we require to participate and the steps we take to ensure that nonpublic information is not shared with third parties — and in our new policies, like the steps we've taken to add https and filter IP addresses at Labs. We will of course always have to balance many priorities while running the sites, but privacy has been and will remain one of the most important ones.

Reducing available information

More concretely, the appeal expresses concern that the publication of certain information about edits in the dumps, on the API, and on the sites, allows users to deduce information about editors. It therefore requests that we remove that information from dumps and the API.

This information has been public since the beginning of the projects almost 13 years ago. As Tfinc and others have discussed extensively above, the availability of this information has led to the creation of a broad set of tools and practices that are central to the functioning of the projects. We understand that this can lead to the creation of profiles that are in some cases uncomfortably suggestive of certain information about the respective editor. However, we do not think this possibility can justify making a radical change to how the projects have always operated, so we do not plan to act on this request.

Aggregation on Labs

The second major concern presented was that the Wikimedia Labs policy, unlike the Toolserver policy, does not explicitly prohibit volunteer-developed software that aggregates certain types of account information without opt-in consent. Because of this, the appeal requested a ban on such software on servers (like Labs) that are hosted by the Foundation.

To address this concern, I proposed a clarification to the Labs terms of use. Several users have expressed the opinion that this is insufficient, so the discussion is still ongoing about what approach (if any) should be taken on Labs. Anyone interested in this request is urged to contribute to the discussion in that section.

Collection of IP addresses

The final request in the appeal was to not "store and display the IP addresses of registered users". We currently store those addresses, but only for 90 days, as part of our work to fight abuse. This will continue under the new Data Retention Guidelines. We do not display the IP addresses of registered users, except to those volunteers who are involved in our abuse-fighting process, and then only under the terms described in this Privacy Policy and the Access to Nonpublic Information Policy. So we think we are reasonably compliant with this request.

Conclusion

As NNW put it in a comment above, the appeal seeks “as much transparency as necessary, as much privacy as possible.” The WMF strongly agrees with this goal, which is why we have always collected very little personal data, why we do not share that data except in very specific circumstances, and why we have written a very detailed, transparent privacy policy that explains in great detail what we do with the data we have. At the same time, we also recognize that providing information about edits has been part of how we have enabled innovation, flexibility, and growth. After weighing those factors, we have reached the conclusions described above. We hope that the users who signed the appeal will accept this conclusion, and continue to participate and support our shared mission. —LVilla (WMF) (talk) 00:37, 10 January 2014 (UTC)[reply]

Note on Labs Terms / Response to NNW

Hi, NNW: If you are asking here about the change from Toolserver to Labs about when “profiling tools” are allowed, we made the change because the edit information has always been transparently available, so the Toolserver policy was not effective in preventing “profiling” - tools like X edit counter could be (and were) built on other servers. As has been suggested above, since the policy was ineffective, we removed it.
However, this change was never intended to allow anarchy. The current Labs terms of use allows WMF to take down tools, including in response to a community process like the one that occurred for X edit counter. Would it resolve some of your concerns if the Labs terms made that more obvious? For example, we could change the last sentence of this section from:
If you violate this policy ... any projects you run on Labs, can be suspended or terminated. If necessary, the Wikimedia Foundation can also do this in its sole discretion.
to:
If you violate this policy ... any projects you run on Labs, can be suspended or terminated. The Wikimedia Foundation can also suspend or terminate a tool or account at its discretion, such as in response to a community discussion on meta.wikimedia.org.
I think this approach is better than a blanket ban. First, where there is a legitimate and widely-felt community concern that a particular tool is unacceptable, it allows that tool to be dealt with appropriately. Second, it encourages development to happen on Labs, which ultimately gives the community more leverage and control than when tools are built on third-party servers. (For example, tools built on Labs have default filtering of IP addresses to protect users - something that doesn’t automatically happen for tools built elsewhere. So we should encourage use of Labs.) Third, it encourages tool developers to be bold - which is important when encouraging experimentation and innovation. Finally, it allows us to discuss the advantages and disadvantages of specific, actual tools, and allows people to test the features before discussing them, which makes for a more constructive and efficient discussion.
Curious to hear what you (and others) think of this idea. Thanks.-LVilla (WMF) (talk) 00:02, 24 December 2013 (UTC)[reply]
Is there a need in distinguishing WMF's role in administering Labs tools? I would only stress the requirement of Labs Tools to obey this policy, here, and link to a Labs policy on smooth escalation (ask tool author; discuss in community; ask Labs admins; ask WMF). Gryllida (talk) 05:14, 24 December 2013 (UTC)[reply]
WMF is called out separately in the policy because WMF employees ultimately have control (root access, physical control) to the Labs servers, and so ultimately have more power than others. (I think Coren has been recruiting volunteer roots, which changes things a bit, but ultimately WMF still owns the machines, pays for the network services, etc.) I agree that the right order for conversation is probably tool author -> community -> admins, and that the right place for that is on in the terms of use but an informal policy/guideline on wikitech. -LVilla (WMF) (talk) 17:15, 24 December 2013 (UTC)[reply]
Yah, I just wanted to propose that the policy references both concepts (WMF's ultimate control, and the gradual escalation process) so the users don't assume that appealing to WMF is the only way. Gryllida (talk) 08:38, 25 December 2013 (UTC)[reply]
As I mentioned elsewhere on this page, the talk about "community consensus" raises questions such as "which community?" and "what happens when different communities disagree?" Anomie (talk) 14:30, 24 December 2013 (UTC)[reply]
Right, which is why I didn't propose anything specific about that for the ToU- meta is just an example. Ultimately it'll have to be a case-by-case judgment. -LVilla (WMF) (talk) 17:15, 24 December 2013 (UTC)[reply]
I would perhaps remove the "on Meta" bit then since it bears no useful meaning. «... such as in response to a community discussion.» looks complete to me. There doesn't even have to be a discussion in my view: a single user privately contacting WMF could be enough, granted his report of abuse is accurate. «... such as in response to community feedback.» could be more meaningful. Gryllida (talk) 08:38, 25 December 2013 (UTC)[reply]
This is meant as an example ("such as"), so I think leaving the reference to meta in is OK. Also, this is in addition to the normal reasons for suspension. For the normal reasons for suspension, a report by a single person would be fine, but I think in most cases this sort of discretion will be exercised only after community discussion and consultation, so I think the reference to discussion is a better example than saying "feedback".-LVilla (WMF) (talk) 22:28, 31 December 2013 (UTC)[reply]
I am referring to this argument from above: we made the change because the edit information has always been transparently available, so the Toolserver policy was not effective. The position that any analysis that can be performed by a third party should also be allowable on WMF servers with WMF resources is not convincing. It is clearly possible for a third party to perform comprehensive and intrusive user profiling by collating edit data without the user's prior consent. We could (and should!) still prohibit it on our servers and by our terms-of-use policy. (A different example: it's clearly possible for a third party running a screen scraper to construct a conveniently browsable database of all edits that have ever been oversighted; this doesn't mean WMF should allow it and finance it.) Now, why should this kind of user profiling be prohibited by WMF? Because WMF lives on the goodwill of its editors, and editor NNW above put it best: "I want to create an encyclopedia, not to collect money for spying on me." AxelBoldt (talk) 18:15, 24 December 2013 (UTC)[reply]
You're right, but I think removed (oversaught) edits are out of question here. Whatever else is available is available, and allowing to collect freely available information programmatically sounds reasonable to me. Gryllida (talk) 08:38, 25 December 2013 (UTC)[reply]
It's not reasonable if the editors don't want it and if it doesn't further any identifiable objective of the foundation. In fact it is not only unreasonable but it's a misuse of donor funds. AxelBoldt (talk) 22:28, 25 December 2013 (UTC)[reply]
You should be interested in contributing to the #Tool_settings section below. Gryllida (talk) 01:56, 28 December 2013 (UTC)[reply]
Hello LVilla (WMF)! Your suggestion means that any tool that will be programmed in future has to be checked and – if someone things that it is necessary – has to be discussed individually. My experiences until now: "the community should not have any say in the matter" and a quite short discussion "Technically feasible, legally okay... but want tools do we want?" started at lists.wikimedia.org. If we want it that way we will have to define who is "community". Is it the sum of all users of all WMF projects? Can single projects or single users declare to keep a tool (e.g. en:WP voted for no opt-out or opt-in for X!'s Edit Counter but that would mean that my edits there will be used in that tool although I deny it completely for my account)? Which way will we come to a decision: simple majority or best arguments (and who will decide then)? Does a community vote for tool X mean that there is no chance for a tool Y to try it a second time or do we have to discuss it again and again?
We have to be aware of our different cultures of handling private data or even defining what's private and what's not. Labs "doesn't like" (nice term!) "harmful activity" and "misuse of private information". US law obviously doesn't evaluate aggregating data as misuse, I do. We discuss about necessary "transparency" but do not have a definition for it. The time logs of my edits five years ago seem to be important but you don't want to know my name, my address, my sex, my age, my way how I earn my money… which would make my edits, my intentions and my possible vandalism much more transparent than any time log. Some say "the more transparency the better" but this is a discussion of the happy few – but dominating – who live in North America and Western Europe. I think we also should think of those users who live in the Global South and want to edit problematic topics (religion, sexuality…). For those aggregated user profiles may become a real problem and they will always be a minority in any discussion. NNW (talk) 17:56, 28 December 2013 (UTC)[reply]
Everyone involved is aware that privacy values vary a great deal from community to community; but it seems very ill-advised to give the most restrictive standards a veto over the discussion, in practice and in principle. A clear parallel with the discussion over images can be drawn: while it would have been possible to restrict our standards to the subset deemed acceptable by all possible visitors, to do so would have greatly impoverished us. The same goes for usage of public data: we should foster an encourage new creative uses; not attempt (and fail) to preemptively restrict new tools to the minuscule subset nobody could raise an objection to. This does not preclude acting to discourage or disable a tool the community at large objects to – and the Foundation will be responsive to such concerns – but it does mean that this is not something that can be done with blanket bans.

To answer your more explicit questions, the answer will generally be "it depends" (unsatisfying as this sounds). Ultimately yes, the final arbiter will be the Foundation; but whether or not we intervene is dependent entirely on context as a whole; who objects, why, and what could be done to address those concerns. MPelletier (WMF) (talk) 00:48, 1 January 2014 (UTC)[reply]

So for programmers sky's the limit, it's to the community to find out which tool might violate their rights and to discuss this again and again and again because every tool has to be dealt anew. The community has to accept that in the end a RFC like for X!’s Edit Counter is just a waste of time and that programmers – of course – are not interested in any discussion or compromise because it might cut their tools. WMF is in the comfort position that Meta is in the focus of only very few users and the privacy policy does not apply to Labs. It would be fair to admit that under these circumstances WP:ANON becomes absurd and in near future – with more powerful tools – a lie. I understood "The Wikimedia Foundation, Inc. is a nonprofit charitable organization dedicated to encouraging the growth, development and distribution of free, multilingual, educational content" as "free and multilingual and educational content" but a user profile generated with my editing behaviour isn't educational. NNW (talk) 13:50, 4 January 2014 (UTC)[reply]

Questions from Gryllida

Implementation as Extension

This requests to conceal time of an edit. Would any of the supporters of the appeal be willing to demonstrate a working wiki with the requested change implemented as an Extension which discards edit time where needed? If sufficiently safe and secure, it could be added to a local German wiki by request of the community, and considered by other wiki communities later on. Many thanks. Gryllida (talk) 04:43, 24 December 2013 (UTC)[reply]

Tool settings

Have you considered requesting the Tool author to add an opt-out (or opt-in, as desired) option at a suitable scope? Gryllida (talk) 04:45, 24 December 2013 (UTC)[reply]

Example: editor stats:
«Note, if you don't want your name on this list, please add your name to [[User:Bawolff/edit-stat-opt-out]]».
--Gryllida (talk) 02:14, 28 December 2013 (UTC)[reply]

FYI: The tool address is here. It is not mentioned in the appeal text. (I have notified the tool author, Ricordisamoa, of this discussion and potentially desired feature.) Gryllida (talk) 02:20, 28 December 2013 (UTC)[reply]

User:Ricordisamoa deliberately ignored the idea of an opt-in or opt-out and there is no chance to discuss anything: There's no private data collection, and only WMF could prevent such tools from being hosted on their servers: the community should not have any say in the matter. For complete discussion read Talk:Requests for comment/X!'s Edit Counter#Few questions. NNW (talk) 16:29, 28 December 2013 (UTC)[reply]
@Gryllida and NordNordWest: of course I accept community suggestions (e.g. for improvements to the tool) but the WMF only is competent about legal matters concerning Wikimedia Tool Labs. If there should be any actions, they will have to be taken by the WMF itself. See also [3]. --Ricordisamoa 03:04, 29 December 2013 (UTC)[reply]
Ricordisamoa, would you not be willing to add an opt-out? I would desire it be solved without legal actions or escalation, as it appears to be something within your power and ability, and many users want it. (It seems OK to decline OPT-IN feature request.) Gryllida (talk) 09:07, 29 December 2013 (UTC)[reply]
@Gryllida: No. --Ricordisamoa 16:44, 30 December 2013 (UTC)[reply]
Ricordisamoa, I understand your view. It might make sense to document that in FAQ, if not already, at leisure. I appreciate you being responsive. Gryllida (talk) 07:17, 31 December 2013 (UTC)[reply]
As long as WMF wants to encourage programmers to do anything as long as it is legally there is no reason for programmers to limit the capabilities of their tools. "Community" is just a word which can be ignored very easily when "community" wants to cut capabilities. Only "improvements" will be accepted and "improvements" mean "more, more, more". NNW (talk) 14:00, 4 January 2014 (UTC)[reply]

Discussion on same topic in other locations

Note that this issue has also been discussed in #Generation_of_editor_profiles and #Please_add_concerning_user_profiles. For a full history of this topic, please make sure to read those sections as well. —LVilla (WMF) (talk) 00:36, 8 January 2014 (UTC)[reply]

Opt-in

There is the possibility for a compulsary opt-in for generating user profiles at Labs. By this we would return to the Toolserver policy which worked fine for years. No information would be reduced, fighting vandalism would still be possible, programmers still could write new tools and of course there will be lots of users who are willing to opt-in (like in Toolserver times). On the other hands all other users who prefer more protection against aggregated user profiles can get it if they want to. I see no reason why this minimal solution of the appeal couldn't be realized. NNW (talk) 13:43, 13 January 2014 (UTC)[reply]

As has been stated elsewhere, this only gives a false sense of security. There are other websites that allow profiling anyway, and there's no way to stop them, so there's no clear reason to pretend that you have a choice. //Shell 20:56, 13 January 2014 (UTC)[reply]
As has been stated elsewhere something that is done somewhere doesn't mean we have to do it, too. NNW (talk) 21:32, 13 January 2014 (UTC)[reply]
Toolserver policy was only enforced upon user request. There's a lingering worry that some upset user slap a tool author with a take-down request; this is demoralizing to authors after spending many hours developing the software. This discouraging effect is why we don't see many community tracking tools, like the Monthly DAB Challenge. I've got cool and interesting ideas, but wont waste my time. Dispenser (talk) 19:04, 21 January 2014 (UTC)[reply]
With an opt-in there would be no reason for any complaint. Everybody can decide if her/his data gets used for whatever or not and there will be still lots of users who will like and use whatever you are programming. Please think of those authors who spent many hours to create an encyclopedia and find themselves as an object of spying tools afterwards. Believe me: that's demoralizing. NNW (talk) 23:19, 21 January 2014 (UTC)[reply]

Community comments acceptance deadline

Hi. According to the top of the page, the community comments acceptance deadline is 15 January 2014. I'm not sure this is a good idea. Discussion is ongoing and there appear to be a number of unresolved issues on this talk page and at Talk:Access to nonpublic information policy. Discussion should continue until there's consensus to move forward. --MZMcBride (talk) 07:32, 5 January 2014 (UTC)[reply]

Given the significant changes still expected, and the few, dated translations, I agree. How far should we push it out? A few weeks? Let's aim to get the expected changes ironed out in English by mid-month? --Elvey (talk) 06:28, 6 January 2014 (UTC)[reply]
I just made several edits. The last two are labeled Option A and Option B. They were prompted by this problematic edit: The new language "are kept confidential" was added, referring to 'email this user' email. That implies they are kept. (!) Should they be? Arguably not. Thus I suggest we go with option A, or if that's not accurate, to Option B. I hope we can go with Option A.--Elvey (talk) 06:28, 6 January 2014 (UTC)[reply]
@Elvey: As I noted in the edit page, please open a new section on this page for substantive edits, rather than making them directly in the doc. Thanks! —LVilla (WMF) (talk) 02:53, 7 January 2014 (UTC)[reply]
I think extending the non-public information policy discussion makes sense, and the data retention policy is still unpublished and so will obviously need to go past the 15th. But on this doc, I'd lean towards bearing down on the remaining open issues and still aiming to close on the 15th (with the obvious exception that there might be changes to it resulting from changes to the other two documents.) Michelle is on vacation, and we can't confirm that until she returns, but that is my sense of the right plan. —LVilla (WMF) (talk) 02:53, 7 January 2014 (UTC)[reply]
Hi! Just to confirm, we've extended the discussions for the privacy policy and access policy drafts until 14 February 2014 (which is the same period as the data retention guidelines). Thanks! Mpaulson (WMF) (talk) 20:53, 22 January 2014 (UTC)[reply]

Data Retention Guidelines posted

We're happy to announce that the first draft of the new data retention guidelines are now available for your review, feedback, and translation. This draft is the result of a collaboration between many teams within the Foundation, including Analytics, Operations, Platform, Product, and Legal.

As with the other privacy documents, this draft is just that: a draft. We want to hear from you about how we can make it better. As suggested in the discussion about timelines above, we plan to hold the community consultation period for this draft open until 14 February 2014.

Thanks - looking forward to the discussion. —LVilla (WMF) (talk) 21:30, 9 January 2014 (UTC)[reply]

Great to see. I've commented on Talk:Data retention guidelines. //Shell 00:30, 10 January 2014 (UTC)[reply]


Section summaries: Need an explanation

I like the summaries, but there is one which I find hard to translate. It's "You are consenting to the use of your information in the U.S. and to the transfer of that information to other countries necessary to provide our services to you and others." I'm not sure what's the literal meaning. Is it that information is only used or transferred whenever it is necessary to provide our services or does it mean that only those information which are necessary to provide our services are used or transferred? Alice Wiegand (talk) 19:45, 12 January 2014 (UTC)[reply]

I understand it as "it is technically necessary (as chosen by the internet service providers involved) to send all data, produced by a person reading or interacting with WP, over various national borders" - and the U.S. of course. Something which happens with almost everything you do online. Alexpl (talk) 08:17, 14 January 2014 (UTC)[reply]
Thank you both for your comments! I changed the wording to: You are consenting to the use of your information in the U.S. and to the transfer of that information to other countries in connection to providing our services to you and others. Does that make more sense? RPatel (WMF) (talk) 00:39, 15 January 2014 (UTC)[reply]

Id like

to append another question. Are these summaries - an I didnt real all of that above - given to another business making company, like Google perhaps? Or so. else who is dealing with information? Who considers that this is not the case and how?--Angel54 5 (talk) 22:29, 18 January 2014 (UTC) Means: This one: "Despite our best efforts in designing and deploying new systems, we may occasionally record personal information in a way that does not comply with these guidelines. When we discover such an oversight, we will promptly comply with the guidelines by deleting, aggregating, or anonymizing the information as appropriate." How may I understand that? Allowed, forbidden, occasional forbidden. Thats to swimming for my taste.--Angel54 5 (talk) 22:57, 18 January 2014 (UTC)[reply]

Some volunteer or employee may put a program on a server which collects personal data. That is forbidden and WMF will do something about it as soon as such an event is discovered or reported. Since this is a wiki, such things can never be fully ruled out. Alexpl (talk) 07:42, 20 January 2014 (UTC)[reply]
Hi Angel54. Just wanted to let you know that Alexpl's interpretation is accurate. We try to ensure that data is collected, used, and retained in compliance with our policies and guidelines, but there is always a chance that someone does something that doesn't comply and when that does happen, we will try to correct it as soon as we can. Mpaulson (WMF) (talk) 21:01, 22 January 2014 (UTC)[reply]

English Wikipedia account creation procedures violate the WMF Privacy Policy

http://wikimediafoundation.org/wiki/Privacy_policy#Discussions says, "Users are not required to list an email address when registering." I would like to inquire upon the requirement of the English Wikipedia to provide an email account when registering for an account as indicated with the instruction "The first is a username, and secondly, a valid email address that we can send your password to (please don't use temporary inboxes, or email aliasing, as this may cause your request to be rejected)." (Emphasis not added, it is originally in bold.) There is no option to register without giving an email address and this appears to be a prima facie violation of the WMF Privacy Policy. Thank you. 134.241.58.251 20:57, 21 January 2014 (UTC)[reply]

Hi, there is no violation of the privacy policy. An email is required to create an account on your behalf and for us to communicate the password across to you. Also, the email is not listed at all. It is kept on record for ~90 days per the Wikimedia's Data retention policy and is only displayed to the user who is creating the account plus other users who have a valid reason to view it. So, there is no violation, just technical limitation. John F. Lewis (talk) 21:04, 21 January 2014 (UTC)[reply]
The labs interface for creating an account (an unofficial method) is separate from MediaWiki's actual account creation (the official method) in that the former is used to request someone other than yourself use the latter to create the account for you by-proxy and then turn it over to you. This is potentially done for numerous reasons, including being affected by account-creation-blocked blocks (particularly from abusive schools with a shared IP), being unable to read CAPTCHAs, getting nabbed by the automated similar-account-name extension, and so forth. From the account creator standpoint, they'd ask the same questions and require the same information as needed on the labs interface. For example, as John F. Lewis said, an email account to email the newly created account's password to is one obvious reason. However, when it comes to abusive shared IPs, an email address, especially from the blocked school or other organization, may be the only way to distinguish one user from another. As such, providing an email address in that instance would be the difference between someone actually being willing to create the account for you versus being summarily declined as just another abusive user from that institution. --slakr 21:45, 21 January 2014 (UTC)[reply]
This does present a very clear and unambiguous violation of the WMF Privacy Policy. It is unclear how anyone could 1) defend it and 2) allow it to continue. 24.61.9.111 00:03, 22 January 2014 (UTC)[reply]

Thank you for your comments! John F. Lewis and Slakr are correct that there are valid reasons for Labs to request an email address when you create an account (sending a password to you, etc.). This is not a violation of the Privacy Policy because when the Policy states that you are only required to provide a user name and password to create an account, it is referring to a standard account. Labs accounts are non-standard accounts, and so it is not a violation of the Policy to require users to provide more information. RPatel (WMF) (talk) 20:36, 22 January 2014 (UTC)[reply]