Peter Li from the Global Internet Freedom Consortium has responded in the comments to my post about snooping by Chinese circumvention tools:
We apologize for the confusion here. The anti-censorship ranking service is provided by one of the GIFC partners. It only publishes the popularity ranks of destination websites users visit through our anti-censorship tools. It is similar to alexa.com but is only limited to anti-censorship web traffic.
The ranking service is not authorized to access, nor can it access, the data users transmit on the wire. It is not authorized to release logs containing information on the websites any individual user visits either.
The FAQ for the ranking service was not written properly, as originally “user” there meant website owners who may be interested in getting detailed statistics on how their websites are visited through our anti-censorship tools. We apologize that we have overlooked the wording.
The GIFC partner who runs the ranking service, the World Gates’ Inc, has been notified, and that FAQ entry has been removed. Thank you for discovering the problem.
Global Information Freedom Consortium
Also, Rebecca Mackinnon has written an excellent followup to the post that includes a response from Bill Xia of Dynamic Internet Technologies / Dynaweb that ‘DIT never gives out “personal-identifying user data”‘ and the following quote from Peter Li:
Yes, in some cases FBI asked us to provide logs for certain websites or destination IPs in some particular time periods, for example, they would request something like the original IPs who visited xyz.com at Jan 12, 2007, 12:20-30 EST, and the visited web pages. We provided such information as we feel we are obligated to work with law enforcement agencies in the free world.
Note that the above quote does not imply any sort of quid pro quo for FBI access to data. If Dynaweb is storing the data about individual users, they are required by U.S. law to give access to that data in response to government warrants and subpoenas.
Rebecca also gets the issue of the trust invested in circumvention tools precisely right:
The moral of this long story is important: when using circumvention tools, make sure you understand enough about how they work, what they’re meant to be used for, and who runs them, so that you’re not taking a leap of faith with people you would rather not trust.
The decision about who to trust is a personal one: I am more inclined to trust a VPN operating in the U.S. which is subject to FBI requests than a Beijing Telecom connection subject to Beijing public security bureau requests, but that’s just me. Other people might feel very differently and make different choices. Some people may feel very comfortable trusting the Falun Gong… others, well, might not… It appears that the VOA, RFA, and HRIC have decided to trust them and to recommend these services to their users.
Where does this leave the issue?
I’m happy that the data is no longer for sale on the website, but given all of these factors, I’m still concerned with the amount and sensitivity of the data being stored, the lack of disclosure to users about what data is being stored and how it is being used, and the care with which the data is being protected.
I want to make clear first that I am not attacking the motives of the developers of these tools. I have every reason to believe that the people building, distributing, and running these tools are doing so in honest resistance to the restrictive Internet policies of the Chinese government. I should have made that fact clear in my original post. I don’t think anyone was selling data to make a quick buck. I think any money made out of any hypothetical sale of personal data would have been plowed back into the circumvention projects.
Still, I am somewhat skeptical of Peter’s explanation that the issue was merely confusion arising from a misunderstanding of the word “user.” The key sentence seems pretty clear to me: “But data that can be used to identify a specific user are considered confidential and not shared with third parties unless you pass our strict screening test.” To the degree that websites are “identified,” they are already identified in the public aggregate data google.com, live.com, etc). What additional, confidential data would be published about a website? I think it more likely that the confusion here is between the various projects contributing data and the ranking.edoors.com site displaying the data. In any case, the faq entry in question has now been removed form the site, so if they were offering to sell data, they are not anymore.
But Peter’s further comment about sharing data with the FBI indicates that, whether or not they are actively selling individual user data, they are definitely storing the data on an individual level. This fact alone is cause for concern, or at least for disclosure. There is no law in the U.S. that requires storage of web browsing histories, though the EU data retention directive does require that EU ISPs store the source and destination IP address of every Internet communication. The data flowing over the networks of these circumvention tools is particularly sensitive, since most of the users of the tools are breaking the laws of the countries merely to use them. Any data that is stored can be shared, stolen, subpoenaed, warranted, and otherwise distributed. The fact that some or all of the GIFC circumvention tools are storing browsing histories of individual users vastly increases the level of trust those users are investing in the tools, not just not intentionally to misuse the data but also to safeguard it from attack or from misuse by partners trusted with the data. The current confusion over what data is being shared with the ranking service and what the ranking service is doing with the data is a demonstration of the inherent dangers of storing and sharing the data, even with trusted partners.
A user should be able to make an informed decision between using a tool that tracks her activity (like dynaweb, gpass, and firephoenix) and a tool that does not (like anonymizer). Note that this is not a personal recommendation on my part to use any tool over any other. Lots of folks have responded to my original post by saying “See, you should use Tor!”. I think Tor is a great project, but without going into depth, it is very open about the ways that it does and does not protect the privacy of its users. As Rebecca says, before using a tool, you should be aware of how it works and what it is doing with you data and then make your decision about what and whom to trust. But projects have to disclose what they are doing with their users’ data for users to be able to make this choice.
Update: Edited to remove sloppy wording that wrongly implied connection between State Department funding and access to data.