Skip to content

Popular Chinese Filtering Circumvention Tools DynaWeb FreeGate, GPass, and FirePhoenix Sell User Data

09-Jan-09

Update: The site hosting the data for these tools has now removed the faq entry offering to sell the data. Please read my subsequent update for responses from the tool developers and further thoughts.

Three of the circumvention tools — DynaWeb FreeGate, GPass, and FirePhoenix — used most widely to get around China’s Great Firewall are tracking and selling the individual web browsing histories of their users. Data about aggregate usage of users of the tools is published freely. You can see, for example, that the three sites most visited by users of these circumvention tools are live.comgoogle.com, and secretchina.com. Aggregate data like this is a terrific resource for those of us interested in researching circumvention tool usage, and not much of a privacy risk for the circumventing users if it is only stored (as well as displayed) in the aggregate.

But the ranking site also advertises a pay service through which you can get not only much more data, but data about individual users. The site’s FAQ states:

Q: I am interested in more detailed and in-depth visit data. Are they available?

A: Yes, we can generate custom reports that cover different levels of details for your purposes, based on a fee. But data that can be used to identify a specific user are considered confidential and not shared with third parties unless you pass our strict screening test. Please contact us if you have such a need.

So they are happy to provide you with specific user data, but only if you double super promise not to share it and only if they really like you.

It’s hard to state how dangerous this practice is. These tools are acting as virtual ISPs for millions of users. All circumvention tools work by proxying the data of their users through some third machine, so all circumventing traffic is going through that third party machine. Selling the browsing histories of those users is like an ISP selling the browsing histories of its users, which is a big step beyond what companies like NebuAd and Phorm were / are trying to do. NebuAd and Phorm are at least adding a variety of pseudonymity and privacy layers to their tracking, whereas dynaweb et al. are evidently directly storing (and selling) the full, individually identifiable browsing histories of their users.

And the data about circumventing users is much more sensitive than the data about most ISP users. These are the histories of users browsing sites that are not only blocked (and therefore mostly sensitive in one way or another) but blocked by an authoritarian country with an active policy and practice of persecuting dissidents. The mere act of anyone, let alone projects proclaiming themselves for internet freedom, storing this data is very bad practice. Any data that is stored can be potentially be shared or stolen. The best way to make sure that dangerous data like this does not get into the wrong hands is not to store it in the first place.

But these projects are not only storing the data. They are actively offering to sell it. None of the projects has anything like a privacy policy that I can find, and none of them provides any notice anywhere on the site or during the installation process that the project will be tracking and selling user browsing activity.* But all of the sites have deceptive language like this from the FirePhoenix home page:

Secure

FP encrypts all your network traffic. No third-party can recognize what Internet information is flowing in/out of your computer, even if they are monitoring your traffic.

In fact, third parties can recognize the data flowing in/out of a computer running FirePhoenix by buying that data and promising not to share it with anyone else.

This sort of thing demonstrates that there is no way to eliminate points of control from a network. You can only move them around so that you trust different people. In this case, Chinese users are replacing some of the trust in their local Chinese ISPs with trust in the circumvention projects through which they are proxying their traffic. But those tools are acting as virtual ISPs themselves and so have all the potential for control (and abuse) that the local ISPs have. They can snoop on user activity; they can filter and otherwise tamper with connections; they can block P2P traffic.

These particular virtual ISPs have chosen to support themselves by selling user data. Lots of folks rely on personal VPNs to circumvent or otherwise secure their connections, but those VPNs are not inherently any safer that the local ISPs through which they are tunneling. The popular VPN Relakks, for example, is hosted in Sweden, where a law passed last year requires that the federal government monitor all data entering and leaving the country, including foreign users of the Relakks VPN. Some circumvention projects like Psiphon use a peer to peer model in which volunteers host proxies (ideally a volunteer known by the circumventing user) and others like Tor use algorithms to try to ensure trust of the proxies, but all of them require that the user trust some other person or some code with all of her circumventing traffic.

*: installation language not verified for FirePhoenix, which has only a Chinese interface.

Viral Conversations: Community Based Production of Biased Reviews

26-Nov-08

Update: Viral Conversations has changed the language in their FAQ not to encourage gifting of reviewed items and to encourage reviewers to disclose whether they are keeping reviewed items.

I just ran across a new site called Viral Conversations. The basic idea is to serve as a brokerage between companies with products they want reviewed and bloggers who want to review them. Sony submits an offer for a blogger to review a camera, some bloggers submit applications to review the camera, Sony chooses one or more to write reviews, sends them a camera, and the blogger writes a review.

There’s a customer driven version of this basic idea that could be community empowering and in the long term best interests of the company. Maybe the company offers to lend two cameras to qualifying bloggers, and the users of the site vote on which bloggers get to review the cameras. This way, there would be no chance for Sony to pick only friendly reviewers, and reviewers would not get paid for reviews with merchandise. I’m sure there would be problems with this approach, but it’s possible at lest to think hard about how to create such a site in a way that would produce honest, community driven reviews of the products. In fact, such a system could attack existing problems with generating honest reviews through advertiser driven media.

Unfortunately, Viral Conversations is not such an honest attempt. In fact, it not only ignores the problem of biased selection of reviewers, but it is breathtakingly bold in the corruption of its system for generating reviews. Consider the following from the FAQ:

[for advertisers]

What Kind of Reviews Can I Expect?

We encourage all of our bloggers to be as honest as possible. Sometimes there will be negative aspects or criticisms, as this is to be expected. This not only makes the review more believable but gives you suggestions on how you can improve your product.

What if the Review is Negative?

We strongly suggest that all bloggers contact you beforehand if the review is more negative than positive. Hopefully this gives you the opportunity to fix the problem. If a resolution can’t be reached we suggest that the review not be published. We can’t force anyone to not publish or take down a negative review, but we will try to help.

Do I Have to Let the Bloggers Keep The Item?

No, you don’t have to let the bloggers keep the item, but we do think it’s a good idea and really nice thing to do. It’s going to depend on a number of factors such as cost and shipping difficulty. Letting the bloggers keep a $50 coffee maker is probably a no brainer, but you may feel a little differently about an $1500 espresso machine. Be as clear as possible in the beginning to avoid any confusion.
[for bloggers]

Does the Review have to be Good?

No, the review should be honest. Most would agree that the IPhone is a great product, although not everyone likes the touch screen, and it’s safe to say everyone wishes the battery would last longer. These do not make the IPhone a bad product. Talk about the product’s good points, and mention areas where it needs improvement. If you find that your review is more negative than positive or almost all negative, please put on the brakes before you publish. Send an email or pick up the phone and let someone know first.

Do I Get to Keep The Product I am Reviewing?

That’s going to vary from offer to offer. While we recommended that merchants who use our service let you keep the product or item, it’s not always possible. Sometimes it’s a monetary issue, other times it’s a limited availability issue. That information should be communicated to you before hand. If you do keep the item you are responsible for any tax liabilities that are incurred.

So while Viral Conversation can’t absolutely guarantee good reviews or that reviewers get paid by companies for good reviews, they strongly suggest 1) that the companies give the reviewed products to the reviewers and 2) that the bloggers only publish positive reviews. Okay.

And what about a disclaimer from the blogger about the fact that she is basically being paid to write good review?

Do I need a Disclaimer on My Post?

You don’t need a disclaimer but we very strongly recommend you do it to be upfront and honest with your readers. It could be something as simple as “The John Smith Camera Company sent me their new ABC-123 DLSR camera to review”. If you do a lot of reviews on your website a more formal review policy should be something you should look into.

Assuming the advertisers and bloggers follow the suggested practices, the formal review policy should presumably say something to the effect of “You should trust nothing I write in this blog because I’m being paid with in kind merchandise to write only positive reviews”?

Postscript:

I enjoy reading outrageous terms of service. Viral Conversations has a great bit in their terms of service:

Viral Conversations website disclaims any and all responsibility or liability for the accuracy, content, completeness, legality, reliability, or operability or availability of information or material displayed in the ViralConversations.com website pages. [emphasis mine]

Even though I live and work with lawyers, I am not one myself. Still, I’m pretty sure it’s not possible to disclaim all responsibility for the legality of my actions. If it is, I hereby disclaim all liability for the legality of any and all actions committed by me, including swiping your shiny new iphone. …

Google Ad Planner: Advertising Surveillance of the Internet

25-Nov-08

For a long time, the only free source of data about site traffic online has been the Alexa Top Sites list, but the data for the Alexa list is based on the very skewed sample of folks who run the Alexa toolbar, and who the heck runs the Alexa toolbar these days? When I’ve needed data about the most popular sites in a country, I’ve had to use the Alexa data, but only holding my nose with knowledge that the data at best represents a wild guess. There have been better sources of data, but they were all closed, expensive, and generally collected in at least mildly sketchy ways.

Google’s ad planner tool moves dramatically toward filling this big hole in public knowledge about the web site traffic. To try it out, visit the above url and click on the ‘Begin Research’ button.

The ad planner tool is:

a free media planning tool that can help you identify websites your audience is likely to visit so you can make better-informed advertising decisions.

With Google Ad Planner, you can:

* Define audiences by demographics and interests.
* Search for websites relevant to your audience.
* Access aggregated statistics on the number of unique visitors, page views, and other data for millions of websites from over 40 countries.
* Create lists of websites where you’d like to advertise and store them in a media plan.
* Generate aggregated website statistics for your media plan.

What the tool actually does is provide a list of total traffic numbers the 250 most visited sites that meet a number of different demographic queries, including by country and by site type. This lets you, for instance, find out the 250 most visited sites in India, along with the total traffic and number of unique visitors for each site. Or the 250 sites most visited by women. Or by women between 25 and 34. Or by women between 25 and 34 who make more than $150,000 a year:

It’s hard to overstate the power of this tool and the orders of magnitude improvement it is over the Alexa data. You can filter the data by category (newspapers, liberal blogs, flower stores, etc, though the categories seem very poorly assigned). You can choose just sites that allow advertising or all sites (note that the tool shows just advertising sites by default). You can choose sites visited by users who have visited some other site. Or sites visited by users who have searched for some word.

Did you know that the New York Times has twice as many visitors (21 million) as the next closest newspaper, the Washington Post (11 million)? That the Washington Post has half again as much traffic as the next newspaper? That the Huffington Post has basically as much traffic (6.8 million) as every newspaper but New York Times and the Washington Post? That Daily Kos is less than a quarter of the size of the Huffington Post? That unlike in any of the other 20+ included countries, only 2 of the top 25 sites in China are U.S. hosted sites (yahoo at #8 and microsoft at #23)?

And the data used for the tool is the Good Stuff:

How is the data in Google Ad Planner generated?

Google Ad Planner combines information from a variety of sources, such as aggregated Google search data, opt-in anonymous Google Analytics data, opt-in external consumer panel data, and other third-party market research. The data is aggregated over millions of users and powered by computer algorithms; it doesn’t contain personally-identifiable information.

In other words, they use all of the very expensive, somewhat-to-very privacy questionable methods that we privacy interested folks worry about. They tap into their own extensive search logs, the even more extensive data from the adwords system, the extensive data from their analytics tool, and “market research” companies that install spyware that is difficult to distinguish from malware.

But hey, now at least we get the data.

What’s fascinating about this tool is that it’s a market research tool for folks who want to figure out what list of sites to advertise on. It’s little known because it has not been marketed like google trends as a general use tool, even though it is hugely useful as such. In fact, the terms of service only explicitly allows that: “You may use the Program to choose sites on which to target ads” (oddly, the terms also mandates that “The existence of this Program will be deemed Confidential Information” and must be protected with stringent security safeguards, notwithstanding the publication of the tool by google). The fantastic power of this tool for monitoring and understanding the Internet and the wide and deep and invasive methods used to collect the data for the tool point to the very strong connection between surveillance and advertising. The release of this tool and its data ouput as an ‘ad planner’ shows that in the world of adwords, doubleclick’s use of near universal third party cookies, and Phorm’s tapping of UK Internet connections, advertising has become very difficult to distinguish from surveillance.

Where are the AdWords jingles?

12-Nov-08

I’ve been reading up on the history of media and advertising lately, including a book by Stephen Fox on the history of advertising called Mirror Makers. Fox’s core argument is that advertising strategies are cyclical over time, varying between straightforward, plain text advertising that describes the price and value of the product to atmospheric advertising that attempts to attract attention and build up the reputation of a brand. He includes lots of examples of early advertising, including the following jingles about “Sunny Jim” used to sell Force cereal in 1902:

Jim Dumps was a most unfriendly man,
who lived his life on the hermit plan;
In his gloomy way he’d gone through life,
And made the most of woe and strife;
Till Force one day was served to him –
Since then they’ve called him “Sunny Jim.”

Jim Dumps a little girl possessed,
Whom loss of appetite distressed;
“I des tan’t eat!” the child would scream;
Jim fixed a dish of Force and cream –
She tasted it — then joy for him –
She begged for more from “Sunny Jim.”

The Sunny Jim character became a national cultural icon through these jingles. “A giant likeness adorned the sides of two eleven-story buildings in New York,” says Fox, “Songs, musical comedies and vaudeville skits were written about him. Anybody with a cheery personality and the name of James risked being called Sunny Jim.” Unfortunately, the jingles did not help sell much cereal, and they were eventually replaced by straightforward descriptions of the nutritional and economic advantages of the cereal. Some other jingle campaigns in the era worked, others like Force did not.

These jingles were published in written form, obviously because there was no radio in 1900. The idea of advertising as poetry seems quaint today, but actually more possible in the age of the text only AdWords format. It’s striking that AdWords today consists only of straightforward sells. See for instance the following AdWords ads currently running for ‘cereal:’

The strict text limits of the format might be a factor in discouraging jingles, but that same constraint could also serve as a creative force. It would certainly be possible to create an interesting, compelling jingle campaign one line of text at a time, and such an approach would encourage folks to pay more attention to the easily ignored AdWords boxes.

One possibility is that advertisers feel they don’t need jingles to capture attention when their ads are well targeted, as with AdWords. Another is that companies advertise directly through AdWords rather than through an ad agency and so don’t have the access to creative advertising expertise. Another is that AdWords is just young and hasn’t hit the jingle cycle yet.

Whatever the reason, an effect of the lack of AdWords jingles is that the cultural impact of AdWords is mostly limited to the impact the ads have on the creation of content, rather than on the content of the ads themselves. This is a significant divergence from most other modern mass media forms of advertising, in which the ads themselves are arguably as impactful as their impact on the content supported by them.

Surveillance Project Blog

12-Nov-08

We’ve started a blog for the Berkman Surveillance Project. I’ve been posting all of my surveillance stuff there, including the following stories:

Google Privacy Interview

30-Sep-08

Here’s an interview on google privacy issues I did last week for IT Business Edge.

OpenDNS and Firefox Search

07-Sep-08

Doc asks why opendns has broken the search from the address bar feature of his firefox. The problem is that the address bar fail over to google relies on the dns request failing, but opendns requests never fail. Instead, if an opendns server gets a request for a non-existent address, it display the opendns search/advertising page instead. That’s how they make their money. One of the many side effects of this behavior is that when you type ‘cotcaro’ into the firefox address bar, firefox first tries to lookup ‘cotcaro.com’ and ‘cotcaro.org’. On a normal dns server, those lookups would fail, and firefox would then try a search for ‘cotcaro’. But with opendns, the name lookup never fails. Instead, it returns the address for the opendns search/advertising page. So firefox doesn’t get the chance to fall through to the search.

I’d advise Doc not to use opendns in any case. In addition to creating ugly side effects like described above by breaking the dns protocol, it makes its money by selling information about its clients to other companies in the way that all search / advertising companies do:

We are affiliated with a variety of businesses and work closely with them in order to provide our services to users. We will only share personal information with affiliates to the extent that is necessary for such affiliates to provide the services. For example, when a website visitor searches on OpenDNS, the IP address and query are shared with OpenDNS’s advertising partners.

The given example may describe what almost every search engine does to make it money (search on google, google displays some ads for other companies, when you click on one of those ads, the company gets your ip address and the term you searched for). But the language in the clause allows opendns to sell any of your personal information to any of its customers “to the extent that is necessary .. to provide the services.”

Update: language edited slightly to make clear that opendns just seems to be selling information about its clients in the same way that other search / advertising companies do.

Midnight Piggybacking

14-Aug-08

So I’m sitting here at my excellent local Memphis honda repair shop getting Little Tokyo’s oil changed. In addition to being locally run, honest, and professional, the shop has wifi, so I can sit and work (or blog!) while getting my car fixed. The wifi wasn’t working today, so I asked the owner if he still offered it. The owner said that he does, but he only turns it on when asked now because someone has been “stealing” from him. Further questioning revealed that on three separate occasions, someone was working late at the shop and noticed a car idling outside the shop with a bright screen inside. Every time, when he turned off the wifi router, the car left.

Piggybacking someone else’s wifi is obviously nothing new. I’ve gone wardriving in a neighborhood in a pinch a few times (and even been accosted for sitting on a sidewalk in front of someone’s house once!). But in at least one case the car was idling in front of the shop from 12 midnight until 5 in the morning. I’m struggling to think of a reason for sitting on the router for so long so late at night other than the need for anonymity for some illicit activity. I suppose it might be a group of teenagers just looking for some private place to access facebook away from prying (and possibly surveiling) parents, but that seems a stretch. Individual anecdotes are obviously dangerous to draw conclusions from, but the fact that this is happening in Memphis at my local car shop makes me wonder how common it is. Memphis is far from the cutting edge of Internet activity.

I keep my wirless network at home open on the principle that I don’t trust the network to be secure with or without transport layer security and that I’m happy to share access with anyone who wants to use it. I’ve always judged the risk of someone using the access to do something I could be liable for to be small enough not to worry about it. This encounter makes me wonder whether I, like Bruce Schneier, should think harder about securing my home wireless network.

I also find it interesting that he was able to defend himself pretty effectively from what he viewed as an attack on his computers and network. As sophisticated as the wardriving attackers may have been, his simple defense of turning off the router until access is requested is pretty effective (though I strongly advised him to encrypt the network as well as turn it on on demand). Even more effectively, the owner noted the license plate of the car on at least one occasion. So now if the police show up at his door and try to arrest him for child pornography, he’ll have a license plate number to identify the users of his network. So midnight piggybacking as an anonymity technique could in many cases be less effective than just showing up at a local coffee shop in the middle of the day. the worst scenario in that case would be a physical identification which would in most cases be much more difficult to track back to a person than a license plate number.

Nigerian Searches for Spam

12-Aug-08

More google insights fun. Here’s the list of the top google searches from Nigeria:

Note that five of the top ten searches are for a tool called email extractor lite 1.4, which is a tool that pulls emails from a block of text. In other words, it is useful for harvesting email addresses for spam. I won’t link to it for fear of google juicing it, but here’s a screen shot:

This agrees with the perception of Nigerian as the source of the ubiquitous Nigerian Scam spam, but it is surprising in that it seems to suggests that a very large proportion of Nigerian Internet users are involved in spam production. I’m having a hard time coming up with an alternative explanation of this finding. If some botnet were running email extraction on lots of Nigerian computers, it wouldn’t be bothering with a google search for the tool (and would in fact just be doing the email extraction itself). One possible explanation is that email harvesting is contracted out to individuals who are left on their own to troll the Internet for pages with email addresses. Constant searches for the email extractor page would be consistent with not very technical folks getting paid for finding and harvesting email addresses.

Also note on the results page that the top rising search currently is Oceanic Bank, which seems to be a legitimate Nigerian Bank. But the web page for the bank includes a bright red Scam Alert that warns of widespread use of impostor Oceanic Bank sites for Nigerian scams.

Digital Cameras v. Nigeria

12-Aug-08

One of my guiding theories of the modern media / advertising landscape is that the extensive real time surveillance of consumers by online advertisers and content providers encourages the growth of content about digital cameras (the content about which is easily monetized) at the expense of hard news, especially international news about developing countries like Nigeria.

The following google insights chart of digital camera v. Nigeria searches over time strikes a blow against that theory:

Of course, this data say nothing about the amount of content produced about the respective topics, but the whole point of the google insights tool (which is targeted at advertisers) is to tell advertisers and content providers what sorts of content consumers are interested in. Content about digital cameras is likely still more profitable, since digital camera ad clicks presumably pay more than Nigeria ad clicks, but the decline in digital camera searches is still striking. It’s possible that this trend is merely the result of declining interest in digital cameras (which is surprising), but the fact that searches about Nigeria have not decreased over time is interesting in itself. Quick checks of similar comparisons show that consumer product content is more popular than hard news content, but that there is no accelerating trend in that direction.