Help Research On On-Line Harassment

Colette Vogele is a researcher and attorney who works to combat revenge porn, and who runs the site Without My Consent. She’s launching a survey to measure the incidence of on-line harassment. I’m writing on the topic at the moment, and I can say unequivocally that the field badly needs reliable empirical data. Please participate in the survey! Details are below:

Are you a person, 18 years old or over, who has harassment on the Internet? If so, we would appreciate your taking the time to complete a survey.

Our names are Dan Taube, Keely Kolmes and Colette Vogele, and we would like to request your participation in our research on the experience of online harassment. Being harassed online includes things like having someone intrude into your privacy online, someone using the Internet or mobile phone technology to harm your reputation, or someone stalking you online. We want to learn about the kinds of experiences people have, how they cope, and the resources they use to address the problem.

This study has been approved by the Institutional Review Board of Alliant International University.

As a participant, we will ask you a number of multiple choice questions, and some short answer questions, regarding your experiences of harassment on the Internet. Next, we will ask you to give us information about your age, sex, and similar things. It will take at most 15 minutes to finish the survey.

Your input may help in developing better services and resources for people who have experienced online harassment.

No names or personal information will be linked to the study and your participation will be completely anonymous so long as you do not put your name in your responses. If you should wish to contact the researchers directly, your participation may become confidential rather than anonymous, although your name will not be linked to any of the data you submit.

To be eligible for the study, you must be 18 or older, have had (or be currently having) an experience of online harassment, and be able to read and understand English.

If you meet these requirements and want to participate, you can find the survey at:

If you do not qualify for the study but you know others who might be interested in participating, feel free to forward this notice or URL.

Thank you for your interest and participation.

The Law of Internet Intermediaries: Meet the New Boss, Same as the Old Boss

I have a short essay, Middlemen, up at the Florida Law Review Forum. It’s a response to Jacqui Lipton‘s thought-provoking article, Law of the Intermediated Information Exchange (bonus: first page is at 1337!). And, it has a footnote about turtles. Here’s the introduction:

Meet the new boss, same as the old boss.

The Internet was supposed to mean the death of middlemen. Intermediaries would fade into irrelevance, then extinction, with the advent of universal connectivity and many-to-many communication. The list of predicted victims was lengthy: record labels, newspapers, department stores, travel agents, stockbrokers, computer stores, and banks all confronted desuetude. Most commentators lauded the coming obsolescence as empowering consumers and achieving greater efficiency; a few bemoaned it. But disintermediation was inevitable.

Jacqueline Lipton’s article “Law of the Intermediated Information Exchange” shows how foolish that conclusion was. Tower Records and Borders bookstores folded, and newspapers struggle to survive. But music fans don’t buy directly from Universal Music or Sony Music—they get the latest Jay-Z or Muse tracks from Apple’s iTunes Music Store. College students surf Craigslist to find listings of apartments for rent. News junkies stay glued to Reddit, or Twitter. Kayak collects cheap flight reservations for us. And Google helps us find . . . everything. We simply swapped one set of middlemen for another.

Gene Patents, Oil-Eating Bacteria, and the Common Law

The Supreme Court issued its decision in Association for Molecular Pathology v. Myriad Genetics today. A unanimous Court (with a short, quirky concurrence from Justice Scalia) held that the patent claims directed to isolated, purified DNA sequences did not recite patentable subject matter under 35 U.S.C. 101; by contrast, those directed to complementary DNA (DNA with the exons removed) did recite patentable subject matter. The case has generated much discussion but little controversy. Myriad’s stock price soared (presumably because the opinion wasn’t even more damaging) and then dipped. And the entire contretemps may be overtaken by whole-genome sequencing.

I think there are three interesting points to the case. First, it exemplifies the wonders and terrors of entangling the common law with complex statutory schemes. Second, it continues the War of the Roses between the Supreme Court and the Federal Circuit. Finally, despite the Court’s invocation of Chakrabarty, it shows how far the law and the society in which it is embedded have traveled since the fights over recombinant DNA in the 1970s and 1980s.

Unlike in copyright law, most of the Supreme Court’s precedent on patent law – at least, its modern precedent – deals with questions of statutory interpretation. Thus, we have KSR v. Teleflex interpreting obviousness (section 103), Microsoft v. AT&T interpreting extraterritorial infringement (section 271(f)(1)), and Merck v. Integra Lifesciences interpreting the research exemption to infringement (section 271(e)(1)). The question is not whether Congress has exceeded its powers under Article I, section 1, clause 8 of the Constitution. Rather, it is how to figure out what Congress meant when drafting the Patent Act. Section 101, which describes what constitutes patentable subject matter, is admirably, dangerously concise: “Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor…”

Genes are a “composition of matter,” as are inoculants of multiple strains of plant root nodule bacteria. But the Supreme Court has held that neither is entitled to a patent. Why? The Court has read into Section 101 three exceptions: laws of nature, natural phenomena, and abstract ideas. The list is short, and yet the Court has had to return repeatedly to this trinity of exceptions to define their scope and meaning. The Court’s precedent has both principled and practical justifications for the exclusions. The principled theory is that none of these three is a product of human ingenuity; rather, they are pre-existing rules of nature that are not, in any sense, inventions. The practical reason is that granting patents over subject matter falling within these zones would confer too great a property right to the alleged inventor – it would hinder, rather than promote, innovation. The challenge is that the theoretical rationale forces the Court into difficult and even absurd line-drawing problems, and the practical one seems to invade Congressional prerogatives as to the best way to generate innovative effort. In short, the Court’s common law drafting of statutory exemptions may have arrived, eventually, at a workable balance, but at the significant costs of uncertainty, judicial effort, and institutional conflict. We might be better off if the trinity had never existed.

Second point: the Court lives to reverse the Federal Circuit. It did so on automatic injunctions for victorious patent plaintiffs, on the test for whether an invention comprises an ineligible abstract idea, on whether the CAFC has exclusive jurisdiction over patent malpractice cases, and now on whether isolated DNA is patent-eligible. (There are more!) My standard explanations for this perpetual battle are that the Federal Circuit was instituted to be pro-patentee, while the Supreme Court carries no such mandate, and that this debate has devolved into a contest of rules (the practitioner-oriented CAFC) versus standards (the more academically-inclined Court). But, if the goal of jurisprudence in the lower courts and courts of appeals is to do justice while not being reversed, the Federal Circuit has a pretty poor recent track record. On the other hand, the uncertainty in the Supreme Court’s opinions often means that the CAFC’s approach has significant gravitational effect, as the emphasis on the “machine or transformation” test in USPTO guidelines and post-Bilski jurisprudence proves.

Lastly, when the Supreme Court decided Diamond v. Chakrabarty in 1980, holding that a General Electric research scientist could patent an invented 0il-eating bacterium carrying four hydrocarbon-metabolizing plasmids, the result was met with no small amount of terror. It was the beginning of corporate control over life itself. It enabled soulless firms to manipulate the very stuff of living tissue, and to block countermeasures with the force of intellectual property law. Today, while advocates have complained about the high cost of Myriad’s breast cancer detection regime, and worried about the effects on women’s health, there is no such comparable disturbance in the Force. We’ve accepted the biotechnology industry (even with the occasional jitter about GMO crops). That’s an interesting commentary on technological change, and perhaps one that makes it easier for the Court to issue its decision in a contentious area.

We haven’t seen the last opinion from the Court on its trio of exceptions. Let the parsing of the opinion begin!

Search and the First Amendment

Jane and I are in Arlington, Virginia, for a conference on Competition Policy in Search and Social Media at George Mason University. Jane, Neil Richards, Dawn Nunziato, and Stuart Benjamin will discuss the interplay of the First Amendment, regulation, and search / social media. I expect an entertaining fight over whether search results are speech, not speech, or something in between.

Reporting Fail: The Reidentification of Personal Genome Project Participants

Last week, a Forbes article by Adam Tanner announced that a research team led by Latanya Sweeney had re-identified “more than 40% of a sample of anonymous participants” in Harvard’s Personal Genome Project. Sweeney is a progenitor of demonstration attack research. Her research was extremely influential during the design of HIPAA, and I have both praised and criticized her work before.

Right off the bat, Tanner’s article is misleading. From the headline, a reader would assume that research participants were re-identified using their genetic sequence. And the “40% of a sample” line suggests that Sweeney had re-identified 40% of a random sample. Neither of these assumptions is correct. Even using the words “re-identified” and “anonymous” is improvident. Yet the misinformation has proliferated, with rounding up to “nearly half” or “97%.”

Here’s what actually happened: Sweeney’s research team scraped data on 1,130 random (presumably) volunteers in the Personal Genome Project database. Of those 1,130 volunteers, 579 had voluntarily provided their zip code, full date of birth, and gender. (Note that if the data had been de-identified using the HIPAA standard, zip code and date of birth would have been truncated.) From this special subset, 115 research participants had uploaded files to the Personal Genome Project website with filenames containing their names. (Or the number might be 103—there are several discrepancies in the report’s text and discrimination matrix which frustrate any precise description.) Another 126 of the subgroup sample could be matched to likely identities found in voter registration records and other (unidentified) public records, for a total of 241 re-identifications.

So, from the subset of 579 research participants who provided birth date, zip code, and gender, Sweeney’s team was able to provide a guess for 241 of them—about 42%. Sweeney’s research team submitted these 241 names to the PGP and learned that almost all of them (97%) were correct, allowing for nicknames.

A few things are noteworthy here. First, the 42% figure includes research participants who were “re-identified” using their names.  This may be a useful demonstration to remind participants to think about the files they upload to the PGP website. Or it might not; if the files also contained the participants’ names, the participants may have proceeded with the conscious presumption that users of the PGP website were unlikely to harass them. In any case, the embedded names approach is not relevant to an assessment of re-identification risk because the participants were not de-identified. Including these participants in the re-identification number inflates both the re-identification risk and the accuracy rate.

However, if these participants hadn’t uploaded records containing their names, some of them nevertheless would have been re-identifiable through the other routes. Sweeney’s team reports that 35 of those 115 participants with embedded names were also linkable to voter lists or other public records, and 80 weren’t. So, taking out the 80 who could not be linked using public records and voter registers (and assuming that the name was not used to inform and improve the re-identification process for these other 35), Sweeney’s team could claim to have reidentified 161 of the 579 participants who had provided their birthdates, zip codes, and gender. Even if we assume that all of the matches are accurate, the team provided a guess based on public records and voter registration data for only 28% of the sample who had provided their birth dates, zip codes, and genders.

In the context of the reidentification risk debate today, 28% is actually quite low. After all, Sweeney has said that “87% of the U.S. population are uniquely identified” by the combination of these three pieces of information. The claim has been repeated so many times that it has achieved nearly axiomatic status.

If anything, the findings from this exercise illustrate the chasm between uniqueness and re-identifiability. The latter requires effort (triangulation between multiple sources), even when the linkable information is basic demographics. Sweeney’s team acknowledges this, reframing the 87% figure as an “upper bound” based on estimates of uniqueness that do not guarantee identifiability. The press has not grasped that this study shows that reidentification risk is lower than many would have expected. Unfortunately reidentification risk is just technical enough to confuse the uninitiated. For a breathtaking misunderstanding of how Sweeney’s results here relate to her earlier 87% estimate, check out MIT’s Technology Review.

When there is a match, the question is whether the zip, birth date and sex uniquely identify an individual. Sweeney has argued in the past that it does with an accuracy of up to 87 per cent, depending on factors such as the density of people living in the zip code in question.

These results seem to prove her right.

Oh my god, no MIT Tech Review. That is not correct.

Though Sweeney’s study has some lapses in critical detail, it is much more careful and much less misleading than the reporting on it. I am especially disappointed by Tanner’s Forbes article.   Since Tanner is a colleague and collaborator of Sweeney’s and is able to digest her results, I am disturbed by the gap between Tanner’s reporting and Sweeney’s findings. The significance that many participants were re-identified using their actual names should not have escaped his notice. His decision to exclude this fact contributes to the fearmongering so common in this area.

Smoke If You Got ‘Em

I’m here in rainy, lovely Eugene, Oregon watching the Oregon Law Review symposium, A Step Forward: Creating a Just Drug Policy for the United States. (You can watch it live.) Jane is presenting her paper Defending the Dog – here’s the conclusion:

The narcotics dog doesn’t deserve the bad reputation it has received among scholars. The dog is the first generation of police tools that can usher a dramatic shift away from human criminal investigation and the attendant biases and conflicts of interests. Moreover, the reaction to the narcotics dog, as compared to the cadaver-sniffing dog, reveals an unsettling tendency to exploit criminal procedure when we are not enthusiastic about the underlying substantive criminal law. The natural instinct to do so may be counterproductive because drug enforcement will persist, with uneven results, and without a critical mass of public outrage.

Drug policy is a little far afield from my usual interests, but given the overwhelming use of Title III warrants (about 85% in 2011) to combat drug trafficking, and pending bills such as CISPA (which allows sharing for national security purposes – trafficking has long qualified), it seems well worth a Friday to learn more. (And, Jane’s empirical work brings some helpful rigor to the issue.) Updates as events warrant…

Privacy Law in Sixty Seconds (or so)

I am occasionally struck by my good fortune to write in an area that has such a supportive community. Much credit is due to the influence, ingenuity, and incessant hard work of Paul Schwartz and Dan Solove. Invariably, every privacy scholar has benefited from Dan’s and Paul’s support. This promotional video for their informal treatise Privacy Law Fundamentals nicely captures the combination of thougthfulness and goofiness that Paul and Dan have fostered. There are some jokes at the expense of celebrities and Europe, which is always good fun. (Privacy Law Fundamentals also happens to be the book I recommend to people who are new to privacy law.)

Is Data Speech?

Jane Yakowitz Bambauer has a new article forthcoming in 66 Stanford Law Review __ (forthcoming 2014), titled “Is Data Speech?” Here’s the abstract:

Privacy laws rely on the unexamined assumption that the collection of data is not speech. That assumption is incorrect. Privacy scholars,
recognizing an imminent clash between this long-held assumption and First Amendment protections of information, argue that data is different from the
sort of speech the Constitution intended to protect. But they fail to articulate a meaningful distinction between data and other, more traditional forms of expression. Meanwhile, First Amendment scholars have not paid sufficient attention to new technologies that automatically capture data. These
technologies reopen challenging questions about what “speech” is.
This Article makes two bold and overdue contributions to the First Amendment literature. First, it argues that when the scope of First Amendment coverage is ambiguous, courts should analyze the government’s motive for regulating. Second, it highlights and strengthens the strands of First Amendment theory that protect the right to create knowledge. Whenever the state regulates in order to interfere with knowledge, that regulation should draw First Amendment scrutiny.
In combination, these theories show clearly why data must receive First Amendment protection. When the collection or distribution of data troubles lawmakers, it does so because data has the potential to inform, and to inspire new opinions. Data privacy laws regulate minds, not technology. Thus, for all practical purposes, and in every context relevant to the privacy debates, data is speech.

Fairmont Fail

Jane and I had a fantastic time at Eric Goldman’s Internet Law Works-In-Progress conference, but we’re facing an early flight, and our room at the Fairmont shaking with the noise from the ballroom level (one level down). It’s supposed to quiet down… at some point. Apparently Fairmont hasn’t figured out that conference folks might have different preferences from late-night partiers about proximity to noise. Next time, we’ll stay at a hotel that does. The Fairmont used to be our benchmark for thoughtful service. No more.

Cyberwar and Cyberespionage

My paper “Ghost in the Network” is available from SSRN. It’s forthcoming in the University of Pennsylvania Law Review. I’m appending the abstract and (weirdly, but I hope it will become apparent why) the conclusion below. Comments welcomed.


Cyberattacks are inevitable and widespread. Existing scholarship on cyberespionage and cyberwar is undermined by its futile obsession with preventing attacks. This Article draws on research in normal accident theory and complex system design to argue that successful attacks are unavoidable. Cybersecurity must focus on mitigating breaches rather than preventing them. First, the Article analyzes cybersecurity’s market failures and information asymmetries. It argues that these economic and structural factors necessitate greater regulation, particularly given the abject failures of alternative approaches. Second, the Article divides cyber-threats into two categories: known and unknown. To reduce the impact of known threats with identified fixes, the federal government should combine funding and legal mandates to push firms to redesign their computer systems. Redesign should follow two principles: disaggregation, dispersing data across many locations; and heterogeneity, running those disaggregated components on variegated software and hardware. For unknown threats – “zero-day” attacks – regulation should seek to increase the government’s access to markets for these exploits. Regulation cannot exorcise the ghost in the network, but it can contain the damage it causes.


Something terrible is going to happen in cyberspace. That may help.

The U.S. suffers serious but less visible cyberattacks daily. Complex technology, mixed with victims’ reluctance to disclose the scale of harms, leads to underappreciation of cyber-risks. This disjunction generates the ongoing puzzle of cybersecurity: the gap between dramatic assessments of risks the U.S. faces and minimalist measures the country has taken to address them. America’s predictions do not match its bets. One of those positions is wrong. But the economic and structural factors that impede regulation suggest reform will not occur without a dramatic focusing event.[1] The U.S. did not address its educational deficiencies in math and science until the Soviets launched Sputnik into orbit.[2] Until the near-meltdown at Three Mile Island, America was complacent about nuclear energy safety.[3] And it required the attacks of 9/11 for the country to address the rise in international terrorism, the gaps in its intelligence systems, and the weaknesses in aviation security.[4] This Article’s role is to sit on the shelf, awaiting with dread that focusing event. When it occurs, regulators will need a model for a response. This Article offers one.

Cybersecurity offers copious challenges for future research. Two are particularly relevant for this Article. First, data integrity is a difficult puzzle. Restoring data after attacks is unhelpful if one cannot tell good information from bad – we must be able to distinguish authorized updates from unauthorized ones. This seemingly technical puzzle has important implications for provenance in other areas, from rules of evidence to intellectual property, which struggle with similar authentication problems. Second, nation-states are now engaged in the long twilight struggle of espionage and hacking in cyberspace. At present, there are neither formal rules nor tacit norms that govern conduct. Eventually, though, countries must arrive at accommodations. Spying[5], assassination[6], and armed combat[7] all benefited from shared rules, even during the Cold War. Lawyers can raise awareness of these benefits and help shape the system that emerges. Future research can contribute to both these inquiries.

For now, ghosts roam the network. They cannot be driven out. We must lessen the effects of their touch.

[1] John W. Kingdon, Agendas, Alternatives, and Public Policies 165 (2003).

[2] Cornelia Dean, When Science Suddenly Mattered, in Space and in Class, N.Y. Times (Sept. 25, 2007),….

[3] Perrow, supra note 80, at 29-30.

[4] Thomas H. Kean et al., Final Report of the National Commission on Terrorist Attacks in the United States 254-65 (2004).

[5] Geoffrey B. Demarest, Espionage in International Law, 24 Denv. J. Int’l L. & Pol’y 321 (1996).

[6] Nathan A. Sales, Self-Restraint and National Security, 6 J. Nat’l Security L. & Pol’y 227, 249-50 (2012).

[7] Geoffrey S. Corn, Back to the Future: De Facto Hostilities, Transnational Terrorism, and the Purpose of the Law of Armed Conflict, 30 U. Pa. J. Int’l L. 1345, 1346-47 (2009).