Sharing Shortcomings

I have a new essay coming out in Loyola University Chicago Law Journal titled Sharing Shortcomings. Comments and feedback are very much welcomed. Here’s the abstract:

Current cybersecurity policy emphasizes increasing the sharing of threat and vulnerability information. Legal reform is seen as crucial to enabling this exchange, both within the public and private sectors and between them. Information sharing is due for some skepticism, though, and this Essay (part of a symposium on Privacy in a Data Collection Society) attempts to provide it. Not only are there few real legal barriers to data exchange, but greater sharing will generate little benefit and will create significant privacy risks. This Essay creates a typography of vertical and horizontal information sharing, and argues that while top-down communication could be useful, it faces important practical impediments. The present focus on sharing increases the scope of the surveillance state unnecessarily and displaces more effective cybersecurity policy measures.

Ground Control to Major Dumb

The St. Louis Cardinals, one of baseball’s most famous teams, is under investigation (by both Major League Baseball and the FBI) for allegedly hacking into a data warehouse compiled by the Houston Astros. At first blush, this seems strange: the Cardinals play in the National League Central, and the Astros in the American League West. While all teams compete, this isn’t a bitter divisional rivalry, such as between the Red Sox and Yankees. So why break in?

In this case, it appears it is personal. Jeff Luhnow, currently general manager of the Astros, was formerly a player development executive for the Cardinals. He left, and took his (money)ball with him to Houston. In this case, Luhnow set up a data warehouse called Ground Control for the Astros, which the organization uses to catalog player data and rate their prospects. (It seems to be going well: the Astros, previously a laughingstock, are in first place at the moment in their division.) For the Cardinals, he’d done something similar, creating a system called Redbird to play the same role. Cardinals executives appeared concerned that Luhnow had engaged in theft of trade secrets or confidential information about how to evaluate players algorithmically. So, it seems that the Cardinals people tried Luhnow’s old password from Redbird on Ground Control, and it worked.

As Deadspin brilliantly notes, there is a lot of stupid in the story as currently understood. First, Luhnow apparently didn’t bother to change passwords when he changed teams. (This may be the only case study ever in favor of password change requirements.) Second, the Cardinals hacker team broke into Ground Control from a team member’s home. Third, ESPN “legal analyst” Lester Munson makes a genuinely hilarious series of errors in his screed on Just for fun, let’s tackle those:

  • Lester claims it’s not clear that this activity (if as alleged) by the Cardinals is a crime. Dead wrong. It’s a clear violation of the federal Computer Fraud and Abuse Act, 18 U.S.C. 1030. Take a look at 1030(a)(2). For criminal liability, there are but three elements: 1) intentional access without authorization (or exceeding authorized access), to 2) a protected computer (defined as “used in or affecting interstate or foreign commerce or communication” by 1030(e)(2)(B)), and 3) obtaining information. The Cardinals folks (allegedly again) intentionally accessed Ground Control. Ground Control affects interstate commerce – that’s both the business of Major League Baseball, and being connected to the Internet – so it’s a protected computer. And the Cardinals retrieved information from Ground Control. That’s it. Lester claims that “the prosecutor must be able to show that the information was the work product of significant efforts by Astros officials and, more importantly, was not available elsewhere.” This is completely wrong. Lester appears to be channelling trade secret theft, which is 1) a state crime, not a federal one, under these circumstances, and 2) totally unrelated to computer crime statutes. (Texas has a state-based computer crime offense that prosecutors could charge, too. Check out Section 33.02 of Title 7 of the Texas Penal Code: “A person commits an offense if the person knowingly accesses a computer, computer network, or computer system without the effective consent of the owner.” Even easier to prove than the CFAA violation.) [Update 11:16AM: Of course, federal prosecutors could also charge under 18 U.S.C. 1832, which I just re-checked. It is frequently used against hacking to benefit non-US interests, but the language covers interstate commerce, too.]
  • Lester needs a refresher on how intent works in criminal law. Here he is again: “the prosecutor must be able to show that Cardinals executives knew they were committing a crime. If the Cardinals’ activity was just a dirty trick or an attempt at getting even with a former colleague, the hacking might not qualify as a crime.” NO NO NO. Ignorance of the law is no defense. You have to look at the applicable statute. For the CFAA, for example, the key is intentional access to a computer. That’s the mens rea element – the defendants don’t have to know anything about computer crime law. They jus have to have the intent to access a computer, and then carry out such access. This is Crim Law 101.
  • Lester doesn’t bother to consider criminal liability for trade secret theft under Texas law. Section 31.05(b) of the Texas Penal Code makes it a felony if: “without the owner’s effective consent, he knowingly: (1) steals a trade secret; (2) makes a copy of an article representing a trade secret; or (3) communicates or transmits a trade secret.” If Ground Control contains trade secrets (and I bet it does) and the Cardinals stole them, they can be liable under Texas law. Lester is even incorrect about prosecution here – you have to show that the thing stolen / transmitted is a trade secret, defined in 31.05(a)(4) as “the whole or any part of any scientific or technical information, design, process, procedure, formula, or improvement that has value and that the owner has taken measures to prevent from becoming available to persons other than those selected by the owner to have access for limited purposes.” The statute doesn’t state that the information must not be publicly available, although courts at times read in such a requirement based on common law precedent.
  • I’d also disagree with Lester’s conclusion that it’s a mistake for the FBI to tackle this intrusion. MLB is big business, and we’ve decided to have prosecutors go after computer hacking, especially when it’s big business. Sure, maybe we’d like the FBI to spend more time on hacking of government data and less on private firm attacks, but given where we are on hacking enforcement, there doesn’t seem anything improper about this investigation, especially since prosecution of executives of a famous baseball franchise would likely have significant deterrence effects.

The whole episode tastes of fail.

The Crane Kick and the Unlocked Door

Cybersecurity legislative and policy proposals have had to grapple with when (if ever) firms ought to be held liable for breaches, hacks, and other network intrusions. Current approaches tend to focus on the data that spills when bad things happen: if it’s sensitive, then firms are in trouble; if not personally identifiable, then it’s fine; if encrypted, then simply no liability. This approach is a little bit strange, by which I mean daft: it uses the sensitivity of the information as a proxy for both harm (how bad will the consequences be?) and precautions (surely firms will protect more sensitive information more rigorously?).

I propose a different model. We should condition liability – via tort, or data breach statute, or even trade secret misappropriation – based upon how the intruders gained access. Let’s take two canonical examples. One exemplifies the problem of low-hanging fruit – or, put another way, the trampling of the idiots. Sony Playstation Network (Sony is a living model for how not to deal with cybersecurity) apparently failed to patch a simple bug in their database server that was widely known (an SQL injection attack, for the cognoscenti). Arthur the dog would have patched that vulnerability, and he is a dog who is continually surprised to learn that farts are causally connected to his own butt. On the other hand, Stuxnet and Flame depended upon zero day vulnerabilities: there is, by definition, no way to defend against these attacks. They are like the Crane Kick from “The Karate Kid”: if do right, no can defense.

So why would we measure vulnerability based on data rather than precautions? The latter is a classic tort move: we look at whether the defensive measures taken are reasonable, rather than whether the harm that resulted is large. I would suggest a similar calculus for cybersecurity (ironic in light of software’s immunity from tort vulnerability): if you get pwned based on something you could have easily patched, then you’re liable for every harm that a plaintiff can reasonably allege. In fact, I’m perfectly happy with overdeterrence here: it’s fine with me if you get hit for every harm a creative lawyer can think of. But if your firm gets hit by a zero day attack against your Oracle database, you’re not liable. (There are some interesting issues here about who can best insure against this residual risk; I’m assuming that companies are not the best bearers of that risk.)

This leaves some hard questions: what about firms that have stupid employees who open e-mails loaded with zero day exploit code? We might need a more sophisticated analysis of precautions. How was your desktop A/V? Did you segment your network? Did you separate your data to make it harder to identify or exploit?

To take up one obvious objection: this scheme requires some forensics. One must determine why a breach occurred to fix liability. But: firms do this analysis already. They have to figure out how someone broke in. We can design rules to protect secrets such as network defenses, and any litigation is likely to take place months if not years after the fact. I think it’s unlikely that firms will be able to game effectively the system to show that intrusions resulted from impossible attacks rather than someone jiggling doorknobs to find unlocked ones. And, we could play with default rules to deal with this problem: companies could be liable for breaches unless they could show that attackers exploited unknown weaknesses. If we’re worried about fakery, we could require that firms prove their case to a disinterested third party, such as Veracode or Fireeye – companies with no incentive to cut a break to weak organizations. Or, we could set up immunity for firms that follow best practices: encrypt your data, patch known vulnerabilities in your installed software base, provide for resilience / recovery, and you’re safe.

I think we should differentiate liability for cybersecurity problems based on how the attackers broke in. Were you defeated by the Crane Kick? If so, then you get sympathy, but not liability. But if it turns out that you left the front door unlocked, then you’re going to have to pay the freight. We can’t expect miracles from IT companies, but it makes sense to require them to do the easy things.

In Memoriam: Greg Lastowka

I am deeply saddened to learn of the news of the death of my friend Greg Lastowka, a professor at Rutgers-Camden School of Law. Greg was a pioneer in studying virtual worlds and video games, and his work forms a good part of the foundation in that field. His work had that wonderful quality of the best scholarship: it was utterly new, and yet once you’d read it, you couldn’t believe that anyone had ever thought differently. It made the instant transition from novelty to accepted wisdom that only the best work achieves.

Far more important, though, was that Greg was a terrific person. He was wise and funny, and genuinely unassuming. He supported those, like me, who were junior cyberlaw scholars, and he could give insightful feedback to senior profs in a way that they could hear. He had a fascinating life – as a lawyer, as a member of the Peace Corps, as an academic, and as a husband and father. Scrolling through Facebook posts, I’d notice that Greg had added a new image of one of his sons’ latest pieces of artwork, or a new bit of software code one of them had written. His quiet pride in and love for his family was plain. And, in the face of a terrible diagnosis, Greg demonstrated remarkable courage, humor, and tenacity.

My heartfelt sympathies and condolences go out to his wife Carol, his sons Adam and Daniel, his colleagues at Rutgers-Camden, and his many friends.

Greg’s dean, John Oberdiek, has a moving tribute to him (with thanks to Larry Solum).

Is De-Identification Dead Again?

Earlier this year, the journal Science published a study called “Unique in the Shopping Mall: On the Reidentifiability of Credit Card Metadata” by Yves-Alexandre de Montjoye et al. The article has reinvigorated claims that deidentified research data can be reidentified easily. These claims are not new, but their recitation in a vaunted science journal led to a new round of panic in the popular press.

The particulars of the actual study are neither objectionable nor enlightening. The authors demonstrate that in highly dimensional databases (for example, those with a lot of variables that can take a lot of different values), each person in the database is distinguishable from the others. Indeed, each person looks distinguishable from the others based on just a small subset of details about them. This will not surprise anybody who actually uses research data because the whole point of accessing individual-level data is to make use of the unique combinations of factors that the people represented in the database possess. Otherwise, aggregated tables would do. What is surprising, however, is the authors’ bold conclusions that their study somehow proves that data anonymization is an “inadequate” concept and that “the open sharing of raw deidentified metadata data sets is not the future.” How Science permitted this sweeping condemnation of open data based on such thin evidence is itself a study in the fear and ideology that drives policy and scientific discourse around privacy.

What the de Montjoye Study Actually Demonstrated

The credit card metadata study used a database consisting of three months of credit card records for 1.1 million clients in an unspecified OECD country. The bank removed names, addresses, and other direct identifiers, but did nothing else to mask the data. The authors used this database to evaluate the chance that any given person is unique among clients in the database based on X number of purchase transactions. So, using an example from the paper, if Scott was the only person who made a purchase at a particular bakery on September 23rd and at a particular restaurant on September 24th, he would be unique with only two transactions within the database. The authors use these “tuples” (place-date combinations) to estimate the chance that a person in the database looks unique compared to the other data subjects. They found that 90% of the data subjects were unique in the database based on just four place-date tuples. And the rate of uniqueness increased if approximate price information was added to each tuple.

The authors treat database uniqueness and reidentifiability as one and the same. That is, the authors treat the chance that a person is unique in the dataset based on X number of tuples as the chance that the person can be reidentified.

I am sympathetic to the authors’ goal of finding concrete, a quantifiable measure of privacy risk. But database uniqueness should not be its measure. Measures of sample uniqueness systematically exaggerate the risk of reidentification. Consequently, any research and data sharing policy that relies only on sample uniqueness as the measure of re-identification risk will strike the balance of privacy and data utility interests in the wrong place.

Problem 1: Sample Uniqueness is Not Reidentification. (It’s Not Even Actual Uniqueness.)

The greatest defect in the Science article is treating uniqueness within a sample database as equivalent to “reidentification,” which the authors do several times. For example, the authors state that 90% of individuals can be “uniquely reidentified” with just four place-date tuples. I suspect that most readers interpreted the article and its subsequent coverage in the popular media to mean that if you know just four pieces of place-date purchase information for a person, you are 90% likely to be able to figure out who they are in the de-identified research database. But the authors did not come close to proving that.

The problem is that uniqueness in a deidentified research database cannot tell us whether the data subject is actually unique in the general population. The research database will describe only a sample of the population, and may be missing a lot of information about each of its data subjects. Inferring actual uniqueness from database uniqueness requires some extra information and modeling about what proportion of the population is sampled, and how complete the data about them is.

To give an extreme example, let’s go back to “Scott”—the credit card-holder who went to a bakery on September 23rd and a restaurant on September 24th. Suppose that his data was part of a research dataset that included the purchase histories of just ten credit card customers. Using this database on ten people, could we reliably say anything about whether Scott was the only person in his city to go to the bakery and the restaurant? Of course not. We may have a hunch that the city’s inhabitants are unlikely to go to this bakery and that restaurant on the same days that Scott did, but we’d be using our intuitions rather than the research data to draw our conclusions about uniqueness. Read more…

Privacy in a Data Collection Society

Jane and I are here with a great group of presenters and attendees at a conference at Loyola University Chicago School of Law, Privacy in a Data Collection Society. I’m speaking this afternoon on the folly of information sharing as a means of improving cybersecurity, and I’ll post a cleaned-up draft of my remarks here (hopefully, eventually to become an essay). And, I’ll try to post some ad hoc updates on what the speakers have to say.

Update 1: Here is Jane’s abstract:

All Life Is an Experiment. (Sometimes It’s a Controlled Experiment.)

What the Facebook Emotion Contagion Study Can Teach Us About the Policy and Public Perception of Research

Thesis: Our unexamined instincts about social science research lead us to craft laws and public opinions that are backwards. Our disapprobation and legal restrictions apply most strongly to research that is performed by academics and other neutral investigators, that is more methodologically sound, that distributes its burdens more evenhandedly, and that shares its insights with the general public.

Update 2: Meg Jones, on the Right to be Forgotten

  • Google v. Spain – Spanish newspaper had right to process information on Gonzales, but Google did not.
  • Google assesses individual’s claims under national law
  • Lauber / Werle v. Wikipedia – brothers convicted of murdering actor, and sought to have references to them removed from Web sites referring to the crime.
  • Martin v. Hearst – CT erasure stature nullified Martin’s arrest. She sued newspaper for publishing about her arrest. Second Circuit: newspaper’s truth is different from her truth.
  • Clash of values between Europe and U.S. over forgetting
  • [shows clip of Phineas and Ferb “Cyberspace Rules of the Road“]
  • Link rot and other ways that information disappears
  • Digital immortality? Internet is not the perfect memory we’re afraid of
  • Poll on whether Americans ought to have right to remove irrelevant information from search results (39% Yes, 21% No too hard to define, 18% No public record, 15% Yes only minors, 6% Yes except public figures)

Update 3: Felix Wu, How EU Right to be Forgotten Relates to US Law

  • Conventional wisdom: EU approach is crazy and would never work in US
  • Felix: less incompatible than we think, and the incompatibility is different than commonly believed
  • US does have areas where information is removed: Fair Credit Reporting Act (bankruptcies – 1o years)
  • Key is sectoral vs. over-arching approach
  • We would be surprised to see US adopt, as first omnibus right, a right to be forgotten
  • Why not adopt a sector-specific RtbF?
  • HIPAA – already specifies certain sensitive information where access is restricted (though HIPAA applies only to covered entities)
  • How to think about Google in this context? Is it a new sort of credit report?
  • Credit report is defined, in part, by use – Google is used for commercial and non-commercial purposes
  • Removal in certain contexts as intermediate step


  • Mention of data obscurity as term rather than RtbF! Hailing Woody Hartzog!
  • How do we know about periods of data retention by companies?

Update 4: Jane Bambauer, All Life Is an Experiment

  • Using Facebook emotional contagion study as vehicle for instincts and laws about research
  • Reactions most harsh when research most legitimate – we criticize academics far more than industry
  • Sanctions are strongest when study authors disclose results to public
  • Facebook’s alteration of scale of emotion in postings led to effect on postings by users seeing them
  • Why did this experiment engender controversy, rather than “poke to vote,” for example?
  • Objections to ethics of research: lack of informed consent, surreptitious intervention, violation of Common Rule
    • FB study undoubtedly violates FIPPs (respect for context)
    • God punishes King David with plague for taking a census – only God is to know that information
    • Good research requires repurposing data – Google has identified unreported side effects of drugs this way
    • Piketty repurposed tax data for his book on wealth distribution
  • Surreptitious manipulation of Newsfeed
    • Standard part of metrics-driven research
    • Bricks-and-mortar retail observes traffic to optimize shelf display
    • Individual physicians may select among equally effective treatment options for each patient – may be useful to formalize the experiment since it has better controls
    • Sunstein’s “50 Shades of Manipulation” – promotes self interest of manipulator, and designed to bypass cognitive reasoning
    • Downstream use of research can fit within this definition, but the research itself does not – it’s a cost to the company, and the company does not know if it bypasses reasoning
    • How do we know status quo is preferable?
    • Research is less self-serving when it’s shared publicly
    • Researchers at Cornell were the ones who took the real hit, but Cornell’s IRB says it’s in compliance
    • Even if their research was not exempt from IRB review, it would have qualified for expedited review and exemption from informed consent
    • The most legally exposed people were the researchers, not FB or the journal
  • Problematic outcomes
    • Companies are at a disadvantage when they work with neutral / academic researchers
    • Firms are benefited when they avoid formally testing hypotheses and assumptions using randomized control trials
    • It’s safer to avoid sharing results with media / public
  • Sensible to reform Common Rule
    • Require IRB review when intervention would create physical or legal risk if performed for non-research purposes

Brett Frischmann – Being Human in the Twenty-First Century: How Social and Technological Tools are Reshaping Humanity

  • Machines and technologies steer us in ways that make us increasingly predictable and manipulable, and ultimately less human
  • Post-WWII: concerns about computers overtaking humans – Turing test as exemplar
  • We want to be humans who use computers, not humans who are computers
  • When does technology replace or diminish our humanity? Can we detect it?
  • Hard definitional baseline – what is human?
  • Interconnected sensor networks, Internet of Things, Big Data will expand scale / scope of human engineering – ubiquity is key
  • Technology / humanity are abstract and complex
  • 3 parts to project
    • Humans and tools – technological dehumanization
    • Human-focused Turing-type tests
    • Applications (critique of nudging) – each incremental nudge can be justified, but path of nudging itself may be unjustifiable
  • Focus is techno-social engineering of humans: influence, manipulate, construct
  • Internet has transformed environments within which we live our lives
  • Demand for Big Data is dependent upon sensors on / around humans
  • IBM’s Watson as an example of technology approaching Turing line
  • Brett is interested in whether humans are approaching Turing line – conditions under which they’re indistinguishable from a machine
    • What happens if human passes test and appears machine? Consequences?
    • On-line contracting: designed to nudge you to click I Agree


Deven Desai – Associational Freedom and Data Hoarding

  • FBI has stated preference for using warrant for GPS tracking
  • Concern for associational freedom and interplay with Fourth Amendment
  • Freedom to develop ideas before speaking – vital to self-governance
  • Sedition Act criminalizes speech and assembly separately
  • Meet-ups and activists are current incarnations of assembly concerns – fear of backward-looking surveillance
  • Protect precursors to speech
  • Bugging in public places undercuts associational freedom
  • Digital data can be hoarded, and lack of rules on law enforcement use leads inexorably to accumulation
  • Key limits
    • Duration
    • Minimization
    • Apply limits retrospectively as well for searches in data troves
    • Return – government must return or delete data


Helen Nissenbaum – Big Data’s End Run Around Informed Consent

  • Full title of paper: Anonymity and Consent
  • Big Data: epistemological paradigm – faith in power of data to produce knowledge
  • Ethics of big data – what happens when the data is about individuals?
    • Anonymity breaks link between data and identifiable individual
    • Thesis: big data poses insurmountable challenges to anonymity and consent – renders them ineffective in quest to privacy
  • Notice & consent enshrined in U.S. privacy regulation (FIPPS, GLBA, FERPA, VPPA, GLB, notice and opt-out requirements)
    • Require consent from subjects if one deviates from substantive rules
    • Notice and choice regime of ToS online
    • GLBA gives you very little chance to opt-out
    • Critiques of notice and consent as theoretical matter and in operational challenges
  • Challenges to N&C increasing
    • More actors, information, flow
    • Impossible to predict future uses or consequences
  • Transparency dilemma: impossible to have a policy that is both comprehensible and comprehensive
  • Public lives of others: inferences based on network analysis, social networks, representative sample
  • Informed consent may have to be abandoned, which is acceptable because informed consent is a means rather than an end (which is privacy)
  • Privacy as control over information is wrong definitional approach
  • Instead, privacy as contextual integrity
    • Ideal informational norms: settle competing interests / preferences / desires best; promote ethical and political values; promote context-specific ends and values for social integrity
  • Patient consent operates as permission for limited departures from standards / expectations
  • Key role of background assumptions and societal constraints
  • Privacy policies should shrink in importance, and societal limitations should wax in importance, in terms of constraining information flow

Against Jawboning

I’d be grateful for feedback on a new draft article, Against Jawboning, coming out in volume 100 of the Minnesota Law Review. Here’s the abstract:

Despite the trend towards strong protection of speech in U.S. Internet regulation, federal and state governments still seek to regulate on-line content. They do so increasingly through informal enforcement measures, such as threats, at the edge of or outside their authority – a practice this Article calls “jawboning.” The Article argues that jawboning is both pervasive and normatively problematic. It uses a set of case studies to illustrate the practice’s prevalence. Next, it explores why Internet intermediaries are structurally vulnerable to jawboning. It then offers a taxonomy of government pressures based on varying levels of compulsion and specifications of authority. To assess jawboning’s legitimacy, the Article employs two methodologies, one grounded in constitutional structure and norms, and the second driven by process-based governance theory. It finds the practice troubling on both accounts. To remediate, the Article considers four interventions: implementing limits through law, imposing reputational consequences, encouraging transparency, and labeling jawboning as normatively illegitimate. In closing, it extends the jawboning analysis to other fundamental constraints on government action, including the Second Amendment. The Article concludes that the legitimacy of informal regulatory efforts should vary based on the extent to which deeper structural limits constrain government’s regulatory power.

The Antidote for “Anecdata”: A Little Science Can Separate Data Privacy Facts from Folklore

Guest post by Daniel Barth-Jones

For anyone who follows the increasingly critical topic of data privacy closely, it would have been impossible to miss the remarkable chain reaction that followed the New York TLC’s (Taxi and Limousine Commission) recent release of data on more than 173 million taxi rides in response to a FOIL (Freedom of Information Law) request by Urbanist and self-described “Data Junkie” Chris Whong.  It wasn’t long at all after the data went public that the sharp eyes and keen wit of software engineer Vijay Pandurangan detected that taxi drivers’ license numbers and taxi plate (or medallion) numbers hadn’t been anonymized properly and could be decoded due to the failed encryption process.

Soon after Pandurangan’s revelation of the botched unsalted MD5 cryptographic hash in the TLC data, Anthony Tockar, working on a summer Data Science internship with Neustar,  posted his blog “Riding with the Stars: Passenger Privacy in the NYC Taxicab Dataset” with the aim of introducing the concept of “differential privacy” and announcing Neustar’s expertise in this area. (It’s well worth checking out both Tockar’s short, but informative, tutorial on differential privacy and his application of the method to the maps of the TLC taxi data as his smartly designed graphics allow you interactively adjust differential privacy’s “epsilon” parameter and see its impact on the results.)

To illustrate possible rider privacy risks for the TLC taxi-data, Tockar, armed with some celebrity paparazzi photos and some clever insights as to when, where and how to find potential vulnerabilities produced a blog post replete with attention grabbing tales of miserly celebrities who stiffed drivers on their tips and cyber-stalking strip club patrons, which quickly went viral. And so as to up the fear, uncertainty, and dread (FUD) factors surrounding his attacks, Tockar further gravely warned us all in his post that:

Equipped with this [TLC Taxi] dataset, and just a little auxiliary information about you, it would be quite trivial for someone to follow your movements, collecting data on your whereabouts and habits, while you remain blissfully unaware. A stalker could find out where you live and work. Your partner may spy on you. A thief could work out when you’re away from home, based on your habits.

However, as I’ll explain in more detail, sorting out these quite concerning claims in a rational fashion which will enable us to consider complex decisions about the possible trade-offs between Freedom of Information and open government principles and data privacy concerns requires that we move beyond mere citation of anecdotes (or worse, collections of anecdotes in which carefully targeted and especially vulnerable, non-representative cases have been repackaged as “anecdata”). Instead, we must base our risk assessment in a systematic investigation appropriately founded in the principles of scientific study design and statistically representative samples. Regrettably though, this wasn’t the case here and has quite often not been the case for many headline snatching re-identification attacks that have repeatedly made the news in recent years.

Read more…

Big Pharma: the New Hustler

That’s the provocative thesis of Jane’s post over at Balkinization for the conference Public Health in the Shadow of the First Amendment. Worth a read! And here’s her second post.

The Cambridge University Press decision and Educational Fair Use

The Eleventh Circuit released its 129-page opinion in Cambridge University Press v. Patton (which most of us probably still think of as the Becker case) last Friday. Although the appeals court reversed what I thought was a pretty solid opinion of the district court upholding Georgia State University’s practice of distributing digital “course packs” of reading materials to its students, it is very far from a big win for the publishers who challenged the practice. There is a lot to like in the opinion for advocates of educational fair use, and it is difficult to imagine that the district court on remand will rule in favor of the publisher plaintiffs with respect to very many of the works at issue even though the appeals court directed changes in some aspects of its fair use analysis. Although it found some errors in the district court’s treatment of the second and third fair use factors, the appeals court sensibly and correctly rejected several arguments that would have materially constricted the scope of educational fair use in the digital arena. (Full disclosure: I joined Jason Schultz’s excellent amicus brief on behalf of Georgia State.)

Although the Court of Appeals’ opinion deserves a close look, I’ll confine myself here just to noting a few highlights. Read more…