- Why Libraries Matter
I’ve written up a piece on Medium on why libraries matter — you can find it here:
Vital parts of the Web are censored, poisoned, and lost amidst truthiness. Libraries are our unusual defense.
With thanks to Knight Foundation for its new Library Challenge.
- Righting the right to be forgotten
The F-T just published a piece I wrote about the implementation of the right to be forgotten in Europe. Here is a draft from which the op-ed was drawn:
Last week Google formally launched a blue-ribbon committee of advisors to help it implement the European Court of Justice’s new “right to be forgotten.” Its work is cut out for it, as the search giant processes more than 70,000 requests since May to decouple a claimant’s name from possibly true but still “irrelevant” (and presumably reputation-damaging) search results. Turning theory into practice has revealed unanswered questions – and some outright flaws – in the Court’s decision, regardless of where you might stand on the right’s philosophical merits.
The first puzzle is transparency. Other types of compelled redactions, such as for alleged copyright infringement, occasion a notification to searchers that results have been altered. But a specific notice that a search on someone’s name is missing something could lead to a negative inference about the person even worse than the substance of whatever has been removed. So how to report on compelled takedowns in a way that is neither Orwellian nor self-defeating?
One idea is for Google and other affected search engines to contribute to a database of takedowns that independent academics can analyze in order to produce credible insights about how the new right is working in practice. Are public figures looking to scrub their records to avoid scrutiny, or are the requestors more often private citizens? Are the takedowns focusing on content within obscure Web-originating message boards, or on archives of government records or newspaper articles? Without a record of takedowns, there will be no way to understand how the use and impact of the right are unfolding.
The second puzzle is accountability. With Google’s European market share around 90%, name-specific content that’s delisted might as well be gone entirely – indeed, it’s Google’s power that makes the assertion of the right meaningful. But here state power is being exercised without the involvement of the state: a request is made of Google for a redaction, and Google decides how to handle it. If the request is denied, the claimant might escalate the issue to his or her local data protection authority. But if the request is agreed to, there’s no means for review. Under the Court’s decision, the public’s right to know is to be balanced against a claimant’s right to privacy – but there’s no easy way for the public to remonstrate against poor balancing.
That should change, and there is an admirable start: Google has begun alerting affected sites when content has been taken down. Thus, BBC and Guardian reporters could disclose last week, disapprovingly, that some of their articles had been eliminated from some Google searches. The ensuing controversy resulted in Google restoring some of the links. The sites can thus stand in for the public, objecting to overly broad takedowns so long as they know that they’re taking place. That’s why Google’s decision to notify creators of content that’s at issue is vital to achieving the Court’s stated purpose, rather than a subversion of it, as some have alleged. But not every affected site enjoys the platform of a major newspaper or state-funded broadcaster. A more comprehensive solution would be for sites to be able to answer the original takedown request before Google even makes a decision, and to have standing to appeal an adverse determination the way that original claimants can – something that the Court itself would have to bless.
But once we’ve gone so far as to allow a properly adversarial process in deciding upon takedowns, we highlight the incongruity of having Google – or any private party, for that matter – as a decision maker about rights. To place Google in that role is to diminish Europe’s sovereign power, not enhance it, even if the role is compelled by European authorities. It turns a rights problem into a customer service issue, and one that Google and others in its position no doubt rightly disdain. If Google can process 70,000 requests, so can and should the data protection authorities. And not every public decision needs the full, lawyer-heavy trial format to be sufficient to the cause – any more than Google is using it now.
This would place decisions about rights in the public sphere where they belong, and limit the scope to the sovereign’s jurisdiction, so a European decision would still not affect use beyond the relevant country-specific Google portals.
Finally, the Court needs to recognize that the Web is protean. Sites and content change, including such ever-evolving pages as Wikipedia biographies, which means that a decision rendered at a point in time may lose its rationale later on – just as the Court acknowledges that something that was once relevant could become irrelevant over time, and thus subject to a takedown. Its argument cuts both ways. One way to deal with this is for redaction decisions to be limited in time. Successful claimants should register and maintain an email address for a reminder that a redaction is about to expire. Prior to expiration a claimant should have to seek to renew the redaction. That way the memory hole is temporary rather than permanent – and a redaction must be justified to account for changing circumstances.
Those who are against the right to forget in the first place should be cheered to see its first uncertain implementation pared back. And those who favor it should want to get it right – especially as, troublingly, there may be more types of requests, from more sources, to come. Such a treacherous path cannot be navigated without the transparency and accountability that we have come to demand of the sovereigns who govern us.
- Time capsule crypto can help us commit our secrets to history
More than a decade ago, researchers at Boston College interviewed people from both sides of the Troubles in Northern Ireland, promising each contributor to the “Belfast Project” that his or her interview recording wouldn’t be released until the contributor died. In the meantime, the tapes would be deposited at the College’s rare books library under lock and key. On the basis of those promises, some people spoke for the first time about painful actions that remain murky in the public eye, including unsolved murders arising from the conflict that they’d helped commit or cover up.
When the British government learned of the Belfast Project about ten years later, it invoked a mutual legal assistance treaty to demand immediate access to some of the tapes. After months of legal wrangling, some of the tapes were turned over, resulting in the arrest last month of Sinn Féin leader Gerry Adams for involvement in one of the killings discussed in the interviews. Adams was released, but Northern Ireland officials are now seeking the entire set of interviews – perhaps to balance inquiry into the Irish Republic Army with investigation of possible crimes by members of the Ulster Volunteer Force as well.
Libraries like Boston College’s are familiar with making promises about the “dark archiving” of materials like these, whether for the papers of a Supreme Court Justice, an interview with a soldier ready to give a sustained look at the conduct of war, or the records of the university’s own faculty and students. But just as it has become easier to quietly maintain such records, the reach of the subpoena has also increased. These records are more accessible and searchable than ever, whether for intelligence or law enforcement purposes, or to benefit a party to a divorce or other private lawsuit.
Those anxious about the increasing use and scope of legal pressure against archives include researchers, librarians, and journalists who point out the value of protecting sources who wish to make a record for posterity, and the difficulties of ever procuring documents and interviews from those sources if the fruits are only one subpoena away from disclosure. On the other side include those who simply want to solve awful crimes and have those behind them made to answer on the law’s timetable rather than their own.
Both sides of the debate around overriding a promise of confidentiality share an assumption: that there are records that can be accessed upon a judge’s order that might solve a crime or meet some other vital purpose – whether or not that access betrays a promise of confidentiality to the people who made those records possible. The Belfast Project is simply a sharp and high profile example of an issue that reaches into the lives of nearly every institution integrated into the digital world – and us, since we are those institutions’ users.
Corporations are increasingly aware of the fact that whatever they store is discoverable through judicial process – or all too leakable by a disgruntled employee. That’s why any business beyond mom and pop tends to have a formal document “retention” policy for its internal secrets – which is in fact a document destruction policy, intended to ensure that the business regularly deletes its mountains of accrued bits. It’s more complicated when those businesses are merely custodians of their customers’ data. Google, Facebook, and Microsoft are routinely caught in the middle when, for example, Brazilian authorities demand information about a subscriber and don’t want to use the cumbersome mutual legal assistance treaty process to get it. The Brazilians threaten penalties for holding back information that American law may insist not be disclosed – or vice versa. And the public has been inundated with descriptions of the U.S. government’s mining of digital databases for foreign intelligence – in large part thanks to a leak of the government’s own materials.
Are we stuck with either having to destroy our secrets or leave them exposed to near-instant disclosure? It might be possible to split the difference: to develop an ecosystem of contingent cryptography for libraries, companies, governments, and citizens. Instead of using new technologies to preserve for ready discovery material that might in the past never have been stored, nor deleting everything as soon as possible, we can develop systems that place sensitive information beyond reach until a specified amount of time has passed, or other conditions are met. There has been fitful research done on “time capsule cryptography,” by which something can be encoded so that not even its creator can access it until a certain amount of time – usually represented by the kinds of “proof of work” puzzles requiring vast computing power that undergird the operation of bitcoin and other cryptocurrencies. Cryptocurrencies uses these puzzles to prevent any one entity from taking over the distributed operation of the currencies, thereby falsifying the records of who’s given what to whom. What works to prevent any one party from subverting a currency could also place some of the data increasingly comprising our lives beyond the reach of a simple subpoena, by forcing the curious to wait a designated period of time before they can see what they want – whether or not they have legal paperwork purports to entitle them to it sooner.
Even without relying on such complicated technologies, sensitive material can be encrypted using a key that is split into fragments, the way that it can take two simultaneous keys to launch a missile. Imagine key fragments distributed around the world to, say, ten parties, requiring the cooperation of at least six of them to reassemble the key needed to get the documents. The parties would be instructed only to announce the keys when the original owner’s specified conditions are met. Early disclosure wouldn’t be impossible, but it would require a sustained effort that would only be worth undertaking if the access were a genuine priority, and one justifiable to the authorities of several countries who could each in turn pressure their respective keyholders. That kind of encryption is easy to do, and it can further be used to provide decent assurances that the material encrypted has not been altered in any way since it was first locked up.
The original conception of a trust company was as a firm that would solemnly represent the interests of its beneficiaries – which is why a bank worthy enough to entrust one’s savings to might also be worth entrusting decisions about a child’s college fund to in the event that the parents became incapacitated. Banks may not be among the most trusted institutions today, but libraries are – and they can together embrace a new generation of encryption technologies to safeguard materials that otherwise will never be created or saved for fear of early discovery. Imagine if the records of private firms, government agencies, and individuals from earlier eras were coming free now as trustees combined their keys to release them as time passed or other conditions were met. (In the case of Boston College’s promises, it might be that a keyholder would commit to publish its part of a key only upon the announcement of the death of a Belfast Project interviewee.) As a trust-restoring measure, secrets about government intelligence gathering could themselves be subject to time capsule accountability by those governments. Some actions today might reasonably remain secret – but with a guarantee that they will be revealed at a later date certain, even if the government in question feels later regret over entering into the bargain.
The last refuge of privacy cannot be placed solely in law or technology. It must repose in both, and a thoughtful combination of the two can help us thread a path between having all our secrets trivially discoverable and preserving nothing for our later selves for fear of that discovery.
[A version of this piece has been adapted for the Boston Globe.]
- The ten things that define you
I’ve written an op-ed for the New York Times about the European Court of Justice’s ruling finding a “right to be forgotten.” After that and my initial blog post in reaction to the court’s ruling, I wanted to share some further thoughts on this fascinating and potentially far-reaching development.
First, a refresher on the facts:
A man named Mario Costeja González objected that a Google search on his name turned up two foreclosure announcements published in a newspaper from 1998 seeking buyers of his property to satisfy unpaid debts — debts that were apparently genuine, but that were old enough that, in his view, they should remain obscure rather than a quick search away.
The court agreed, in a ruling and press release that noted, with his name, the very facts that Mr. González sought to bury. That oddity points to a subtlety in the court’s holding: for the first time, the legal problem isn’t in the availability of material on the Web, but rather in its searchability.
So the court implies that Google should be ready to remove links specific to searches on an objecting person’s name. How will it know whether to go ahead and remove the information? Well, says the court,
if it is found, following a request by the data subject [...], that the inclusion in the list of results displayed following a search made on the basis of his name of the links to web pages published lawfully by third parties and containing true information relating to him personally is, at this point in time[...] appears, having regard to all the circumstances of the case, to be inadequate, irrelevant or no longer relevant, or excessive in relation to the purposes of the processing at issue carried out by the operator of the search engine, the information and links concerned in the list of results must be erased.
Adds the court:
[I]t should in particular be examined whether the data subject has a right that the information relating to him personally should, at this point in time, no longer be linked to his name by a list of results displayed following a search made on the basis of his name. In this connection, it must be pointed out that it is not necessary in order to find such a right that the inclusion of the information in question in the list of results causes prejudice to the data subject. [...]
[These] rights override, as a rule, not only the economic interest of the operator of the search engine but also the interest of the general public in finding that information upon a search relating to the data subject’s name. However, that would not be the case if it appeared, for particular reasons, such as the role played by the data subject in public life, that the interference with his fundamental rights is justified by the preponderant interest of the general public in having, on account of inclusion in the list of results, access to the information in question.
This is coherent in theory — the court is trying to balance competing values — but it seems nearly hopeless in practice. It’s tricky enough to ask that search engines eliminate links to allegedly copyright-infringing material — too often the party demanding the deletion isn’t really describing an infringement and isn’t even the party holding the copyright, and search engines are poorly positioned to judge. Figuring out what’s “inadequate, irrelevant or no longer relevant,” is an unanchored standard, and I imagine that, to be safe, Google will just start eliding nearly anything on request — especially if it will owe damages if a court later finds it blew the balancing. It’s even more complicated when the complexities of implementation of ECJ decisions throughout the EU’s respective state court systems is taken into account. That’s what makes me much less sanguine than, say, the author of this CNN opinion piece placing a lot of weight on the court’s balancing test to vindicate genuine free speech interests. If the court is serious about seeing this test applied, perhaps, as Alex Karman suggests, aggrieved people should make a stop at the courthouse first, having a judge review the request and then make an order to Google. That could also help create a formal record of takedowns — after all, as the ECJ decision says, something formerly relevant could become irrelevant, but the opposite is also true: something irrelevant could become relevant, such as when a private figure becomes a public one. How to restore those relevant disappeared search results?
Early reports suggest lots of understandable interest by Europeans seeking line item vetoes on search result pages. (Indeed, people in other countries will start wanting it, too.) As my colleague Samuel Klein points out, Google could even be caught in the middle as spurious requests are made for removal — what happens for those who discover that the search results that reflect best upon them have been removed at the request of a mischief-making imposter? If Google limits these redactions to those accessing it from Europe, will Americans need to codge access from a European IP address to check to see what’s been wrongly redacted in their name?
All of this might be reason to rue the court’s decision and be done with it.
Except: What are the ten things that most define you in the eyes of others? That would be the ten organic links at the other end of:
Google enjoys 93% market share in Europe. If you want to learn about a stranger, you search on his or her name, and if you’re searching, you’re using Google.
And that is why I found myself ruminating on the idea I unpack in the NYT op-ed. That landing page on a search for someone’s name has outsized importance. Our only solace in the status quo is that what appears there is largely untouched by human hands, for better or for worse — Google spits out whatever, in its inscrutable AI wisdom, is “relevant” to the words your name comprises. But given the special status of that page to the people whose names are represented by the search terms, there might be something worthwhile to appear there that isn’t just ten links out of the Google sorting hat. The second page — you know, the one with links 11-20 that might as well be in Siberia — could contain the unadulterated search. We’re already trained to expect some smarter processing by Google and Bing when we are searching for flights, or shoes, the weather, or even how many centimeters are in 42 inches. House ads can appear, and, of course, precious sponsored links.
To include a free”house ad” by the people implicated by a search on their name — like the free credit report they’re entitled to, along with a shot at correcting inaccurate information held by a credit bureau — would do far less violence to search engines’ business models, and more important, their integrity, than the court’s current decision. When a single corporate actor becomes the gatekeeper for our identities, using formulas it can’t fairly be asked to reveal, there’s reason to think something more might be offered. Without taking into account the meaning of that landing page to the identity and reputation of the person searched, the AI will simply get better on its own terms — and perhaps the next refinement of “relevance” will be to assemble political donations, arrests, home address, and kids’ names all on that first landing page. That public data is all typically available with a few searches, a level of practical obscurity we may realize we value only if it, too, vanishes. It’s worth thinking more broadly about this before that happens.
Additional recommend reading: Zeynep Tufekci on the controversy.
- Is the EU compelling Google to become about.me?
Today the EU’s highest court interpreted the EU’s 1995 Data Protection Directive to mean that individuals should have a shot at insisting that Google and other search engines remove certain search results found upon a search for their names, not because they are false, or infringe copyright, but because they violate a “respect for private life” or a “right to protection of personal data.” What does that mean specifically? Not easy to say. Neither the opinion nor the Court’s press release is clear on that. Among the many cases pending about it, the one that the Court heard involved a Spanish citizen who did not like that people could find the public records of a foreclosure sale of one of his properties. So that’s not personal, secret information that was somehow uncovered; it’s a public record or fact made more searchable. And it’s not in the narrow category of things like social security numbers that might be in public documents, but for which Google and other search engines have taken some steps to make them not work as search terms. (Same with credit card numbers.)
In fact, I can’t tell if the Spanish citizen actually won anything. The Court’s own press release names him, and the fact that he at one point owed so much money that he had a property foreclosed. Not only does that illustrate the Streisand Effect, giving attention to exactly the thing he wanted to keep private, but more important, it appears to show that the Court doesn’t see a problem with publishing the very data it thinks sensitive enough to be worthy of an entirely new category of protection.
The answer might lie in the limits of the ruling: it appears that the idea is not to remove certain indexed Web pages, such as public records, from a search engine entirely, but rather only to give people a shot at removing that which appears as a search result under their names. So a document called “Jonathan Zittrain foreclosure of 123 Main St.” might be (if I were an EU citizen) ripe for removal as a result under “Jonathan Zittrain,” but not under “123 Main St. foreclosure.”
Is this terrible for search engines? It’s not great, since it will mean more work implementing a sort of notice-and-takedown regime of the kind that’s become commonplace for copyright — which is already more tolerable than somehow having to proactively police the search engine’s contents for information that might be subject to this newly articulated right. Where before Google and others could turn away people not happy about old foreclosures being tied to their names in searches, now those people can petition Google, which is somehow to perform a balancing test of the interest in the person in her privacy versus the interest in the public at large finding the indexed Web page she wants removed. Exactly who’d be qualified to do that I don’t know, and if the penalty for getting it wrong means lots of litigation and eventually potentially money damages — though not clear if that’s on the table — then I could see search engines coming to remove anything from a list of search results under a name for the person requesting it to avoid further trouble. Hence a search for a name becomes more like an about.me page — curated by the person named, or in the case of “John Smith,” the 5.2 million John Smiths out there (adjusted, I guess, by how many are in the EU).
As a process matter, big search engines won’t have to shut down over this. It will cost them, but so do lots of things. Would it amount to anything more than a fig leaf? Part of that depends on whether an order under this new system is one that must apply to google.com as well as, say, google.es and the rest of the EU-localized googles. Must results in google.com must be geo-filtered when displayed in the EU (call it the YouTube model, where certain videos at universal youtube.com links are withheld from certain jurisdictions), or is filtering in google.es enough (call it the google.de model, where links to neo-Nazi speech are removed from google.de but not from google.com, even when the user of google.com is in Germany)? In the German case, the government’s proscriptions against neo-Nazi speech are perhaps symbolic: that’s why the German governments stops at demanding that Google remove these results only from google.de.
Another procedural matter: the Court says it can weigh in on Google’s behavior because Google is selling ads targeted at EU customers, and has boots on the ground (a corporate subsidiary, servers, salespeople, etc.) in Europe. What if a more modestly scaled search engine like DuckDuckGo were to index the same information at issue in this case? It might be that the Court wouldn’t ask it to do what Google is being asked to do. So there could be an odd regulatory arbitrage by smaller search engines that want to make available exactly the information that Google and Bing may be told they can’t. The Court is likely wise to stop where it does if it’s going to get started at all, though — consider a much more lurid right-to-be-forgotten case in which two German murderers sought to have facts about their deeds expunged from Wikipedia. (You can read about it on … Wikipedia.)
As a substantive matter, it really joins the battle that Viktor Mayer-Schoenberger has been following in his book Delete: whether true but regrettable facts in the public domain are something that a person should be able to control. I’m skeptical of allowing such a right, even as we must acknowledge that pre-search engines there were tons of facts like this that were, in effect, deleted or unfindable over time. So we can see restrictions as some effort to approve of, and restore, a status quo of circa. 1995. The most important harm of this decision is not to the search engine companies, but to the public at large, and its ability to find accurate public information.
Update: I’ve put down some further thoughts in this NYT op-ed and this followup blog post.