- Why Libraries Matter
I’ve written up a piece on Medium on why libraries matter — you can find it here:
Vital parts of the Web are censored, poisoned, and lost amidst truthiness. Libraries are our unusual defense.
With thanks to Knight Foundation for its new Library Challenge.
- Time capsule crypto can help us commit our secrets to history
More than a decade ago, researchers at Boston College interviewed people from both sides of the Troubles in Northern Ireland, promising each contributor to the “Belfast Project” that his or her interview recording wouldn’t be released until the contributor died. In the meantime, the tapes would be deposited at the College’s rare books library under lock and key. On the basis of those promises, some people spoke for the first time about painful actions that remain murky in the public eye, including unsolved murders arising from the conflict that they’d helped commit or cover up.
When the British government learned of the Belfast Project about ten years later, it invoked a mutual legal assistance treaty to demand immediate access to some of the tapes. After months of legal wrangling, some of the tapes were turned over, resulting in the arrest last month of Sinn Féin leader Gerry Adams for involvement in one of the killings discussed in the interviews. Adams was released, but Northern Ireland officials are now seeking the entire set of interviews – perhaps to balance inquiry into the Irish Republic Army with investigation of possible crimes by members of the Ulster Volunteer Force as well.
Libraries like Boston College’s are familiar with making promises about the “dark archiving” of materials like these, whether for the papers of a Supreme Court Justice, an interview with a soldier ready to give a sustained look at the conduct of war, or the records of the university’s own faculty and students. But just as it has become easier to quietly maintain such records, the reach of the subpoena has also increased. These records are more accessible and searchable than ever, whether for intelligence or law enforcement purposes, or to benefit a party to a divorce or other private lawsuit.
Those anxious about the increasing use and scope of legal pressure against archives include researchers, librarians, and journalists who point out the value of protecting sources who wish to make a record for posterity, and the difficulties of ever procuring documents and interviews from those sources if the fruits are only one subpoena away from disclosure. On the other side include those who simply want to solve awful crimes and have those behind them made to answer on the law’s timetable rather than their own.
Both sides of the debate around overriding a promise of confidentiality share an assumption: that there are records that can be accessed upon a judge’s order that might solve a crime or meet some other vital purpose – whether or not that access betrays a promise of confidentiality to the people who made those records possible. The Belfast Project is simply a sharp and high profile example of an issue that reaches into the lives of nearly every institution integrated into the digital world – and us, since we are those institutions’ users.
Corporations are increasingly aware of the fact that whatever they store is discoverable through judicial process – or all too leakable by a disgruntled employee. That’s why any business beyond mom and pop tends to have a formal document “retention” policy for its internal secrets – which is in fact a document destruction policy, intended to ensure that the business regularly deletes its mountains of accrued bits. It’s more complicated when those businesses are merely custodians of their customers’ data. Google, Facebook, and Microsoft are routinely caught in the middle when, for example, Brazilian authorities demand information about a subscriber and don’t want to use the cumbersome mutual legal assistance treaty process to get it. The Brazilians threaten penalties for holding back information that American law may insist not be disclosed – or vice versa. And the public has been inundated with descriptions of the U.S. government’s mining of digital databases for foreign intelligence – in large part thanks to a leak of the government’s own materials.
Are we stuck with either having to destroy our secrets or leave them exposed to near-instant disclosure? It might be possible to split the difference: to develop an ecosystem of contingent cryptography for libraries, companies, governments, and citizens. Instead of using new technologies to preserve for ready discovery material that might in the past never have been stored, nor deleting everything as soon as possible, we can develop systems that place sensitive information beyond reach until a specified amount of time has passed, or other conditions are met. There has been fitful research done on “time capsule cryptography,” by which something can be encoded so that not even its creator can access it until a certain amount of time – usually represented by the kinds of “proof of work” puzzles requiring vast computing power that undergird the operation of bitcoin and other cryptocurrencies. Cryptocurrencies uses these puzzles to prevent any one entity from taking over the distributed operation of the currencies, thereby falsifying the records of who’s given what to whom. What works to prevent any one party from subverting a currency could also place some of the data increasingly comprising our lives beyond the reach of a simple subpoena, by forcing the curious to wait a designated period of time before they can see what they want – whether or not they have legal paperwork purports to entitle them to it sooner.
Even without relying on such complicated technologies, sensitive material can be encrypted using a key that is split into fragments, the way that it can take two simultaneous keys to launch a missile. Imagine key fragments distributed around the world to, say, ten parties, requiring the cooperation of at least six of them to reassemble the key needed to get the documents. The parties would be instructed only to announce the keys when the original owner’s specified conditions are met. Early disclosure wouldn’t be impossible, but it would require a sustained effort that would only be worth undertaking if the access were a genuine priority, and one justifiable to the authorities of several countries who could each in turn pressure their respective keyholders. That kind of encryption is easy to do, and it can further be used to provide decent assurances that the material encrypted has not been altered in any way since it was first locked up.
The original conception of a trust company was as a firm that would solemnly represent the interests of its beneficiaries – which is why a bank worthy enough to entrust one’s savings to might also be worth entrusting decisions about a child’s college fund to in the event that the parents became incapacitated. Banks may not be among the most trusted institutions today, but libraries are – and they can together embrace a new generation of encryption technologies to safeguard materials that otherwise will never be created or saved for fear of early discovery. Imagine if the records of private firms, government agencies, and individuals from earlier eras were coming free now as trustees combined their keys to release them as time passed or other conditions were met. (In the case of Boston College’s promises, it might be that a keyholder would commit to publish its part of a key only upon the announcement of the death of a Belfast Project interviewee.) As a trust-restoring measure, secrets about government intelligence gathering could themselves be subject to time capsule accountability by those governments. Some actions today might reasonably remain secret – but with a guarantee that they will be revealed at a later date certain, even if the government in question feels later regret over entering into the bargain.
The last refuge of privacy cannot be placed solely in law or technology. It must repose in both, and a thoughtful combination of the two can help us thread a path between having all our secrets trivially discoverable and preserving nothing for our later selves for fear of that discovery.
[A version of this piece has been adapted for the Boston Globe.]
- The ten things that define you
I’ve written an op-ed for the New York Times about the European Court of Justice’s ruling finding a “right to be forgotten.” After that and my initial blog post in reaction to the court’s ruling, I wanted to share some further thoughts on this fascinating and potentially far-reaching development.
First, a refresher on the facts:
A man named Mario Costeja González objected that a Google search on his name turned up two foreclosure announcements published in a newspaper from 1998 seeking buyers of his property to satisfy unpaid debts — debts that were apparently genuine, but that were old enough that, in his view, they should remain obscure rather than a quick search away.
The court agreed, in a ruling and press release that noted, with his name, the very facts that Mr. González sought to bury. That oddity points to a subtlety in the court’s holding: for the first time, the legal problem isn’t in the availability of material on the Web, but rather in its searchability.
So the court implies that Google should be ready to remove links specific to searches on an objecting person’s name. How will it know whether to go ahead and remove the information? Well, says the court,
if it is found, following a request by the data subject [...], that the inclusion in the list of results displayed following a search made on the basis of his name of the links to web pages published lawfully by third parties and containing true information relating to him personally is, at this point in time[...] appears, having regard to all the circumstances of the case, to be inadequate, irrelevant or no longer relevant, or excessive in relation to the purposes of the processing at issue carried out by the operator of the search engine, the information and links concerned in the list of results must be erased.
Adds the court:
[I]t should in particular be examined whether the data subject has a right that the information relating to him personally should, at this point in time, no longer be linked to his name by a list of results displayed following a search made on the basis of his name. In this connection, it must be pointed out that it is not necessary in order to find such a right that the inclusion of the information in question in the list of results causes prejudice to the data subject. [...]
[These] rights override, as a rule, not only the economic interest of the operator of the search engine but also the interest of the general public in finding that information upon a search relating to the data subject’s name. However, that would not be the case if it appeared, for particular reasons, such as the role played by the data subject in public life, that the interference with his fundamental rights is justified by the preponderant interest of the general public in having, on account of inclusion in the list of results, access to the information in question.
This is coherent in theory — the court is trying to balance competing values — but it seems nearly hopeless in practice. It’s tricky enough to ask that search engines eliminate links to allegedly copyright-infringing material — too often the party demanding the deletion isn’t really describing an infringement and isn’t even the party holding the copyright, and search engines are poorly positioned to judge. Figuring out what’s “inadequate, irrelevant or no longer relevant,” is an unanchored standard, and I imagine that, to be safe, Google will just start eliding nearly anything on request — especially if it will owe damages if a court later finds it blew the balancing. It’s even more complicated when the complexities of implementation of ECJ decisions throughout the EU’s respective state court systems is taken into account. That’s what makes me much less sanguine than, say, the author of this CNN opinion piece placing a lot of weight on the court’s balancing test to vindicate genuine free speech interests. If the court is serious about seeing this test applied, perhaps, as Alex Karman suggests, aggrieved people should make a stop at the courthouse first, having a judge review the request and then make an order to Google. That could also help create a formal record of takedowns — after all, as the ECJ decision says, something formerly relevant could become irrelevant, but the opposite is also true: something irrelevant could become relevant, such as when a private figure becomes a public one. How to restore those relevant disappeared search results?
Early reports suggest lots of understandable interest by Europeans seeking line item vetoes on search result pages. (Indeed, people in other countries will start wanting it, too.) As my colleague Samuel Klein points out, Google could even be caught in the middle as spurious requests are made for removal — what happens for those who discover that the search results that reflect best upon them have been removed at the request of a mischief-making imposter? If Google limits these redactions to those accessing it from Europe, will Americans need to codge access from a European IP address to check to see what’s been wrongly redacted in their name?
All of this might be reason to rue the court’s decision and be done with it.
Except: What are the ten things that most define you in the eyes of others? That would be the ten organic links at the other end of:
Google enjoys 93% market share in Europe. If you want to learn about a stranger, you search on his or her name, and if you’re searching, you’re using Google.
And that is why I found myself ruminating on the idea I unpack in the NYT op-ed. That landing page on a search for someone’s name has outsized importance. Our only solace in the status quo is that what appears there is largely untouched by human hands, for better or for worse — Google spits out whatever, in its inscrutable AI wisdom, is “relevant” to the words your name comprises. But given the special status of that page to the people whose names are represented by the search terms, there might be something worthwhile to appear there that isn’t just ten links out of the Google sorting hat. The second page — you know, the one with links 11-20 that might as well be in Siberia — could contain the unadulterated search. We’re already trained to expect some smarter processing by Google and Bing when we are searching for flights, or shoes, the weather, or even how many centimeters are in 42 inches. House ads can appear, and, of course, precious sponsored links.
To include a free”house ad” by the people implicated by a search on their name — like the free credit report they’re entitled to, along with a shot at correcting inaccurate information held by a credit bureau — would do far less violence to search engines’ business models, and more important, their integrity, than the court’s current decision. When a single corporate actor becomes the gatekeeper for our identities, using formulas it can’t fairly be asked to reveal, there’s reason to think something more might be offered. Without taking into account the meaning of that landing page to the identity and reputation of the person searched, the AI will simply get better on its own terms — and perhaps the next refinement of “relevance” will be to assemble political donations, arrests, home address, and kids’ names all on that first landing page. That public data is all typically available with a few searches, a level of practical obscurity we may realize we value only if it, too, vanishes. It’s worth thinking more broadly about this before that happens.
Additional recommend reading: Zeynep Tufekci on the controversy.
- Is the EU compelling Google to become about.me?
Today the EU’s highest court interpreted the EU’s 1995 Data Protection Directive to mean that individuals should have a shot at insisting that Google and other search engines remove certain search results found upon a search for their names, not because they are false, or infringe copyright, but because they violate a “respect for private life” or a “right to protection of personal data.” What does that mean specifically? Not easy to say. Neither the opinion nor the Court’s press release is clear on that. Among the many cases pending about it, the one that the Court heard involved a Spanish citizen who did not like that people could find the public records of a foreclosure sale of one of his properties. So that’s not personal, secret information that was somehow uncovered; it’s a public record or fact made more searchable. And it’s not in the narrow category of things like social security numbers that might be in public documents, but for which Google and other search engines have taken some steps to make them not work as search terms. (Same with credit card numbers.)
In fact, I can’t tell if the Spanish citizen actually won anything. The Court’s own press release names him, and the fact that he at one point owed so much money that he had a property foreclosed. Not only does that illustrate the Streisand Effect, giving attention to exactly the thing he wanted to keep private, but more important, it appears to show that the Court doesn’t see a problem with publishing the very data it thinks sensitive enough to be worthy of an entirely new category of protection.
The answer might lie in the limits of the ruling: it appears that the idea is not to remove certain indexed Web pages, such as public records, from a search engine entirely, but rather only to give people a shot at removing that which appears as a search result under their names. So a document called “Jonathan Zittrain foreclosure of 123 Main St.” might be (if I were an EU citizen) ripe for removal as a result under “Jonathan Zittrain,” but not under “123 Main St. foreclosure.”
Is this terrible for search engines? It’s not great, since it will mean more work implementing a sort of notice-and-takedown regime of the kind that’s become commonplace for copyright — which is already more tolerable than somehow having to proactively police the search engine’s contents for information that might be subject to this newly articulated right. Where before Google and others could turn away people not happy about old foreclosures being tied to their names in searches, now those people can petition Google, which is somehow to perform a balancing test of the interest in the person in her privacy versus the interest in the public at large finding the indexed Web page she wants removed. Exactly who’d be qualified to do that I don’t know, and if the penalty for getting it wrong means lots of litigation and eventually potentially money damages — though not clear if that’s on the table — then I could see search engines coming to remove anything from a list of search results under a name for the person requesting it to avoid further trouble. Hence a search for a name becomes more like an about.me page — curated by the person named, or in the case of “John Smith,” the 5.2 million John Smiths out there (adjusted, I guess, by how many are in the EU).
As a process matter, big search engines won’t have to shut down over this. It will cost them, but so do lots of things. Would it amount to anything more than a fig leaf? Part of that depends on whether an order under this new system is one that must apply to google.com as well as, say, google.es and the rest of the EU-localized googles. Must results in google.com must be geo-filtered when displayed in the EU (call it the YouTube model, where certain videos at universal youtube.com links are withheld from certain jurisdictions), or is filtering in google.es enough (call it the google.de model, where links to neo-Nazi speech are removed from google.de but not from google.com, even when the user of google.com is in Germany)? In the German case, the government’s proscriptions against neo-Nazi speech are perhaps symbolic: that’s why the German governments stops at demanding that Google remove these results only from google.de.
Another procedural matter: the Court says it can weigh in on Google’s behavior because Google is selling ads targeted at EU customers, and has boots on the ground (a corporate subsidiary, servers, salespeople, etc.) in Europe. What if a more modestly scaled search engine like DuckDuckGo were to index the same information at issue in this case? It might be that the Court wouldn’t ask it to do what Google is being asked to do. So there could be an odd regulatory arbitrage by smaller search engines that want to make available exactly the information that Google and Bing may be told they can’t. The Court is likely wise to stop where it does if it’s going to get started at all, though — consider a much more lurid right-to-be-forgotten case in which two German murderers sought to have facts about their deeds expunged from Wikipedia. (You can read about it on … Wikipedia.)
As a substantive matter, it really joins the battle that Viktor Mayer-Schoenberger has been following in his book Delete: whether true but regrettable facts in the public domain are something that a person should be able to control. I’m skeptical of allowing such a right, even as we must acknowledge that pre-search engines there were tons of facts like this that were, in effect, deleted or unfindable over time. So we can see restrictions as some effort to approve of, and restore, a status quo of circa. 1995. The most important harm of this decision is not to the search engine companies, but to the public at large, and its ability to find accurate public information.
Update: I’ve put down some further thoughts in this NYT op-ed and this followup blog post.
- Reconciling lifestreaming and privacy: tech-facilitated negotiations
I’ve long thought that, as tough as privacy against government intrusion and corporate surveillance are, the most novel and complex privacy challenges will be peer-to-peer. With gov’t and corporate privacy issues, the players to be affected are more known and manageable, and impinging on their freedom to collect on us — or report what they find — feel like “regular” regulation.
But what happens when the information being gathered about us is thanks to someone wearing a headset and simply streaming anything interesting that he or she sees, helpfully auto-tagged with our identities? Some bars and restaurants may try to ban Google Glass on the way in, but lessons from anything ranging from mobile phones to hats tell us who’s going to win that war in the longer term. Especially once the distribution of streaming devices has evened out, so it’s not just the occasional freak behaving anti-socially, but all of us doing so, we’ll need to look for other solutions if we don’t want to be stuck simply having to reconcile ourselves to no private moments in public.
One place to mine is the realm of digital rights management. DRM has not worked out so well for copyrighted material in the public mainstream, like movies and music. But what if the kind of tagging by which stuff can ask — if not require — “don’t copy me” could be deployed for privacy purposes, more in the spirit of Creative Commons than the ill-fated Macrovision VHS copy protection scheme.
How to do this? A start would be to allow people to set their expectations for a given environment, and to be able to broadcast them (without having to share their names, of course). If enough people in, say, a classroom, agree that the meeting is off the record, then recording devices will be alerted accordingly. They’ll still function, but they’ll show a message that the environment is expected to be off the record — and perhaps they’ll have a glowing LED or some other gentle indicator to tell others in the room that someone has chosen to record despite the norm. Perhaps, too, those recorded will be able to have some form of pseudonymous contact information embedded in the recording — so that if it should become public, they can choose to show that they were indeed the ones recorded (again without necessarily having to reveal identity) and then ask — not demand — some privilege in contextualizing or commenting upon the recording.
Many of us might appreciate an opportunity to know about others’ preferences and expectations in a quiet, low-impact way, and then to respect them — or if not, to realize that that choice entails overriding the preferences of others. The function of the technology is not to impede certain uses by fiat — the way the old DRM did — but rather to allow people to see that other people are implicated by what they do, permitting the moral dimension of our enthusiastic use of technology to become more apparent.
Update: PlaceAvoider appears to seek to implement some of this functionality.