Filed under: %a la mod,fly-by-wire,international,popular demand,Rogue content editor,Too weird for fiction,Uncategorized,wikipedia
(echoes of a broken system)
UPDATE: Aaron committed suicide on January 11, 2013.(!) More on his life here.
Aaron Swartz is a friend and Cambridge-area polymath whose projects focus on access to knowledge, open government, and an informed civil society. He has worked as a software architect, digital archivist, social analyst, Wikipedia analyst, and political organizer. Last year he co-founded the Progressive Change Campaign Committee and the non-profit political advocacy group Demand Progress.
He is also currently charged with computer fraud by the US Attorney’s office, in what appears to be the latest example of “a sweeping expansion of federal criminal jurisdiction” based on the broad applicability of wire fraud and computer fraud statutes. An overview:
Aaron has studied institutional influence and ways to work with large datasets. In 2008, he founded watchdog.net, “the good government site with teeth“, to aggregate and visualize data about politicians – including where their money comes from. That year he also worked with Shireen Barday at Stanford Law School to assess “problems with remunerated research” in law review articles (i.e., articles funded by corporations, sometimes to help them in ongoing legal battles), by downloading and analyzing over 400,000 law review articles to determine the source of their funding. The results were published in the Stanford Law Review. Most recently, he served for 10 months as a Fellow at Harvard’s Safra Center for Ethics, in their Lab on Institutional Corruption.
He contributed to the field of digital archiving, designing and implementing the Open Library, which serves as a global digital resource today, and as a foundation for any digital libraries in the future. And he collected 2 million public-domain court decisions from the US PACER system — a system that nominally makes all such decisions available to the public, but in practice keeps them hidden behind a paywall – to add to Carl Malamud’s collection at resource.org. (That work in turn gave rise to the crowdsourced RECAP project.)
The Case of the Over-Downloader
Last week, Aaron was charged by a grand jury with computer fraud , for allegedly downloading millions of academic articles hosted by the journal archive JSTOR, and exceeding authorization on MIT and JSTOR servers to do so.
JSTOR claims no interest in pursuing a legal case. However they are not part of the prosecution, and Aaron faces a possible fine and up to 35 years in prison, with trial set for September. You can support his legal efforts online.
The Association of College and Research Libraries notes that both the prosecution and Swartz’s supporters have characterized the trial with “superficial, and deeply incorrect, messages about libraries and licensed content“.
So how did this come to pass, and what does it mean for the Internet?
Details of the case and public reactions it inspired, after the jump.
This past winter, JSTOR observed “systematic downloading” of millions of articles from MIT’s campus – in violation of their terms of service. According to the indictment, this at one point brought down JSTOR computers, and led to MIT’s campus access to the service being twice blocked for a few days. In January, MIT changed their access policy for using JSTOR as a result. According to JSTOR’s public statement, once they identified Aaron as the source of the downloading and ‘secured the content’, they had “no interest in this becoming an ongoing legal matter.”
However, the US Attorney’s Office was already looking into the situation. An investigation under Attorney for Massachusetts Carmen Ortiz led to the indictment, on charges of wire fraud, computer fraud, unlawfully obtaining information from a protected computer, and damaging a protected computer.
Why was the government moved to react so strongly, if there was no civil case?
Attorney Max Kennerly points out some legal problems in a detailed critique, “Examining The Outrageous Aaron Swartz Indictment For Computer Fraud“, and notes the great power prosecutors have over the lives of defendants.
Attorney Jerry Cohen, a Boston IP lawyer, suggests this aggressive use of criminal charges rather than civil charges is part of a trend in government prosecution of such cases, like taking “a sledgehammer to drive a thumb tack… It’s intended to terrorize the person who’s indicted and others who might be thinking of the same thing.“
[Swartz's actions] sure sound suspicious, but what, exactly, was Swartz’s crime? Sneaking into a building at M.I.T. might seem like trespassing, but that’s not a federal crime. He’s charged instead with wire and computer fraud—knowingly accessing a computer with the intent to defraud, and gaining some value from it. (A JSTOR subscription like M.I.T.’s could go for fifty thousand dollars.) Critics compare the act to breaking and entering, while supporters note a better analogy is that JSTOR gave Swartz the keys to its house, then got upset when he drank all the milk.
JSTOR, for its part, says the milk was returned—Swartz gave back the downloaded data—and considered its dealings with Swartz complete. (Can one “steal” and then “return” data, when the original data remain on JSTOR’s servers all along?) But that doesn’t appear to satisfy the government, which has been waging something of a war against “hacking,” broadly defined.
Public reaction to the case
Other archives and journals have been quiet about the affair. JSTOR has said no more than necessary in their public statement, and MIT has remained mum. And if the prosecution means to take a hard line to send a message, they have not yet clarified what it is.
Unlike the awkward institutional statements, the public response on the Internet has been thoughtful and at times inspiring. A number of academics and writers have covered the case.
Glyn Moody responded with an essay on the art of liberating knowledge, and what that should mean to us.
Some commentators point out that the sort of data harvesting at issue here is done by many groups that engage in meta-analysis : search engines, other large web properties, academic researchers, and other analysts. Any of these may have ideas to test on metadata that they can most easily get by spidering and scraping the web — sometimes including sites they have to jump through hoops to access. But larger organizations have their reputation and legal teams to protect them from challenges.
Finally, on Thursday, Greg Maxwell published a bittorrent archive of 18,592 public domain papers from the Philosophical Transactions of the Royal Society from 1665 to 1922, which he says he has had for some time. Like many old journals, the Philosophical Transactions were digitized for their publisher by JSTOR. The text of these works is being cleaned up online on Wikisource.
Enclosing the public domain
This highlights one of the grayer areas of IP in modern digital publishing — the use of the public domain by groups that could make it easy for others to share and reuse public domain documents, but chose not to. In extreme cases, this involves actual enclosure – limiting access to the only source of such works, or suggesting to users that they do not have the right to reuse them.
Maxwell introduces his archive with an alternately meticulous and scathing essay about why scientific knowledge should be free, and how we can improve inefficient social policies. In it he notes:
"I've had these files for a long time, but I've been afraid that
if I published them I would be subject to unjust legal harassment
by those who profit from controlling access to these works.
I now feel that I've been making the wrong decision."
By legal harassment, he is likely referring to the practice of some archive owners to claim a new copyright on the images produced by scans of public domain materials, even though such scans are often considered uncopyrightable. A classic case of enclosure.
In the case of the old Royal Society journals, canonical hosts such as JSTOR and the Royal Society’s archives only offer them for rent, at heady rates of $5-$20 per article per month. However, they are also already freely available online, if less visibly so, thanks to independent library-scanning efforts (including the Internet Archive, Google Books, and Hathi Trust) and an independent digital curator (the tireless John Mark Ockerbloom at UPenn).
This illustrates one of the grayer areas of archiving and preservation — how public domain documents are made available to the public. Torn between optimizing access and revenue, publishers sometimes feel obliged to put these works behind a paywall. This issue affects federal initiatives (PACER) and institutions (the Smithsonian), as well as publishers such as the Royal Society whose archives extend back more than a century.
What do you think? Related stories and anecdotes are welcome.
16 Comments so far
Leave a comment
Leave a comment
Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>