I’ve had a few responses to my last post that take issue with my seemingly one-size-fits-all, here’s-a-silver-bullet proposal, including Magical thinking in data curation from Dorothea Salo (whose previous blog post I had cited in mine). That was not my intended message, but I suspect I left enough unconnected dots, mushed a few different ideas together, and failed to define some terms such that I left the wrong impression.
I’ll also confess that my post was not really about data curation per se (at least in the sense that Dorothea means it), but rather the tools we use to interact with data. I do think that better tools will make the hard work of data curation easier, or will at least (in many cases) push the complexity into a more manageable space. I’ll also note that what I am proposing is in no way a “novel” approach — in fact, it’s based completely on decade-or-more-old standards and is quite common. Examples of what I am proposing are all over the place, but we simply don’t see them often enough in academia or the library.
Here is an attempt to bullet-point a few conclusions/take-aways:
- There’s no silver bullet. If someone suggests there is, tell them they are wrong or write a blog post telling them as much ;-).
- We’re all using the Internet to share data. It’s a good place to be doing so. The Web, in particular (by this I specifically mean HTTP-based technologies) is excellent for this. Email’s pretty good, too, as is FTP, etc. But the Web rules.
- The Web is based on some basic principles that are not nearly well-enough understood. A better understanding, especially by the folks who build the web applications and write the specifications we use is crucial. Many of the tools we use in the libraries/repositories are poorly attuned to the core principles of the web. Seeing those systems evolve is in everyone’s interest.
- These principles are actually quite elegant and powerful. A better understanding of these principles by librarians, information technologists, administrators is also quite important, and (I’ll contend) something we should strive for.
- All data has structure. Much of that structure can be captured by the tools we all use to create data. It’s critically important that we advocate for and use tools that do so and that make that data portable, i.e., available for reuse by other applications. When the tool does not or cannot capture the structure of the data automatically (and I’m really talking about metadata here), make it as easy as possible for the user to add that metadata at the point of creation.
- Human-created metadata is exponentially more difficult/expensive to create than metadata that can be captured automatically. When human has to create a piece of metadata, make sure it does not have to be re-created by someone else later. That’s a huge waste of time.
- The tasks we are engaged in in academia and the library (esp. when it comes to managing data) are not special. At all. The extent to which the library/repository sees its mission as unique (i.e., demands that it “solves its own problems”) is the extent to which it is doomed to extinction (and the sooner the better).
- To put it another way — “we are all librarians/data curators now.” There are great strides being made. We ought not miss out due to some notion of the “specialness” of The Library.
- Try to see analogs to our work in unlikely places. Look at Twitter, Google, Amazon, Facebook, but look beneath the obvious use cases and think about the implications — can using the library (for some cases) be as easy as using Amazon or Google? Could the rich ecosystem of client applications we see forming around Twitter form around our OPAC? Can our systems be offered as a platform by client app developers like Facebook? Following the basic web principles makes this much more possible.
- Why is our Institutional Repository not more like Amazon S3? Would something like SimpleDB (Amazon’s key-value store) be a better way to capture information about our resources?
- Why authorities are not available as web resources is mind-boggling to me. Oh wait, they are! Ed Summers et. al. at the Library of Congress are at the forefront of this whole approach I am talking about. Keep an eye on their work for ideas/inspiration.
There’s lots more to say, but I’ll leave that for another post….
Comments are now closed.