DASe

You are currently browsing the archive for the DASe category.

REST Presentation

Here’s the PDF slide set of a talk I gave the IT community at UT Austin on June 25, 2008.
Real World REST with Atom/AtomPub

Roy Fielding’s latest post discusses the differences between software implementations, architectures, and architectural styles. Having spent the last few months rewriting DASe to adhere more closely to the REST architectural style, I have come to know the occasionally dizzying challenge of making good design decisions based on a set of principles operating at about 2 or 3 levels of abstraction above the nitty-gritty implementation that I am putting down in code. I had thought that DASe was fairly RESTful to begin with in its original design, but important RESTful constraints were NOT being followed, in fact. It has been an interesting adventure, to say the least, but I am as convinced as ever that the ultimate benefits will be worth it, both for DASe and for my own understanding of distributed information systems.

The architecture of DASe relies heavily on the Atom & AtomPub specifications, which I think is a pretty good base, since those specs are shot through with an understanding of and desire to capitalize on the principles of REST.  When REST is discussed it is as likely as not in terms of HTTP and half again as likely to be in terms of Atom/AtomPub.  So there’s much to be learned from some very intelligent folks on the subject.  And frankly, when a design decision needs to be made, it can be difficult to bring abstract concepts down to the concrete and thus “what does Google do?” or “what does Amazon do?” is often a pretty good start. Same with Atom.  Why not let the Atom spec design decisions flow back up into my design?  So my database tables tend to have an “updated” column, and an “updated_by” column that map nicely to atom:updated and atom:author.   And my primary domain classes generally have an “asAtom” method.  It’s not so different, I think, from Mark Nottingham’s recent post about allowing the four principle HTTP verbs to inform his data model.  Although ostensibly about getting “beyond” methods, it strikes as simply “convention-over-configuration” that says “OK — tight coupling with HTTP methods is fine” so we can move on to a level of abstraction up.  Design is ALWAYS a balance, and I was completely mistaken when I assumed REST would be a cookbook of answers for the design challenges of web application architecture and design.  But it does, I think, provide a framework for considering those decisions and an awareness of possible trade offs/benefits lurking when we choose one path over another.

The Roy Fielding post drew a  comment/response from Sam Ruby, and that, a response from Roy stating “…but hypertext as the engine of hypermedia state is also about late binding of application alternatives that guide the client through whatever it is that we are trying to provide as a service.”  I must have read it somewhere before (perhaps in the dissertation?), but I often use that phrase “late binding” in describing REST.  It really is key — that the state of a representation need not be set until I choose to interact with it.  And in my interaction with it, that state is bound (and thus usable & interesting to me) but there is no contract regarding that state.  It can continue to grow and change and everything it links to can grow and change in their own time.  I (a resource owner) am NOT bound by other’s interactions with the resource.  Well, as a librarian, that is a compelling/revolutionary/subversive/threatening (not to mention terrifying) proposition! But it certainly does seem to offer incredible opportunity and captures much of the promise and messiness of information flow in the real world.

DASe & Metadata

Early in the development of the DASe project we decided/realized that the ONLY way we would be able to quickly and efficiently deal with all of the various digital collections we hoped to incorporate would be to NOT enforce any kind of metadata scheme on anyone, but rather simply let folks describe their “stuff” anyway they wish. Not to mention, since many of these were legacy collections set up in a FileMaker or Access database or even an Excel spreadsheet, there was often already a schema in place and folks (rightly) didn’t want to change. Note that we are talking about faculty members and department administrators who have lots better things to do that figure out how to use Dublin Core to describe the images that the have already been using for years in their classes, research, and publications.

We (Liberal Arts Instructional Technology Services at The University of Texas at Austin) had an interest in “rationalizing” this hodge-podge of data & metadata towards two ends: one, we wanted folks to be able to share their collections easily if they wished, and two, we wanted a means by which we could easily repurpose the digital assets in all sorts of ways: podcasts, websites, specialized search interfaces, etc. So we went with what is essential key-value pairs: collection managers create “attributes” (e.g., title, description, person depicted, time period, etc.) that best describes their assets and we provide an interface that allows them to add metadata to any item by filling in a value for any/all attributes that apply. Well, turns out this works REALLY well. We currenly have 88 collections, comprising over 300,000 items (images, audio, video, documents, etc) and the system holds over 4 million pieces of metadata (i.e. the “values” table has over 4 million rows). Searching is fast, adding new collections is easy, and application maintenance (including backing up collections as XML documents) is painless.

The current version of DASe runs on PHP4 with a PostgreSQL back end. The next rev, which is a significant retooling of the current architecture and code base will be PHP5 and will be able to use PostgeSQL, MySQL, SQLite, or XML files as a backend. How that all works, where Atom, REST, RDF and more fit in, problems encountered along the way, as well as solutions settled on (tentative and otherwise) will be some of the topics explored in future posts.

Sam Ruby has a post that is particularly on target for answering (or at least exploring) a couple of questions I posed to the atom-syntax group (see Threads: “Why Use Atom?” and “Atom Inside web application architecture”). His was a follow-up to Yaron Goland’s entertaining Revenge of Babble post. The comments are enlightening, both in the ways folks try to answer/address the problem AND in the ways that no one really DOES, at least definitively (or so it seems to me). The closest is Sam Ruby:

Yaron Yoland: “What are the criteria that identify a problem that ATOM is a good solution for? And even more importantly what are some key flags to look for that identify a problem that ATOM is probably a bad idea for?”

Sam Ruby: “My take: Atom is good for circumstances where data can be organized into “chunks” that you can identify by title, location, who made the last change, when that change was made, and the data itself is either textual or a brief textual summary can be obtained/synthesized. When is it bad? While the above list suggests several pieces of data that should be present, these requirements don’t tend to be equally weighted. The most fundamental pieces of information in my experience are ones that allow a client to answer the following two questions: have I seen this information before, and did it change? Where can I find it and did it meaningfully change are next.
(…) But my experience is that being able to answer “did this resource change?” in both machine and human terms is essential for proper use of HTTP and Atom respectively.”

The question I need to answer is: “what is the cost associated with Atom, and what are the benefits?”. DASe is built in PHP5, and the XML tools available (XMLReader, XMLWriter, SimpleXML, etc.) are REALLY easy to use when dealing with XML that’s not too complex. In all cases, the XML I am passing around is very simple: lists of collections, lists of items, lists of media files. Seems to me there IS some cost to using Atom internally in the application since it inevitably makes those very simple XML structures more complex. I’m beginning work on a generic DASe Atom class that will help simplify the process of serializing and unserializing PHP objects and object arrays in to Atom entries and/or feeds and see what that gives us. Two things I hope to discover: one: does Atom help me answer that question “did this resource change?” and/or help me “bake” that into the representation of that data, and two: obviously, Atom transforms any/all data endpoints into service endpoints. What’s gained there? Perhaps a lot, in fact, since I can “subscribe” to internal processes in the application (e.g., error logging) as a monitoring/maintenance tool.

Pages

Threads

  • Blogroll

  •  

    Protected by AkismetBlog with WordPress