OPML Chapter Five: Search engines, OPML and Google
Dec 15th, 2006 by jim
Consider search engines. At this time in history there is really only one general purpose, web-wide search engine, Google. How does Google Page Rank really work? Do you know? I don’t. Larry Page does. Larry Page is a good guy, as far as I can tell. I shared a bus ride with him a couple of years ago, and enjoyed talking. But is it really the best for the peace that one person is in charge of connections ? How long will that be ok? For a year? For ten years? For a hundred years?
Google is an intermediary between me and the riches of the world wide web. It is an amazingly powerful and effective intermediary. I use it regularly, and I appreciate its power. I genuinely like Google ads, because I often find useful sources among the advertisers.
On the other hand, I envision a world where there are many search engines, many directories, many alternative ways to connect and create montages of the content available on the web. I envision a world where each person and each team and each organization continually putters with a kind of “knowledge surround”–a knowledge household, that provides dynamic, continuously updated access to the pulsing, transforming, continually-evolving digital universe.
Interestingly, in the world of blogs there has not been a consolidation of search capabilities. Instead, there are hundreds of meme-trackers, aggregators, and specialized search engines available. Indeed, on many blog hosting sites, including this one, each blog has its own search capability.
In my view, this is good. We benefit from diversity and competition and collaboration and evolution in search.
How has this happened? Because blog content is highly standardized and most of it is available in XML-based RSS and similar formats. It is pre-structured and thus open to searching.
One would hope that all content would be as open to multiple search approaches. A few days ago Charlie Nesson of Harvard Law School pointed out that Google has an agreement with Harvard to digitize the contents of Harvard’s libraries. These libraries are second only to the Library of Congress in size, and broader (i.e. global and historical) in scope. One would hope that the results of this digitization would be made available in bite-sized bits (bad pun intended) so they can be searched by many different sorts of machines. One would hope that the scanned “pages” made available could be read directly by any web-connected individual, and could be linked-to like blog-posts-with-permalinks. These resources will then become part of the rich “stuff” of common digital knowledge.
What does this all have to do with OPML? OPML is a practical approach to pulling together a collection of elements available on the web. OPML is an elegant and powerful approach to user-created and user-controlled access to elements of the web. Resources that have been catalogued in OPML trees and webs are available without recourse to general-purpose search engines. Sophisticated OPML collections assemble links to direct resources and express relationships among these links and resources. These collections in turn can be searched by any of hundreds of engines–including but not limited to Google and other major players.