Wikipedia and a Controlled Vocabulary
I’ve been playing with Wikipedia a bit over the last few days and have noticed problems with many of the terms in the articles that should link elsewhere.
One problem related to the wiki software is that it’s easy to create a link around words without actually having to find out if there’s a Wikipedia article corresponding to those terms and what the URL for that article is. While on the one hand, this facilitates the creation of new articles, it can point people astray when there might be an article under a slightly different heading.
Take place names as an example. Some place names are noted in page titles followed by another place name, like a city, in parentheses. If someone types the text without the parentheses and makes it a link, it goes to the wrong place. Some geographic names ending in ’s sometimes maintain the ’s and sometimes just have an s. First of all, it can be tricky to know which one to use. Second, it’s probably up to the author to figure out if there’s already a Wikipedia article with either preference.
Also, some terms vary slightly, like American Civil War and United States Civil War. An author must note that because Wikipedia is international and there are many civil wars throughout the world and world history, it’s important to specify /which/ civil war. It’s also important to know the preferred term for the wars in order to choose the best link. Automatic redirects from some terms to others solve some of these problems. (Note: I borrowed a link for the terms above that stops the redirect for the full effect.)
I talked to a Wikipedian about these issues. “Are there Wikipedians who patrol the encyclopedia looking for name and link issues like these?” I wonder. It’s up to the article authors to resolve all of them, as far as he knew.
As I ponder these issues and how Wikipedia might consider addressing or resolving them, a concept from my library school days appeared before me: a controlled vocabulary. A controlled vocabulary, very basically, is when there are certain terms that are preferred and used in a system, perhaps at the expense of other synonyms. The Library of Congress Subject Headings, the departments on this weblog, and the categories on a retail Web site are examples of controlled vocabularies. Because the Wikipedia article title words can be used again as instant links to the articles, retyping the titles correctly in an entry becomes critical. The biggest advantage to using a controlled vocabulary and making it easy for editors to figure out the right terms is improved linking among the articles. Instead of going to a page indicating an article on a topic doesn’t exist, someone could go to where the article actually is.
How exactly would this work in Wikipedia? In some ways, the controlled vocabulary is already established by the article titles. Going beyond that (related terms, broader terms, narrower terms, use fors, etc., that are pointers in other controlled vocabularies) might get way too complicated. Wikipedia already uses redirects and disambiguation pages (instant see references) and see also links. I don’t know stats, but I would guess that far more contributors are anonymous instead of people who are members of the Wikipedia community and regularly work in/on the wikiuniverse. How can they make it easy for someone outside of their community to quickly learn about their controlled vocabulary, its importance, and how to apply it?




