Unstopping Stop Terms
I’ve been pondering whether Feedster has too many stop terms because of a post on the Feedster Support blog. My name happens to be a stop word, so it’s not possible to search for "j Baumgart." I might be a bit biased because of that.
I think a long list of stop words is only useful until it really starts hindering searches. If a search engine is going to have such a list, allowing searchers to disable a stop term could be very powerful. Perhaps some of that ability depends on how the database/engine is configured. Some databases might completely omit words on the stop list, so it isn’t possible to disable a stop term. Disabling stop terms can be incredibly useful for phrase and known-item searching.
To illustrate, try this search for posts about the Computers in Libraries conference. "In” is on the stop list, so it’s omitted. On the front page, I see a number of posts with various forms of "computers” and "libraries,” but only a few with the specific text "computers in libraries.” Those few posts are scattered amidst posts with "library” and "computer,” posts that aren’t relevant to my search. The search appears to have high recall, but low precision. (Recall is the number of items you get in search results compared to the total number of items in the database, right? Precision is how many of the results are actually directly relevant to the inquiry.)
Try the search in an engine that can search on the entire phrase "Computers in Libraries,” like Google. Notice the difference? Google has much higher precision. The items specifically about the conference are together, not mixed with other results with variations of the words.
Several other things might be happening in Feedster, too. The engine might be disconnecting the phrase, perhaps because of the stop word. That might account for the posts with "library” and "computer” in a different order than the query. It’s also possible, though I doubt this from other searches I’ve done in Feedster, that there’s such little content about the conference, it’s giving me more just so it can try to impress me. Nice precision impresses me. Being able to unstop stop words impresses me.






November 25th, 2004 at 4:31 pm
Could you test my “Inside search”? When I do the search, feedster shows the “Redhead” on myblog’s results page. Is it normal? How can I have other image?
November 25th, 2004 at 11:41 pm
Scott R. reports Feedster’s stop terms list has been narrowed down.