You are viewing a read-only archive of the Blogs.Harvard network. Learn more.

Localizing Acts-As-Taggable-On

ActiveSupport::Multibyte is part of Rails so everyone can enjoy multibyte safeness in their applications. Within it, the Chars class enables you to work transparently with UTF-8 encoding in the Ruby String class without having extensive knowledge about the encoding. So while in the past you needed to download special gems and libraries to work with foreign language character sets, now you have a multibyte safe proxy for string methods.  Problem solved, right?

While ruby operations like downcase and normalize will keep your utf8 characters intact, you need to be wary when you use a regex.

The Acts-As-Taggable-On plugin calls a cleanup method in the Tag.rb library.  See if you can spot the problem…

def self.cleanup(name)
n = name.to_s.downcase.gsub(/[^a-z0-9_-]+/, '').strip
n.blank? ? nil : n
end

The regex is removing non-alphanumeric and underscore characters.  So foreign language tags are being stripped away!  Instead, you can use the /u regex parameter to parse UTF-8 strings containing multibyte characters:

gsub(/[^a-z0-9_-]+/u, ”)

So, one little character makes this plugin localized by treating Far East (and other) characters as individual characters.

def self.cleanup(name)
n = name.to_s.downcase.gsub(/[^a-z0-9_-]+/u, '').strip
n.blank? ? nil : n
end

Ah, but there’s one caveat here.  While alphanumeric tags are created with unwanted special characters stripped out, UTF-8 strings are not.  So I could end up with a tag that looks like this:

;)%$%+عربيةعربيةعربيةعربي

Hmm!

2 comments August 18th, 2009


Pages

Tweets

Meta

Recent Posts