ActiveSupport::Multibyte is part of Rails so everyone can enjoy multibyte safeness in their applications. Within it, the Chars class enables you to work transparently with UTF-8 encoding in the Ruby String class without having extensive knowledge about the encoding. So while in the past you needed to download special gems and libraries to work with foreign language character sets, now you have a multibyte safe proxy for string methods. Problem solved, right?
The Acts-As-Taggable-On plugin calls a cleanup method in the Tag.rb library. See if you can spot the problem…
def self.cleanup(name) n = name.to_s.downcase.gsub(/[^a-z0-9_-]+/, '').strip n.blank? ? nil : n end
The regex is removing non-alphanumeric and underscore characters. So foreign language tags are being stripped away! Instead, you can use the /u regex parameter to parse UTF-8 strings containing multibyte characters:
So, one little character makes this plugin localized by treating Far East (and other) characters as individual characters.
def self.cleanup(name) n = name.to_s.downcase.gsub(/[^a-z0-9_-]+/u, '').strip n.blank? ? nil : n end
Ah, but there’s one caveat here. While alphanumeric tags are created with unwanted special characters stripped out, UTF-8 strings are not. So I could end up with a tag that looks like this:
2 comments August 18th, 2009