Archive for August, 2008

URL Management and Acts As Nested Set

I’ve been tasked with redesigning our database to optimize the way we store and search for URLs. My first attempt involved Acts_as_Tree. But I’ve since, thrown that out in favor of Acts_as_Nested_Set.

With the nested set model, we can retrieve a single path without having multiple self-joins. The full tree is retrieved through the use of a self-join that links parents with nodes on the basis that a node’s lft value will always appear between its parent’s lft and rgt values. For a more in depth explination of how nested sets work, check out this article on Managing Hierarchical Data in MySQL.

You can’t look up nested sets without tripping over the plugin called BetterNestedSet. BetterNestedSet is an extension of ActsAsNestedSet that provides an enhanced acts_as_nested_set mixin for ActiveRecord.

>> script/plugin install svn://rubyforge.org/var/svn/betternested...

So below I’ve detailed my implementation of how to store and retrieve urls using Acts_As_Nested_Set ad the BetterNestedSets plugin. There are two tables, Domains and Directories. I split this out because in most cases we’re really not interested in the directory structure of a url. Rather than clutter up the Domains table with additional rows, we can simply access the Directories table by domain_id if we find the need.

Domain levels are separated by a dot, or period symbol. According to ICANN, domain names can have a 255 character limit and the maximum character number of a supported second level and lower level domains is 63. So by splitting out a url by subdomain, you’ll never do a string search over more than 63 characters. In addition, for performance we should specifiy in the database that the string column be set to a char(63) as opposed to a varchar(255) since varchar is used to keep table sizes down and char is used to make searches faster.

class CreateDomains < ActiveRecord::Migration
def self.up
create_table :domains do |t|
t.int :parent_id
t.string :subdomain, :limit => 63, :null => false
t.int :lft
t.int :rgt
t.timestamps
end
end
def self.down
drop_table :domains
end
end
class CreateDirectories < ActiveRecord::Migration
def self.up
create_table :paths do |t|
t.int :parent_id
t.int :domain_id
t.int :lft
t.int :rgt
t.string :directory, :limit => 63, :null => false
t.timestamps
end
end
def self.down
drop_table :directories
end
end

To store my Urls in my new database tables, I created a method that loops through an array of urls and calls the find_or_create_nestedset method where the meat of the code is stored. The reason I’m showing you this simple loop is to demonstrate the Addressable::URI class and the heuristic_parse method used to parse the urls into a standardized format. You can also use the URI::extract method.

You might want to note that I’m catching the InvalidURIError that is thrown for non standard urls. This is a must when working with user entered urls.

#This method loops through an array of urls and calls
#the find_or_create_nestedset method on each
def self.parse_urls(urls)
require 'rubygems'
require 'addressable/uri'
#Loop through urls and create sitetree records
urls.each do |url|
begin
uri = Addressable::URI.heuristic_parse(url)
domain = find_or_create_nestedset(uri)
#
rescue Addressable::URI::InvalidURIError
#skip the record for now
next
end
end
end

The find_or_create_nestedset method parses a uri into the domain table and (if the uri contains a directory structure as well) into a separate directory table. The domain string is read right to left to find or create a domain record, which means the TLD (top level domain e.g. .com, .net, .org) will always be a root node.

Note: I did try using a dynamic finder rather than checking to see if the find_by returned a record or not. While finding a record by subdomain or parent_id is fine, creating a record throws an error because the dynamic finder assumes that you are trying to assign a parent_id on create. Like the id column in a rails table, you’re not allowed to assign the parent_id column of a nested set. I couldn’t find an option that would allow me to bypass this issue, but please let me know if there’s a better way.

#This method parses uri into the
#domain/directory nested set structure
def self.find_or_create_nestedset(uri)
return Directory.new if uri.host.nil?
#   * normalize the uri
#   * Parse the domain and insert into the domains table
#   * Parse the directories and insert into the directories table
#normalize uri
uri.normalize!
#strip off leading www
sitename = uri.host.start_with?('www.') ? uri.host.gsub('www.', '').downcase : uri.host.downcase
#begin transation
Domain.transaction do
#domains
domain = Domain.new() #initialize
parent = Domain.new() #initialize
#split up the domains into an array
subdomains = sitename.split('.').delete_if {|i| i.empty? }.reverse
#loop through the domain strings right to left to find or create a domain record
subdomains.each do |subdomain|
node = Domain.find_by_subdomain_and_parent_id(subdomain, parent.id)
if node.nil?
node = Domain.create(:subdomain => subdomain, :parent_id => parent.id)
node.move_to_child_of parent unless parent.new_record?  #for acts_as_nested_set
end
parent = node
end
domain = parent
#directories
node = Directory.new() #reinitialize
parent = Directory.new() #reinitialize
#split path directories into an array
patharray = uri.path.strip.split('/').delete_if {|i| i.empty? }
#remove extensions (e.g. foo.html)
patharray.pop unless uri.extname.empty?
return domain if patharray.empty?
#loop through directories left to right to find or create a directory record
patharray.each do |directory|
node = Directory.find_by_directory_and_domain_id_and_parent_id(directory, domain.id, parent.id)
if node.nil?
node = Directory.create(:directory => directory, :domain_id => domain.id, :parent_id => parent.id)
node.move_to_child_of parent unless parent.new_record? #for acts_as_nested_set
end
parent = node
end
directory = parent
end
return domain
end

Now that are tables are populated, we can create the models to access our urls in pleasing ways.

class Domain < ActiveRecord::Base
has_many   :directories
acts_as_nested_set
#
def self.all_parents
self.find(:all, :conditions => "rgt  lft + 1")
end
#
def self.all_leaves
self.find(:all, :conditions => "rgt = lft + 1")
end
#
def host
subdomain + ancestors.collect {|i| "." + i.subdomain }.to_s
end
#
def urls
return host.collect if directories.empty?
directories.collect {|i| i.url }
end
#
def tld
self.root
end
#
end
class Directory < ActiveRecord::Base
belongs_to :domain
acts_as_nested_set :scope => :domain
#
def subdir
return "" if directory.nil?
path = ancestors.reverse.collect {|i| "/" + i.directory.to_s if !i.directory.nil?}.to_s
path + "/" + directory
end
#
def url
domain.host + subdir
end
#
end

Given the leaf record of our nested set, our host method returns the a full url string by using the BetterNestedSet ancestor method. Our urls method will return all the url directories found under a domain. This is useful because now you have access to a bunch of information about a uri that can be accessed rather quickly. The key here is speed. With any Domain object, I can quickly return all the urls in my databases that contain the word “mychildren” with one line of code:

Domain.find_by_subdomain("mychildren").urls

If you want the “mychildren” root domain and we know the tld is “.com”, we can trim away significant sections of the set rather than doing a full table scan. For example:

dotcom = Domain.find_by_subdomain_and_parent_id("com",nil)
domaindotcom = Domain.find_by_subdomain_and_parent_id("mychildren",dotcom);
#
#return all the subdomains of "mychildren.com"
puts domaindotcom.urls
#
# output:
# lillian.mychildren.comgeorge.harry.mychildren.commark.mychildren.com

2 comments August 28th, 2008

Restful Authentication on Multi-Databases

Recently, we had a need to pull out the users table into a separate database for security reasons. So I’ve heard that DHH isn’t down with multiple databases (?), but last year he mentioned a cool new gem called Magic Multi-Connections.

I might try it out at some point, but it still seems like jumping through more hoops than I really need. After completing the RESTful Authentication tutorial, I instead referred to Recipe #15 from Rails Recipes and defined a parent class called Security:


class Security < ActiveRecord::Base
self.abstract_class = true
set_table_name "users"
set_primary_key "id"
establish_connection :security
end

Modified the existing User:

class User < Security

And that’s basically it. It just kind of worked.

I did have some problems with the redirect_to root_path line in the update method of the accounts_controller, had to change it to render :action => ‘edit’. But this was just a quirk from the tutorial.

My new concern is migrations. If you rake db:migrate, it migrates your tables to your primary database and that’s it. I’ve seen a way to fix this on the net, but I haven’t gone down that road yet. In the meantime you’ll find the SQL to create the 2nd database and user table below:


CREATE DATABASE security


CREATE TABLE users
(id int,
login char(40),
email char(40),
crypted_password char(40),
salt char(40),
created_at datetime,
updated_at datetime,
remember_token char(40),
remember_token_expires_at datetime,
activation_code char(40),
activated_at datetime,
password_reset_code char(40),
enabled boolean,
PRIMARY KEY (id)
);

So far so good. Everything seems to be working in Application 1. Since I don’t really need to worry about sharing sessions, this implementation should work fine for us. I’ll need to reimplement everything in Application 2 but not only can I keep my user table secure but my users can login with the same username and password used in App 1. I’ll let you know how it all pans out.

Add comment August 26th, 2008

Authentication On Its Own

So, recently I implemented Restful Authentication using the Restful Authentication with all the bells and whistles tutorial.

So far so good.

Now we find ourselves wanting to pull out user authentication into a separate database. Why would we want to do this non-Railsy thing? Well there’s loads of good reasons.

I’m a little annoyed by the attitude of some that accessing multiple databases is a bad thing. It’s a realistic thing, people. Sometimes I get the feeling that Rails folks are a little inexperienced when it comes to databases.

If RoR is ever to be accepted in big-name companies, folks will need to acknowledge that you just don’t always have the luxury of writing an app or creating a database from scratch. Specifically, there are many Enterprise Solutions designed so that login and user information is stored in its own database and accessed by multiple applications each having their own databases.

So what’s the best way to implement this model in Rails? I’ve tried just accessing the authentication models from a seperate db a la Recipe #15 from Rails Recipes. But this just seems to confuse the Mailer. I’ll blog again if I can get any further.

We’ve been thinking about playing with ActiveResource. Maybe we’ll create a stand alone Security App that can be accessed by our various web apps. A colleague came across an interesting blog post Authenticate like SSO with ActiveResource. Of course, this article pointed us towards RubyCAS which looks really fun.

Any thoughts?

Add comment August 20th, 2008


RSS Berkman Gender & Tech

RSS Tweets

Tags

Meta