Project Info
Please be advised that this document is in need of an update and doesn’t accurately reflect all aspects of our current blogging infrastructure.
-BCIS Webmaster, 5/14/2009
We get frequent requests for information about the technology, history, and policies of the Weblogs at Harvard Law School project. We have prepared this brief synopsis and background of our offerings. We hope this proves helpful and are happy to attempt to answer additional questions if time allows. W@HLS began in 2003 as an academic and social research project:
Berkman Center fellow Dave Winer wants to get Harvard blogging:
http://www.hno.harvard.edu/gazette/2003/04.17/13-blogging.html
An active blogging discussion group developed around the service and is provided meeting space by the Berkman Center but is otherwise independent. The system we used was called Manila, a platform developed by UserLand Software. Our deployment was very successful, with about 500 blogs created in the first two years. In 2006, we transitioned away from Manila to the evolving WordPress MU platform. We also utilized the transition to close down old and abandoned blogs. The transition was difficult and complex, but provided us with a more stable and flexible blogging platform on a more powerful server.
The launch was covered by the Harvard Crimson:
We currently offer free weblogs to any member of the Harvard community. We enforce this requirement by allowing registration to users with an email address ending in harvard.edu, hbs.edu, or radcliffe.edu. We use a terms of service and privacy policy that were developed in consultation with the community by the Berkman Center’s Clinical Program in Cyberlaw. We welcome others to use our policies, but recommend that you customize them to suit your unique needs and run them by your general counsel’s office first. If you do use our policies, we request but do not insist on a link back to W@HLS.
We have always supported the use of RSS to syndicate content on our blogs server. In fact, Berkman is the copyright holder of the RSS specification (http://cyber.law.harvard.edu/rss/). Using RSS, content on our blogs server is syndicated all over the web, as well as being used in other sites around Harvard that support RSS, such as the Harvard course management platform, iSites.
With our recent (2007) upgrade to a newer version of WordPress MU, we have installed several dozen themes that support customizations through things like header image uploads, changing colors and fonts, and modifying sidebar widgets. We unfortunately do not have the ability to allow our users to securely write or modify their own themes, but we continue to look for solutions for this oft-requested feature.
What follows are a few details about our setup that may be of interest to other systems administrators looking to implement a blog solution at their school or organization. WordPress MU is the best system we’ve found. Another alternative is the free software that runs LiveJournal. We did not investigate any non-free blogging platforms. WordPress MU lives at http://mu.wordpress.org . Unfortunately the release cycle and support are not very professional, but we have not had many problems, provided that we dedicate a bit of time each month to checking for updates, reading the forums to see if any bugs were found, and then applying the updates on a staging site prior to moving them live. There is some non-trivial cost in maintaining a blogs server including occasional community policing, conflict resolution, and software upgrades. Our blogs project no longer has its own funding, but we do not find maintaining the blogs server and upgrading it to be overly burdensome for our small but highly skilled technology staff.
We currently host a few hundred active blogs at any one time (they come and go as students enter and leave and as courses begin and end). As of February 2007, we were serving approximately 250,000 page views per day, or about 7 million views per month. Some of this traffic is podcasts and other large files. In February we served 200GB of content.
Our primary blogs server is a dual processor Dell PowerEdge 1850 with 4GB RAM and hardware RAID-1. The server has an average load of about 50%. The blogs database is stored on another, more powerful machine that Berkman uses for most of our databases. The blogs server generally has over 200 connections open to the MySQL server at any one time. We provide all of our users 256MB of file storage, but few come close to that limit. For select users, we increase the limit, up to 1GB. Files are stored on local disk, although we plan to eventually move them to our NFS server. For safety, we backup all files each night and run dumps of the MySQL database four times each day from a replication server. We currently use the perchild MPM in Apache and eAccelerator for PHP caching, which we find provides a significant performance improvement. Watch out, though: the kses.php file must be excluded from caching or you will have lots of problems. In addition, we have WordPress’s built-in disk caching enabled. We also reduce load by keeping most plug-ins, including the very heavy SlimStats plugin, turned off by default, allowing users to enable plugins as they see appropriate for their individual blogs.
Blog spam is a big problem. We used to use a combination of custom banning scripts and the Spam Karma plugin to prevent comment and trackback spam. Now we use Automattic’s Akismet product under an educational license along with a few custom rules and scripts specific to our setup (and mod_security in Apache). In addition to restricting signups to harvard.edu accounts, we also utilize a modified version of the WPMU-Signup-Captcha plugin found at http://valery.bgit.net/ to discourage bots.
While we plan to run scripts in the future to clean out old accounts, we currently feel no overwhelming need to do so due to the size of our deployment. We have been generally satisfied with the power and flexibility of our system and feel that it servers our community’s needs well. If the W@HLS research project is ever re-activated in the future, we may invest additional time and effort into improving our offerings and building research tools on top of the platform.
