~ Archive for news ~

About our wordpress deployment

ø

We are frequently asked about our wordpress deployment by universities, NGOs and other institutions that’re interested in setting up their own multiuser blogging platform. We’ve been answering those questions on an ad-hoc basis – this page will serve to collect the most common questions and hopefully be something we can refer to interested parties. First – see our Project Info page if you’re interested in the early history of blogging at Berkman.

Notes: Answers are current as of 1/27/2011. This document does not represent the official position of the Berkman Center, Harvard Law School or any other entity.

What OS do you use?

Ubuntu LTS (multiple flavors). Any *nix would be great.

What is your current hardware for the blogs web server, and database server (we assume MySQL)? Does any of this run in virtual machines?

WordPress requires MySQL. There was a PostgreSQL fork a while back that died off pretty quickly. We have a well appointed database server (you’ll get a cert warning from this link as it’s using a self-signed SSL cert: the SHA1 fingerprint is CF:DD:34:9D:B8:CD:E0:B9:EE:E8:1D:0F:FE:A9:1F:33:36:58:0D:7C) that shares duties with many other sites and applications – the database server is not a virtual machine and has directly attached storage to maximize IO (all the normal stuff you’d do to create a high performance database server).

Our wordpress application server (again, you’ll get a cert warning because of a self-signed SSL cert, same SHA1 as above) is a xen VM with 3 gig of ram and 4 cores. We run nginx as a caching front-end proxy to our apache backend. I packaged up this nginx config as a plugin, along with sample configs, info here. My talk about high-performance wordpress (along with an overview of our nginx deployment) at Wordcamp Boston 2010 is here.

We’ve read about some of the improvements you have made via your news page : which improvement has been most important?

Hands down – the nginx caching proxy. Some requests are VERY expensive – RSS feeds, for instance. A caching proxy (or perhaps WP Super Cache) is a necessity. A default, uncached wordpress deploy IS NOT going to get you far.

You definitely want a physical machine to maximize MySQL IO. You should tune it properly for the large amount of RAM you’ve surely installed in it.

Your wordpress app server needs multiple cores to maximize concurrency.

Be sure to use a PHP opcode cache – APC has been nothing but unicorns and rainbows for us.

We could probably handle double the traffic with our current hardware, and nginx can load balance for us if/when we need to use multiple wordpress application servers. Our performance problems have not been related to our MySQL server so far.

Do you let users install custom themes or plugins?

No. We will install custom themes or plugins occasionally for special projects, but only after a thorough audit and after all development has taken place on a completely separate system.

We have been known to develop custom themes and charge back developer time to university departments. We love child theming the new-ish twentyten theme.

Do you allow SSH access to blog owners / theme developers?

No.

Any core hacks?

A couple, but we’re factoring those out and have even contributed one to the wordpress core. We expect to be on a completely clean wordpress core by Summer 2011.

Do you integrate with LDAP or another directory service?

Yes and no. We use apache’s mod_auth_ldap to protect some private blogs, but we don’t use it to populate users inside wordpress. This has worked out fine, with few complaints from users about having a separate account. It also has the advantage of allowing those who wouldn’t be in a university LDAP server to have accounts – alumni, contractors, collaborators, consultants, etc.

Who gets a blog?

Anyone with a harvard.edu address.

How do you deal with spam?

For comments, we use Akismet. It does a pretty good job, but it seems to be losing effectiveness over time. Either that, or the sheer volume of blog spam has been increasing – most likely it’s a combination of both. We also suggest that blog authors have comments close automatically on old posts (after, say, 30 days), and that they moderate comments to devalue us as a target.

For spam blogs or malicious users – requiring a harvard.edu address is a pretty high barrier. That said, we do have issues with compromised accounts, or university affiliates attempting to exploit us via linkfarming. We enforce our terms of service and view linkfarming as injurious to the university and against the spirit of this endeavor. Defining what’s spam can be a bit like defining obscenity – to paraphrase Justice Potter Stewart’s concurrence in Jacobellis v. Ohio, “you know it when you see it.”

What kind of traffic do you see? How many blogs are you hosting?

  • 800+ live blogs, probably 200 are what you’d consider active, and maybe 100 are what you’d consider REALLY active.
  • 700k visits per month, around 3 million+ page views by actual humans. Probably 7 to 8 million total page views counting bots.
  • TONS of bot visits. It’s kinda like we’re under a continuous DDOS attack. See our minimal robots.txt – we attempt to enforce the Crawl-delay value through the excellent limiting features provided by nginx.

What’re people blogging about?

We host personal blogs, project blogs, the entire web presence for various working groups, archives of administrative updates, and a whole slew of other types of content. It’s perhaps best shown rather than told through a very small selection:

Why wordpress?

We wanted an open source multi-user blogging platform and it seemed the best choice at the time. We’ve been very happy with it, and there have been real improvements to the core features WITHOUT the core team throwing backwards compatibility under the bus.

Upgraded to 3.0.3, code contributed to WordPress core

ø

Boring but important – we’re now running wordpress 3.0.3. The last couple point releases have served to plug a few privilege escalation bugs – nothing too scary, but needing attention nonetheless.

We’ve also contributed a patch to the wordpress core that should be released in version 3.2. Hopefully we’ll have more to contribute in the future.

– Happy Holidays!

Upgraded to wordpress 3.0.1, new analytics on the horizon.

ø

Upgrade to WordPress 3.0.1

The changes that matter most to me (as the sysadmin) aren’t of much interest to most: wordpress 3.0 merged vanilla wordpress and wordpress-mu into a single codebase. This means there’s only one wordpress to contend with, and as plugins get “certified” to work under wordpress 3.x, they should work for us as well (ceteris paribus, especially around security and privacy considerations).

Changes that might matter to you – as a blogger – include:

  1. A new default theme – “twentyten”. It’s clean and very customizable – check it out on my blog,
  2. Further refinement of the backend interface,
  3. A “get shortlink” feature on the post edit page for use in twitter and other size constrained social service.

Analytics Upgrades

We’re still building out the infrastructure, but we’re going to offer piwik and google as improved analytics options. Piwik is a fairly impressive open source analytics program (demo here) that’s a great option for the privacy-conscious: we will run the piwik analytics server and your visitor data won’t be leaked to third parties, ever.  We’ll provide google analytics through a plugin if that’s your preference.

Stay tuned!

New Plugin and Theme nominations for blogs.law.harvard.edu. . .

22

We’re looking to expand our selection of plugins and themes in our WordPress install, and would love your help figuring out what would be most useful to the blogs.law community.

Here’s the official WordPress plugin directory. Keep in mind that not all plugins can be run in a WordPress Mu environment, and that we reserve the right to reject plugins of dubious quality, security, or function.

Here’s the official WordPress theme directory. We don’t have a budget to pay for any premium themes (so only nominate free ones, please) and the same caveats apply around quality, security, and function.

Please comment below with your nominations, including the full URL to the item of concern. Happy blogging, and thanks for your help.

New page caching implemented on the blogs.law.harvard.edu server

9

We’ve implemented a new page caching system that should help improve response time and ensure better uptime for the blogs.law.harvard.edu server. A “page cache” will intercept requests and serve pages WITHOUT invoking all of the wordpress code – allowing our site to serve many more requests over all. The page cache only comes in to play for non-logged-in users, and for those that haven’t posted a comment recently to a blog we host.

The page cache has one significant side effect: WP-slimstats will be even less accurate, as most pages served to non-logged in users come directly from a page cache. This means that WP-slimstats doesn’t know about all traffic. Wp-Slimstats over-counts illegitimate traffic and is inefficient, unsupported and inaccurate – we’re looking into other options to provide some kind of analytics for our sites that will work with the new page cache.

The backstory:

We get a significant amount of legitimate – and not so legitimate – traffic. We served over 3 million legitimate page views and around 300,000 unique visitors this August – on top of all the ‘bot traffic. Badly behaved ‘bots request too many pages simultaneously and can cause a significant service interruption in combination with all the legitimate traffic we’re already handling.

Our traffic – bad and good – has been increasing over time and we have been seeing more service interruptions due to high load: we have reached the point where we simply can’t continue to provide a high quality service without a page cache in place. We’ve already implemented a behaviour based ‘bot catcher and numerous other tricks to optimize how we serve content: the page cache is the newest weapon in our arsenal.

Thanks for your patience! The site should feel snappier. If you want to browse a fully cached version of your site, log out and clear any “blogs.law.harvard.edu” cookies. Logging out by itself isn’t enough, you have to clear your blogs.law cookies, too.

Server Upgraded to WordPress 2.8.4a

ø

And there you have it. We’ve been upgraded to WordPress 2.8.4a, the latest stable WordPress Mu release.

You should notice some minor changes in the adminstrator backend for your blog – nothing major, just a nice set of refinements to the look-and-feel and a few new features (like the redone “widget” control under appearance -> widgets).

Please contact techhelp at cyber dot law dot harvard dot edu if you’re seeing any oddness. Thanks!

WordPress Mu upgrade scheduled for 9/8, 5pm

ø

WordPress Mu 2.8.4 is primarily a bug fix release that also helps to refine the new administrator backend introduced in the 2.7 branch.

If you’re already used to WordPress 2.7, there’s not a lot that’ll surprise you in 2.8.4.  This isn’t a huge change and we don’t expect a significant amount of downtime (if any).

New plugin installed – Source code highlighting

ø

We’ve installed the SyntaxHighlighter Plus plugin to allow you to post formatted, highlighted source code.

There appears to be a few quirks related to open / closed brackets. We’ll continue to look into it. See the link above for details on how this plugin works – you must enable it under “appearance -> plugins” before you can use it.

Blog server upgraded to WordPress 2.7.1

3

Whew.  We’ve upgraded the blogs.law.harvard.edu server to wordpress mu 2.7.1 – which explains the completely new backend for blog managers.

Next we’re going to move onto plugin upgrades and start a process to evaluate the requests we’ve had for new plugins – any ideas? Please comment on this post.

We’re in much better shape than before to quickly apply upgrades, so expect us to be more in-line with WordPress-Mu releases in the future.

Thanks for your patience!

Upgrade to 2.6.5 went fine, new feature added. . .

ø

So the upgrade to 2.6.5 appears to have gone off without a hitch. We’ll be working on the upgrade to 2.7.1 today, with a planned rollout at 4pm.

I’ve added a new feature: apparently WordPress does not defaultly adjust for daylight savings time. I’ve added the “timezone” plugin to let you set your timezone to a city near you, wordpress will then automatically adjust for daylight savings time where appropriate.

More details here:
 http://wordpress.org/extend/plugins/auto…

You can set your timezone under “settings” -> “timezone”.