New page caching implemented on the blogs.law.harvard.edu server
We’ve implemented a new page caching system that should help improve response time and ensure better uptime for the blogs.law.harvard.edu server. A “page cache” will intercept requests and serve pages WITHOUT invoking all of the wordpress code – allowing our site to serve many more requests over all. The page cache only comes in to play for non-logged-in users, and for those that haven’t posted a comment recently to a blog we host.
The page cache has one significant side effect: WP-slimstats will be even less accurate, as most pages served to non-logged in users come directly from a page cache. This means that WP-slimstats doesn’t know about all traffic. Wp-Slimstats over-counts illegitimate traffic and is inefficient, unsupported and inaccurate – we’re looking into other options to provide some kind of analytics for our sites that will work with the new page cache.
The backstory:
We get a significant amount of legitimate – and not so legitimate – traffic. We served over 3 million legitimate page views and around 300,000 unique visitors this August – on top of all the ‘bot traffic. Badly behaved ‘bots request too many pages simultaneously and can cause a significant service interruption in combination with all the legitimate traffic we’re already handling.
Our traffic – bad and good – has been increasing over time and we have been seeing more service interruptions due to high load: we have reached the point where we simply can’t continue to provide a high quality service without a page cache in place. We’ve already implemented a behaviour based ‘bot catcher and numerous other tricks to optimize how we serve content: the page cache is the newest weapon in our arsenal.
Thanks for your patience! The site should feel snappier. If you want to browse a fully cached version of your site, log out and clear any “blogs.law.harvard.edu” cookies. Logging out by itself isn’t enough, you have to clear your blogs.law cookies, too.

Ahmed
September 19, 2009 @ 5:38 pm
Can we outsource analytics to Google with a plugin like Google Analyticator:
“Google Analyticator adds the necessary JavaScript code to enable Google Analytics logging on any WordPress blog. This eliminates the need to edit your template code to begin logging. Google Analyticator also includes several widgets for displaying Analytics data in the admin and on your blog.”
Link: http://wordpress.org/extend/plugins/google-analyticator/
djcp
September 20, 2009 @ 12:15 pm
The problem isn’t getting the google analytics tracking code onto our blogs, it’s figuring out a way to get hundreds of authors to set up their own analytics accounts and then configure tracking codes properly. It’s not trivial and would require significant support resources, I expect. But I’ll look into it – thanks!
Brandon Haynes
October 20, 2009 @ 10:14 am
I experienced what can only be a cache-invalidation issue today, where changes to an already-published entry were reflected for my (authenticated) request, but the older version was served for unauthenticated users. As a result, a broken image in a newly-published entry persisted for almost an hour, even after I had long-since corrected it. The cache seems to auto-invalidate periodically, but it would be nice if it were updated whenever an entry was modified.
Just wanted to bring this to your attention; wasn’t sure if you were already aware of this issue.
B
djcp
October 20, 2009 @ 2:24 pm
Thanks for the heads up, your analysis is correct.
The frontend cache isn’t aware of backend changes, so if you were to publish a broken URL that’d continue to be served to non-authenticated users until the cache expires.
The cache times by file type (again, only for non-logged-in-users) are:
* Images, javascript, css, and office-like documents (.doc, .pdf, etc.) : 2 hours.
* Pages (pretty much anything that’s HTML): 20 minutes.
* RSS/Atom feeds: 45 minutes. XML feeds are very “expensive” for wordpress, especially on large blogs.
We only cache when the backend returns a file successfully, we DO NOT cache error codes or the non-existence of files.
I’ll try to see if I can come up with a solution in the next bit of time I spend on blog server maintenance. It would be nice to have the backend notify the frontend when a page should be de-cached.
Brandon Haynes
October 21, 2009 @ 12:21 pm
Thanks for those additional data; knowing the relative cache times is helpful. I’m of course willing to tolerate some cache staleness in exchange for (greatly) increased performance.
B
cynthia rockwell
October 25, 2009 @ 10:17 am
i have been getting loads of comment spam showing up in my regular comment moderation queue rather than in a spam queue. is akismet no longer working?
djcp
October 25, 2009 @ 1:50 pm
Cynthia,
Thanks for reporting this problem. I’ve upgraded Akismet to the latest version and I believe the issue should now be fixed. PLEASE contact us again if you continue to see problems.
cynthia rockwell
October 28, 2009 @ 4:04 pm
thanks very much….it seems to be continuing though, but not quite as badly. i checked in this morning and found 13 new spam in myregular comment queue. many more new ones in my spam queue too, so akismet is working but for some reason its not catching everything anymore, i guess…