First Media Cloud Code Release

March 19th, 2009 | Tags:

We’re happy to announce that we’re releasing the first version of the Media Cloud code.

This code does three things:

  • Runs a web app that allows you to manage a set of media sources and their feeds.
  • Periodically crawls the feeds setup within the web app and downloads any new stories found within the downloaded feeds.
  • Extracts the substantive text from the downloaded story content (minus the ads, navigation, comments, etc) and associates a set of tags with each story based on that extracted text.

The system is written perl on top of postgres and uses the catalyst web application framework for the web application.

We’ve been running the code for almost a year now in production, but we’re publishing this as an alpha release because we have not extensively documented the installation or use of the system.

  1. Mimi
    March 29th, 2009 at 09:37
    Reply | Quote | #1

    Great job. Would be good also to skim any duplication of articles and maybe just report how many times it has been recycled through different feeds.

  2. Ben Reilly
    March 31st, 2009 at 01:37
    Reply | Quote | #2

    The lib/Bundle/MediaWords.pm file contains a line that read “HTTP::CLIENT::PARALLEL/m”. I believe the “/m” should not be present.

  3. April 10th, 2009 at 16:40
    Reply | Quote | #3

    I just read about the Wolfram|Alpha search/discovery tool. Somehow (in my blissful ignorance) it seems that your application could make good use of their computational architecture and algorithms. http://www.hplusmagazine.com/articles/ai/wolframalpha-searching-truth

TOP