<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule"
	>
<channel>
	<title>Comments on: Public records, one JPEG at a time?</title>
	<atom:link href="http://blogs.law.harvard.edu/infolaw/2008/06/02/public-records-one-jpeg-at-a-time/feed/" rel="self" type="application/rss+xml" />
	<link>http://blogs.law.harvard.edu/infolaw/2008/06/02/public-records-one-jpeg-at-a-time/</link>
	<description>Information, Law, and the Law of Information</description>
	<lastBuildDate>Sat, 28 Nov 2009 11:40:16 -0500</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Daniel Bell</title>
		<link>http://blogs.law.harvard.edu/infolaw/2008/06/02/public-records-one-jpeg-at-a-time/comment-page-1/#comment-96468</link>
		<dc:creator>Daniel Bell</dc:creator>
		<pubDate>Wed, 09 Sep 2009 21:59:31 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.law.harvard.edu/infolaw/2008/06/02/public-records-one-jpeg-at-a-time/#comment-96468</guid>
		<description>JPEGS are not ideal because they are more difficult to print, being images. If they have been scanned this way then they are sometimes a nightmare to read on screen because they are slow to load, and printing them sometimes means having to print a gray blurry background (depending on the scan quality).</description>
		<content:encoded><![CDATA[<p>JPEGS are not ideal because they are more difficult to print, being images. If they have been scanned this way then they are sometimes a nightmare to read on screen because they are slow to load, and printing them sometimes means having to print a gray blurry background (depending on the scan quality).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Chris S</title>
		<link>http://blogs.law.harvard.edu/infolaw/2008/06/02/public-records-one-jpeg-at-a-time/comment-page-1/#comment-54251</link>
		<dc:creator>Chris S</dc:creator>
		<pubDate>Tue, 10 Jun 2008 00:11:07 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.law.harvard.edu/infolaw/2008/06/02/public-records-one-jpeg-at-a-time/#comment-54251</guid>
		<description>Thats quite a nice effort there, the content looks good!

Personally, I find that the real problem here is not one of a lack of data available, its the lack of a uniform method of accessing that data.  It is simply not sufficient to have so many different ways to access this data, mostly HTTP and web scraping.  THOMAS has done a better job than most in providing a standardized means of access, but its all &quot;one off.&quot;  If I write code to access data via THOMAS its not the same as what I have to do to get S.Ct cases, or what I have to do to get Patents.  

I have thought of numerous intriguing things that I could do if I were able to access this wealth of information programmatically.  Before we can do any of these neat things, we need to able to mine the data.  Before we can do real mining, we need to be able to build a searchable repository (all real search efficiency is driven by indexed search data, depending on a singular (if distributed) search index).  Before we can do all that, we need a uniform API to access this diverse data.  

The world of imaginable research becomes your oyster if you have this API acting as a wrapper to all of these data sources.

However, developing this API is no small task, and as much of a fan of open software as I am, I have little faith that the community development process could develop such a complex API efficiently.</description>
		<content:encoded><![CDATA[<p>Thats quite a nice effort there, the content looks good!</p>
<p>Personally, I find that the real problem here is not one of a lack of data available, its the lack of a uniform method of accessing that data.  It is simply not sufficient to have so many different ways to access this data, mostly HTTP and web scraping.  THOMAS has done a better job than most in providing a standardized means of access, but its all &#8220;one off.&#8221;  If I write code to access data via THOMAS its not the same as what I have to do to get S.Ct cases, or what I have to do to get Patents.  </p>
<p>I have thought of numerous intriguing things that I could do if I were able to access this wealth of information programmatically.  Before we can do any of these neat things, we need to able to mine the data.  Before we can do real mining, we need to be able to build a searchable repository (all real search efficiency is driven by indexed search data, depending on a singular (if distributed) search index).  Before we can do all that, we need a uniform API to access this diverse data.  </p>
<p>The world of imaginable research becomes your oyster if you have this API acting as a wrapper to all of these data sources.</p>
<p>However, developing this API is no small task, and as much of a fan of open software as I am, I have little faith that the community development process could develop such a complex API efficiently.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
