Using SSL to Prove Document Authenticity

1

This blog post is an idea that I’ve been kicking around for a while but haven’t had the time to research or implement.  I’ve finally decided just to post it speculatively.  I’m really hoping to get feed back from those in the community more knowledgeable about SSL than I am.  Note: This is a relatively geeky topic if you don’t understand what https:// and SSL are this post won’t make much sense…

Introduction

Does anyone know anything about the internals of https?  I was wondering if there is any way to prove that a document downloaded over https really came from the site you claim that it came from.  In other words, if you download a document over https, is there anyway for you to prove to a third party that it actually came from the web site you claim it came from? For example,  let’s say that Alice downloads doc.pdf from https://foobar.com/doc.pdf. https provides Alice assurance that doc.pdf really came from foobar.com (assuming that the certificate is legitimate).  But assuming doc.pdf does not have a digital signature,  if Alice simply sends the downloaded file to Bob, he has no proof that the file actually came from foobar.com. (Obviously, the ideal solution would be for the maintainer of foobar.com to digitially sign the pdf file. But few websites digitially sign the files they distribute and individual users often have no means of convincing a web site to do so.)  My question is whether there is any way for Alice to prove to Bob that she really obtained the file from foobar.com.  I thought that it might be possible for Alice to prove the file’s origin by sending some of the raw network traffic establishing the SSL connection along with the file.  (I’m using a PDF file to simplify the example but presumably the same issues would apply to a web page.)

Use Cases

PACER is an online service used by the United States federal courts to provide online access to court records and documents.  The documents on PACER are generally thought to be in the public domain but remain behind a pay wall.   Efforts such as  the PACER Recycling Project and RECAP allow users to upload PDF documents obtained from PACER to a central server where the documents can then be freely downloaded by others.  However, while PACER uses SSL, it does not provide digitally signed PDF files.  Thus users currently have no way to prove that the documents really came from PACER.

Another use case, is as a replacement for web screen shots.  Because web pages can be easily altered or taken down,  screen shots are often offered as “proof” that a web page used to exist even if it has since been altered or removed.  For example, this CNET news story describes how pranksters from 4chan retaliated against AT&T for blocking their site by posting a fake report saying that AT&T’s CEO died.  The story includes this screen shot of the pranked web page prior to its removal.  Of course screen shots can be easily faked or altered using tools such as Photo Shop or just by saving and editing the html.  Presumably web screen shots posted by CNET are relatively trustworthy, but what about screen shots posted by unknown users?

Ideal Solution

I envision a Firefox extension that would allow a user to easily create an archive bundle for an https: web page containing the page and SSL information proving its legitimacy.  (Obviously this would need to work for single files as well as web pages.)  This bundle would allow other users to view the web page of file as it existed and provide easily verifiable proof that the web page really came from the site in question.

My Questions for the SSL Knowledgable

Is this doable at all?

Screen shots are trivial to fake, if this approach can’t provide perfect proof of the origin of a document how much more assurance would it give you than just a screen shot?

Would releasing the raw https traffic also mean that Alice would be releasing her user name and password?

A minor concern is that the fact that a web site hosted or displayed a particular page is slightly different from the web site signing a file.  Furthermore, there may be issues with XSS vulnerabilities that allow attackers to make an https web site display arbitrary content.  However, XSS attack are a problem now with screen shot being passed around and XSS altered pages could probably be detected by viewing the html source.

But Not All Web Sites Use SSL

It has been repeatedly shown that web 2.0 applications such as gmail and facebook cannot be used securely over an unencrypted connection.  For example, hijacking the account of a facebook users on the same network is trivial. Perhaps I’m being overly optimistic but I believe once these vulnerabilities become more widely know and attack scripts/ exploits become widely available web applications will move to SSL as the default or at least offer https as an option.  (GMail already has an option to enable https though it is buried deeply within the settings.)

Please Comment

There you have it: my first real blog post.  Please let me know what you think.

First Blog Post

1

“First Post” as they used to say on slashdot.

This is my first blog post.

After spending far too long considering blogging and agonizing about the optimal approach, I’ve decided to just jump into it.

As such, I’m not sure how this blog will evolve over time.  For example,  I expect to focus on technical topics but I reserve the right to write about other things.  Similarly I’m using blogs.law.harvard.edu because I work at Berkman and it is a convenient platform with has good Google juice but I might decide to move to another server in the future.

Some of my goals of blogging are:

  • Increasing my Google rank so I’m the first search result for “David Larochelle”.
  • Getting ideas out of my head and into a place where other people can see them.

    I occasionally think of projects or ideas that I haven’t seen mentioned elsewhere.  Unfortunately I don’t have time to implement everything so a lot of this stuff gets stuck in my head.  As they say in “Getting Things Done”, you should get stuff out of your head.  So by posting this stuff here, I hope to get it out of my head and get feed back from the Internet community on what’s worth implementing.

  • Having a web site that I can point people to (my old home page is now hopelessly out of date).   I know that home pages are less popular after the rise of social networking but I still think that twitter and facebook are not always enough.  Facebook is quasi-private and twitter’s message limit makes it impossible to deeply discuss things.  Also both services don’t archive well and can be thought of as temporary media.

Well there you have it — the intro to my blog.  Check back soon for a blog post about something other than blogging.

Log in
Protected by AkismetBlog with WordPress