Atom Link Extensions Use Case
Ed Summers <ehs@pobox.com> Fri, 08 June 2012 13:11 UTC
Return-Path: <owner-atom-syntax@mail.imc.org>
X-Original-To: ietfarch-atompub-archive@ietfa.amsl.com
Delivered-To: ietfarch-atompub-archive@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0D92F21F880B for <ietfarch-atompub-archive@ietfa.amsl.com>; Fri, 8 Jun 2012 06:11:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.662
X-Spam-Level:
X-Spam-Status: No, score=-2.662 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, RCVD_IN_DNSWL_LOW=-1, SARE_MILLIONSOF=0.315]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id r2poTbq5Z4hp for <ietfarch-atompub-archive@ietfa.amsl.com>; Fri, 8 Jun 2012 06:11:43 -0700 (PDT)
Received: from hoffman.proper.com (IPv6.Hoffman.Proper.COM [IPv6:2605:8e00:100:41::81]) by ietfa.amsl.com (Postfix) with ESMTP id D36B021F8875 for <atompub-archive@ietf.org>; Fri, 8 Jun 2012 06:11:42 -0700 (PDT)
Received: from hoffman.proper.com (localhost [127.0.0.1]) by hoffman.proper.com (8.14.5/8.14.5) with ESMTP id q58D44hE094868 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 8 Jun 2012 06:04:04 -0700 (MST) (envelope-from owner-atom-syntax@mail.imc.org)
Received: (from majordom@localhost) by hoffman.proper.com (8.14.5/8.13.5/Submit) id q58D44MK094867; Fri, 8 Jun 2012 06:04:04 -0700 (MST) (envelope-from owner-atom-syntax@mail.imc.org)
X-Authentication-Warning: hoffman.proper.com: majordom set sender to owner-atom-syntax@mail.imc.org using -f
Received: from mail-ob0-f171.google.com (mail-ob0-f171.google.com [209.85.214.171]) by hoffman.proper.com (8.14.5/8.14.5) with ESMTP id q58D43PX094861 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=FAIL) for <atom-syntax@imc.org>; Fri, 8 Jun 2012 06:04:04 -0700 (MST) (envelope-from ed.summers@gmail.com)
Received: by obfk16 with SMTP id k16so5681692obf.16 for <atom-syntax@imc.org>; Fri, 08 Jun 2012 06:04:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type; bh=PWxl3gN0s5Recxk1vP/vslkBLMNk0mgxqE+WAMJwQBA=; b=OHqr/+bkFutPuXUUGeDHJdWxzv2AQh7LEAeOjazvZWQNimqM7pVsmSNc3kdBHFeLRL wViPSXNqon1VVPpPY+k/AjGrlKVPIDAbrPJaOJog6xo05XaCAO86zCBbj1Ez2b/VcLvz VhV3ff1ZiuNfvzlqvjVCpvPqNFxAygGqYWeyANDK0vNK/Y27/zge4qeeXv5/R8q7gh4z r8rXRFWXHtEs8ZWquxL5jr+rf9g4oiJIF6L8LsX39AdccDqjWJObA8hG3mfD3VJh/jq1 1hENt5TQj5RsgsuWqapBWnnFyqfU24LVEW1th6/Pnco6HNU18QOq3L/vQRTBfKHpe1Ib 14FA==
MIME-Version: 1.0
Received: by 10.182.40.71 with SMTP id v7mr6419840obk.5.1339160643191; Fri, 08 Jun 2012 06:04:03 -0700 (PDT)
Received: by 10.60.147.138 with HTTP; Fri, 8 Jun 2012 06:04:03 -0700 (PDT)
Date: Fri, 08 Jun 2012 09:04:03 -0400
X-Google-Sender-Auth: ZPhCmAynBhOQ5Mlj_qS50nU7Zpc
Message-ID: <CABzDd=4pwK3Ao=fGOL4K+vN3po9iwd2QBkmL8OwEw3ZmYvW=Xw@mail.gmail.com>
Subject: Atom Link Extensions Use Case
From: Ed Summers <ehs@pobox.com>
To: atom-syntax <atom-syntax@imc.org>
Cc: James Snell <jasnell@gmail.com>
Content-Type: text/plain; charset="ISO-8859-1"
Sender: owner-atom-syntax@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/atom-syntax/mail-archive/>
List-Unsubscribe: <mailto:atom-syntax-request@imc.org?body=unsubscribe>
List-ID: <atom-syntax.imc.org>
Hi all, I am using Atom to syndicate access to data dumps at the Library of Congress. We have a web application that provides access to historic newspapers [1], and we have received requests for access to the underlying OCR data for research and commercial purposes. Despite the fact that this is historic data, we are routinely adding new content as it is digitized. Rather than require clients to issue millions of requests to get at the OCR data (which is actually web addressable) the plan is to periodically create a tarred and compressed dump file of new OCR content, and publish the availability of the file in an Atom feed, which interested parties can subscribe to. It's a similar model to what Wikimedia does for various Wikipedia projects [2]. Here's a minimal example, to give you an idea of what I mean (warning URLs don't currently resolve): <?xml version="1.0" encoding="utf-8"?> <feed xmlns="http://www.w3.org/2005/Atom"> <title>Chronicling America OCR Dumps</title> <link rel="self" type="application/atom+xml" href="http://chroniclingamerica.loc.gov/dumps/ocr/feed/" /> <id>info:lc/ndnp/dumps/ocr</id> <author> <name>Library of Congress</name> <uri>http://loc.gov</uri> </author> <updated>2012-06-08T08:35:27-04:00</updated> <entry> <title>part-00001.tar.bz2</title> <link rel="alternate" type="application/x-bzip2" href="http://chroniclingamerica.loc.gov/data/dumps/ocr/part-00001.tar.bz2" /> <id>info:lc/ndnp/dump/ocr/part-00001.tar.bz2</id> <updated>2012-06-07T13:57:23-04:00</updated> <summary type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml">OCR dump file <a href="http://chroniclingamerica.loc.gov/data/dumps/ocr/part-00001.tar.bz2">part-00001.tar.bz2</a> with size 162.7 MB generated June 7, 2012, 1:57 p.m.</div></summary> </entry> </feed> So the reason why I am writing here is that I would like to add checksum information to the feed to let clients verify that they have downloaded the data dump file correctly. An argument could be made that it's not necessary since a corrupted bz2 file would likely not decompress. An argument could also be made that the Content-MD5 header could be used. But I like the idea of making an explicit assertion about the checksum in the Atom document. After a bit of googling I ran across James Snell's Atom Link Extensions draft, which provides a pattern for including an md5 checksum in the <link> element like so: <link rel="alternate" type="application/x-bzip2" hash="md5:579758192095fde80896058af4ce0aee" href="http://chroniclingamerica.loc.gov/data/dumps/ocr/part-00001.tar.bz2" /> Unfortunately it looks like the draft has expired. I was wondering: a) are there other established patterns for adding checksum information for resources in Atom b) if it's worth it for James to update the draft and try to push it forwards to an Informational status As more and more data providers make dumps of their data available to reduce crawling (like Wikipedia) it seems like a good use case for Atom to support. //Ed [1] http://chroniclingamerica.loc.gov [2] http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-abstract.xml-rss.xml [3] http://tools.ietf.org/html/draft-snell-atompub-link-extensions-08
- Atom Link Extensions Use Case Ed Summers
- Re: Atom Link Extensions Use Case Richard Salz
- Re: Atom Link Extensions Use Case Philippe Rathé
- Re: Atom Link Extensions Use Case James M Snell
- Re: Atom Link Extensions Use Case Tim Bray
- Re: Atom Link Extensions Use Case James M Snell
- Re: Atom Link Extensions Use Case Ed Summers
- Re: Atom Link Extensions Use Case Ed Summers
- Re: Atom Link Extensions Use Case Tim Bray
- Re: Atom Link Extensions Use Case Ed Summers
- Re: Atom Link Extensions Use Case James M Snell
- Re: Atom Link Extensions Use Case Philippe Rathé
- Re: Atom Link Extensions Use Case James M Snell
- Re: Atom Link Extensions Use Case Ed Summers
- Re: Atom Link Extensions Use Case Tim Bray