Re: Atom Link Extensions Use Case
James M Snell <jasnell@gmail.com> Fri, 08 June 2012 14:48 UTC
Return-Path: <owner-atom-syntax@mail.imc.org>
X-Original-To: ietfarch-atompub-archive@ietfa.amsl.com
Delivered-To: ietfarch-atompub-archive@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AA5BB21F8895 for <ietfarch-atompub-archive@ietfa.amsl.com>; Fri, 8 Jun 2012 07:48:33 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.284
X-Spam-Level:
X-Spam-Status: No, score=-3.284 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1, SARE_MILLIONSOF=0.315]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id H8B8jSoHPm0P for <ietfarch-atompub-archive@ietfa.amsl.com>; Fri, 8 Jun 2012 07:48:32 -0700 (PDT)
Received: from hoffman.proper.com (IPv6.Hoffman.Proper.COM [IPv6:2605:8e00:100:41::81]) by ietfa.amsl.com (Postfix) with ESMTP id 1A1F221F8880 for <atompub-archive@ietf.org>; Fri, 8 Jun 2012 07:48:29 -0700 (PDT)
Received: from hoffman.proper.com (localhost [127.0.0.1]) by hoffman.proper.com (8.14.5/8.14.5) with ESMTP id q58EdgUZ003141 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 8 Jun 2012 07:39:42 -0700 (MST) (envelope-from owner-atom-syntax@mail.imc.org)
Received: (from majordom@localhost) by hoffman.proper.com (8.14.5/8.13.5/Submit) id q58Edg8S003140; Fri, 8 Jun 2012 07:39:42 -0700 (MST) (envelope-from owner-atom-syntax@mail.imc.org)
X-Authentication-Warning: hoffman.proper.com: majordom set sender to owner-atom-syntax@mail.imc.org using -f
Received: from mail-wi0-f181.google.com (mail-wi0-f181.google.com [209.85.212.181]) by hoffman.proper.com (8.14.5/8.14.5) with ESMTP id q58EdeWA003135 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=FAIL) for <atom-syntax@imc.org>; Fri, 8 Jun 2012 07:39:42 -0700 (MST) (envelope-from jasnell@gmail.com)
Received: by wibhn14 with SMTP id hn14so553234wib.4 for <atom-syntax@imc.org>; Fri, 08 Jun 2012 07:39:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; bh=7us51ZQly9JYhBKgsiwA8Wo4+hf5d1WsnQIUwLOjxK0=; b=QDIUTyNnkbLEOlLWiuikkmEjOm1RwrEGwipKN2t7+hFYQIXaGi0dHXYTAkQBfHq334 0T9d2JqZHRY6fOHpSjku6NCyBB9klvA3z1M6PZZZJh4e5lwJXxGuvhYKZQQWRXD6Wlsr Sbs/aO7k3jwayZw89QB/rmJZ4mnaa43y9HfCbeGXB6OgneH/TlSQ8ri0B+D8itH1p4l1 RA2hSGX1fEsPbm6R93lii1GYWPmSO0sLYx7Hp5O9K5ANMuqI4qy+avx8ZqGivplcz9wi 7WdqfObMWDL8f4RUlqAC2KgvJfiMV7W4R+QvxemGfdHPtMLzumVHrm5je5lzq6wZRheW oWIA==
Received: by 10.180.91.109 with SMTP id cd13mr951358wib.22.1339166380401; Fri, 08 Jun 2012 07:39:40 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.223.104.12 with HTTP; Fri, 8 Jun 2012 07:39:20 -0700 (PDT)
In-Reply-To: <CABzDd=4pwK3Ao=fGOL4K+vN3po9iwd2QBkmL8OwEw3ZmYvW=Xw@mail.gmail.com>
References: <CABzDd=4pwK3Ao=fGOL4K+vN3po9iwd2QBkmL8OwEw3ZmYvW=Xw@mail.gmail.com>
From: James M Snell <jasnell@gmail.com>
Date: Fri, 08 Jun 2012 07:39:20 -0700
Message-ID: <CABP7RbduNRpCZ2aTEqKd+TtUmVKmYFVHihzfZDBzZaV=kjAbhQ@mail.gmail.com>
Subject: Re: Atom Link Extensions Use Case
To: Ed Summers <ehs@pobox.com>
Cc: atom-syntax <atom-syntax@imc.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by hoffman.proper.com id q58EdgW9003136
Sender: owner-atom-syntax@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/atom-syntax/mail-archive/>
List-Unsubscribe: <mailto:atom-syntax-request@imc.org?body=unsubscribe>
List-ID: <atom-syntax.imc.org>
Good stuff. I can definitely resurrect the draft and move it forward if there's enough interest. - James On Fri, Jun 8, 2012 at 6:04 AM, Ed Summers <ehs@pobox.com> wrote: > Hi all, > > I am using Atom to syndicate access to data dumps at the Library of > Congress. We have a web application that provides access to historic > newspapers [1], and we have received requests for access to the > underlying OCR data for research and commercial purposes. Despite the > fact that this is historic data, we are routinely adding new content > as it is digitized. Rather than require clients to issue millions of > requests to get at the OCR data (which is actually web addressable) > the plan is to periodically create a tarred and compressed dump file > of new OCR content, and publish the availability of the file in an > Atom feed, which interested parties can subscribe to. It's a similar > model to what Wikimedia does for various Wikipedia projects [2]. > > Here's a minimal example, to give you an idea of what I mean (warning > URLs don't currently resolve): > > <?xml version="1.0" encoding="utf-8"?> > <feed xmlns="http://www.w3.org/2005/Atom"> > <title>Chronicling America OCR Dumps</title> > <link rel="self" type="application/atom+xml" > href="http://chroniclingamerica.loc.gov/dumps/ocr/feed/" /> > <id>info:lc/ndnp/dumps/ocr</id> > <author> > <name>Library of Congress</name> > <uri>http://loc.gov</uri> > </author> > <updated>2012-06-08T08:35:27-04:00</updated> > <entry> > <title>part-00001.tar.bz2</title> > <link rel="alternate" type="application/x-bzip2" > href="http://chroniclingamerica.loc.gov/data/dumps/ocr/part-00001.tar.bz2" > /> > <id>info:lc/ndnp/dump/ocr/part-00001.tar.bz2</id> > <updated>2012-06-07T13:57:23-04:00</updated> > <summary type="xhtml"><div > xmlns="http://www.w3.org/1999/xhtml">OCR dump file <a > href="http://chroniclingamerica.loc.gov/data/dumps/ocr/part-00001.tar.bz2">part-00001.tar.bz2</a> > with size 162.7 MB generated June 7, 2012, 1:57 p.m.</div></summary> > </entry> > </feed> > > So the reason why I am writing here is that I would like to add > checksum information to the feed to let clients verify that they have > downloaded the data dump file correctly. An argument could be made > that it's not necessary since a corrupted bz2 file would likely not > decompress. An argument could also be made that the Content-MD5 header > could be used. But I like the idea of making an explicit assertion > about the checksum in the Atom document. > > After a bit of googling I ran across James Snell's Atom Link > Extensions draft, which provides a pattern for including an md5 > checksum in the <link> element like so: > > <link rel="alternate" type="application/x-bzip2" > hash="md5:579758192095fde80896058af4ce0aee" > href="http://chroniclingamerica.loc.gov/data/dumps/ocr/part-00001.tar.bz2" > /> > > Unfortunately it looks like the draft has expired. I was wondering: > > a) are there other established patterns for adding checksum > information for resources in Atom > b) if it's worth it for James to update the draft and try to push it > forwards to an Informational status > > As more and more data providers make dumps of their data available to > reduce crawling (like Wikipedia) it seems like a good use case for > Atom to support. > > //Ed > > [1] http://chroniclingamerica.loc.gov > [2] http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-abstract.xml-rss.xml > [3] http://tools.ietf.org/html/draft-snell-atompub-link-extensions-08
- Atom Link Extensions Use Case Ed Summers
- Re: Atom Link Extensions Use Case Richard Salz
- Re: Atom Link Extensions Use Case Philippe Rathé
- Re: Atom Link Extensions Use Case James M Snell
- Re: Atom Link Extensions Use Case Tim Bray
- Re: Atom Link Extensions Use Case James M Snell
- Re: Atom Link Extensions Use Case Ed Summers
- Re: Atom Link Extensions Use Case Ed Summers
- Re: Atom Link Extensions Use Case Tim Bray
- Re: Atom Link Extensions Use Case Ed Summers
- Re: Atom Link Extensions Use Case James M Snell
- Re: Atom Link Extensions Use Case Philippe Rathé
- Re: Atom Link Extensions Use Case James M Snell
- Re: Atom Link Extensions Use Case Ed Summers
- Re: Atom Link Extensions Use Case Tim Bray