[urn] Comments on PWID -05

worley@ariadne.com (Dale R. Worley) Thu, 28 February 2019 03:19 UTC

Return-Path: <worley@alum.mit.edu>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8C0BC1228B7 for <urn@ietfa.amsl.com>; Wed, 27 Feb 2019 19:19:35 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.934
X-Spam-Level:
X-Spam-Status: No, score=-1.934 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_SOFTFAIL=0.665] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=comcastmailservice.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rdDpUJt6POTY for <urn@ietfa.amsl.com>; Wed, 27 Feb 2019 19:19:33 -0800 (PST)
Received: from resqmta-ch2-02v.sys.comcast.net (resqmta-ch2-02v.sys.comcast.net [IPv6:2001:558:fe21:29:69:252:207:34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BCB2B12008F for <urn@ietf.org>; Wed, 27 Feb 2019 19:19:33 -0800 (PST)
Received: from resomta-ch2-16v.sys.comcast.net ([69.252.207.112]) by resqmta-ch2-02v.sys.comcast.net with ESMTP id zCDoglTGSUNMVzCEOg3BJj; Thu, 28 Feb 2019 03:19:32 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcastmailservice.net; s=20180828_2048; t=1551323972; bh=918dOk8iceaYvl0DzL2GGxwnFgWtLoDJZ7omJ7xlnK8=; h=Received:Received:Received:Received:From:To:Subject:Date: Message-ID; b=GnoCzrR5tLBF28lzmiA5jDXHfd4EIm2E2n2ScvV3ANspXneeoSsr3AcuB4JmG8od2 M0I1FwcHIwnXoO9Emt1DUNMteR7uYJubhUEJeSUE3J0msV7G2ofIS/Ic3T2ZFC4xsl zj5Q5QNZtWJr8dgM4PgtBM6VOKaIHyORjyE5CbbOdvmRzTHG4fQRPSv3mZ+ToJTPvs NEgNmGQb3xbtdjwS+KLs8M6Mx2S6w+IAqRZcOlJ/NyZXOZpHXPYSCN3M99F/dqS5gb f8fEg7jSYL9cPRxydaq5YA9wK5M46/UPd2kZKue0BaYalisl8fqnCFdEOSDP9PVPd5 Raom3wQ2fZpZA==
Received: from hobgoblin.ariadne.com ([IPv6:2601:192:4603:9471:222:fbff:fe91:d396]) by resomta-ch2-16v.sys.comcast.net with ESMTPA id zCENgCh1JcxRXzCEOgfNHz; Thu, 28 Feb 2019 03:19:32 +0000
X-Xfinity-VMeta: sc=-100;st=legit
Received: from hobgoblin.ariadne.com (hobgoblin.ariadne.com [127.0.0.1]) by hobgoblin.ariadne.com (8.14.7/8.14.7) with ESMTP id x1S3JVZ5030227; Wed, 27 Feb 2019 22:19:31 -0500
Received: (from worley@localhost) by hobgoblin.ariadne.com (8.14.7/8.14.7/Submit) id x1S3JUXh030224; Wed, 27 Feb 2019 22:19:30 -0500
X-Authentication-Warning: hobgoblin.ariadne.com: worley set sender to worley@alum.mit.edu using -f
From: worley@ariadne.com
To: Eld Zierau <elzi@kb.dk>
Cc: L.Svensson@dnb.de, urn@ietf.org
Sender: worley@ariadne.com
Date: Wed, 27 Feb 2019 22:19:30 -0500
Message-ID: <87d0ncha65.fsf@hobgoblin.ariadne.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/urn/BAWQqe-fzfc7PNoXyEZsRFmF-w4>
Subject: [urn] Comments on PWID -05
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Revisions to URN RFCs <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/urn/>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Feb 2019 03:19:35 -0000

The discussion of archive-id is considerably clearer than before but it
seems to me that there should be some discussion of how to distinguish
domain name archive-ids (e.g., netarkivet.dk) and archive-ids that have
to be looked up in the (future) registry.  Being able to reliably make
this decision seems to be the first step in resolving a PWID.

I think arranging this is straightforward, since the syntax of
archive-id is "+( unreserved )", and many of those strings are not
allowed as DNS names for hosts.  E.g., one could require that any
archive-id that is not intended to be interpreted as a DNS name to start
with one of "-", ".", "_", "~".

Similar considerations apply to archived-item-id and distinguishing it
from URI.  (All URIs must start with a letter.)

           precision-spec = "part" / "page" / "subsite" / "site"
                    / "collection" / "recording" / "snapshot"
                    / "other"

Is the inclusion of "other" the best way to handle this?  Usually a
component like this would allow "extension values" (that conform to the
same syntax as the defined values, e.g., "+letter").  As written,
everything that cannot be classified as "part", "page", ..., "snapshot"
would have to be labeled "other", even if a particular archive had
several different additional precision values that it operated with
internally.

      *  'URI' is defined as in [RFC3986] but where occurrences of "[",
         "]", "?" and "#" are %-encoded in order not to clash with URN
         reserved characters [RFC8141].

This gets complicated.  For example "http://example.com/foo#bar" is a
different URL than "http://example.com/foo%23bar", and might have
different contents.  You can't use "http://example.com/foo%23bar" as the
archived-item part of PWIDs for the saved contents of both of these
URLs.

One possibility is to set the archived-item string to be URI with [, ],
?, #, and % all %-encoded, so that the two URLs have these archived-item
values:
    http://example.com/foo%23bar
    http://example.com/foo%2523bar
That would be laborious, though, if many URLs contain %-escapes and
humans have to copy PWID URNs by hand.

      *  'archival-time' is a UTC timestamp as described in the W3C
         profile of [ISO8601] [W3CDTF] (also defined in [RFC3339]), for
         example YYYY-MM-DDThh:mm:ssZ.

Looking at RFC 3339, I see:

   date-fullyear   = 4DIGIT
   date-month      = 2DIGIT  ; 01-12
   date-mday       = 2DIGIT  ; 01-28, 01-29, 01-30, 01-31 based on
                             ; month/year
   time-hour       = 2DIGIT  ; 00-23
   time-minute     = 2DIGIT  ; 00-59
   time-second     = 2DIGIT  ; 00-58, 00-59, 00-60 based on leap second
                             ; rules
   time-secfrac    = "." 1*DIGIT
   time-numoffset  = ("+" / "-") time-hour ":" time-minute
   time-offset     = "Z" / time-numoffset

   partial-time    = time-hour ":" time-minute ":" time-second
                     [time-secfrac]
   full-date       = date-fullyear "-" date-month "-" date-mday
   full-time       = partial-time time-offset

   date-time       = full-date "T" full-time

But comparing that to W3CDTF, I see no single nontermainal which
corresponds to the set of formats allowed in W3CDTF.  I suggest you make
a more rigid specification as to what is allwed for archival-time.

   [W3CDTF]   W3C, "Date and Time Formats: note submitted to the W3C. 15
              September 1997", 1997,
              <http://www.w3.org/TR/NOTE-datetime>.

              W3C profile of ISO 8601 urn:pwid:archive.org:2017-04-
              03T03:37:42Z:page:http://www.w3.org/TR/NOTE-datetime

The final two lines of this block look like a mis-formatted
bibliographic reference.

Dale