Re: [urn] Suggested PWID URN for Persistent Web IDentifiers - version 3

worley@ariadne.com (Dale R. Worley) Sat, 21 July 2018 03:12 UTC

Return-Path: <worley@alum.mit.edu>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 79903130E83 for <urn@ietfa.amsl.com>; Fri, 20 Jul 2018 20:12:03 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.685
X-Spam-Level:
X-Spam-Status: No, score=-1.685 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, RCVD_IN_DNSWL_LOW=-0.7, SPF_SOFTFAIL=0.665] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LBCCGju2ceJV for <urn@ietfa.amsl.com>; Fri, 20 Jul 2018 20:12:01 -0700 (PDT)
Received: from resqmta-ch2-04v.sys.comcast.net (resqmta-ch2-04v.sys.comcast.net [IPv6:2001:558:fe21:29:69:252:207:36]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 84223130E6A for <urn@ietf.org>; Fri, 20 Jul 2018 20:12:01 -0700 (PDT)
Received: from resomta-ch2-20v.sys.comcast.net ([69.252.207.116]) by resqmta-ch2-04v.sys.comcast.net with ESMTP id gfWufUoVwYsfTgiJMf8nr9; Sat, 21 Jul 2018 03:12:00 +0000
Received: from hobgoblin.ariadne.com ([IPv6:2601:192:4603:9471:222:fbff:fe91:d396]) by resomta-ch2-20v.sys.comcast.net with ESMTPA id giJKf10tfTGiGgiJLfhouV; Sat, 21 Jul 2018 03:11:59 +0000
Received: from hobgoblin.ariadne.com (hobgoblin.ariadne.com [127.0.0.1]) by hobgoblin.ariadne.com (8.14.7/8.14.7) with ESMTP id w6L3BwJE005521; Fri, 20 Jul 2018 23:11:58 -0400
Received: (from worley@localhost) by hobgoblin.ariadne.com (8.14.7/8.14.7/Submit) id w6L3Bvub005518; Fri, 20 Jul 2018 23:11:57 -0400
X-Authentication-Warning: hobgoblin.ariadne.com: worley set sender to worley@alum.mit.edu using -f
From: worley@ariadne.com
To: Eld Zierau <elzi@kb.dk>
Cc: urn@ietf.org
In-Reply-To: <76f19e6c8892422d9d9475549218d82d@kb.dk> (elzi@kb.dk)
Sender: worley@ariadne.com
Date: Fri, 20 Jul 2018 23:11:57 -0400
Message-ID: <87va99qo3m.fsf@hobgoblin.ariadne.com>
X-CMAE-Envelope: MS4wfEO2/cDv0jz10l2YB8fzl0+ZgCTW5N2kUkrfXuAQ2FykSd1f5E3iG76L7oeWd9D8qvOJTGpbnwMn8MfoRINKPy7XbLmO7sJ3noMwYEedlX/zhgw/FxMR sG/9EotfJM1NNqH6v3CZ9whOcQOyFgkhZkOTVsM6aKOsmCm+YYEKQht+kxj6bz2Of0gMOrT7K7nbop/d5OjVApGj5CpCwmy9z9yhXkM4E67glMceRGm9cYDu SBXfrVssaoaE3puYzqPanQ==
Archived-At: <https://mailarchive.ietf.org/arch/msg/urn/Jzd9INZxhpNFiQH1ZxUhZ_0wkNY>
Subject: Re: [urn] Suggested PWID URN for Persistent Web IDentifiers - version 3
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: Revisions to URN RFCs <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/urn/>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 21 Jul 2018 03:12:04 -0000

Comments on this draft:

I am by no means an archivist, so my comments will be largely technical.

The draft seems to me to be prolix, giving a lot of discussion of
details that are not reflected in the URN proposal itself.  A careful
editing will probably fix this.

A major component of pwid-urn is archive-id.  My assumption is that the
archive-id is the top-level component and defines what abstract
"archive" the URN is a reference into.  And that the "archive" defines
the exact interpretation of archival-tine, coverage-spec, and
archived-item.  However, in order for PWID URNs to be unambiguous, it
must be unambiguous what "archive" a given archive-id refers to.  The
draft suggests "it is recommended to use the web domain as the identifier
for the web archive".

What if an institution decides to create PWID URNs that use an
archive-id which is a domain name that the institution does not own?  As
written, the draft does not forbid *me* from creating PWID URNs with
archive-id "netarkivet.dk".

In regard to archival-time, the draft states "The 'archival-time' [...]
can therefore be specified at any of the levels of granularity as
described in [W3CDTF]."  However, W3CDTF gives a number of formats for
time designations, and calls those formats "granularities", whereas this
draft allows only one format.  I suspect that a more careful description
of the meaning of the quoted sentence would fix this problem.

The syntax assumes that knowing archive-id, archival-time,
coverage-spec, and archived-item (the archived URI) will always be
sufficient to specify exactly the resource in question.  However, it
seems likely that special situations will arise when two distinct
resources will have attributes that are similar enough that the
distinction between them is not easily expressed in terms of archive-id,
archival-time, coverage-spec, or archived URI.  In this case, additional
information is needed to create an exact reference, and the URN syntax
needs to have an expansion facility to allow this.

archived-item can be a URI, and a URI can contain the characters '?',
'#', '[', and ']', among others.  However, those 4 characters may not
appear in the NSS part of a URN.  Note that the first 2 of these
characters are used in URIs only to introduce the "query" and "fragment"
parts, but as the draft is written, the archived-item URI is not
restricted to not having a query or fragment part.

Dale