Re: [urn] Suggested PWID URN for Persistent Web IDentifiers - version 3

worley@ariadne.com (Dale R. Worley) Sat, 08 September 2018 14:08 UTC

Return-Path: <worley@alum.mit.edu>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7927212872C for <urn@ietfa.amsl.com>; Sat, 8 Sep 2018 07:08:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.685
X-Spam-Level:
X-Spam-Status: No, score=-1.685 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.25, RCVD_IN_DNSWL_LOW=-0.7, SPF_SOFTFAIL=0.665] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=comcastmailservice.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qqtTHWNiOkEF for <urn@ietfa.amsl.com>; Sat, 8 Sep 2018 07:08:43 -0700 (PDT)
Received: from resqmta-ch2-02v.sys.comcast.net (resqmta-ch2-02v.sys.comcast.net [IPv6:2001:558:fe21:29:69:252:207:34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1D1A2124BE5 for <urn@ietf.org>; Sat, 8 Sep 2018 07:08:42 -0700 (PDT)
Received: from resomta-ch2-14v.sys.comcast.net ([69.252.207.110]) by resqmta-ch2-02v.sys.comcast.net with ESMTP id ydRRfb0FiettHydujfkhyc; Sat, 08 Sep 2018 14:08:41 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcastmailservice.net; s=20180828_2048; t=1536415721; bh=gRtuSY6QMV14o8jO+0R1iRTkT1Ojrxz/qsNoJRRyJew=; h=Received:Received:Received:Received:From:To:Subject:Date: Message-ID; b=KKXt+xIo+TBD5ONgn4rT7UVV5PX6ZMt/H5r2nNySPU6HEN8eRlDgCQuMzhEMzgPny 4sDLMoblQKQifnkRV6kkgvFTdMv83iuMQArImKMwBz+vdi1jPqui3KM4pUWEnapDNd z3P6f8nIApO7QQiMHhfP27qW6XUVfF/ilWq/MsJ1YkXkejxolMtKyrId/qw6WN+aim btBjDm7fOHjs9yA8JREpDp2IQ/ZZexfG0jp9HXz63jZ/u11k0rOiwLRa8cw0wBABUe 6/52tDoFXS7lqgnKWkRNKoC4hXvtbMYrmfyWnHW6Wv6pGyQs/701glAs0CxLWMl4y0 VLzUywTs/5ZOQ==
Received: from hobgoblin.ariadne.com ([IPv6:2601:192:4603:9471:222:fbff:fe91:d396]) by resomta-ch2-14v.sys.comcast.net with ESMTPA id yduhfxSNOEXnCyduifbCWH; Sat, 08 Sep 2018 14:08:40 +0000
Received: from hobgoblin.ariadne.com (hobgoblin.ariadne.com [127.0.0.1]) by hobgoblin.ariadne.com (8.14.7/8.14.7) with ESMTP id w88E8cKL022334; Sat, 8 Sep 2018 10:08:38 -0400
Received: (from worley@localhost) by hobgoblin.ariadne.com (8.14.7/8.14.7/Submit) id w88E8bGo022329; Sat, 8 Sep 2018 10:08:37 -0400
X-Authentication-Warning: hobgoblin.ariadne.com: worley set sender to worley@alum.mit.edu using -f
From: worley@ariadne.com
To: elzi@kb.dk, urn@ietf.org
In-Reply-To: <f5b4lgl9htu.fsf@troutbeck.inf.ed.ac.uk>
Sender: worley@ariadne.com
Date: Sat, 08 Sep 2018 10:08:37 -0400
Message-ID: <87y3cculne.fsf@hobgoblin.ariadne.com>
X-CMAE-Envelope: MS4wfB7Rjur4OJSM/wED7IemF4Pm4H1ZTEDoIMOaqCkcToghXzVkUqLSW3rGNJLVAnc/TL59YbEMEv3RZAr8Ccfk3yiv6PQ4x7bwv1MHtT8Mpj68U0Wh0iw/ IuuQADGKBzh21VELmtGz0FtCMnp+1oZsCjGx8D+p1W3VkIHfyFQqGNIrNMON4vcooqjYMoICHFvHDaktxY2lUD9YBbZ2z/YYdF5WiO52MqJXDzwfMiuqB23c DzW89nnGTVAHykaIK0ZyfA==
Archived-At: <https://mailarchive.ietf.org/arch/msg/urn/iDVhRqQtdsX1zzfGrZyY0UAULV8>
Subject: Re: [urn] Suggested PWID URN for Persistent Web IDentifiers - version 3
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Revisions to URN RFCs <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/urn/>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 08 Sep 2018 14:08:45 -0000

Re-reading this thread, I think Henry's comments are significant in
regard to how this proposal will interact with actual archive use.  I
don't know enough about archiving to speak well to that, but there are
significant technical points that seem to be connect with or parallel
Henry's archiving considerations.


One critical element is the archive identifier.  There seem to be two
cases regarding what is intended.

One case is where the PWID URNs are constructed by archive organization
itself.  Thus, the archive organization can choose the archive
identifier, and presumably that will be a domain name that it controls.
That is not a perfect solution (we've had trouble with it in other URN
proposals), but it's clear.  But in that situation, the archive is
probably already minting URLs that can be used to retrieve archived
resources, so it's not clear that there is a large benefit to be gained.

The other case is where the URNs are constructed by third parties, that
is, neither the archive nor the consumer of the URN.  As Henry says,
this usage has the nature of a citation.  What is unclear in this case
is who selects the archive identifier of the archive.

One possibility in this case is that it is expected that there will be
only a small number of archives, and there's an IANA registry of archive
identifiers, and any person is allowed to register an identifier for any
archive.  Presumably, expert review could be used to keep the situation
under control.  This is alluded to by:

      On long term, there should be created a registry that keeps track
      of identifiers of archives over time, since they are likely to
      change names, merge etc. when taking about a 100 year period.


The proposal generally assumes that an archived resource can be
identified using just:
- the archive identifier
- the URL whose contents were archived
- the time at which the contents were archived

Embedding the URL into the URN presents syntactic problems.  The
characters [ and ] can be used in the host-address part of a URL, and ?
is used for queries.  The latter I consider to be particularly
important, as web links can often include query-parts.  This problem
needs to be solved, as a syntax that "covers about 80-95% of all cases"
is a syntax that doesn't suffice for the problem at hand.


I don't see a need to worry about fragment-parts of archived URLs, as
the fragment structure is inherently embedded in the resource retrieved
via the URL.  So an archive can archive the URL-without-fragment, the
URN can reference that URL, and user can attach the required
fragment-part to the URN to specify the desired fragment.


There is some lack of clarity about coverage-spec.  If the "coverage" is
*part* of the archived resource, then it is, or should be, a fragment,
and that can be specified by appending a fragment-part to the URN.  If
the coverage is metadata about the resource, it seems to be undefined
what forms the metadata resource could take.  But some coverage values
seem to suggest that the referenced information is a set of resources
which contains as a member the resource designated by the recorded URL.
This concept gets very interesting indeed.  There doesn't seem to be any
defined resource type for "a web site", or even "all the files needed to
display a web page".  Also, it's not clear what the archival-time means
in this context, since presumably the archive need not contain archived
copies of everything in the aggregate that were all made at the
specified archival-time.

Dale