[urn] Comments on PWID -05
worley@ariadne.com (Dale R. Worley) Thu, 28 February 2019 03:19 UTC
Return-Path: <worley@alum.mit.edu>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8C0BC1228B7 for <urn@ietfa.amsl.com>; Wed, 27 Feb 2019 19:19:35 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.934
X-Spam-Level:
X-Spam-Status: No, score=-1.934 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_SOFTFAIL=0.665] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=comcastmailservice.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rdDpUJt6POTY for <urn@ietfa.amsl.com>; Wed, 27 Feb 2019 19:19:33 -0800 (PST)
Received: from resqmta-ch2-02v.sys.comcast.net (resqmta-ch2-02v.sys.comcast.net [IPv6:2001:558:fe21:29:69:252:207:34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BCB2B12008F for <urn@ietf.org>; Wed, 27 Feb 2019 19:19:33 -0800 (PST)
Received: from resomta-ch2-16v.sys.comcast.net ([69.252.207.112]) by resqmta-ch2-02v.sys.comcast.net with ESMTP id zCDoglTGSUNMVzCEOg3BJj; Thu, 28 Feb 2019 03:19:32 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcastmailservice.net; s=20180828_2048; t=1551323972; bh=918dOk8iceaYvl0DzL2GGxwnFgWtLoDJZ7omJ7xlnK8=; h=Received:Received:Received:Received:From:To:Subject:Date: Message-ID; b=GnoCzrR5tLBF28lzmiA5jDXHfd4EIm2E2n2ScvV3ANspXneeoSsr3AcuB4JmG8od2 M0I1FwcHIwnXoO9Emt1DUNMteR7uYJubhUEJeSUE3J0msV7G2ofIS/Ic3T2ZFC4xsl zj5Q5QNZtWJr8dgM4PgtBM6VOKaIHyORjyE5CbbOdvmRzTHG4fQRPSv3mZ+ToJTPvs NEgNmGQb3xbtdjwS+KLs8M6Mx2S6w+IAqRZcOlJ/NyZXOZpHXPYSCN3M99F/dqS5gb f8fEg7jSYL9cPRxydaq5YA9wK5M46/UPd2kZKue0BaYalisl8fqnCFdEOSDP9PVPd5 Raom3wQ2fZpZA==
Received: from hobgoblin.ariadne.com ([IPv6:2601:192:4603:9471:222:fbff:fe91:d396]) by resomta-ch2-16v.sys.comcast.net with ESMTPA id zCENgCh1JcxRXzCEOgfNHz; Thu, 28 Feb 2019 03:19:32 +0000
X-Xfinity-VMeta: sc=-100;st=legit
Received: from hobgoblin.ariadne.com (hobgoblin.ariadne.com [127.0.0.1]) by hobgoblin.ariadne.com (8.14.7/8.14.7) with ESMTP id x1S3JVZ5030227; Wed, 27 Feb 2019 22:19:31 -0500
Received: (from worley@localhost) by hobgoblin.ariadne.com (8.14.7/8.14.7/Submit) id x1S3JUXh030224; Wed, 27 Feb 2019 22:19:30 -0500
X-Authentication-Warning: hobgoblin.ariadne.com: worley set sender to worley@alum.mit.edu using -f
From: worley@ariadne.com
To: Eld Zierau <elzi@kb.dk>
Cc: L.Svensson@dnb.de, urn@ietf.org
Sender: worley@ariadne.com
Date: Wed, 27 Feb 2019 22:19:30 -0500
Message-ID: <87d0ncha65.fsf@hobgoblin.ariadne.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/urn/BAWQqe-fzfc7PNoXyEZsRFmF-w4>
Subject: [urn] Comments on PWID -05
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Revisions to URN RFCs <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/urn/>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Feb 2019 03:19:35 -0000
The discussion of archive-id is considerably clearer than before but it seems to me that there should be some discussion of how to distinguish domain name archive-ids (e.g., netarkivet.dk) and archive-ids that have to be looked up in the (future) registry. Being able to reliably make this decision seems to be the first step in resolving a PWID. I think arranging this is straightforward, since the syntax of archive-id is "+( unreserved )", and many of those strings are not allowed as DNS names for hosts. E.g., one could require that any archive-id that is not intended to be interpreted as a DNS name to start with one of "-", ".", "_", "~". Similar considerations apply to archived-item-id and distinguishing it from URI. (All URIs must start with a letter.) precision-spec = "part" / "page" / "subsite" / "site" / "collection" / "recording" / "snapshot" / "other" Is the inclusion of "other" the best way to handle this? Usually a component like this would allow "extension values" (that conform to the same syntax as the defined values, e.g., "+letter"). As written, everything that cannot be classified as "part", "page", ..., "snapshot" would have to be labeled "other", even if a particular archive had several different additional precision values that it operated with internally. * 'URI' is defined as in [RFC3986] but where occurrences of "[", "]", "?" and "#" are %-encoded in order not to clash with URN reserved characters [RFC8141]. This gets complicated. For example "http://example.com/foo#bar" is a different URL than "http://example.com/foo%23bar", and might have different contents. You can't use "http://example.com/foo%23bar" as the archived-item part of PWIDs for the saved contents of both of these URLs. One possibility is to set the archived-item string to be URI with [, ], ?, #, and % all %-encoded, so that the two URLs have these archived-item values: http://example.com/foo%23bar http://example.com/foo%2523bar That would be laborious, though, if many URLs contain %-escapes and humans have to copy PWID URNs by hand. * 'archival-time' is a UTC timestamp as described in the W3C profile of [ISO8601] [W3CDTF] (also defined in [RFC3339]), for example YYYY-MM-DDThh:mm:ssZ. Looking at RFC 3339, I see: date-fullyear = 4DIGIT date-month = 2DIGIT ; 01-12 date-mday = 2DIGIT ; 01-28, 01-29, 01-30, 01-31 based on ; month/year time-hour = 2DIGIT ; 00-23 time-minute = 2DIGIT ; 00-59 time-second = 2DIGIT ; 00-58, 00-59, 00-60 based on leap second ; rules time-secfrac = "." 1*DIGIT time-numoffset = ("+" / "-") time-hour ":" time-minute time-offset = "Z" / time-numoffset partial-time = time-hour ":" time-minute ":" time-second [time-secfrac] full-date = date-fullyear "-" date-month "-" date-mday full-time = partial-time time-offset date-time = full-date "T" full-time But comparing that to W3CDTF, I see no single nontermainal which corresponds to the set of formats allowed in W3CDTF. I suggest you make a more rigid specification as to what is allwed for archival-time. [W3CDTF] W3C, "Date and Time Formats: note submitted to the W3C. 15 September 1997", 1997, <http://www.w3.org/TR/NOTE-datetime>. W3C profile of ISO 8601 urn:pwid:archive.org:2017-04- 03T03:37:42Z:page:http://www.w3.org/TR/NOTE-datetime The final two lines of this block look like a mis-formatted bibliographic reference. Dale
- [urn] Comments on PWID -05 Dale R. Worley
- Re: [urn] Comments on PWID -05 Martin J. Dürst
- Re: [urn] Comments on PWID -05 Eld Zierau
- Re: [urn] Comments on PWID -05 Dale R. Worley
- Re: [urn] Comments on PWID -05 Martin J. Dürst