Comments on draft-seantek-rdf-urn-00

worley@ariadne.com (Dale R. Worley) Thu, 13 November 2014 02:37 UTC

Return-Path: <worley@ariadne.com>
X-Original-To: urn-nid@ietfa.amsl.com
Delivered-To: urn-nid@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 94F7A1A1ADA for <urn-nid@ietfa.amsl.com>; Wed, 12 Nov 2014 18:37:46 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kamH53govBEe for <urn-nid@ietfa.amsl.com>; Wed, 12 Nov 2014 18:37:43 -0800 (PST)
Received: from resqmta-ch2-09v.sys.comcast.net (resqmta-ch2-09v.sys.comcast.net [IPv6:2001:558:fe21:29:69:252:207:41]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9D7321A1A9D for <urn-nid@ietf.org>; Wed, 12 Nov 2014 18:37:43 -0800 (PST)
Received: from resomta-ch2-11v.sys.comcast.net ([69.252.207.107]) by resqmta-ch2-09v.sys.comcast.net with comcast id Eqck1p0022Ka2Q501qdiF4; Thu, 13 Nov 2014 02:37:42 +0000
Received: from hobgoblin.ariadne.com ([24.34.72.61]) by resomta-ch2-11v.sys.comcast.net with comcast id Eqdh1p00V1KKtkw01qdiev; Thu, 13 Nov 2014 02:37:42 +0000
Received: from hobgoblin.ariadne.com (hobgoblin.ariadne.com [127.0.0.1]) by hobgoblin.ariadne.com (8.14.7/8.14.7) with ESMTP id sAD2bfqg001732; Wed, 12 Nov 2014 21:37:41 -0500
Received: (from worley@localhost) by hobgoblin.ariadne.com (8.14.7/8.14.7/Submit) id sAD2behU001729; Wed, 12 Nov 2014 21:37:40 -0500
Date: Wed, 12 Nov 2014 21:37:40 -0500
Message-Id: <201411130237.sAD2behU001729@hobgoblin.ariadne.com>
From: worley@ariadne.com (Dale R. Worley)
Sender: worley@ariadne.com (Dale R. Worley)
To: Sean Leonard <dev+ietf@seantek.com>
In-reply-to: <05E89947-5180-40BB-A14A-9D97E92DDAB1@seantek.com> (dev+ietf@seantek.com)
Subject: Comments on draft-seantek-rdf-urn-00
References: <05E89947-5180-40BB-A14A-9D97E92DDAB1@seantek.com>
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20140121; t=1415846262; bh=erZtKO7o+CRfa7ZVmCFXIOBH3RcrLC24LVUKTEKlnt8=; h=Received:Received:Received:Received:Date:Message-Id:From:To: Subject; b=m4fJHcOACVrDWSVm7AjcWLeiKWeMBdRmNJOyIe4n3ZsOLMe/in7+t52TvY43jN/eM R7MZ85Y58ywnoqlxUpFeeLYRYeR2NGlEW67PBO/aHtWcNLsIJHS8pens8lE0G6BHPb B7Ps5Y/h3WGWBsgyCNRW2oaSRCd3PGm3lsKlxZfAxLlGm8OM/SVZ3pDtcce6H2JgPg iuob0zpTrjge3B87Wt7G9JDD8zf2nLu4l1EjE2nFWCeWdMoSKH6a1AaBDqULdoKjls g+DwxMKOqklneV+MqSjFxelomKzA9sAizPUCXs2fe5zIFxs/dgqTC0OtgLDindA5ir k6MmbERXxH3/Q==
Archived-At: http://mailarchive.ietf.org/arch/msg/urn-nid/56oLfVh36MyxO7VpJFYsLUFk8xE
Cc: urn-nid@ietf.org
X-BeenThere: urn-nid@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: discussion of new namespace identifiers for URNs <urn-nid.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn-nid>, <mailto:urn-nid-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/urn-nid/>
List-Post: <mailto:urn-nid@ietf.org>
List-Help: <mailto:urn-nid-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn-nid>, <mailto:urn-nid-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Nov 2014 02:37:46 -0000

(Many of these comments apply to draft-seantek-xmlns-urn-00 as well.)

   1.  Introduction

   The Resource Description Framework [RDF] is a framework for
   representing information in the web.  RDF contains nodes that are
   identified by URI references.  The URI reference is basically an
   opaque string with semantics applied onto it by the RDF standard; RDF
   applications are not required or expected to dereference the URI.

You almost certainly mean "URI" not "URI reference" ("RDF contains
nodes that are identified by URIs.") -- a "URI reference" is an
appearance of a URI in a particular place, in the same way a footnote
in a book is a reference.  There can be many URI references to the
same URI.

(You can see this by expanding the acronym "URI":  "RDF contains nodes
that are identified by Uniform Resource Identifiers." not "RDF
contains nodes that are identified by Uniform Resource Identifier
references.")

   This document defines a URN specifically for identifying RDF URI
   references.

This should read "This document defines a URN namespace identifier [or
alternatively, NID] specifically for use in constructing RDF URIs."

   The abstract resource does not have any particular concrete
   representation (such as a type of content identified by Internet
   media type), although concrete representations may be associated
   with it.

This is true, but you say later that, one or more URIs may be
associated with the URN via the IANA registration.  I believe that
these are an important feature of the NID, and you should probably
discuss them here, as they act somewhat like a representation of the
URN.

   Abstract parts of the abstract resource can be identified with
   fragment identifiers.

How does this statement interact with the fact that RFC 2141 does not
admit fragment identifiers for URNs, and that there is work afoot to
allow fragment identifiers for URNs?

   Declaration of syntactic structures:

   The structure of the Namespace Specific String is any
   valid XML name corresponding to the "Name" production in
   Section 2.3 of [XML] (production 5), with the following restrictions:
      1. The name MUST be at least four characters.
      2. Colons MAY be used as arbitrary intra-name dividers.
      3. Colons MUST NOT appear at the beginning or end of the name.
      4. Consecutive colons are PROHIBITED.
   and the following relaxation:
      5. The first part of the name preceding the first colon MAY
         be a whole decimal number as discussed in
         "Process of identifier assignment".

"PROHIBITED" is not an RFC 2119 word, so it should probably appear in
lower-case.

While it's a convenient mnemonic to compare the NSS syntax to the Name
syntax, it's easier for implementers if the NSS syntax is given
directly and in one place.

It's not clear what item 2 means, as there is no definition of
"divider", and the syntax of Name allows a colon in any position.

The phrasing of 5 is peculiar, since Name allows digits in all
positions except the first -- it would seem that the natural
expression would be "The first character may be a digit".  But if that
is the intended meaning, why was it not stated in that simpler way?

The text suggests that the use of leading digits is only permitted 'as
discussed in "Process of identifier assignment"', suggesting that the
use of leading digits is further restricted in some manner which is
not specified in this section.  Whatever restrictions are placed on
the use of leading digits should be expressed unambiguously in this
section.

The stated syntax does not allow fragment identifiers, but the rest of
the document presumes that a fragment identifier may be present.
Probably this section only intends to define the RDF URN NSS, leaving
implicit the syntax of the full RDF URN.  That should be clarified by
providing an explicit production for <rdf-urn>.

   When encoded in a URN, Unicode code points beyond U+007F
   are encoded as percent-encoded UTF-8. Conveniently, all XML name
   characters in the US-ASCII range are in the [RFC3986] unreserved set.

Describing the syntax of the NSS by specifying a set of Unicode
strings and then an encoding to be applied to that set of strings to
produce the URNs is formally correct but puts a burden on an
implementer.  It would be better if that aspect of the syntax was also
described as a combination of an informal description of the intention
and a complete and correct ABNF.

   Identifier uniqueness considerations:
   Once a name is registered in the IANA registry, it is unique.

   Identifier persistence considerations:
   Once a name is registered in the IANA registry, it is permanent.

These are awkwardly phrased.  It would be better phrased as "The
meaning of an identifier is registered in the registry, and thus is
unique." and "Once an identifier is registered, its meaning cannot be
changed."  (However, if a registration can be withdrawn as is
mentioned in passing later in the draft, can a URN whose previous
registration was withdrawn later be registered again -- with a
different description/registrant/associated URIs?)

   Process of identifier assignment:

   Identifiers are registered with IANA on a First-Come, First-Served
   basis. One-character names and prefixes are RESERVED for further
   use. Two- and three-character names and prefixes are RESERVED
   for language tags and regional codes; however, those names
   have no such semantic content when used in an RDF URN. Whole number
   prefixes are RESERVED for IANA Private Enterprise Numbers.
   Registrants are free to register names with reserved two-character
   and three-character prefixes, such as "au:flag" or "en:us:ca:lax".
   Registrants are also free to register names with reserved whole
   number prefixes, such as "20:10-250".

There are a number of difficulties here.

The word RESERVED is not an RFC 2119 word, and so should probably be
in lower-case.

The second and third sentence describes one-, two-, and three-
character "names".  It's not clear what "name" means here.  By
default, I expect it to be the same as "URN", but of course all URNs
have at least 7 characters.  So perhaps "name" means "NSS".  But the
syntax definition restricts NSSs to have at least 4 characters.

I don't understand what "however, those names have no such semantic
content when used in an RDF URN" means. -- You've just said that such
(syntactically excluded!) "names" are reserved, which presumably means
that they can't be used as NSSs to form URNs that are registered.  But
if they can't be used to form URNs, how could they be "used in an RDF
URN"?  Or is the important consequence based on one-character
"prefixes"?  But every string has a one-character prefix.

When you say, "Whole number prefixes are RESERVED for IANA Private
Enterprise Numbers.", what do you mean by "reserved"?  There is no
obvious semantic correspondence between private enterprise numbers and
URN NSSs.  And by "prefix" to you mean the default meaning of "an
initial substring of characters in a string", or (as I suspect you
mean) "an initial substring of characters in an NSS which is followed
by a colon"?

I suspect what you mean is that any NSS of the form
<digits>:<something> is implicitly associated in some way with the
registrant of the enterprise number <digits>, but you don't say that
explicitly.

And at the end you say "registrants are free to register names with
reserved ... prefixes".  In what manner are prefixes "reserved" if
registrants are free to register them?  (And in particular, if a
registrant can register a URN starting with a private enterprise
number even if it is not the registrant of the private enterprise
number.)

    Process for identifier resolution:

The fact that one or more URIs may attached to an RDF URN's
registration to provide a resolution for the URN is a very important
feature of your definition, and it deserves to be described more
explicitly than just as part of the "Process for identifier
resolution".  Indeed, in practice, it's a fundamental part of the
semantics of an RDF URN as you have described it.

Conversely, since there is (as far as I know) no defined way for a URN
resolver to look up the information associated with *any* IANA
registration, it's not clear how this information can be used in an
implemented resolution software.  More discussion is needed on how a
queryable database of these associations is to be accessed.

Similar considerations apply to "Validation mechanism".

   Fragments (delimited by the # character) are not considered part of
   the namespace-specific string, so a fragment would not affect
   lexical equivalence.

This sentence seems to be part of the rules for lexical equivalence,
but is not in that section of the template.

Assuming that URNs with fragment identifiers are compared in the ways
described in RFC 3986 for generic URIs, the fragment identifiers of
two URNs *are* compared when testing the URNs for equivalence.

    3.  IANA Considerations

These considerations place various burdens on IANA.  Has anyone
checked that IANA is in position to undertake them?  In particular:

   The registration template SHALL be encoded in UTF-8.

Can IANA process registrations containing arbitrary Unicode characters?

   If a registrant attempts to register a name that is confusingly
   similar to other registered names (such as only differing by case, or
   differing by code points but generating the same or confusingly
   similar visual representations), the registrants of the prior names
   are to receive a warning notification of the impending registration.
   However, there is no protest mechanism; the registration will still
   succeed unless withdrawn by the registrant.  IANA SHOULD implement a
   modern algorithm to detect such confusingly similar names.

What is the definition of "confusing"?  Can an RFC apply a SHOULD to
IANA?

   If a registrant attempts to register a name that contains a whole
   number prefix, the registrant of the corresponding IANA Private
   Enterprise Number is to receive a warning notification of the
   impending registration.

Is IANA prepared to undertake this?

Given that there is no protest mechanism, what is the purpose of
these?  As far as I know, there is no non-IETF mechanism that any
registrant could use to oppose such a registration.

There is no defined mechanism for withdrawing any registration that I
know of.  Is there intended to be one for this registry?

Dale