Re: [urn] I-D Action: draft-ietf-urnbis-rfc2141bis-urn-01.txt

Alfred Hönes <ah@TR-Sys.de> Thu, 19 January 2012 17:02 UTC

Return-Path: <A.Hoenes@TR-Sys.de>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F24B721F86A5 for <urn@ietfa.amsl.com>; Thu, 19 Jan 2012 09:02:47 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -96.299
X-Spam-Level:
X-Spam-Status: No, score=-96.299 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, CHARSET_FARAWAY_HEADER=3.2, HELO_EQ_DE=0.35, MIME_8BIT_HEADER=0.3, MIME_CHARSET_FARAWAY=2.45, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qz6-In8gU+EO for <urn@ietfa.amsl.com>; Thu, 19 Jan 2012 09:02:46 -0800 (PST)
Received: from TR-Sys.de (gateway.tr-sys.de [213.178.172.147]) by ietfa.amsl.com (Postfix) with ESMTP id 02CE121F84F7 for <urn@ietf.org>; Thu, 19 Jan 2012 09:02:44 -0800 (PST)
Received: from ZEUS.TR-Sys.de by w. with ESMTP ($Revision: 1.37.109.26 $/16.3.2) id AA163372514; Thu, 19 Jan 2012 18:01:54 +0100
Received: (from ah@localhost) by z.TR-Sys.de (8.9.3 (PHNE_25183)/8.7.3) id SAA02623; Thu, 19 Jan 2012 18:01:48 +0100 (MEZ)
From: Alfred Hönes <ah@TR-Sys.de>
Message-Id: <201201191701.SAA02623@TR-Sys.de>
To: L.Svensson@dnb.de
Date: Thu, 19 Jan 2012 18:01:48 +0100
In-Reply-To: <24637769D123E644A105A0AF0E1F92EF012326@dnbf-ex1.AD.DDB.DE> from "Svensson, Lars" at Jan "19, " 2012 "12:10:35" pm
X-Mailer: ELM [$Revision: 1.17.214.3 $]
Mime-Version: 1.0
Content-Type: text/plain; charset="hp-roman8"
Content-Transfer-Encoding: 8bit
Cc: urn@ietf.org
Subject: Re: [urn] I-D Action: draft-ietf-urnbis-rfc2141bis-urn-01.txt
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Discussions about possible revisions to the definition of Uniform Resource Names <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/urn>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 19 Jan 2012 17:02:48 -0000

[[ speaking as the draft author ]]


On 2012-01-19, Lars Svenson wrote:

> Alfred,
>
> thanks for moving this forward. I have added some inline comments,
> below.

Lars,
Thanks for taking the time to review the draft.
I have been eagerly waiting for comments.


> Alfred wrote on October 31, 2011
>
>> So, in order to bring forward the discussion on the draft, please
>> currently focus on the other open issues tagged in (editorial) Notes
>> inside the draft, which have not received much comments so far.
>> In particular, we should hopefully be able to close the NID syntax
>> issues discussed in section 2.1 soon, with your help!
>
> Sec 2.1 says:
>
> [[
>    Note for discussion:
>       The above definition is taken from RFC 2141 and changed to reflect
>       requirements from RFC 3406[bis].  RFC 3406[bis] contains further
>       syntax restrictions on NID strings.
>       Should this be further restricted, e.g., to avoid possible
>       confusion caused by multiple adjacent hyphens and NIDs looking
>       like a numerical value or a numerical range?
>       Does the WG opt to move the more restrictive NID syntax details
>       from the rfc3406bis draft to this section?
>       Such restrictions would be fully backward compatible because no
>       NIDs have been defined so far that would violate these
>       restrictions.  Hyphens have been used only in the naming pattern
>       for "Informal Namespace IDs" per RFC 3406[bis].
>
>    Namespace Identifiers are case-insensitive, so that for instance
>    "ISBN" and "isbn" refer to the same namespace.
>
>    To avoid confusion with the URI Scheme name "urn", the NID "urn" is
>    permanently reserved by this RFC and MUST NOT be used or registered.
>
>    Note for discussion:
>       This reservation is carried over unchanged from RFC 2141, for
>       historical reasons.  Further possible reservations and/or details
>       are out of scope for this document, but within the scope of the
>       rfc3406bis draft!
> ]]
>
> If we say that 2141bis reserves the NID 'urn' but that all other
> reserved NIDs are specified by 3406bis, 3406bis should be the master
> document for all restrictions/requirements on NID syntax. That way
> we'd have all in one place which makes it easier for implementers
> (they only need to look at 3406bis when registering a namespace).

The main reason for the current placement in the drafts of the
reservation for the NID string 'urn' is that -- as already pointed out
in the 2nd 'Note for discussion' quoted above -- it was in RFC 2141
(last paragraph of section 2.1, at the bottom of page 2 of RFC 2141).

IMO, this placement is justified because it seems to be an essential
detail to disambiguate the leading part of a URN string (whether it
is written as the full URN or as the -- not recommended -- colloquial
abbreviated form comprised of only the NID and subsequent components).

The other exclusions under consideration are more like for
convenience of human users, and need only be respected during
the process of registration of a URN namespace;
hence, the placement in rfc3406bis seems appropriate to me.
Nevertheless, for clarity, the banning of 'urn' as an NSS string
is repeated in rfc3406bis (as a quote from rfc2141bis), in the same
way it has been in RFC 3406 vs. RFC 2141.

However, that implementers on *generic* URN parsers need to follow
rfc2141bis; since this is the document entitled "URN syntax".
Implementations that want/need to go the NID-specifics can
(and should) always derive the current set of registered NISs
from the IANA registry -- i.e. the result of the application of
RFC 3406 / rfc3406bis, not these documents themselves -- and obtain
the NID-specific additional syntax rules from the registration
documents proper.
IMO, generic tools do not need to be aware of NID-specific syntax
details, only URN resolver systems need to do that -- to the extent
of the NIDs they specifically support.

So, in a nutshell, the exclusion specified in rfc2141bis is intended
to provide disambiguation in the general case, ensuring that a
single (leading) instance of "urn" will always be the URI scheme
and never be an NID.


Following other voices, I conclude that the WG does _not_ desire
further formal restrictions in the NID syntax in rfc2141bis,
because the common sense of prospective registrants (likely
seeking for short, expressive, mnemonical, NID strings) and of
the janitors of the IANA registry (a.k.a. IETF experts, and in
the case of dispute, the IESG) will prevent the registration
of confusing/dangerous/abusive NIDs.

If you don't share this conclusion, please state your opinion
and arguments NOW, on the urn list; thanks!


> Some further comments:
>
> The document does not give a complete ABNF for the URN syntax.
> If we supply that, it would be fairly straightforward to implement
> a URN validation service and possibly a service to check for lexical
> equivalence.

It's important, but a bit tricky to avoid overspecification
(with the impending danger of inconsistencies).
Therefore, the rfc2141bis draft follows the IETF tradition of
including other ABNF rules by reference; this seems to be proper
in this case because RFC 5234 (ABNF) and RFC 3986 (URI Syntax)
are the only sources of ABNF rules referred to, and knowledge of
these documents seems to be essential for all implementers of
rfc2141bis.

Further, the *informative* Appendix B of the current rfc2141bis draft
contains the fully expanded version of all the ABNF rules that are
referred to in the normative part of the draft, so that -- assuming
that Appendix B is precisely quoting RFCs 3986 and 5234 -- readers
of rfc2141bis can have the full picture of all ABNF syntax details.


> Regarding NSS Syntax: Is it possible to give an ABNF rule that
> makes it clear that some characters MUST be percent-encoded?
> I'm not that deep into ABNF, but I guess that there are experts
> on this list.

Answer to the question:  Essentially, NO !
(At least it would be huge, onerous, unreadable, and confusing.)

The deeper reason likely is that what you would like to have relates
to the _semantics_ of <pct-encoded>, which cannot be given in ABNF.

Note that the ABNF standard (RFC 5234) points out that textual
explanations can place additional restrictions on ABNF rules that
cannot be formulated in ABNF proper, and that such explanations
are at the same normative level as the formal, directly machine-
parseable, ABNF rules.


> And a final thought about Lexical Equivalence (out of scope for
> 2141bis, but relevant to 3406bis):
> If I have a namspace that allows characters which I must percent-
> encode in a URN (like 'ÅÄÖåäö'), how can I specify that the NSS
> 'ÅÄÖ' is lexically equivalent to 'åäö'?

A namespace that needs/wants to incorporate non-ASCII characters
(which will be first UTF-8 encoded and then percent-encoded for
inclusion in URNs) needs to determine its appropriate rules.
In case of roman characters with accents, the case mapping
properties and normalization rules/forms specified by the
Unicode Standard might be good candidates to draw from.

In general and ultimately, any equivalence not easily expressible
in syntax rules will need to be instantiated by the registration /
resolution systems for a specific namespace, and the methods for
implementation are strictly a "local" matter for the maintainers
of the registration system(s).

A similar problem is well-known for domain names, where it is
well-known that, in particular for non-roman scripts,
"human-friendly" equivalence for identifiers is impossible to
be specified on an "absolute base" and achieved in a distributed
manner because human-perceived equivalence is frequently culture
and context dependent; hence, in the context of the DNS, only
the name registration system and the authoritative servers for a
domain can implement and enforce the non-ASCII equivalence rules
intended for that particular domain.

So, in your example above, the respective namespace document could
specify a "normal form" of the NSS for that namespace and the rules
to achieve it (e.g., Unicode NFxx normalization and case mapping to
lower-case), or it could simply state that any appropriate
equivalence will be deliverd by the resolution system.

Therefore, and in fact, for some namespaces, _lexical_ equivalence
of URNs (i.e., what a client system can determine) might become
less important than _semantical_ equivalence (implemented in the
registration / resolution services).


>> I plan to submit another revision of this draft during the IETF 82
>> week, once draft submission is open again.
>
> I could not find a newer revision than 01, so my comments refer
> to that one. If there is a new revision, please accept my apologies
> and point me to the newest one and I will have a look ASAP.

Indeed, -01 is still current; I've been waiting for more review
comments (like your message!) to address before preparing the next
draft version, but there haven't been much so far.


> All the best,
>
> Lars
>
>  ...
>
> --
> Dr. Lars G. Svensson
> Deutsche Nationalbibliothek / Informationstechnik
> http://www.dnb.de/
> l.svensson@dnb.de


Kind regards,
  Alfred.

-- 

+------------------------+--------------------------------------------+
| TR-Sys Alfred Hoenes   |  Alfred Hoenes   Dipl.-Math., Dipl.-Phys.  |
| Gerlinger Strasse 12   |  Phone: (+49)7156/9635-0, Fax: -18         |
| D-71254  Ditzingen     |  E-Mail:  ah@TR-Sys.de                     |
+------------------------+--------------------------------------------+