Re: Last Call: <draft-ietf-urnbis-rfc2141bis-urn-20.txt> (Uniform Resource Names (URNs)) to Proposed Standard

John C Klensin <john-ietf@jck.com> Tue, 21 February 2017 17:40 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9415B129485; Tue, 21 Feb 2017 09:40:57 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RP_MATCHES_RCVD=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RjjJLe77u0Xc; Tue, 21 Feb 2017 09:40:56 -0800 (PST)
Received: from bsa2.jck.com (bsa2.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5CCAD12943D; Tue, 21 Feb 2017 09:40:56 -0800 (PST)
Received: from [198.252.137.70] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1cgEQo-000Bdw-OD; Tue, 21 Feb 2017 12:40:54 -0500
Date: Tue, 21 Feb 2017 12:40:48 -0500
From: John C Klensin <john-ietf@jck.com>
To: "tom p." <daedulus@btconnect.com>, ietf@ietf.org
Subject: Re: Last Call: <draft-ietf-urnbis-rfc2141bis-urn-20.txt> (Uniform Resource Names (URNs)) to Proposed Standard
Message-ID: <00068EB71CB3F63F36092E69@PSB>
In-Reply-To: <006c01d28c5e$fbb19760$4001a8c0@gateway.2wire.net>
References: <148699853178.24969.7610250025952470052.idtracker@ietfa.amsl.com> <052201d28c3d$c8061060$4001a8c0@gateway.2wire.net> <092474D20D151EB6C7C0E280@PSB> <006c01d28c5e$fbb19760$4001a8c0@gateway.2wire.net>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.70
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf/WXTDI0X_B-MLtmsGiD_HgdgzBzg>
Cc: alexey.melnikov@isode.com, barryleiba@computer.org, draft-ietf-urnbis-rfc2141bis-urn@ietf.org, urnbis-chairs@ietf.org
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf/>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Feb 2017 17:40:57 -0000


--On Tuesday, February 21, 2017 16:04 +0000 "tom p."
<daedulus@btconnect.com> wrote:

> OK on 'Updates'
> 
> Meanwhile,
> s.2.2 'the basic Latin repertoire [RFC20] '
> I don't know what you mean - RFC20 does not use the term 'Latin
> repertoire' and, being European, I tend to think of the Latin
> repertoire as being the 160 or so characters of the Belgian
> language.

Mea cupla.  I knew that unqualified reference was going to get
us into trouble and let it go.  Whether you (and future readers
of 2141bis) should have known this (European or not) is
debatable,  but "Basic Latin" is a term of art in
internationalization work and "repertoire" keeps us away from
debates about particular coded character sets, encodings, etc.,
and, in particular, avoids confusion with %-encoded arbitrary
Unicode characters (see the introduction to Section 2) which are
certainly still ASCII characters.  The definition of "basic
Latin repertoire"  is more or less equivalent to "undecorated
Latin characters", but that terminology raises other issues...
to the point that I'm tempted to say "can't win no matter what
one does".    

Personally, I've always objected to "Basic Latin" because
several characters we consider to be part of the repertoire
today (e.g., "w" and distinct "j" and "u") were late additions
and would not have been part of the writing system as distinct
characters for writers of Latin in the time or, e.g., Virgil.
But that argument was lost long ago and "basic Latin" is the
prevailing terminology today.

> I think you need to specify the code points here.

I think that would cause confusion with syntax rules that
specify what is possible.   The paragraph in which the phrase
you picked up simply imposes a "SHOULD NOT" restriction and
explains why it should be adhered to when possible and what the
exceptions should be. It really clarifies and is perhaps
partially redundant with the previous paragraph.  How would you
feel about making the phrase something closer to "the basic
Latin repertoire, i.e., the letters and digits of ASCII as
described above" and moving the RFC 20 citation to the first use
of "ASCII" in that previous paragraph?

Other textual suggestions welcome but, again, I think listing
code points would cause confusion with the formal syntax for
<namestring> and some of the carefully-constructed language
around it.

     john

p.s. personal note: 2141bis has either the advantage of, or
suffers from, the fact that Peter and I have been immersed in
internationalization (i18n) issues for the last several years
with various involvements in PEECIS, IDNA, SMTPUTF8 ("EAI"), and
other efforts.   That has probably resulted in our trying to be
a little more precise in this draft than many IETF documents
about URI schemes and naming and perhaps, as in this case,
having slightly higher expectations about community
understanding of those issues than might be the case.  I think
that precision is an advantage and hope that community knowledge
and understanding will gradually improve.   YMMD, but I hope
than anyone who is actually inclined to advocate for less
precision (you have not) be careful about any advocacy, even on
a diversity basis, for IETF consideration of writing systems
that cannot be expressed in ASCII and/or any language other than
English.