Re: [apps-discuss] i18n intro, Sunday 14:00-16:00

John C Klensin <john@jck.com> Thu, 21 July 2011 15:22 UTC

Return-Path: <john@jck.com>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A7B7E21F89BA for <apps-discuss@ietfa.amsl.com>; Thu, 21 Jul 2011 08:22:32 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.589
X-Spam-Level:
X-Spam-Status: No, score=-102.589 tagged_above=-999 required=5 tests=[AWL=-0.290, BAYES_00=-2.599, MIME_8BIT_HEADER=0.3, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fEpChVMlzSj4 for <apps-discuss@ietfa.amsl.com>; Thu, 21 Jul 2011 08:22:28 -0700 (PDT)
Received: from bs.jck.com (ns.jck.com [209.187.148.211]) by ietfa.amsl.com (Postfix) with ESMTP id 5001C21F877D for <apps-discuss@ietf.org>; Thu, 21 Jul 2011 08:22:28 -0700 (PDT)
Received: from [127.0.0.1] (helo=localhost) by bs.jck.com with esmtp (Exim 4.34) id 1Qjv4x-000Mue-Qs; Thu, 21 Jul 2011 11:22:24 -0400
Date: Thu, 21 Jul 2011 11:22:22 -0400
From: John C Klensin <john@jck.com>
To: "\"Martin J. Dürst\"" <duerst@it.aoyama.ac.jp>, Peter Saint-Andre <stpeter@stpeter.im>
Message-ID: <215FF088D6A2C5D95D969DFF@PST.JCK.COM>
In-Reply-To: <4E27CF30.5050205@it.aoyama.ac.jp>
References: <4E25D187.7010901@stpeter.im> <4E25D8FE.9030402@stpeter.im> <4E27CF30.5050205@it.aoyama.ac.jp>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
Cc: apps-discuss@ietf.org
Subject: Re: [apps-discuss] i18n intro, Sunday 14:00-16:00
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Jul 2011 15:22:32 -0000

--On Thursday, July 21, 2011 16:03 +0900 "\"Martin J. Dürst\""
<duerst@it.aoyama.ac.jp> wrote:

> Some comments:
>

I already sent Peter a rather long list of comments, many of
which overlap these, privately (I will forward them to you,
Martin, or others if you are interested).  I have remarks on a
few of these...


> Slide 7: there are thousands of languages and scripts:
> Thousands of languages: yes; thousands of scripts: no
> (http://www.unicode.org/Public/UNIDATA/Scripts.txt currently
> has 95 (including 'Common'), so "well over a hundred" would me
> much more appropriate).

Agreed, although I read that as measuring a language-script
product, which would certainly be in the thousands.  It is
always hard to know how much detail a presentation like this
should go into but, if one wanted to push further, it might be
worth noting that the Unicode script list as found at that URL
is controversial -- some would claim that Unicode joins things
together as a single script that should be separated and
identifies things as separate scripts that are mostly
typographical variations plus a few added characters.

>...
> Slide 131: "UTF-8 is the preferred IETF encoding (RFC 3629)":
> RFC 3629 is the reference for UTF-8 per se, the IETF
> preference is expressed in RFC 2277. So the text should say
> "UTF-8 (RFC 3629) is the preferred IETF encoding (RFC 2277)"
> (or some such), and add RFC 2277 to the references.

2277 and 5198 in both cases.  5198 is already in the references.

> Slide 132: integers -> bytes (or octets)
> (we are really now on a lower, somewhat more physical level,
> and byte/octet is completely adequate here (indeed anything
> else would be needlessly confusing).

I assumed Peter was trying to make the distinction between
abstract integer code points and their instantiation in an
encoding.  Given that distinction raised issues in EAI and in
3536bis, it is probably worth trying to keep.  Whether the hint
of using "integer" is sufficient for that purpose probably
depends on what Peter says as the slides are flying by.
Conversely, I don't know whether it is worth trying to preserve
the distinction in a talk at this level.

>...
> Slide 168: Fussball vs. Fußball isn't a normalization issue
> (not even NFKC).

Agreed.  But it is an issue with the use of toCaseFold.  One of
the suggestions I made to Peter is that it may be useful to
differentiate between the level of aggressiveness (i.e.,
vunerability to controversy) of lower-casing (or upper-casing)
and applying case folding.

> Of the two HenryIV, IDNA only allows one (2008) or maps to one
(2003).

True of several others of the examples the slides use or hint at.

>...

best,
   john