Re: [apps-discuss] i18n intro, Sunday 14:00-16:00

"Martin J. Dürst" <duerst@it.aoyama.ac.jp> Fri, 22 July 2011 06:55 UTC

Return-Path: <duerst@it.aoyama.ac.jp>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8A62821F85C0 for <apps-discuss@ietfa.amsl.com>; Thu, 21 Jul 2011 23:55:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -99.858
X-Spam-Level:
X-Spam-Status: No, score=-99.858 tagged_above=-999 required=5 tests=[AWL=-0.068, BAYES_00=-2.599, HELO_EQ_JP=1.244, HOST_EQ_JP=1.265, MIME_8BIT_HEADER=0.3, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pi5GFDzw+NUE for <apps-discuss@ietfa.amsl.com>; Thu, 21 Jul 2011 23:55:22 -0700 (PDT)
Received: from acintmta02.acbb.aoyama.ac.jp (acintmta02.acbb.aoyama.ac.jp [133.2.20.34]) by ietfa.amsl.com (Postfix) with ESMTP id 99CC321F85C6 for <apps-discuss@ietf.org>; Thu, 21 Jul 2011 23:55:22 -0700 (PDT)
Received: from acmse02.acbb.aoyama.ac.jp ([133.2.20.226]) by acintmta02.acbb.aoyama.ac.jp (secret/secret) with SMTP id p6M6tHRA010030 for <apps-discuss@ietf.org>; Fri, 22 Jul 2011 15:55:17 +0900
Received: from (unknown [133.2.206.133]) by acmse02.acbb.aoyama.ac.jp with smtp id 713f_0fcf_8e9ac8f2_b42f_11e0_a358_001d0969ab06; Fri, 22 Jul 2011 15:55:17 +0900
Received: from [IPv6:::1] ([133.2.210.5]:44342) by itmail.it.aoyama.ac.jp with [XMail 1.22 ESMTP Server] id <S15320B2> for <apps-discuss@ietf.org> from <duerst@it.aoyama.ac.jp>; Fri, 22 Jul 2011 15:55:19 +0900
Message-ID: <4E291EA2.6010500@it.aoyama.ac.jp>
Date: Fri, 22 Jul 2011 15:54:26 +0900
From: =?UTF-8?B?Ik1hcnRpbiBKLiBEw7xyc3Qi?= <duerst@it.aoyama.ac.jp>
Organization: Aoyama Gakuin University
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.9) Gecko/20100722 Eudora/3.0.4
MIME-Version: 1.0
To: John C Klensin <john@jck.com>
References: <4E25D187.7010901@stpeter.im> <4E25D8FE.9030402@stpeter.im> <4E27CF30.5050205@it.aoyama.ac.jp> <215FF088D6A2C5D95D969DFF@PST.JCK.COM>
In-Reply-To: <215FF088D6A2C5D95D969DFF@PST.JCK.COM>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Cc: apps-discuss@ietf.org
Subject: Re: [apps-discuss] i18n intro, Sunday 14:00-16:00
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Jul 2011 06:55:23 -0000

Hello John,

On 2011/07/22 0:22, John C Klensin wrote:
>
>
> --On Thursday, July 21, 2011 16:03 +0900 "\"Martin J. Dürst\""
> <duerst@it.aoyama.ac.jp>  wrote:
>
>> Some comments:
>>
>
> I already sent Peter a rather long list of comments, many of
> which overlap these, privately (I will forward them to you,
> Martin, or others if you are interested).

Yes interested, please send.

> I have remarks on a
> few of these...
>
>
>> Slide 7: there are thousands of languages and scripts:
>> Thousands of languages: yes; thousands of scripts: no
>> (http://www.unicode.org/Public/UNIDATA/Scripts.txt currently
>> has 95 (including 'Common'), so "well over a hundred" would me
>> much more appropriate).
>
> Agreed, although I read that as measuring a language-script
> product, which would certainly be in the thousands.

In that case, it would be fine.

> It is
> always hard to know how much detail a presentation like this
> should go into but, if one wanted to push further,

I didn't.

> it might be
> worth noting that the Unicode script list as found at that URL
> is controversial -- some would claim that Unicode joins things
> together as a single script that should be separated and
> identifies things as separate scripts that are mostly
> typographical variations plus a few added characters.

I think "controversial" is too strong, because it suggests that there 
are great fights going on. Not the only way to count scripts, yes 
indeed. But I only counted that to give a rough start value, and I don't 
think there is a reasonable way to define scripts that results in thousands.

>> ...
>> Slide 131: "UTF-8 is the preferred IETF encoding (RFC 3629)":
>> RFC 3629 is the reference for UTF-8 per se, the IETF
>> preference is expressed in RFC 2277. So the text should say
>> "UTF-8 (RFC 3629) is the preferred IETF encoding (RFC 2277)"
>> (or some such), and add RFC 2277 to the references.
>
> 2277 and 5198 in both cases.  5198 is already in the references.

I'd guess then this should be:

"UTF-8 (RFC 3629, RFC 5198) is the preferred IETF encoding (RFC 2277)"


>> Slide 132: integers ->  bytes (or octets)
>> (we are really now on a lower, somewhat more physical level,
>> and byte/octet is completely adequate here (indeed anything
>> else would be needlessly confusing).
>
> I assumed Peter was trying to make the distinction between
> abstract integer code points and their instantiation in an
> encoding.

That would make a lot of sense. But exactly for that reason, it has to 
be bytes. "UTF-8 encodes each codepoint as a squence of 1 to 4 integers" 
just doesn't make sense.

Regards,   Martin.

> Given that distinction raised issues in EAI and in
> 3536bis, it is probably worth trying to keep.  Whether the hint
> of using "integer" is sufficient for that purpose probably
> depends on what Peter says as the slides are flying by.
> Conversely, I don't know whether it is worth trying to preserve
> the distinction in a talk at this level.
>
>> ...
>> Slide 168: Fussball vs. Fußball isn't a normalization issue
>> (not even NFKC).
>
> Agreed.  But it is an issue with the use of toCaseFold.  One of
> the suggestions I made to Peter is that it may be useful to
> differentiate between the level of aggressiveness (i.e.,
> vunerability to controversy) of lower-casing (or upper-casing)
> and applying case folding.
>
>> Of the two HenryIV, IDNA only allows one (2008) or maps to one
> (2003).
>
> True of several others of the examples the slides use or hint at.
>
>> ...
>
> best,
>     john
>
>