Re: [apps-discuss] i18n intro, Sunday 14:00-16:00

"Martin J. Dürst" <duerst@it.aoyama.ac.jp> Thu, 21 July 2011 13:30 UTC

Return-Path: <duerst@it.aoyama.ac.jp>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7E0C721F8B2D for <apps-discuss@ietfa.amsl.com>; Thu, 21 Jul 2011 06:30:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -100.815
X-Spam-Level:
X-Spam-Status: No, score=-100.815 tagged_above=-999 required=5 tests=[AWL=0.975, BAYES_00=-2.599, GB_I_LETTER=-2, HELO_EQ_JP=1.244, HOST_EQ_JP=1.265, MIME_8BIT_HEADER=0.3, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hZSVoBngHr4T for <apps-discuss@ietfa.amsl.com>; Thu, 21 Jul 2011 06:30:49 -0700 (PDT)
Received: from acspool01.acbb.aoyama.ac.jp (acspool01.acbb.aoyama.ac.jp [133.2.20.162]) by ietfa.amsl.com (Postfix) with ESMTP id EED9921F8B12 for <apps-discuss@ietf.org>; Thu, 21 Jul 2011 06:30:48 -0700 (PDT)
Received: from acintmta01.acbb.aoyama.ac.jp ([133.2.20.226]) by acspool01.acbb.aoyama.ac.jp (secret/secret) with ESMTP id p6L743mL012791 for <apps-discuss@ietf.org>; Thu, 21 Jul 2011 16:04:03 +0900
Received: from acmse01.acbb.aoyama.ac.jp ([133.2.20.226]) by acintmta01.acbb.aoyama.ac.jp (secret/secret) with SMTP id p6L742CY014521 for <apps-discuss@ietf.org>; Thu, 21 Jul 2011 16:04:02 +0900
Received: from (unknown [133.2.206.133]) by acmse01.acbb.aoyama.ac.jp with smtp id 1262_ba8a_9d22be0c_b367_11e0_8ad4_001d096c5b62; Thu, 21 Jul 2011 16:04:02 +0900
Received: from [IPv6:::1] ([133.2.210.5]:52960) by itmail.it.aoyama.ac.jp with [XMail 1.22 ESMTP Server] id <S1531882> for <apps-discuss@ietf.org> from <duerst@it.aoyama.ac.jp>; Thu, 21 Jul 2011 16:04:04 +0900
Message-ID: <4E27CF30.5050205@it.aoyama.ac.jp>
Date: Thu, 21 Jul 2011 16:03:12 +0900
From: "\"Martin J. Dürst\"" <duerst@it.aoyama.ac.jp>
Organization: Aoyama Gakuin University
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.9) Gecko/20100722 Eudora/3.0.4
MIME-Version: 1.0
To: Peter Saint-Andre <stpeter@stpeter.im>
References: <4E25D187.7010901@stpeter.im> <4E25D8FE.9030402@stpeter.im>
In-Reply-To: <4E25D8FE.9030402@stpeter.im>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Cc: apps-discuss@ietf.org
Subject: Re: [apps-discuss] i18n intro, Sunday 14:00-16:00
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Jul 2011 13:30:53 -0000

Some comments:

Slide 7: there are thousands of languages and scripts:
Thousands of languages: yes; thousands of scripts: no
(http://www.unicode.org/Public/UNIDATA/Scripts.txt currently has 95 
(including 'Common'), so "well over a hundred" would me much more 
appropriate).

Slide 10: we need to encode more than [A-Z][a-z]:
This should probably be: [A-Za-z]

Slide 22: Why not UPPER vs. lower vs. Title ?

Slide 24: You could do full, half

Slide 55: There actually is U+1E9E LATIN CAPITAL LETTER SHARP S. But 
case conversion only goes downwards, not upwards.

Slide 113: There is no such thing as Compat Recomp!

Slide 114: "Fastest processing": That depends on various assumptions, in 
particular on the assumption that you actually implement processing as 
in slide 113 (which typically is not done).

Slide 117: "Compared to NFKC: Produces more false positives during
comparison operations": This is confusing/wrong. If matches are 
positive, then NFKC will match more than NFC, and if some of these 
matches are considered false, then NFKC will produce more false positives.

Slides 114-118: Some of the arguments given for a single NF apply to a 
certain aspect of NFs, e.g. C or D or K.

Slide 123: Good to see that. By the way, I seem to remember both John 
and me begging you for an explanation of why Jabber wants to use NFD a 
few months ago, and I'm not sure I have seen an answer. Now might be a 
good time (if you already sent one, a pointer would be appreciated).

Slide 131: "UTF-8 is the preferred IETF encoding (RFC 3629)":
RFC 3629 is the reference for UTF-8 per se, the IETF preference is 
expressed in RFC 2277. So the text should say
"UTF-8 (RFC 3629) is the preferred IETF encoding (RFC 2277)"
(or some such), and add RFC 2277 to the references.

Slide 132: integers -> bytes (or octets)
(we are really now on a lower, somewhat more physical level, and 
byte/octet is completely adequate here (indeed anything else would be 
needlessly confusing).

Slide 135: This should come close to normalization. I'd move the part on 
encoding earlier, but maybe that's just me.

Slide 168: Fussball vs. Fußball isn't a normalization issue (not even 
NFKC). Of the two HenryIV, IDNA only allows one (2008) or maps to one 
(2003).

Slide 184, middle bullet: I'd add 'occasionally' to put that issue in 
perspective

I remember having seen this talk before, but I don't know where. I 
thought it was very good. I'd be a bit worried if I'd have to spend 2h 
to present it, it's written for a fast pace.

Regards,   Martin.


On 2011/07/20 4:20, Peter Saint-Andre wrote:
> On 7/19/11 12:48 PM, Peter Saint-Andre wrote:
>> You might have noticed a curious item on the agenda at 14:00 on Sunday:
>> "Apps Area Preparatory Meeting for Internationalization Working Groups".
>>
>> At that time, I will present an introduction to internationalization,
>> assisted by Pete Resnick (who will correct me where I go wrong). The
>> intent of this session is to help apps-area folks learn more about
>> internationalization, especially in preparation for the PRECIS WG
>> meeting on Thursday. The room we've been assigned (2103) holds up to 60
>> people so we should have plenty of space, and there is no need to sign
>> up if you want to attend.
>>
>> If this session goes well, Pete and I might offer a more general
>> tutorial at a future IETF meeting. Consider Sunday's session a dry run.
>
> Sorry, I neglected to provide a pointer to my slides:
>
> http://www.saint-andre.com/ietf/i18n-intro.pdf
>
> Peter
>