Re: [rfc-i] Unicode in ABNF (in RFC) draft-seantek-unicode-in-abnf-01.txt
Sean Leonard <dev+ietf@seantek.com> Tue, 04 October 2016 18:06 UTC
Return-Path: <rfc-interest-bounces@rfc-editor.org>
X-Original-To: ietfarch-rfc-interest-archive@ietfa.amsl.com
Delivered-To: ietfarch-rfc-interest-archive@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1])
by ietfa.amsl.com (Postfix) with ESMTP id F18B412941E
for <ietfarch-rfc-interest-archive@ietfa.amsl.com>;
Tue, 4 Oct 2016 11:06:59 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.597
X-Spam-Level:
X-Spam-Status: No, score=-5.597 tagged_above=-999 required=5
tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.001,
RCVD_IN_DNSWL_LOW=-0.7, RP_MATCHES_RCVD=-2.996, SPF_HELO_PASS=-0.001,
SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44])
by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id I4yhXrlPXa-r
for <ietfarch-rfc-interest-archive@ietfa.amsl.com>;
Tue, 4 Oct 2016 11:06:58 -0700 (PDT)
Received: from rfc-editor.org (rfc-editor.org [4.31.198.49])
(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
(No client certificate requested)
by ietfa.amsl.com (Postfix) with ESMTPS id 291F1129418
for <rfc-interest-archive-eekabaiReiB1@ietf.org>;
Tue, 4 Oct 2016 11:06:58 -0700 (PDT)
Received: from rfcpa.amsl.com (localhost [IPv6:::1])
by rfc-editor.org (Postfix) with ESMTP id CA405B80C4B;
Tue, 4 Oct 2016 11:06:57 -0700 (PDT)
X-Original-To: rfc-interest@rfc-editor.org
Delivered-To: rfc-interest@rfc-editor.org
Received: from localhost (localhost [127.0.0.1])
by rfc-editor.org (Postfix) with ESMTP id 237D9B80C4A
for <rfc-interest@rfc-editor.org>; Tue, 4 Oct 2016 11:06:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at rfc-editor.org
Received: from rfc-editor.org ([127.0.0.1])
by localhost (rfcpa.amsl.com [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id JxA7RzMNpoaz for <rfc-interest@rfc-editor.org>;
Tue, 4 Oct 2016 11:06:56 -0700 (PDT)
Received: from mxout-08.mxes.net (mxout-08.mxes.net [216.86.168.183])
by rfc-editor.org (Postfix) with ESMTPS id 3FA61B80C4D
for <rfc-interest@rfc-editor.org>; Tue, 4 Oct 2016 11:06:56 -0700 (PDT)
Received: from [192.168.123.7] (unknown [75.83.2.34])
(using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits))
(No client certificate requested)
by smtp.mxes.net (Postfix) with ESMTPSA id 1A50950A85;
Tue, 4 Oct 2016 14:06:49 -0400 (EDT)
To: =?UTF-8?Q?Martin_J._D=c3=bcrst?= <duerst@it.aoyama.ac.jp>,
"abnf-discuss@ietf.org" <abnf-discuss@ietf.org>
References: <147539145843.2906.13032756764513250005.idtracker@ietfa.amsl.com>
<1c5eb0fa-c6bd-ef6a-320a-8eaf28559d9e@seantek.com>
<f0560992-70aa-225e-7c48-d1df652851eb@it.aoyama.ac.jp>
From: Sean Leonard <dev+ietf@seantek.com>
Message-ID: <f3f544bb-08dd-8664-34cd-1d9ec6132212@seantek.com>
Date: Tue, 4 Oct 2016 11:08:33 -0700
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101
Thunderbird/45.4.0
MIME-Version: 1.0
In-Reply-To: <f0560992-70aa-225e-7c48-d1df652851eb@it.aoyama.ac.jp>
Cc: Chris Newman <chris.newman@oracle.com>,
"rfc-interest@rfc-editor.org" <rfc-interest@rfc-editor.org>
Subject: Re: [rfc-i] Unicode in ABNF (in RFC)
draft-seantek-unicode-in-abnf-01.txt
X-BeenThere: rfc-interest@rfc-editor.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "A list for discussion of the RFC series and RFC Editor functions."
<rfc-interest.rfc-editor.org>
List-Unsubscribe: <https://www.rfc-editor.org/mailman/options/rfc-interest>,
<mailto:rfc-interest-request@rfc-editor.org?subject=unsubscribe>
List-Archive: <http://www.rfc-editor.org/pipermail/rfc-interest/>
List-Post: <mailto:rfc-interest@rfc-editor.org>
List-Help: <mailto:rfc-interest-request@rfc-editor.org?subject=help>
List-Subscribe: <https://www.rfc-editor.org/mailman/listinfo/rfc-interest>,
<mailto:rfc-interest-request@rfc-editor.org?subject=subscribe>
Content-Transfer-Encoding: base64
Content-Type: text/plain; charset="utf-8"; Format="flowed"
Errors-To: rfc-interest-bounces@rfc-editor.org
Sender: "rfc-interest" <rfc-interest-bounces@rfc-editor.org>
On 10/3/2016 2:53 AM, Martin J. Dürst wrote: > I don't see the need to use %su for Unicode strings. The code points > speak for themselves, just use %s. Leaving %i/%iu undefined for > Unicode is indeed advisable, although it could be based on default > case folding, but we know that this would be imperfect, in particular > for Turkish. I like %su because it notifies the reader, and a parser, to expect UTF-8 and "deal with it" in a way that %s alone doesn't. For example, accented e can be é (U+00E9) or é (U+0065 U+0301). When printed or in a medium that doesn't provide direct access to the encoded data (screenshot? mobile app? etc.), the quoted string is ambiguous. Saying %s"foo" means you know that foo is always in the ASCII range, and can't possibly be composed of anything else (including, for example, FULLWIDTH ASCII in U+FF00-U+FF5E). Are %s"foo" and %s"foo" the same? How about %s"·˙•․‥…‧"? %s"µ" and %s"μ"? And the bajillion different dashes? Then there is the issue that even if a code point is objectively, graphically distinct in this version of Unicode, some future version may assign a code point to a character that commonly looks exactly the same as an existing character. Responding to your point, defining %su"" would mean %s"" is undefined for Unicode, which avoids the temptation of %i"" (or nothing "" aka the traditional approach). Perhaps if a need develops for a case-insensitive version in a protocol, %iu could take a parameter that indicates the language tailoring, such as %iu[tr]"çilek". (But, I suppose, one could make a converse argument that %i[tr]"çilek" would be a natural evolution.) Those are a couple of arguments. I am happy to go with whatever (rough) consensus emerges, however. Regards, Sean _______________________________________________ rfc-interest mailing list rfc-interest@rfc-editor.org https://www.rfc-editor.org/mailman/listinfo/rfc-interest
- [rfc-i] Unicode in ABNF (in RFC) draft-seantek-un… Sean Leonard
- Re: [rfc-i] Unicode in ABNF (in RFC) draft-seante… Martin J. Dürst
- Re: [rfc-i] Unicode in ABNF (in RFC) draft-seante… Sean Leonard
- Re: [rfc-i] Unicode in ABNF (in RFC) draft-seante… Martin J. Dürst
- Re: [rfc-i] Unicode in ABNF (in RFC) draft-seante… Sean Leonard