Re: [abnf-discuss] Mail regarding draft-seantek-constrained-abnf

Paul Kyzivat <pkyzivat@alum.mit.edu> Sat, 09 July 2016 17:56 UTC

Return-Path: <pkyzivat@alum.mit.edu>
X-Original-To: abnf-discuss@ietfa.amsl.com
Delivered-To: abnf-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A39E012D180 for <abnf-discuss@ietfa.amsl.com>; Sat, 9 Jul 2016 10:56:06 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.272
X-Spam-Level:
X-Spam-Status: No, score=0.272 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_SOFTFAIL=0.972] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=comcast.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iE4rgZYytOxI for <abnf-discuss@ietfa.amsl.com>; Sat, 9 Jul 2016 10:56:06 -0700 (PDT)
Received: from resqmta-ch2-08v.sys.comcast.net (resqmta-ch2-08v.sys.comcast.net [IPv6:2001:558:fe21:29:69:252:207:40]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0480512D176 for <abnf-discuss@ietf.org>; Sat, 9 Jul 2016 10:56:05 -0700 (PDT)
Received: from resomta-ch2-08v.sys.comcast.net ([69.252.207.104]) by resqmta-ch2-08v.sys.comcast.net with SMTP id LwTgb5xo1lVqILwU1bip9m; Sat, 09 Jul 2016 17:56:05 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20140121; t=1468086965; bh=4FBNB5564c/jLY/rEYuJg9St1U2vI4ZZDF2jTdYbP+s=; h=Received:Received:Subject:To:From:Message-ID:Date:MIME-Version: Content-Type; b=XAviasAsK6wdFaBsoj3U3Zr7kG34ZJi786W/bpNSBcK6sk6t8fU4/OV6uIw4ON4vF fRNc2rZO3MfGfucxs5tZr4Lp8as75G7DexXdQD2fgCuO1BsRNPkzlkvJq7JqYrMYIs /p5lqZAATMRf5Zv8Gw+0y4GqIWledZCw7riUiJ6LSGNlro90RX5lobqVZV2o2ce4j1 ZNzV5h3se6ejxIpMwHK23uFAavGSBdR3iAg0LMQErny8pFYZb3ZTtG9QFP7Kc3x8kT 54k8q8awnoj8Z51rP7QsmGARx7lmA2N4yX/UD5q6tPsNSU6o0Mn1r5R3P7w6fBBGL2 Gigpv1c1keCNQ==
Received: from Paul-Kyzivats-MacBook-Pro.local ([73.218.51.154]) by comcast with SMTP id LwU0bNKtpEjh5LwU0b5nDI; Sat, 09 Jul 2016 17:56:05 +0000
To: Sean Leonard <dev+ietf@seantek.com>
References: <20160708121202.A3E5D12D19F@ietfa.amsl.com> <43d3ffea-57de-f7ef-e740-e448564008ed@alum.mit.edu> <bf12cdc4-8d17-d6aa-cc01-5afad19127ac@seantek.com> <92276351-a21c-302c-f0c8-7b4843c9b5f7@alum.mit.edu> <0781109A-9AFB-42AA-8828-DA5CDF38C377@seantek.com> <3ce7dbb5-24d5-909e-27a2-f8447f9bf3f5@alum.mit.edu> <ABBDBEE6-C069-46A1-AE28-B05D7EA7A7AE@seantek.com>
From: Paul Kyzivat <pkyzivat@alum.mit.edu>
Message-ID: <15f492d7-aa2f-a988-26eb-abc0c2f0fe4f@alum.mit.edu>
Date: Sat, 09 Jul 2016 13:56:03 -0400
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:45.0) Gecko/20100101 Thunderbird/45.1.1
MIME-Version: 1.0
In-Reply-To: <ABBDBEE6-C069-46A1-AE28-B05D7EA7A7AE@seantek.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
X-CMAE-Envelope: MS4wfOGgNbWHyQkgM39hZWV+qLRXiPJ0z6d/aLg9LfGcbmEmyQtwdvI/tVkiQwuN5gU6IsI0e0jpOG2VbzrYx5rh+H43nPG9CvvGTSWx7drKqpQ4np9seHgO bbl8RH+pMLdDJ1u4E0XaA6qBHSYHfGFjm7Bvn6cg2istx9QamaCaZh3z1Vp4YTGEAzcr9vl7p+khHQvRdzSbbLyIX5JcndWLsUi6RuVPnPYwHw97NcYbTUmB f/rUnwWcDp2UT6ori9KTaOEaWTl2DmKlLcmETj7GdkCUalclIountjPslXLaZeXnwgkhfmmZgFH8vHxFVdsQhQ==
Archived-At: <https://mailarchive.ietf.org/arch/msg/abnf-discuss/wfsF3vZlKv7E7zB9Vq2XNC4hQGw>
Cc: Jonathan Hansford <jonathan@hansfords.net>, "draft-seantek-constrained-abnf@ietf.org" <draft-seantek-constrained-abnf@ietf.org>, "abnf-discuss@ietf.org" <abnf-discuss@ietf.org>
Subject: Re: [abnf-discuss] Mail regarding draft-seantek-constrained-abnf
X-BeenThere: abnf-discuss@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "General discussion about tools, activities and capabilities involving the ABNF meta-language" <abnf-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/abnf-discuss>, <mailto:abnf-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/abnf-discuss/>
List-Post: <mailto:abnf-discuss@ietf.org>
List-Help: <mailto:abnf-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/abnf-discuss>, <mailto:abnf-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 09 Jul 2016 17:56:06 -0000

On 7/9/16 3:51 AM, Sean Leonard wrote:
>
>> On Jul 8, 2016, at 3:39 PM, Paul Kyzivat <pkyzivat@alum.mit.edu> wrote:
>>
>> [...]
>>
>> "Defined over X" could be a predicate that a verifier could decide for a given ABNF grammar. [...]
>>
>> "Defined over ASCII" and "Defined over Unicode" could also be determined by analysis of the grammar.
>
> Up to this point, that is determined by (human) analysis of the specification text. It’s worked out overall.

IIUC this can be determined mechanically. First it is necessary to 
decide whether we mean "defined over Unicode codepoints" or "defined 
over Unicode characters". I guess defining over characters is a moving 
target, while code points are (relatively) stable.

And I think for ABNF, defining over codepoints is good enough. If you 
know that, then I/O processes that convert to/from UTF-8, etc. are valid 
to use.

> As much as I would like to see some more Unicode action, we already have two proposals on the table that I would like to focus on and get adopted & passed, before working on new stuff. The proposals are abnf-more-core-rules and constrained-abnf.
>
> In abnf-more-core-rules, there already are mechanisms for incorporating Unicode, in the productions:
>  UNICODE
>  BEYONDASCII
>  C1
>  BEYONDC1 (to be included in next draft)
>
> I suppose the following could be added:
>  LATIN1
>  BEYONDLATIN1
>
> I submit that these mechanisms are probably “quite enough” for IETF usage. There is a strong bias in ABNF for ASCII, which basically reflects the institutional bias in the IETF for US-English and the history of the Internet’s development. I have yet to see an IETF standard that calls out a specific character beyond the ASCII range and gives it special semantics (other than NEL, and even then, NEL is not really used).

I'm not convinced. You can *already* do Unicode. All the above do is 
give you some standard symbols to use for ranges of values that are 
interesting in the context of Unicode. They don't standardize any way of 
introducing Unicode characters that don't fall into the ASCII subset. 
People will still need to enter those numerically, or define symbols for 
ones they care about.

Your draft for more core rules hasn't received a lot of support - 
perhaps because it isn't perceived to add sufficient value to be worth 
the trouble.

ISTM that an important part of "supporting" Unicode in ABNF would be 
allowing use of Unicode characters in some form of quoted string. 
(Perhaps yet another new form.) That might add perceived value to those 
who really want to define grammars that actually *use* non-ascii 
characters.

I think in turn that defining that only makes sense if the grammars 
using those work over Uncode scalar values. (The alternative to that 
seems to be to define those literals to expand into UTF-8, so each 
character in the string might represent multiple bytes. And then you 
need to know what encoding the parser is running over.)

That in turn presumes some way to get unicode scalar values into the 
parser. I think maybe that requires some sort of "framework" that 
discusses input/output to the parser.

OTOH, I don't see a groundswell of users clamoring for this sort of 
thing. So for now I think this is ought to just be a background discussion.

	Thanks,
	Paul

	Thanks,
	Paul