Re: [abnf-discuss] Mail regarding draft-seantek-constrained-abnf

Paul Kyzivat <pkyzivat@alum.mit.edu> Fri, 08 July 2016 16:49 UTC

Return-Path: <pkyzivat@alum.mit.edu>
X-Original-To: abnf-discuss@ietfa.amsl.com
Delivered-To: abnf-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 06B6112D51F for <abnf-discuss@ietfa.amsl.com>; Fri, 8 Jul 2016 09:49:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.935
X-Spam-Level:
X-Spam-Status: No, score=-1.935 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_SOFTFAIL=0.665] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=comcast.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6VkUdDTRtKnp for <abnf-discuss@ietfa.amsl.com>; Fri, 8 Jul 2016 09:49:07 -0700 (PDT)
Received: from resqmta-ch2-06v.sys.comcast.net (resqmta-ch2-06v.sys.comcast.net [IPv6:2001:558:fe21:29:69:252:207:38]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1333E12D106 for <abnf-discuss@ietf.org>; Fri, 8 Jul 2016 09:49:07 -0700 (PDT)
Received: from resomta-ch2-05v.sys.comcast.net ([69.252.207.101]) by resqmta-ch2-06v.sys.comcast.net with SMTP id LYwGbNUO0E8zeLYxeb3sdg; Fri, 08 Jul 2016 16:49:06 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20140121; t=1467996546; bh=P+mhDoZMKifgB7DM8pJFHz1oIKPkOwEkRnt7BfHXfXc=; h=Received:Received:Subject:To:From:Message-ID:Date:MIME-Version: Content-Type; b=sDV4WtUXvvnD7qIPhbFS7LzCPGm922sIsb70wPIztbrdSuKhGiLonQbt6qp+jM6LY oP99b89GmJ+QT+ir5g9TbvWiMI/4ZzAaia5ZiP/s2Ip+iabAPhZ9Let1k2Fs5VAOyC SQMDWNJtyZMP4DuGSioyDq8WZZbycRdfneWIuWW6zGujT9L6F+LFJM6AtxV75HBAfm 6qX1UlQ7T9jdXEs06jNRSmo8D8VwaFquXdpeVbC1qFoRperjKubyk+9M7q18/Y1VM3 7krEHVKBjiq9ieLihFXDzlb0ldvshX0TYJqlPSVS8zoykjPgrIFJAH3OOIJTaGRbIq sldtUUgdWOAEA==
Received: from Paul-Kyzivats-MacBook-Pro.local ([73.218.51.154]) by comcast with SMTP id LYxdbTaevqsftLYxdbofux; Fri, 08 Jul 2016 16:49:06 +0000
To: Sean Leonard <dev+ietf@seantek.com>, Jonathan Hansford <jonathan@hansfords.net>, "draft-seantek-constrained-abnf@ietf.org" <draft-seantek-constrained-abnf@ietf.org>
References: <20160708121202.A3E5D12D19F@ietfa.amsl.com> <43d3ffea-57de-f7ef-e740-e448564008ed@alum.mit.edu> <bf12cdc4-8d17-d6aa-cc01-5afad19127ac@seantek.com>
From: Paul Kyzivat <pkyzivat@alum.mit.edu>
Message-ID: <92276351-a21c-302c-f0c8-7b4843c9b5f7@alum.mit.edu>
Date: Fri, 08 Jul 2016 12:49:04 -0400
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:45.0) Gecko/20100101 Thunderbird/45.1.1
MIME-Version: 1.0
In-Reply-To: <bf12cdc4-8d17-d6aa-cc01-5afad19127ac@seantek.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 7bit
X-CMAE-Envelope: MS4wfLVnLvQBmrkqXO5URlndSmjcJaozi15r1LZEAorXPLjQ8TJjp/YoHnaOFeYlV1/XwTzm2XS+XjYZDcj8Igkx9LZiB9u1O1xbVhWBV+iMr1RwKz0Cwrii olQCRruYyFq3BfLh9hw+NpXOX/3WdgXRq66jcgonhaD8piX1x1zKqO7aireq/MIt+YZZMQPFLrUHyPBokK5O/Cqljd+SKNqIT8/Fhb4KgpFwtVpmrqZWpBTn rHf52IuZhndsQTIQnoScZb1h5p/sAxjKygNPRupeHaSd/nVOLpJJHz3ABs+6sDF2IsfK6c+G+l0xjje1n3yfnw==
Archived-At: <https://mailarchive.ietf.org/arch/msg/abnf-discuss/YLFNppg2EuQAzInC5_bm6r35te0>
Cc: "abnf-discuss@ietf.org" <abnf-discuss@ietf.org>
Subject: Re: [abnf-discuss] Mail regarding draft-seantek-constrained-abnf
X-BeenThere: abnf-discuss@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "General discussion about tools, activities and capabilities involving the ABNF meta-language" <abnf-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/abnf-discuss>, <mailto:abnf-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/abnf-discuss/>
List-Post: <mailto:abnf-discuss@ietf.org>
List-Help: <mailto:abnf-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/abnf-discuss>, <mailto:abnf-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Jul 2016 16:49:08 -0000

On 7/8/16 12:23 PM, Sean Leonard wrote:

>> I have difficulty imagining how you would fit such constraints into
>> the syntax. The syntax of ABNF is pretty nice for what it does, but it
>> doesn't leave much room for extension because we are running out of
>> special characters. (Perhaps the possibilities could be expanded by
>> using unicode characters. Imagine what could be done with emojis as
>> operator characters.)
>
> Length constraints are expressible with current ABNF. Generically,
> suppose you have an identifier comprised of ALPHAs and DIGITs:
>
> identifier = 1*(ALPHA / DIGIT)
>
> Well, if you want to limit identifier to 20 chars, then you can say max
> twenty:
>
> identifier = 1*20(ALPHA / DIGIT)
>
> As Paul says, this is a repetition constraints, and is easy to analyze
> with the repeated symbols are fixed-width. But what about variable width?
>
> Well, suppose you have Domain from RFC 5321:
>
>    Domain         = sub-domain *("." sub-domain)
>
>    sub-domain     = Let-dig [Ldh-str]
>    Let-dig        = ALPHA / DIGIT
>    Ldh-str        = *( ALPHA / DIGIT / "-" ) Let-dig
>
>
>
> Now suppose you want to constrain Domain to be 255 chars. Such a
> constrained domain might be called DNSDomain. Then you can say:
>
>    DNSDomain ^ Domain = 1*255ASCII

Ah. Interesting!


> Regarding Paul's Unicode point: it would be nice for ABNF to have an
> extension that defines the domain of the input symbols, with specific
> options for: ASCII-only, octets, fixed-bit-width units, UTF-8,
> UTF-16LE/BE, and Unicode scalar values (0-0xD7FF, 0xE000-0x10FFFF), and
> Unicode characters (non-characters such as 0xFFFF are excluded). The
> most important of these are ASCII-only, octets, and Unicode scalar values.

Note that if you are using quoted strings in ABNF then you are 
restricted to their mapping to ASCII, which then extends obviously to 
Unicode.

I am not well informed about Unicode, but ISTM that if you are using 
ABNF with it then it makes sense to only do it over Unicode scalar 
values, while leaving the conversion of those to/from a particular 
encoding like UTF-8 can be via a pre/post-processor. You need to be a 
glutton for punishment to define your Unicode-based grammar over octets. 
And is there any operational difference between defining ABNF over 
unsigned integers and defining it over Unicode scalar values?

	Thanks,
	Paul