Re: [abnf-discuss] Mail regarding draft-seantek-constrained-abnf
Paul Kyzivat <pkyzivat@alum.mit.edu> Fri, 08 July 2016 22:11 UTC
Return-Path: <pkyzivat@alum.mit.edu>
X-Original-To: abnf-discuss@ietfa.amsl.com
Delivered-To: abnf-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9505612D94C for <abnf-discuss@ietfa.amsl.com>; Fri, 8 Jul 2016 15:11:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.935
X-Spam-Level:
X-Spam-Status: No, score=-1.935 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_SOFTFAIL=0.665] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=comcast.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Z2bwbEXTrqWc for <abnf-discuss@ietfa.amsl.com>; Fri, 8 Jul 2016 15:11:28 -0700 (PDT)
Received: from resqmta-ch2-05v.sys.comcast.net (resqmta-ch2-05v.sys.comcast.net [IPv6:2001:558:fe21:29:69:252:207:37]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4EFEA12D662 for <abnf-discuss@ietf.org>; Fri, 8 Jul 2016 15:11:12 -0700 (PDT)
Received: from resomta-ch2-05v.sys.comcast.net ([69.252.207.101]) by resqmta-ch2-05v.sys.comcast.net with SMTP id LdzLbFD5h8JCNLdzLbVaDd; Fri, 08 Jul 2016 22:11:11 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20140121; t=1468015871; bh=DjPxpINfGZNqM1IC8M1tb+vhBRX/ZXvWChqaP94G254=; h=Received:Received:Subject:To:From:Message-ID:Date:MIME-Version: Content-Type; b=kImKfbzN7dfTls0ibrbH43sbkAYtWVod2aIexv2KptLd9J65GFE+Dg0HkQtVOhOH1 FFot1n1Qc9Ji36Sk5eyGr/16ouNmmA/AY97Eb3VEjp83liFQGDqq/NmQIFhjvsXNG3 xJ+TjdNl2wSKVi9M+o4wKsifcT4w7yNEOgVsQ2Rth57FW5WSbqQeCTag68VpAk30dW BjQmDVGaB1D5/DG/iRBIZm6qBOz5vuO1rpNguajslPdpHi44Y1aGQ3hQFSbyJYLRsS Rvw58/k9+qjqav/G7sh8k/3kkXsLr7pVOjaznr6Wyz0oFllzEDuVqHE2EIpY1pZWmN GqAnFdbMPqYKQ==
Received: from Paul-Kyzivats-MacBook-Pro.local ([73.218.51.154]) by comcast with SMTP id LdzLbUtb6qsftLdzLbpQAE; Fri, 08 Jul 2016 22:11:11 +0000
To: abnf-discuss@ietf.org
References: <20160708121202.A3E5D12D19F@ietfa.amsl.com> <43d3ffea-57de-f7ef-e740-e448564008ed@alum.mit.edu> <bf12cdc4-8d17-d6aa-cc01-5afad19127ac@seantek.com> <017701d1d95c$ebf89310$c3e9b930$@hansfords.net>
From: Paul Kyzivat <pkyzivat@alum.mit.edu>
Message-ID: <960b5a55-e1ea-fe42-22b8-bcf929826c7a@alum.mit.edu>
Date: Fri, 08 Jul 2016 18:11:09 -0400
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:45.0) Gecko/20100101 Thunderbird/45.1.1
MIME-Version: 1.0
In-Reply-To: <017701d1d95c$ebf89310$c3e9b930$@hansfords.net>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
X-CMAE-Envelope: MS4wfP9ZASYXBD3d2OKavE7UaOxtV4jIxlhL18AvuAvTY8UO0GiaQJPhJ6YousVSaozu2YV5ndtgIrEDKBmnxr5AvEzQikNd6g/i8lWsxoJJR9vkv0fIQt8y hTEdSZGXvpLJJiKRamT5Ju1OUpbH/Vwgl2bwALm55gKj2LRJHlKUps0HCtC4hqa4vtbawcf8uh4aJh4DP8QBEWegEsyCDI10jUI=
Archived-At: <https://mailarchive.ietf.org/arch/msg/abnf-discuss/JahZLGuJf3hyjcLYcRYsYa3NScI>
Subject: Re: [abnf-discuss] Mail regarding draft-seantek-constrained-abnf
X-BeenThere: abnf-discuss@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "General discussion about tools, activities and capabilities involving the ABNF meta-language" <abnf-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/abnf-discuss>, <mailto:abnf-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/abnf-discuss/>
List-Post: <mailto:abnf-discuss@ietf.org>
List-Help: <mailto:abnf-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/abnf-discuss>, <mailto:abnf-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Jul 2016 22:11:30 -0000
On 7/8/16 5:08 PM, Jonathan Hansford wrote: > Paul and Sean, > > > > Thanks for the replies. Life is too hectic at the moment, but I’m hoping > I will find some time to properly read the I-D and your replies and see > how it would all work for me. Is there a parser for constrained ABNF at > the moment? AFAIK there is not. As I said, the draft is a trial balloon. I think it still needs a lot of work, not on the general concept, but on how you link grammars together. (The use case that drives my interest is defining extensibility hooks in a grammar when defining a standard, and then writing another draft that adds syntactical extensions from such a hook.) I'm not aware of a generally available parser for case-sensitive ABNF either. I developed extensions to Bap to do it, and made them available to the author on github, but he never took them. I saw somebody else with another (worse) implementation, but AFAIK it also at least hasn't made it into the web-based Bap linked to the ietf tools page. Thanks, Paul > Jonathan > > > > =O) > > > > *From:*Sean Leonard [mailto:dev+ietf@seantek.com] > *Sent:* 08 July 2016 17:23 > *To:* Paul Kyzivat <pkyzivat@alum.mit.edu>; Jonathan Hansford > <jonathan@hansfords.net>; draft-seantek-constrained-abnf@ietf.org > *Cc:* abnf-discuss@ietf.org > *Subject:* Re: Mail regarding draft-seantek-constrained-abnf > > > > +abnf-discuss@ > > On 7/8/2016 8:06 AM, Paul Kyzivat wrote: > > On 7/8/16 8:11 AM, Jonathan Hansford wrote: > > Hi, > > > > I’ve just discovered draft-seantek-constrained-abnf-00 and have > a couple > of questions: > > > Note that while Sean and I have been bouncing ideas around about > this for some time, the draft is a trial balloon and the specific > text is all Sean's. I'm not yet wild about some of the syntax, but > it is good to have something tangible written down to talk about. > > So far it seems that only Sean and I are interested. If you are, > then that would make three. > > > Yay for three! > > > > > 1) Does Constrained ABNF work with RFC 7405 “Case-Sensitive > String > Support in ABNF”? Either way, it might be worth mentioning that > RFC in > your I-D. > > > Certainly it does. The functionality is entirely orthogonal. > > > Yes, what Paul said. > > > > > 2) Has there been any call to further constrain ABNF such that > rules can have length constraints applied to them? I’m not sure > how much > benefit there would be for RFCs (though I can think of examples > where > identifying such a constraint within the ABNF rather than just > within > the accompanying text would be beneficial), but a Constrained ABNF > compliant parser that included length constraints would be good. > > > I'm not aware of any such thing. Of course it already has repetition > constraints, but I guess that is not what you are looking for. > > I have difficulty imagining how you would fit such constraints into > the syntax. The syntax of ABNF is pretty nice for what it does, but > it doesn't leave much room for extension because we are running out > of special characters. (Perhaps the possibilities could be expanded > by using unicode characters. Imagine what could be done with emojis > as operator characters.) > > > Length constraints are expressible with current ABNF. Generically, > suppose you have an identifier comprised of ALPHAs and DIGITs: > > identifier = 1*(ALPHA / DIGIT) > > Well, if you want to limit identifier to 20 chars, then you can say max > twenty: > > identifier = 1*20(ALPHA / DIGIT) > > As Paul says, this is a repetition constraints, and is easy to analyze > with the repeated symbols are fixed-width. But what about variable width? > > Well, suppose you have Domain from RFC 5321: > > > > Domain = sub-domain *("." sub-domain) > > sub-domain = Let-dig [Ldh-str] > > Let-dig = ALPHA / DIGIT > > Ldh-str = *( ALPHA / DIGIT / "-" ) Let-dig > > > > Now suppose you want to constrain Domain to be 255 chars. Such a > constrained domain might be called DNSDomain. Then you can say: > > > DNSDomain ^ Domain = 1*255ASCII > > > ...and that's it! Basically you have to compose a production that > evaluates to a single symbol (character/integer), where the domain of > the symbol is any possible character that can appear in the production. > The minimum conforming production would be: > > > DNSDomain ^ Domain = 1*255(ALPHA / DIGIT / "-" / ".") > > > The maximum conforming production would be: > > > DNSDomain ^ Domain = 1*255(%x00-FFFFFFFFFFFFFFFFFFFFFFFFF...(infinity)) > > > Reducing the production from a non-maximum to the minimum form, can be > evaluated programmatically. A parser would need to evaluate the > constrained production for all "reachable" single terminal symbols > (characters/integers that may appear at least once), which can be done > more-or-less in O(n) time. It adds all such integers to a set, and then > expresses the set. > > It can also be shown that you can also integrate the constraint into the > base production, at least for all non-recursive productions. In this > case, you would have to break down Domain into a very, very long string > of terminal symbol possibilities (essentially, expressing it as a huge > finite state machine). Once you get all the possible states, you can > limit each symbol combination so that the total combinations do not > exceed 255. This can probably be done computationally, but the proof > would require a lot of math that I do not care to figure out right now. > :) And it's unclear how computationally difficult it would be to do it > in the general case. > > Actually now that I think about it, if your constraint is to limit the > total number of symbols, you can probably prove that any recursive > expression can be expressed as a finite state machine. Therefore, > integrating length constraints would actually be computable. It would > also be super-ugly, but you get the point. > > *** > > Regarding Paul's Unicode point: it would be nice for ABNF to have an > extension that defines the domain of the input symbols, with specific > options for: ASCII-only, octets, fixed-bit-width units, UTF-8, > UTF-16LE/BE, and Unicode scalar values (0-0xD7FF, 0xE000-0x10FFFF), and > Unicode characters (non-characters such as 0xFFFF are excluded). The > most important of these are ASCII-only, octets, and Unicode scalar values. > > I thought about writing such an I-D, but, it would take more work than I > have time for right now, and one would have to address a few of issues. > > Currently the domain of ABNF is any unsigned integer, which is an > infinite set. This is good in that you can express anything positive, > and bad in that you can't express negative symbol productions (e.g., all > symbols EXCEPT "Q"). Once you constrain the set to finite, you can > express things like the regex [^Q]. This makes ABNF compilers more > complex. If you really want to express exclusionary productions, maybe > BNF languages aren't the right tool for the job? > > A compatibility problem arises when you want to reference productions > from other specs, that have different domains. Suppose that one spec is > authored in the UTF-8 domain (see LDAP specs e.g., RFC 4512), and other > spec is authored in the Unicode scalar value domain. This is a real > issue that I am encountering in draft-seantek-certspec, which I > encourage you to read...I wanted to import (reference) distinguishedName > from LDAP / RFC 4514, but distinguishedName includes UTF-8 multibyte > productions, and draft-seantek-certspec is authored to be at least > potentially character set neutral. It would be okay (in > draft-seantek-certspec) to write the string in ISO-2022-JP and then > convert it to something else. > > Converting between domains is computable (e.g., the Unicode scalar value > expressed as %x1F60A 😊 can be broken down into %xD83D.DE0A or > %xF0.9F.98.8A), but symbology for doing it would be arcane. We want to > keep ABNF as simple as possible, and that stuff strikes me as calling > for a complex syntax. > > Any proposal that involves Unicode is going to have to at least think > about character properties and classes. See UTS #18. > > Both the Unicode Consortium and W3C use BNF flavors to describe Unicode > things. As ABNF uses the same mathematical theories and their BNF > flavors, it would be good to replicate syntax where possible. > > Regards, > > Sean > > > > Thanks, > Paul > > > > > > _______________________________________________ > abnf-discuss mailing list > abnf-discuss@ietf.org > https://www.ietf.org/mailman/listinfo/abnf-discuss >
- Re: [abnf-discuss] Mail regarding draft-seantek-c… Paul Kyzivat
- Re: [abnf-discuss] Mail regarding draft-seantek-c… Sean Leonard
- Re: [abnf-discuss] Mail regarding draft-seantek-c… Sean Leonard
- Re: [abnf-discuss] Mail regarding draft-seantek-c… Sean Leonard
- Re: [abnf-discuss] Mail regarding draft-seantek-c… Paul Kyzivat
- Re: [abnf-discuss] Mail regarding draft-seantek-c… Paul Kyzivat
- Re: [abnf-discuss] Mail regarding draft-seantek-c… Jonathan Hansford
- Re: [abnf-discuss] Mail regarding draft-seantek-c… Sean Leonard
- Re: [abnf-discuss] Mail regarding draft-seantek-c… Paul Kyzivat
- Re: [abnf-discuss] Mail regarding draft-seantek-c… Sean Leonard