Re: [abnf-discuss] Mail regarding draft-seantek-constrained-abnf
Sean Leonard <dev+ietf@seantek.com> Fri, 08 July 2016 16:23 UTC
Return-Path: <dev+ietf@seantek.com>
X-Original-To: abnf-discuss@ietfa.amsl.com
Delivered-To: abnf-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D255012D518; Fri, 8 Jul 2016 09:23:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.6
X-Spam-Level:
X-Spam-Status: No, score=-2.6 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4QBJoLKA3xkT; Fri, 8 Jul 2016 09:23:54 -0700 (PDT)
Received: from mxout-08.mxes.net (mxout-08.mxes.net [216.86.168.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0ABCF12D7E9; Fri, 8 Jul 2016 09:23:42 -0700 (PDT)
Received: from [192.168.123.7] (unknown [75.83.2.34]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id 47CF550A85; Fri, 8 Jul 2016 12:23:41 -0400 (EDT)
To: Paul Kyzivat <pkyzivat@alum.mit.edu>, Jonathan Hansford <jonathan@hansfords.net>, "draft-seantek-constrained-abnf@ietf.org" <draft-seantek-constrained-abnf@ietf.org>
References: <20160708121202.A3E5D12D19F@ietfa.amsl.com> <43d3ffea-57de-f7ef-e740-e448564008ed@alum.mit.edu>
From: Sean Leonard <dev+ietf@seantek.com>
Message-ID: <bf12cdc4-8d17-d6aa-cc01-5afad19127ac@seantek.com>
Date: Fri, 08 Jul 2016 09:23:02 -0700
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <43d3ffea-57de-f7ef-e740-e448564008ed@alum.mit.edu>
Content-Type: multipart/alternative; boundary="------------AB5C8BEC35837BE2D0F6DA3C"
Archived-At: <https://mailarchive.ietf.org/arch/msg/abnf-discuss/WKXyIMVDrIq5vu-kOCtoRezBHvY>
Cc: "abnf-discuss@ietf.org" <abnf-discuss@ietf.org>
Subject: Re: [abnf-discuss] Mail regarding draft-seantek-constrained-abnf
X-BeenThere: abnf-discuss@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "General discussion about tools, activities and capabilities involving the ABNF meta-language" <abnf-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/abnf-discuss>, <mailto:abnf-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/abnf-discuss/>
List-Post: <mailto:abnf-discuss@ietf.org>
List-Help: <mailto:abnf-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/abnf-discuss>, <mailto:abnf-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Jul 2016 16:23:57 -0000
+abnf-discuss@ On 7/8/2016 8:06 AM, Paul Kyzivat wrote: > On 7/8/16 8:11 AM, Jonathan Hansford wrote: >> Hi, >> >> >> >> I’ve just discovered draft-seantek-constrained-abnf-00 and have a couple >> of questions: > > Note that while Sean and I have been bouncing ideas around about this > for some time, the draft is a trial balloon and the specific text is > all Sean's. I'm not yet wild about some of the syntax, but it is good > to have something tangible written down to talk about. > > So far it seems that only Sean and I are interested. If you are, then > that would make three. Yay for three! > >> 1) Does Constrained ABNF work with RFC 7405 “Case-Sensitive String >> Support in ABNF”? Either way, it might be worth mentioning that RFC in >> your I-D. > > Certainly it does. The functionality is entirely orthogonal. Yes, what Paul said. > >> 2) Has there been any call to further constrain ABNF such that >> rules can have length constraints applied to them? I’m not sure how much >> benefit there would be for RFCs (though I can think of examples where >> identifying such a constraint within the ABNF rather than just within >> the accompanying text would be beneficial), but a Constrained ABNF >> compliant parser that included length constraints would be good. > > I'm not aware of any such thing. Of course it already has repetition > constraints, but I guess that is not what you are looking for. > > I have difficulty imagining how you would fit such constraints into > the syntax. The syntax of ABNF is pretty nice for what it does, but it > doesn't leave much room for extension because we are running out of > special characters. (Perhaps the possibilities could be expanded by > using unicode characters. Imagine what could be done with emojis as > operator characters.) Length constraints are expressible with current ABNF. Generically, suppose you have an identifier comprised of ALPHAs and DIGITs: identifier = 1*(ALPHA / DIGIT) Well, if you want to limit identifier to 20 chars, then you can say max twenty: identifier = 1*20(ALPHA / DIGIT) As Paul says, this is a repetition constraints, and is easy to analyze with the repeated symbols are fixed-width. But what about variable width? Well, suppose you have Domain from RFC 5321: Domain = sub-domain *("." sub-domain) sub-domain = Let-dig [Ldh-str] Let-dig = ALPHA / DIGIT Ldh-str = *( ALPHA / DIGIT / "-" ) Let-dig Now suppose you want to constrain Domain to be 255 chars. Such a constrained domain might be called DNSDomain. Then you can say: DNSDomain ^ Domain = 1*255ASCII ...and that's it! Basically you have to compose a production that evaluates to a single symbol (character/integer), where the domain of the symbol is any possible character that can appear in the production. The minimum conforming production would be: DNSDomain ^ Domain = 1*255(ALPHA / DIGIT / "-" / ".") The maximum conforming production would be: DNSDomain ^ Domain = 1*255(%x00-FFFFFFFFFFFFFFFFFFFFFFFFF...(infinity)) Reducing the production from a non-maximum to the minimum form, can be evaluated programmatically. A parser would need to evaluate the constrained production for all "reachable" single terminal symbols (characters/integers that may appear at least once), which can be done more-or-less in O(n) time. It adds all such integers to a set, and then expresses the set. It can also be shown that you can also integrate the constraint into the base production, at least for all non-recursive productions. In this case, you would have to break down Domain into a very, very long string of terminal symbol possibilities (essentially, expressing it as a huge finite state machine). Once you get all the possible states, you can limit each symbol combination so that the total combinations do not exceed 255. This can probably be done computationally, but the proof would require a lot of math that I do not care to figure out right now. :) And it's unclear how computationally difficult it would be to do it in the general case. Actually now that I think about it, if your constraint is to limit the total number of symbols, you can probably prove that any recursive expression can be expressed as a finite state machine. Therefore, integrating length constraints would actually be computable. It would also be super-ugly, but you get the point. *** Regarding Paul's Unicode point: it would be nice for ABNF to have an extension that defines the domain of the input symbols, with specific options for: ASCII-only, octets, fixed-bit-width units, UTF-8, UTF-16LE/BE, and Unicode scalar values (0-0xD7FF, 0xE000-0x10FFFF), and Unicode characters (non-characters such as 0xFFFF are excluded). The most important of these are ASCII-only, octets, and Unicode scalar values. I thought about writing such an I-D, but, it would take more work than I have time for right now, and one would have to address a few of issues. Currently the domain of ABNF is any unsigned integer, which is an infinite set. This is good in that you can express anything positive, and bad in that you can't express negative symbol productions (e.g., all symbols EXCEPT "Q"). Once you constrain the set to finite, you can express things like the regex [^Q]. This makes ABNF compilers more complex. If you really want to express exclusionary productions, maybe BNF languages aren't the right tool for the job? A compatibility problem arises when you want to reference productions from other specs, that have different domains. Suppose that one spec is authored in the UTF-8 domain (see LDAP specs e.g., RFC 4512), and other spec is authored in the Unicode scalar value domain. This is a real issue that I am encountering in draft-seantek-certspec, which I encourage you to read...I wanted to import (reference) distinguishedName from LDAP / RFC 4514, but distinguishedName includes UTF-8 multibyte productions, and draft-seantek-certspec is authored to be at least potentially character set neutral. It would be okay (in draft-seantek-certspec) to write the string in ISO-2022-JP and then convert it to something else. Converting between domains is computable (e.g., the Unicode scalar value expressed as %x1F60A 😊 can be broken down into %xD83D.DE0A or %xF0.9F.98.8A), but symbology for doing it would be arcane. We want to keep ABNF as simple as possible, and that stuff strikes me as calling for a complex syntax. Any proposal that involves Unicode is going to have to at least think about character properties and classes. See UTS #18. Both the Unicode Consortium and W3C use BNF flavors to describe Unicode things. As ABNF uses the same mathematical theories and their BNF flavors, it would be good to replicate syntax where possible. Regards, Sean > > Thanks, > Paul >
- Re: [abnf-discuss] Mail regarding draft-seantek-c… Paul Kyzivat
- Re: [abnf-discuss] Mail regarding draft-seantek-c… Sean Leonard
- Re: [abnf-discuss] Mail regarding draft-seantek-c… Sean Leonard
- Re: [abnf-discuss] Mail regarding draft-seantek-c… Sean Leonard
- Re: [abnf-discuss] Mail regarding draft-seantek-c… Paul Kyzivat
- Re: [abnf-discuss] Mail regarding draft-seantek-c… Paul Kyzivat
- Re: [abnf-discuss] Mail regarding draft-seantek-c… Jonathan Hansford
- Re: [abnf-discuss] Mail regarding draft-seantek-c… Sean Leonard
- Re: [abnf-discuss] Mail regarding draft-seantek-c… Paul Kyzivat
- Re: [abnf-discuss] Mail regarding draft-seantek-c… Sean Leonard