Re: [abnf-discuss] Mail regarding draft-seantek-constrained-abnf

Paul Kyzivat <pkyzivat@alum.mit.edu> Fri, 08 July 2016 22:11 UTC

Return-Path: <pkyzivat@alum.mit.edu>
X-Original-To: abnf-discuss@ietfa.amsl.com
Delivered-To: abnf-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9505612D94C for <abnf-discuss@ietfa.amsl.com>; Fri, 8 Jul 2016 15:11:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.935
X-Spam-Level:
X-Spam-Status: No, score=-1.935 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_SOFTFAIL=0.665] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=comcast.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Z2bwbEXTrqWc for <abnf-discuss@ietfa.amsl.com>; Fri, 8 Jul 2016 15:11:28 -0700 (PDT)
Received: from resqmta-ch2-05v.sys.comcast.net (resqmta-ch2-05v.sys.comcast.net [IPv6:2001:558:fe21:29:69:252:207:37]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4EFEA12D662 for <abnf-discuss@ietf.org>; Fri, 8 Jul 2016 15:11:12 -0700 (PDT)
Received: from resomta-ch2-05v.sys.comcast.net ([69.252.207.101]) by resqmta-ch2-05v.sys.comcast.net with SMTP id LdzLbFD5h8JCNLdzLbVaDd; Fri, 08 Jul 2016 22:11:11 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20140121; t=1468015871; bh=DjPxpINfGZNqM1IC8M1tb+vhBRX/ZXvWChqaP94G254=; h=Received:Received:Subject:To:From:Message-ID:Date:MIME-Version: Content-Type; b=kImKfbzN7dfTls0ibrbH43sbkAYtWVod2aIexv2KptLd9J65GFE+Dg0HkQtVOhOH1 FFot1n1Qc9Ji36Sk5eyGr/16ouNmmA/AY97Eb3VEjp83liFQGDqq/NmQIFhjvsXNG3 xJ+TjdNl2wSKVi9M+o4wKsifcT4w7yNEOgVsQ2Rth57FW5WSbqQeCTag68VpAk30dW BjQmDVGaB1D5/DG/iRBIZm6qBOz5vuO1rpNguajslPdpHi44Y1aGQ3hQFSbyJYLRsS Rvw58/k9+qjqav/G7sh8k/3kkXsLr7pVOjaznr6Wyz0oFllzEDuVqHE2EIpY1pZWmN GqAnFdbMPqYKQ==
Received: from Paul-Kyzivats-MacBook-Pro.local ([73.218.51.154]) by comcast with SMTP id LdzLbUtb6qsftLdzLbpQAE; Fri, 08 Jul 2016 22:11:11 +0000
To: abnf-discuss@ietf.org
References: <20160708121202.A3E5D12D19F@ietfa.amsl.com> <43d3ffea-57de-f7ef-e740-e448564008ed@alum.mit.edu> <bf12cdc4-8d17-d6aa-cc01-5afad19127ac@seantek.com> <017701d1d95c$ebf89310$c3e9b930$@hansfords.net>
From: Paul Kyzivat <pkyzivat@alum.mit.edu>
Message-ID: <960b5a55-e1ea-fe42-22b8-bcf929826c7a@alum.mit.edu>
Date: Fri, 08 Jul 2016 18:11:09 -0400
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:45.0) Gecko/20100101 Thunderbird/45.1.1
MIME-Version: 1.0
In-Reply-To: <017701d1d95c$ebf89310$c3e9b930$@hansfords.net>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
X-CMAE-Envelope: MS4wfP9ZASYXBD3d2OKavE7UaOxtV4jIxlhL18AvuAvTY8UO0GiaQJPhJ6YousVSaozu2YV5ndtgIrEDKBmnxr5AvEzQikNd6g/i8lWsxoJJR9vkv0fIQt8y hTEdSZGXvpLJJiKRamT5Ju1OUpbH/Vwgl2bwALm55gKj2LRJHlKUps0HCtC4hqa4vtbawcf8uh4aJh4DP8QBEWegEsyCDI10jUI=
Archived-At: <https://mailarchive.ietf.org/arch/msg/abnf-discuss/JahZLGuJf3hyjcLYcRYsYa3NScI>
Subject: Re: [abnf-discuss] Mail regarding draft-seantek-constrained-abnf
X-BeenThere: abnf-discuss@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "General discussion about tools, activities and capabilities involving the ABNF meta-language" <abnf-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/abnf-discuss>, <mailto:abnf-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/abnf-discuss/>
List-Post: <mailto:abnf-discuss@ietf.org>
List-Help: <mailto:abnf-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/abnf-discuss>, <mailto:abnf-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Jul 2016 22:11:30 -0000

On 7/8/16 5:08 PM, Jonathan Hansford wrote:
> Paul and Sean,
>
>
>
> Thanks for the replies. Life is too hectic at the moment, but I’m hoping
> I will find some time to properly read the I-D and your replies and see
> how it would all work for me. Is there a parser for constrained ABNF at
> the moment?

AFAIK there is not. As I said, the draft is a trial balloon. I think it 
still needs a lot of work, not on the general concept, but on how you 
link grammars together. (The use case that drives my interest is 
defining extensibility hooks in a grammar when defining a standard, and 
then writing another draft that adds syntactical extensions from such a 
hook.)

I'm not aware of a generally available parser for case-sensitive ABNF 
either. I developed extensions to Bap to do it, and made them available 
to the author on github, but he never took them. I saw somebody else 
with another (worse) implementation, but AFAIK it also at least hasn't 
made it into the web-based Bap linked to the ietf tools page.

	Thanks,
	Paul

> Jonathan
>
>
>
> =O)
>
>
>
> *From:*Sean Leonard [mailto:dev+ietf@seantek.com]
> *Sent:* 08 July 2016 17:23
> *To:* Paul Kyzivat <pkyzivat@alum.mit.edu>; Jonathan Hansford
> <jonathan@hansfords.net>; draft-seantek-constrained-abnf@ietf.org
> *Cc:* abnf-discuss@ietf.org
> *Subject:* Re: Mail regarding draft-seantek-constrained-abnf
>
>
>
> +abnf-discuss@
>
> On 7/8/2016 8:06 AM, Paul Kyzivat wrote:
>
>     On 7/8/16 8:11 AM, Jonathan Hansford wrote:
>
>         Hi,
>
>
>
>         I’ve just discovered draft-seantek-constrained-abnf-00 and have
>         a couple
>         of questions:
>
>
>     Note that while Sean and I have been bouncing ideas around about
>     this for some time, the draft is a trial balloon and the specific
>     text is all Sean's. I'm not yet wild about some of the syntax, but
>     it is good to have something tangible written down to talk about.
>
>     So far it seems that only Sean and I are interested. If you are,
>     then that would make three.
>
>
> Yay for three!
>
>
>
>
>         1)      Does Constrained ABNF work with RFC 7405 “Case-Sensitive
>         String
>         Support in ABNF”? Either way, it might be worth mentioning that
>         RFC in
>         your I-D.
>
>
>     Certainly it does. The functionality is entirely orthogonal.
>
>
> Yes, what Paul said.
>
>
>
>
>         2)      Has there been any call to further constrain ABNF such that
>         rules can have length constraints applied to them? I’m not sure
>         how much
>         benefit there would be for RFCs (though I can think of examples
>         where
>         identifying such a constraint within the ABNF rather than just
>         within
>         the accompanying text would be beneficial), but a Constrained ABNF
>         compliant parser that included length constraints would be good.
>
>
>     I'm not aware of any such thing. Of course it already has repetition
>     constraints, but I guess that is not what you are looking for.
>
>     I have difficulty imagining how you would fit such constraints into
>     the syntax. The syntax of ABNF is pretty nice for what it does, but
>     it doesn't leave much room for extension because we are running out
>     of special characters. (Perhaps the possibilities could be expanded
>     by using unicode characters. Imagine what could be done with emojis
>     as operator characters.)
>
>
> Length constraints are expressible with current ABNF. Generically,
> suppose you have an identifier comprised of ALPHAs and DIGITs:
>
> identifier = 1*(ALPHA / DIGIT)
>
> Well, if you want to limit identifier to 20 chars, then you can say max
> twenty:
>
> identifier = 1*20(ALPHA / DIGIT)
>
> As Paul says, this is a repetition constraints, and is easy to analyze
> with the repeated symbols are fixed-width. But what about variable width?
>
> Well, suppose you have Domain from RFC 5321:
>
>
>
>    Domain         = sub-domain *("." sub-domain)
>
>    sub-domain     = Let-dig [Ldh-str]
>
>    Let-dig        = ALPHA / DIGIT
>
>    Ldh-str        = *( ALPHA / DIGIT / "-" ) Let-dig
>
>
>
> Now suppose you want to constrain Domain to be 255 chars. Such a
> constrained domain might be called DNSDomain. Then you can say:
>
>
>    DNSDomain ^ Domain = 1*255ASCII
>
>
> ...and that's it! Basically you have to compose a production that
> evaluates to a single symbol (character/integer), where the domain of
> the symbol is any possible character that can appear in the production.
> The minimum conforming production would be:
>
>
>    DNSDomain ^ Domain = 1*255(ALPHA / DIGIT / "-" / ".")
>
>
> The maximum conforming production would be:
>
>
>    DNSDomain ^ Domain = 1*255(%x00-FFFFFFFFFFFFFFFFFFFFFFFFF...(infinity))
>
>
> Reducing the production from a non-maximum to the minimum form, can be
> evaluated programmatically. A parser would need to evaluate the
> constrained production for all "reachable" single terminal symbols
> (characters/integers that may appear at least once), which can be done
> more-or-less in O(n) time. It adds all such integers to a set, and then
> expresses the set.
>
> It can also be shown that you can also integrate the constraint into the
> base production, at least for all non-recursive productions. In this
> case, you would have to break down Domain into a very, very long string
> of terminal symbol possibilities (essentially, expressing it as a huge
> finite state machine). Once you get all the possible states, you can
> limit each symbol combination so that the total combinations do not
> exceed 255. This can probably be done computationally, but the proof
> would require a lot of math that I do not care to figure out right now.
> :) And it's unclear how computationally difficult it would be to do it
> in the general case.
>
> Actually now that I think about it, if your constraint is to limit the
> total number of symbols, you can probably prove that any recursive
> expression can be expressed as a finite state machine. Therefore,
> integrating length constraints would actually be computable. It would
> also be super-ugly, but you get the point.
>
> ***
>
> Regarding Paul's Unicode point: it would be nice for ABNF to have an
> extension that defines the domain of the input symbols, with specific
> options for: ASCII-only, octets, fixed-bit-width units, UTF-8,
> UTF-16LE/BE, and Unicode scalar values (0-0xD7FF, 0xE000-0x10FFFF), and
> Unicode characters (non-characters such as 0xFFFF are excluded). The
> most important of these are ASCII-only, octets, and Unicode scalar values.
>
> I thought about writing such an I-D, but, it would take more work than I
> have time for right now, and one would have to address a few of issues.
>
> Currently the domain of ABNF is any unsigned integer, which is an
> infinite set. This is good in that you can express anything positive,
> and bad in that you can't express negative symbol productions (e.g., all
> symbols EXCEPT "Q"). Once you constrain the set to finite, you can
> express things like the regex [^Q]. This makes ABNF compilers more
> complex. If you really want to express exclusionary productions, maybe
> BNF languages aren't the right tool for the job?
>
> A compatibility problem arises when you want to reference productions
> from other specs, that have different domains. Suppose that one spec is
> authored in the UTF-8 domain (see LDAP specs e.g., RFC 4512), and other
> spec is authored in the Unicode scalar value domain. This is a real
> issue that I am encountering in draft-seantek-certspec, which I
> encourage you to read...I wanted to import (reference) distinguishedName
> from LDAP / RFC 4514, but distinguishedName includes UTF-8 multibyte
> productions, and draft-seantek-certspec is authored to be at least
> potentially character set neutral. It would be okay (in
> draft-seantek-certspec) to write the string in ISO-2022-JP and then
> convert it to something else.
>
> Converting between domains is computable (e.g., the Unicode scalar value
> expressed as %x1F60A 😊 can be broken down into %xD83D.DE0A or
> %xF0.9F.98.8A), but symbology for doing it would be arcane. We want to
> keep ABNF as simple as possible, and that stuff strikes me as calling
> for a complex syntax.
>
> Any proposal that involves Unicode is going to have to at least think
> about character properties and classes. See UTS #18.
>
> Both the Unicode Consortium and W3C use BNF flavors to describe Unicode
> things. As ABNF uses the same mathematical theories and their BNF
> flavors, it would be good to replicate syntax where possible.
>
> Regards,
>
> Sean
>
>
>
>         Thanks,
>         Paul
>
>
>
>
>
> _______________________________________________
> abnf-discuss mailing list
> abnf-discuss@ietf.org
> https://www.ietf.org/mailman/listinfo/abnf-discuss
>