Re: [abnf-discuss] defining "compatible" extensions

Paul Kyzivat <pkyzivat@alum.mit.edu> Tue, 08 July 2014 20:10 UTC

X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000
Delivered-To: barryleiba.mailing.lists@gmail.com
Received: by 10.58.106.73 with SMTP id gs9csp677032veb; Tue, 8 Jul 2014 13:10:41 -0700 (PDT)
X-Received: by 10.70.96.234 with SMTP id dv10mr7066802pdb.96.1404850240478; Tue, 08 Jul 2014 13:10:40 -0700 (PDT)
Return-Path: <abnf-discuss-bounces@ietf.org>
Received: from mail.ietf.org (mail.ietf.org. [4.31.198.44]) by mx.google.com with ESMTPS id i4si6452644pdn.245.2014.07.08.13.10.39 for <multiple recipients> (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 08 Jul 2014 13:10:40 -0700 (PDT)
Received-SPF: pass (google.com: domain of abnf-discuss-bounces@ietf.org designates 4.31.198.44 as permitted sender) client-ip=4.31.198.44;
Authentication-Results: mx.google.com; spf=pass (google.com: domain of abnf-discuss-bounces@ietf.org designates 4.31.198.44 as permitted sender) smtp.mail=abnf-discuss-bounces@ietf.org; dkim=pass header.i=@ietf.org
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 8AC911A002E; Tue, 8 Jul 2014 13:10:39 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ietf.org; s=ietf1; t=1404850239; bh=dAGl1PX9+xR014rZu5YPMU8/n5eBBoMxTO9kpWJaifs=; h=Message-ID:Date:From:MIME-Version:To:References:In-Reply-To:Cc: Subject:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help: List-Subscribe:Content-Transfer-Encoding:Content-Type:Sender; b=wWfDKD536uNUEXcyKYk/JKF7xt04ofJ98TByhEdo3hc4/d06da+orFs7PnUGui/0O X9QcIPz+6gbjLdT/f6BhFCyy6/oHe2pBROQl/i4t3qRykCqSh5YorEjpMu9CYyDKPN ozgOWboeph75Jnib3fyq5EluIvVrBVSTQaWZ7YGk=
X-Original-To: abnf-discuss@ietfa.amsl.com
Delivered-To: abnf-discuss@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 04F171A002D for <abnf-discuss@ietfa.amsl.com>; Tue, 8 Jul 2014 13:10:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.235
X-Spam-Level:
X-Spam-Status: No, score=-1.235 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, SPF_SOFTFAIL=0.665] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nJOh8trIwYSm for <abnf-discuss@ietfa.amsl.com>; Tue, 8 Jul 2014 13:10:36 -0700 (PDT)
Received: from qmta14.westchester.pa.mail.comcast.net (qmta14.westchester.pa.mail.comcast.net [IPv6:2001:558:fe14:44:76:96:59:212]) by ietfa.amsl.com (Postfix) with ESMTP id 865A91A000B for <abnf-discuss@ietf.org>; Tue, 8 Jul 2014 13:10:36 -0700 (PDT)
Received: from omta07.westchester.pa.mail.comcast.net ([76.96.62.59]) by qmta14.westchester.pa.mail.comcast.net with comcast id Pvf51o0081GhbT85EwAcR8; Tue, 08 Jul 2014 20:10:36 +0000
Received: from Paul-Kyzivats-MacBook-Pro.local ([50.138.229.164]) by omta07.westchester.pa.mail.comcast.net with comcast id PwAV1o0053ZTu2S3TwAbl2; Tue, 08 Jul 2014 20:10:35 +0000
Message-ID: <53BC5030.6090201@alum.mit.edu>
Date: Tue, 08 Jul 2014 16:10:24 -0400
From: Paul Kyzivat <pkyzivat@alum.mit.edu>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:24.0) Gecko/20100101 Thunderbird/24.6.0
MIME-Version: 1.0
To: Ned Freed <ned.freed@mrochek.com>, Barry Leiba <barryleiba@computer.org>
References: <53BAB55A.2090008@alum.mit.edu> <CAC4RtVBnjxi5Q-0WMd2J9Hm8oct+agc8h2V=koSJnYNrpzZ_Ag@mail.gmail.com> <01P9XJXGHZLS0049PU@mauve.mrochek.com>
In-Reply-To: <01P9XJXGHZLS0049PU@mauve.mrochek.com>
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20140121; t=1404850236; bh=/p2yx+nhJW9OFSiP4Glkio5KHvfugYQ9k7ywnkSyKis=; h=Received:Received:Message-ID:Date:From:MIME-Version:To:Subject: Content-Type; b=drx1qWPJA1mo7t9KfCTHB3SDb7hdmQMgOqVpTI9d9VtIhqBDGEvAI2Y45CO4qAOtB c0T53U1O4UHa96/xANTO086x172hER+VJoPR08kcAU769XEbk+A23u6vXsYBc861u6 3bKMMSaTLD9veXI+UqLSlCR8dfH2uny6qrY1Dq2TTn8cn7hD101UWHg6F0zuw5MNV+ BoYKyy+k4TYnJUQ+hJ5G3kPCXrVzGi0nHo6hGjjnj/6WNkKNJLr2cfYImSTE4kLHr0 sr361K3rg5UKXTd+bHIv+zTa8fPP0g7furtbsW+tc+V0K290nywTqLZplG/OZeUffG Z1ASo1ZDBg8gg==
Archived-At: http://mailarchive.ietf.org/arch/msg/abnf-discuss/XFZ4yTbFRJG-HGAWMgAoxTKaFiA
Cc: "abnf-discuss@ietf.org" <abnf-discuss@ietf.org>
Subject: Re: [abnf-discuss] defining "compatible" extensions
X-BeenThere: abnf-discuss@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "General discussion about tools, activities and capabilities involving the ABNF meta-language" <abnf-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/abnf-discuss>, <mailto:abnf-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/abnf-discuss/>
List-Post: <mailto:abnf-discuss@ietf.org>
List-Help: <mailto:abnf-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/abnf-discuss>, <mailto:abnf-discuss-request@ietf.org?subject=subscribe>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Errors-To: abnf-discuss-bounces@ietf.org
Sender: abnf-discuss <abnf-discuss-bounces@ietf.org>

On 7/8/14 1:02 PM, Ned Freed wrote:
>> The main issue I have whenever this sort of thing come up is that ABNF
>> is there to specify syntax, not semantics.  The ABNF in 4566 correctly
>> says that the syntax of an att-name is that it's a token.  The
>> specification itself -- the rest of it, beyond the ABNF -- is there to
>> tell us what values to expect there, what to do with them, and how to
>> define extensions.
>
> I could not agree more. As things edge away from syntax and closer to
> semantics, ABNF stops being the right tool for the job.
>
> For heaven's sake, ABNF can't even handle things like complex ordering or "at
> most two of" that really can be viewed as part of the syntax. And people expect
> it to handle even higher level constructs that overlap existing syntax in
> complex ways? Please.

I agree that some things are better not done in ABNF.

OTOH, designing syntaxes to support backward compatible extensions in 
the future, and then actually defining those extensions in the future, 
is a very common thing. Having some tools and/or processes for doing so 
would be a very good thing. And perhaps some additional syntax support 
would help. Or maybe not. I'm open minded.

SDP is pretty crufty - it is hard to imagine a more poorly designed 
syntax for the purposes it now serves. (It has been pushed vastly beyond 
its originally intended purposes.) So we are left trying to keep it 
limping along, and keep the specifications comprehensible, while 
retaining backward compatibility.

We can, in principle, change the way the syntax is specified to make it 
more consistent and understandable, as long as the revised version 
matches exactly the same set of inputs. And in fact that is what I hope 
we do, within limits. But what we do to the base spec must not render 
any of the existing extensions invalid.

SDP has many extensions. If I count right there are 51 different RFCs 
that define extension attribute values. (There are also lots of 
extensions to other parts of SDP.)

The *right* thing to do for SDP is to deprecate it and introduce a 
better structured alternative. This was already tried, many years ago, 
but failed. People worked on SDPng for a long time. It was XML-based, 
and I think it was pretty reasonable. It didn't fail because SDPng was 
bad, but simply because it wasn't SDP. SDP is embedded in SIP and H.323 
and RTSP, and the countless deployments of that. There was no plausible 
way, and no incentive, to accomplish the migration.

> Which is not to say these sort of things can't be handled cleanly. A good
> example is RFC 5322 section 3.6.
>
> But complex overlapped syntaxes? My response to that is, "Just say no!"
>
> There was a bunch of this junk in RFC 1341 and RFC 1521, the early versions of
> MIME. When I did the RFC 2045/2046 revision, I summarily deleted all of it. And
> AFAIK it has not been missed. Indeed, if the number and type of questions I
> received was any indication it has resulted in less confusion overall.
>
> This doesn't mean you can't specify higher level constraints on top of existing
> sytnax. You can, you just don't use ABNF to do it. A good example where this
> was done are the various extensions to Sieve. Sieve has a simple stable
> syntax specified in RFC 5228. Many extensions have been defined, most recently
> draft-ietf-appsawg-sieve-duplicate-09, but a different format is used to
> specify the syntactic constraints on extensions. The one in the most recent
> draft is:
>
> Usage: "duplicate" [":handle" <handle: string>]
>                        [":header" <header-name: string> /
>                            ":uniqueid" <value: string>]
>                        [":seconds" <timeout: number>] [":last"]

I'm not familiar with that. Is the syntax for the above *defined*, or is 
it left to the reader to figure out? I have used similar syntaxes in the 
past, in software documentation. It is good enough for an informal 
description, but I wouldn't use it for a standard without a formal 
definition of the grammar.

One of the approaches SDP uses for defining attributes looks a bit like 
this. But it leaves a lot to hand waving.

> There are also limited - very limited IMO - cases where overlapped syntaxes are
> OK. Although it's fairly rare, this sometimes happens with media type
> parameters. I don't have a problem with media type parameter syntax being
> specifed in ABNF. But we're talking about adding constraints to a single token,
> not a complex overlay.
>
> And finally, there are cases where fully extensible syntaxes make sense, at
> least in the context of previous design decisions. The obvious example of this
> is IMAP, where extensions add commands and specify their syntax when they do.
> This seems to be worked out OK. (Personally, I'm not entirely happy with the
> approach taken in the base specification to syntax specification, but it's very
> small beer compared to other issues in the IMAP space.)
>
>> That many specs enumerate all the tokens that are valid at the time of
>> their writing isn't really relevant to this, as I see it.  Personally,
>> I think we should stop doing that *unless* we want to define something
>> that intentionally has no extensibility.  To me, this makes sense:
>
>>     florb-value = "true" / "false"
>
>> ...while this does not:
>
>>     florb-value = "true" / "false" / florb-ext
>>     florb-ext = token
>
> I agree; this gets close to the line if edging over it.

I understand. But it is widely used because it is *easy*, while still 
leaving a path for future extension. This technique is used *widely* in 
the ABNF of SIP.

We aren't contemplating a bis for sip any time soon. If we were, then I 
would be looking for something better.

>> The first is clearly saying that *syntactically*, there are only two
>> things that can appear in a florb-value.  You can safely write a
>> parser that looks for those and throws a syntax error if it sees
>> anything else.
>
>> What on Earth is the second saying, syntactically?  I'd better write
>> my parser to parse it as a token.  I presume there's something else in
>> the text that tells me what to do with "true" and "false"
>> semantically, and that explains the extensibility.  What's the point
>> of having that in the *syntax*?

Yes, there typically is text describing the semantics of true and false, 
and saying that anything else that matches is to be ignored unless 
defined by an extension supported by the application doing the parsing. 
And also accompanied by IANA considerations that set up a registry 
florb-values.

>> This sort of thing is also fine, as I see it:
>
>>     florb-value = token ; must be a registered item, as
>>                         ; defined in Section 3.2.1

Yes, this works. It especially works in this case, if this is just an 
enumeration of values. It is far less workable if instead of "token" it 
is "byte-string", and individual values have complex sub-syntax.

An advantage of having the known alternatives shown in the abnf is that 
it is quick to look up the base alternatives - they show up right there, 
without need to consult the text. But you still have to consult the 
registry for other values.

I realize this is a weak argument. The strongest argument is that is is 
a technique that is widely used.

>> Here we're using a comment in the syntax to point the reader to the
>> section that gives the semantics and explains the valid value.
>> References are good.
>
>> Twisting syntax specification around to try to make it go beyond
>> syntax is not good.
>
>> Clearly, opinions differ on this... but there's mine.
>
> The "code" I've "run" in this space says pretty clearly that complex
> overlapped syntaxes cause more problems than they solve. And I really
> don't think adding intersection capabilities to ABNF is a solution.

Let me give you another part of this story.

When defining the syntax of an extension, it is highly desirable to 
reuse rules that are defined in the base specification. E.g., SDP has 
definitions of:

media
fmt
proto
port
unicast-address
FQDN
token
integer
...

To define extensions that are stylistically consistent with the rest of 
SDP you really want to reuse these.

In specifications we typically just say something like:

token = <from RFC4566>

(Or something even less formal.)

But if you are reviewing that specification and want to formally verify 
the ABNF of the extension syntax then you must pull the full abnf for 
4566 and merge it with the syntax you are verifying.

This is true whether the new definition is formally linked to the old 
one (via =/) or not.

	Thanks,
	Paul

_______________________________________________
abnf-discuss mailing list
abnf-discuss@ietf.org
https://www.ietf.org/mailman/listinfo/abnf-discuss