Re: [abnf-discuss] defining "compatible" extensions

> On 7/8/14 1:02 PM, Ned Freed wrote:
> >> The main issue I have whenever this sort of thing come up is that ABNF
> >> is there to specify syntax, not semantics.  The ABNF in 4566 correctly
> >> says that the syntax of an att-name is that it's a token.  The
> >> specification itself -- the rest of it, beyond the ABNF -- is there to
> >> tell us what values to expect there, what to do with them, and how to
> >> define extensions.
> >
> > I could not agree more. As things edge away from syntax and closer to
> > semantics, ABNF stops being the right tool for the job.
> >
> > For heaven's sake, ABNF can't even handle things like complex ordering or "at
> > most two of" that really can be viewed as part of the syntax. And people expect
> > it to handle even higher level constructs that overlap existing syntax in
> > complex ways? Please.

> I agree that some things are better not done in ABNF.

> OTOH, designing syntaxes to support backward compatible extensions in
> the future, and then actually defining those extensions in the future,
> is a very common thing. Having some tools and/or processes for doing so
> would be a very good thing. And perhaps some additional syntax support
> would help. Or maybe not. I'm open minded.

Which is why both Sieve and IMAP, among others, employ such tools. In the case
of IMAP, ABNF provides the necessary capabilities. In the case of Sieve,
something different was needed and was informally defined.

> SDP is pretty crufty - it is hard to imagine a more poorly designed
> syntax for the purposes it now serves. (It has been pushed vastly beyond
> its originally intended purposes.) So we are left trying to keep it
> limping along, and keep the specifications comprehensible, while
> retaining backward compatibility.

> We can, in principle, change the way the syntax is specified to make it
> more consistent and understandable, as long as the revised version
> matches exactly the same set of inputs. And in fact that is what I hope
> we do, within limits. But what we do to the base spec must not render
> any of the existing extensions invalid.

> SDP has many extensions. If I count right there are 51 different RFCs
> that define extension attribute values. (There are also lots of
> extensions to other parts of SDP.)

I have some familiary with SDP as a result of having to deal with media type
issues in the SDP context. Given what I've seen and what I've read, I'd have to
call SDP an anomalous case. I cannot think of anything else remotely comparable
to the mess that is SDP.

It might, and I emphasize might, make sense to try and come up with a better
way to describe SDP and its many extensions. But I seriously doubt that a
solution for SDP, assuming one can be found, would be a useful thing for other
protocols.

> The *right* thing to do for SDP is to deprecate it and introduce a
> better structured alternative. This was already tried, many years ago,
> but failed. People worked on SDPng for a long time. It was XML-based,
> and I think it was pretty reasonable. It didn't fail because SDPng was
> bad, but simply because it wasn't SDP. SDP is embedded in SIP and H.323
> and RTSP, and the countless deployments of that. There was no plausible
> way, and no incentive, to accomplish the migration.

Sure. And there was a somewhat comparable situation in email. RFC 822 defined
an overly general two-level syntax using ABNF extended in prose in a couple of
key ways. that many people found hard to deal with. A large amount of effort
went in to cleaning this up, resulting in the syntax first laid out in RFC
2822. The new syntax is done as a single level in ABNF, segregrated into to
current and obsolete parts, with only a few aspects remaining as prose.

ABNF provided most of what  was needed in this particular case, but not
everything. But the lesson here is that the tools were less important than the
willingness to sit down and do a bunch of very hard work. Pete Resnick did that
work, and the result is the much cleaner specifications we have now.

> > Which is not to say these sort of things can't be handled cleanly. A good
> > example is RFC 5322 section 3.6.
> >
> > But complex overlapped syntaxes? My response to that is, "Just say no!"
> >
> > There was a bunch of this junk in RFC 1341 and RFC 1521, the early versions of
> > MIME. When I did the RFC 2045/2046 revision, I summarily deleted all of it. And
> > AFAIK it has not been missed. Indeed, if the number and type of questions I
> > received was any indication it has resulted in less confusion overall.
> >
> > This doesn't mean you can't specify higher level constraints on top of existing
> > sytnax. You can, you just don't use ABNF to do it. A good example where this
> > was done are the various extensions to Sieve. Sieve has a simple stable
> > syntax specified in RFC 5228. Many extensions have been defined, most recently
> > draft-ietf-appsawg-sieve-duplicate-09, but a different format is used to
> > specify the syntactic constraints on extensions. The one in the most recent
> > draft is:
> >
> > Usage: "duplicate" [":handle" <handle: string>]
> >                        [":header" <header-name: string> /
> >                            ":uniqueid" <value: string>]
> >                        [":seconds" <timeout: number>] [":last"]

> I'm not familiar with that. Is the syntax for the above *defined*, or is
> it left to the reader to figure out?

It's defined, abeit somewhat informally, not that there's much involved in
doing it. Two paragraphs along with a three references to other sections
sufficed. (RFC 5228 section 1.1) And I've yet to hear anyone say it's in any
way confusing.

> I have used similar syntaxes in the
> past, in software documentation. It is good enough for an informal
> description, but I wouldn't use it for a standard without a formal
> definition of the grammar.

Lots of running code says that's unnecessary.

> One of the approaches SDP uses for defining attributes looks a bit like
> this. But it leaves a lot to hand waving.

Again, I caution against considering SDP as a basis for designing a general
solution in this space.

> > There are also limited - very limited IMO - cases where overlapped syntaxes are
> > OK. Although it's fairly rare, this sometimes happens with media type
> > parameters. I don't have a problem with media type parameter syntax being
> > specifed in ABNF. But we're talking about adding constraints to a single token,
> > not a complex overlay.
> >
> > And finally, there are cases where fully extensible syntaxes make sense, at
> > least in the context of previous design decisions. The obvious example of this
> > is IMAP, where extensions add commands and specify their syntax when they do.
> > This seems to be worked out OK. (Personally, I'm not entirely happy with the
> > approach taken in the base specification to syntax specification, but it's very
> > small beer compared to other issues in the IMAP space.)
> >
> >> That many specs enumerate all the tokens that are valid at the time of
> >> their writing isn't really relevant to this, as I see it.  Personally,
> >> I think we should stop doing that *unless* we want to define something
> >> that intentionally has no extensibility.  To me, this makes sense:
> >
> >>     florb-value = "true" / "false"
> >
> >> ...while this does not:
> >
> >>     florb-value = "true" / "false" / florb-ext
> >>     florb-ext = token
> >
> > I agree; this gets close to the line if edging over it.

> I understand. But it is widely used because it is *easy*, while still
> leaving a path for future extension. This technique is used *widely* in
> the ABNF of SIP.

It's used in MIME as well. I really wanted to get rid of it but felt it
represented too much of a change.

More generally, just because it's widely used doesn't make it a good idea. It
isn't. And having worked on specifications that use it, it's anything but easy.
"Lazy" is more like it.

> We aren't contemplating a bis for sip any time soon. If we were, then I
> would be looking for something better.

> >> The first is clearly saying that *syntactically*, there are only two
> >> things that can appear in a florb-value.  You can safely write a
> >> parser that looks for those and throws a syntax error if it sees
> >> anything else.
> >
> >> What on Earth is the second saying, syntactically?  I'd better write
> >> my parser to parse it as a token.  I presume there's something else in
> >> the text that tells me what to do with "true" and "false"
> >> semantically, and that explains the extensibility.  What's the point
> >> of having that in the *syntax*?

> Yes, there typically is text describing the semantics of true and false,
> and saying that anything else that matches is to be ignored unless
> defined by an extension supported by the application doing the parsing.
> And also accompanied by IANA considerations that set up a registry
> florb-values.

> >> This sort of thing is also fine, as I see it:
> >
> >>     florb-value = token ; must be a registered item, as
> >>                         ; defined in Section 3.2.1

> Yes, this works. It especially works in this case, if this is just an
> enumeration of values. It is far less workable if instead of "token" it
> is "byte-string", and individual values have complex sub-syntax.

> An advantage of having the known alternatives shown in the abnf is that
> it is quick to look up the base alternatives - they show up right there,
> without need to consult the text. But you still have to consult the
> registry for other values.

> I realize this is a weak argument. The strongest argument is that is is
> a technique that is widely used.

Actually, it's no argument at all. If you want the current list of values
in the ABNF, that's what commments are for. All the advantages of proximity
without messing up the grammar.

I actually had a version of MIME at one point that did things that way, but
ended up backing the change out.

> >> Here we're using a comment in the syntax to point the reader to the
> >> section that gives the semantics and explains the valid value.
> >> References are good.
> >
> >> Twisting syntax specification around to try to make it go beyond
> >> syntax is not good.
> >
> >> Clearly, opinions differ on this... but there's mine.
> >
> > The "code" I've "run" in this space says pretty clearly that complex
> > overlapped syntaxes cause more problems than they solve. And I really
> > don't think adding intersection capabilities to ABNF is a solution.

> Let me give you another part of this story.

> When defining the syntax of an extension, it is highly desirable to
> reuse rules that are defined in the base specification. E.g., SDP has
> definitions of:

> media
> fmt
> proto
> port
> unicast-address
> FQDN
> token
> integer
> ...

> To define extensions that are stylistically consistent with the rest of
> SDP you really want to reuse these.

> In specifications we typically just say something like:

> token = <from RFC4566>

> (Or something even less formal.)

> But if you are reviewing that specification and want to formally verify
> the ABNF of the extension syntax then you must pull the full abnf for
> 4566 and merge it with the syntax you are verifying.

> This is true whether the new definition is formally linked to the old
> one (via =/) or not.

This is an entirely separate problem IMO. We could easily define an import
mechanism for ABNF that allows for importing from either RFCs or drafts. Do
that and there should be no problem performing more complete checks.

What you're talking about here is more than this - you want a way to specify
that these entites are combined in a fashion that's consistent with the base
specification. I have no problem with doing that, but I remain to be convinced
that it makes sense to define a general mechanism for doing it.

				Ned

_______________________________________________
abnf-discuss mailing list
abnf-discuss@ietf.org
https://www.ietf.org/mailman/listinfo/abnf-discuss