Re: [abnf-discuss] defining "compatible" extensions

inline

On 7/8/14 8:42 PM, Ned Freed wrote:

>> OTOH, designing syntaxes to support backward compatible extensions in
>> the future, and then actually defining those extensions in the future,
>> is a very common thing. Having some tools and/or processes for doing so
>> would be a very good thing. And perhaps some additional syntax support
>> would help. Or maybe not. I'm open minded.
>
> Which is why both Sieve and IMAP, among others, employ such tools. In
> the case
> of IMAP, ABNF provides the necessary capabilities. In the case of Sieve,
> something different was needed and was informally defined.

I'll take your word for it. Part of the problem is that there isn't 
broad knowledge of such things, by the people who could make use of 
them, at the time when it would make a difference.

(Typically there is a window of opportunity when something new is being 
defined and the pattern for extensibility is established. After that it 
is much more painful to change.)

It is probably unrealistic to expect that we could educate all the 
potential authors in advance.

This might be something that ought to be caught during IESG review, or 
by some more specialized review analogous to a security review. (E.g., a 
"syntax review".)

>> SDP is pretty crufty - it is hard to imagine a more poorly designed
>> syntax for the purposes it now serves. (It has been pushed vastly beyond
>> its originally intended purposes.) So we are left trying to keep it
>> limping along, and keep the specifications comprehensible, while
>> retaining backward compatibility.
>
>> We can, in principle, change the way the syntax is specified to make it
>> more consistent and understandable, as long as the revised version
>> matches exactly the same set of inputs. And in fact that is what I hope
>> we do, within limits. But what we do to the base spec must not render
>> any of the existing extensions invalid.
>
>> SDP has many extensions. If I count right there are 51 different RFCs
>> that define extension attribute values. (There are also lots of
>> extensions to other parts of SDP.)
>
> I have some familiary with SDP as a result of having to deal with media
> type
> issues in the SDP context. Given what I've seen and what I've read, I'd
> have to
> call SDP an anomalous case. I cannot think of anything else remotely
> comparable
> to the mess that is SDP.

I'll have to agree with you there.

But SIP has issues too. It is not as awful a syntax as SDP, but it is 
also frequently extended by separate drafts, using the techniques we 
have been discussing here.

> It might, and I emphasize might, make sense to try and come up with a
> better way to describe SDP and its many extensions.

IMO the biggest problem is not so much to come up with a "better" way, 
but rather to come up with a *consistent* and well defined way.

The way used in 4566 itself is something like you described for sieve - 
the syntax of individual attributes is independent of the overall ABNF 
syntax of SDP, and doesn't use ABNF. But what it does use is too 
informal - it is ambiguous.

There has been an informal consensus that the attribute values should be 
defined using ABNF. But how that should be done - whether it is linked 
in to the general SDP syntax or not, and how - is not consistently 
agreed to.

> But I seriously doubt that a
> solution for SDP, assuming one can be found, would be a useful thing for
> other protocols.

Perhaps. Hard to say until it is done.

>> The *right* thing to do for SDP is to deprecate it and introduce a
>> better structured alternative. This was already tried, many years ago,
>> but failed. People worked on SDPng for a long time. It was XML-based,
>> and I think it was pretty reasonable. It didn't fail because SDPng was
>> bad, but simply because it wasn't SDP. SDP is embedded in SIP and H.323
>> and RTSP, and the countless deployments of that. There was no plausible
>> way, and no incentive, to accomplish the migration.
>
> Sure. And there was a somewhat comparable situation in email. RFC 822
> defined
> an overly general two-level syntax using ABNF extended in prose in a
> couple of
> key ways. that many people found hard to deal with. A large amount of
> effort
> went in to cleaning this up, resulting in the syntax first laid out in RFC
> 2822. The new syntax is done as a single level in ABNF, segregrated into to
> current and obsolete parts, with only a few aspects remaining as prose.
>
> ABNF provided most of what  was needed in this particular case, but not
> everything. But the lesson here is that the tools were less important
> than the
> willingness to sit down and do a bunch of very hard work. Pete Resnick
> did that
> work, and the result is the much cleaner specifications we have now.

As I noted, there was a fairly large effort on SDPng that failed. There 
is now a bis in progress. If it doesn't get done as part of that then it 
is unlikely to get done at all. Most people are more concerned with not 
rocking the boat than on doing radical cleanup. I'm really the only one 
who is advocating for fixing this at all. And I have little motivation 
other than my own sense of cleanliness. I have a lot of other things 
that are more important right now. And I won't be compensated for time 
spent on it.

So I'm looking for something relatively simple that will improve the 
situation, even if it doesn't make it perfect. ("Clean" and "perfect" 
are words that don't apply to SDP.)

>> > Which is not to say these sort of things can't be handled cleanly. A
>> good
>> > example is RFC 5322 section 3.6.
>> >
>> > But complex overlapped syntaxes? My response to that is, "Just say no!"
>> >
>> > There was a bunch of this junk in RFC 1341 and RFC 1521, the early
>> versions of
>> > MIME. When I did the RFC 2045/2046 revision, I summarily deleted all
>> of it. And
>> > AFAIK it has not been missed. Indeed, if the number and type of
>> questions I
>> > received was any indication it has resulted in less confusion overall.
>> >
>> > This doesn't mean you can't specify higher level constraints on top
>> of existing
>> > sytnax. You can, you just don't use ABNF to do it. A good example
>> where this
>> > was done are the various extensions to Sieve. Sieve has a simple stable
>> > syntax specified in RFC 5228. Many extensions have been defined,
>> most recently
>> > draft-ietf-appsawg-sieve-duplicate-09, but a different format is
>> used to
>> > specify the syntactic constraints on extensions. The one in the most
>> recent
>> > draft is:
>> >
>> > Usage: "duplicate" [":handle" <handle: string>]
>> >                        [":header" <header-name: string> /
>> >                            ":uniqueid" <value: string>]
>> >                        [":seconds" <timeout: number>] [":last"]
>
>> I'm not familiar with that. Is the syntax for the above *defined*, or is
>> it left to the reader to figure out?
>
> It's defined, abeit somewhat informally, not that there's much involved in
> doing it. Two paragraphs along with a three references to other sections
> sufficed. (RFC 5228 section 1.1) And I've yet to hear anyone say it's in
> any
> way confusing.

OK. It is closer to a "traditional" approach to language specification, 
separating the lexical syntax from the language syntax, and using 
differing specification techniques for each.

It is a proven approach, but not one that can be retrofitted to 
situations where there is no separate lexical syntax.

>> I have used similar syntaxes in the
>> past, in software documentation. It is good enough for an informal
>> description, but I wouldn't use it for a standard without a formal
>> definition of the grammar.
>
> Lots of running code says that's unnecessary.

It appears that the "syntax" definition via "Usage" is formal enough 
given the foundation of lexical syntax and the general defined syntax of 
a Command.

>> One of the approaches SDP uses for defining attributes looks a bit like
>> this. But it leaves a lot to hand waving.
>
> Again, I caution against considering SDP as a basis for designing a general
> solution in this space.
>
>> > There are also limited - very limited IMO - cases where overlapped
>> syntaxes are
>> > OK. Although it's fairly rare, this sometimes happens with media type
>> > parameters. I don't have a problem with media type parameter syntax
>> being
>> > specifed in ABNF. But we're talking about adding constraints to a
>> single token,
>> > not a complex overlay.
>> >
>> > And finally, there are cases where fully extensible syntaxes make
>> sense, at
>> > least in the context of previous design decisions. The obvious
>> example of this
>> > is IMAP, where extensions add commands and specify their syntax when
>> they do.
>> > This seems to be worked out OK. (Personally, I'm not entirely happy
>> with the
>> > approach taken in the base specification to syntax specification,
>> but it's very
>> > small beer compared to other issues in the IMAP space.)
>> >
>> >> That many specs enumerate all the tokens that are valid at the time of
>> >> their writing isn't really relevant to this, as I see it.  Personally,
>> >> I think we should stop doing that *unless* we want to define something
>> >> that intentionally has no extensibility.  To me, this makes sense:
>> >
>> >>     florb-value = "true" / "false"
>> >
>> >> ...while this does not:
>> >
>> >>     florb-value = "true" / "false" / florb-ext
>> >>     florb-ext = token
>> >
>> > I agree; this gets close to the line if edging over it.
>
>> I understand. But it is widely used because it is *easy*, while still
>> leaving a path for future extension. This technique is used *widely* in
>> the ABNF of SIP.
>
> It's used in MIME as well. I really wanted to get rid of it but felt it
> represented too much of a change.

It would be hard to eliminate it from SIP. I don't think it is used in 
4566 (at least not for attributes). But people still define attributes 
in extensions as if it were. I can't get rid of it for those already 
published, but maybe for future ones.

> More generally, just because it's widely used doesn't make it a good
> idea. It
> isn't. And having worked on specifications that use it, it's anything
> but easy.
> "Lazy" is more like it.
>
>> We aren't contemplating a bis for sip any time soon. If we were, then I
>> would be looking for something better.
>
>> >> The first is clearly saying that *syntactically*, there are only two
>> >> things that can appear in a florb-value.  You can safely write a
>> >> parser that looks for those and throws a syntax error if it sees
>> >> anything else.
>> >
>> >> What on Earth is the second saying, syntactically?  I'd better write
>> >> my parser to parse it as a token.  I presume there's something else in
>> >> the text that tells me what to do with "true" and "false"
>> >> semantically, and that explains the extensibility.  What's the point
>> >> of having that in the *syntax*?
>
>> Yes, there typically is text describing the semantics of true and false,
>> and saying that anything else that matches is to be ignored unless
>> defined by an extension supported by the application doing the parsing.
>> And also accompanied by IANA considerations that set up a registry
>> florb-values.
>
>> >> This sort of thing is also fine, as I see it:
>> >
>> >>     florb-value = token ; must be a registered item, as
>> >>                         ; defined in Section 3.2.1
>
>> Yes, this works. It especially works in this case, if this is just an
>> enumeration of values. It is far less workable if instead of "token" it
>> is "byte-string", and individual values have complex sub-syntax.
>
>> An advantage of having the known alternatives shown in the abnf is that
>> it is quick to look up the base alternatives - they show up right there,
>> without need to consult the text. But you still have to consult the
>> registry for other values.
>
>> I realize this is a weak argument. The strongest argument is that is is
>> a technique that is widely used.
>
> Actually, it's no argument at all. If you want the current list of values
> in the ABNF, that's what commments are for. All the advantages of proximity
> without messing up the grammar.

Fair enough.

> I actually had a version of MIME at one point that did things that way, but
> ended up backing the change out.
>
>> >> Here we're using a comment in the syntax to point the reader to the
>> >> section that gives the semantics and explains the valid value.
>> >> References are good.
>> >
>> >> Twisting syntax specification around to try to make it go beyond
>> >> syntax is not good.
>> >
>> >> Clearly, opinions differ on this... but there's mine.
>> >
>> > The "code" I've "run" in this space says pretty clearly that complex
>> > overlapped syntaxes cause more problems than they solve. And I really
>> > don't think adding intersection capabilities to ABNF is a solution.
>
>> Let me give you another part of this story.
>
>> When defining the syntax of an extension, it is highly desirable to
>> reuse rules that are defined in the base specification. E.g., SDP has
>> definitions of:
>
>> media
>> fmt
>> proto
>> port
>> unicast-address
>> FQDN
>> token
>> integer
>> ...
>
>> To define extensions that are stylistically consistent with the rest of
>> SDP you really want to reuse these.
>
>> In specifications we typically just say something like:
>
>> token = <from RFC4566>
>
>> (Or something even less formal.)
>
>> But if you are reviewing that specification and want to formally verify
>> the ABNF of the extension syntax then you must pull the full abnf for
>> 4566 and merge it with the syntax you are verifying.
>
>> This is true whether the new definition is formally linked to the old
>> one (via =/) or not.
>
> This is an entirely separate problem IMO.

Agreed.

> We could easily define an import
> mechanism for ABNF that allows for importing from either RFCs or drafts. Do
> that and there should be no problem performing more complete checks.

I started to work on this a year or so age. But there wasn't a lot of 
interest and I got busy and dropped it.

A hard part is that for it to be valuable enough to deploy it should be 
possible to import abnf for already published RFCs that weren't 
formatted to support it. Finding a heuristic to extract just the ABNF 
from a txt RFC is hard and unreliable.

Also, people use ABNF in many ways in drafts. Sometimes it is all 
together in one place. Sometimes bits and pieces of it are interleaved 
with the text. Sometimes both - interleaved bits plus a consolidated 
syntax. And sometimes there are multiple independent abnf syntaxes in 
the same draft.

It would be easier to define this going forward, along with the 
formalizing of the xml draft format. Can identify abnf artwork in the 
xml. But we probably also need extensions to abnf itself - at least an 
include, and I think the name scoping, and naming of groupings.

That is doable. But it will be a slow start to get into use.

> What you're talking about here is more than this - you want a way to
> specify
> that these entites are combined in a fashion that's consistent with the
> base
> specification. I have no problem with doing that, but I remain to be
> convinced
> that it makes sense to define a general mechanism for doing it.

Doing so would allow the established extension style (that you don't 
like) to be formalized and made rigorous.

Is that better than convincing people to use some other style? I don't know.

	Thanks,
	Paul

_______________________________________________
abnf-discuss mailing list
abnf-discuss@ietf.org
https://www.ietf.org/mailman/listinfo/abnf-discuss