Re: ABNF sets and sequences

Keith Moore <moore@cs.utk.edu> Mon, 10 July 2000 04:39 UTC

Received: from cs.utk.edu (CS.UTK.EDU [128.169.94.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id AAA16259 for <drums-archive@odin.ietf.org>; Mon, 10 Jul 2000 00:39:55 -0400 (EDT)
Received: from localhost (daemon@localhost) by cs.utk.edu with SMTP (cf v2.9s-UTK) id AAA22146; Mon, 10 Jul 2000 00:39:36 -0400 (EDT)
Received: by cs.utk.edu (bulk_mailer v1.13); Mon, 10 Jul 2000 00:39:34 -0400
Received: by cs.utk.edu (cf v2.9s-UTK) id AAA22121; Mon, 10 Jul 2000 00:39:33 -0400 (EDT)
Received: from astro.cs.utk.edu (marvin@localhost) by cs.utk.edu with ESMTP (cf v2.9s-UTK) id AAA22101; Mon, 10 Jul 2000 00:39:32 -0400 (EDT)
Received: from astro.cs.utk.edu (128.169.93.168 -> ASTRO.CS.UTK.EDU) by cs.utk.edu (smtpshim v1.0); Mon, 10 Jul 2000 00:39:32 -0400
Received: from astro.cs.utk.edu (LOCALHOST [127.0.0.1]) by astro.cs.utk.edu (cf 8.9.3) with ESMTP id AAA05463; Mon, 10 Jul 2000 00:39:30 -0400 (EDT)
Message-Id: <200007100439.AAA05463@astro.cs.utk.edu>
X-URI: http://www.cs.utk.edu/~moore/
From: Keith Moore <moore@cs.utk.edu>
To: Dave Crocker <dcrocker@brandenburg.com>
cc: Keith Moore <moore@cs.utk.edu>, drums@cs.utk.edu
Subject: Re: ABNF sets and sequences
In-reply-to: Your message of "Mon, 10 Jul 2000 11:42:21 +0900." <4.3.2.20000709223011.00bfebf0@mail.bayarea.net>
Date: Mon, 10 Jul 2000 00:39:30 -0400
Sender: moore@cs.utk.edu
List-Unsubscribe: <mailto:drums-request@cs.utk.edu?Subject=unsubscribe>

> At 12:56 AM 7/9/00 -0400, Keith Moore wrote:
> >having this notation will not significantly simplify or clarify
> >the specification for Internet mail.
> 
> 1.  The construct that is at issue cannot reasonably be specified in 
> current ABNF.

correct.  languages like ABNF have limitations.  they don't specify
semantics either, but that doesn't prevent them from being useful.

> 2.  The construct occurs regularly, such as for 822-like headers, labeled 
> parameters in Received headers, etc.  This is not a small or occasional issue.

the construct you suggest will not allow ABNF to specify the rules for
822 headers with any greater deal of precision.

you're trying to make a grammar do a job that it cannot do well.
language specialists have understood for years that replacement grammars
have inherent limitations - they cannot fully describe arbitrary languages.
such limitations are well documented in the literature.  this doesn't 
prevent grammars from being useful as part of specification languages, 
but it's a mistake to try to get them to do things that they cannot do well.

for instance -and this is a fairly classic example - most programming
languages require that a variable be declared before it can be used.
it turns out that (if there are arbitrary number of possible variable
names) it is impossible to write a replacement grammar that enforces 
this rule.

the problem you are trying to solve with ABNF is essentialy the same problem 
if the list of valid tokens that can appear "at most once" is extensible.

now, can you design a notation that says "you can specify each of these
tokens at most once"?  of course you can.  but it's not a simple change
to ABNF - it doesn't mesh well with ABNF-style notation.  

> 3.  Putting the qualifier in the prose is a good way a) to get missed, or 
> b) not to get specified.

understood.  but trying to do it in ABNF is an even worse fit.

> Absence of this, for in RFC 1894, is what re-triggered my interest in this 
> topic.  One might argue that misunderstanding this relaxation of the 
> apparent syntactic requirement for ordering is "merely" a matter of 
> acculturation -- one must be fully plugged in to the protocol development 
> community.  That is a requirement that has scaling problems.

in hindsight, 822 should have used 

* ( thing )

thing = (thing1 / thing2 / thing3 )

rather than

[ thing1 ] [ thing2 ][ thing3 ]

822 tried to specify things with ABNF that ABNF (and any replacement grammar)
really cannot specify well.

> While few implementation errors can be documented by this unfair 
> requirement, growth of the Internet should encourage us to make common 
> constructs easier to specify and harder to miss.

I'm all for trying to reduce the number of misunderstandings in a 
specification, but we cannot do this if we misunderstand the limitations
of our specification language.

> >Now it may be that there is some other justification for this notation.  I 
> >just don't know what it is..
> 
> The working group felt it was worth adding earlier.  We just couldn't 
> figure out how.  I now believe that was because we tied it to alternation 
> (a / b) rather than concatenation (a b) and repetition, where it belongs.

I now believe that it's because you are trying to do something with ABNF 
that it cannot do well.

if you really want to specify this more formally than in prose, you need
a different kind of notation than a replacement grammar.

> In any event, the point you raise reduces to whether the community thinks 
> it is useful to have the construct in ABNF.  I do; you don't.  Others will 
> decide.

what seems useful may still not be feasible.
a perpetual motion machine would presumably be useful also.

> > > >2. unlike the rest of ABNF, such a syntax would not easily be
> > > >translated into something that a parser generator could
> > >
> > > ASN.1 parsers handle it just fine.
> >
> >I fail to see how this is relevant.
> 
> See below.
> 
> 
> >ASN.1 is a language for describing the structure of data
> 
> as is ABNF.

yes, but you were responding to my comment that the ABNF extension
you proposed would not be easily translated to a parser generator.
so your comment about ASN.1 in that context was a red herring.

> >(BER maps that structure onto a presentation), but ASN.1
> >is not a language for writing a parser for some already-defined
> >presentation of data.  You can write an RFC 822 parser in yacc,
> 
> yacc is the programming language, not ASN.1.

yacc is a programming language for writing parsers.
you can more-or-less translate ABNF into yacc, or similar languages.
you cannot translate ABNF into ASN.1
 
> >but you can't write one in ASN.1
> 
> asymmetric comparison:
> 
>          <822 headers, or whatever> : abnf : yacc
> 
> should be compared with:
> 
>          <822> : asn.1 : <???>

doesn't make sense.  you can't describe 822 with asn.1

> >ASN.1 is not widely regarded as a particularly good notation for
> >describing protocols.   one of ASN.1's many mistakes is trying
> 
> It is used.  It works.  Some like it; some don't.  That's all fine, but 
> entirely irrelevant.
> 
> You made a specific claim of non-implementability.  I cited an existence 
> proof to contradict your specific claim.

uh, no.  you cited a red herring.


> > > >3. notation that isn't used very often actually hinders readability
> > >
> > > The motivation for this is that it IS needed with regularity, namely for
> > > every list of labeled parameters.  Such lists are ALWAYS 
> > unordered.  That's
> > > the reason for the labels.
> >
> >in many of these cases (e.g. any time the list of labels is extensible)
> >it's a bad idea to specify the parameter labels as literals anyway.
> >
> >in other words, it's bad style to write
> >
> >         list = *( "thing1" / "thing2" / "thing3" / extension-thing )
> 
> You keep using the alternation construct.

yes, because that's the best way to describe this in a grammar.

> 
> Let's get specific about how the construct probably should be used:
> 
> RFC 1894:
> 
> >2.2 Per-Message DSN Fields
> >
> >    Some fields of a DSN apply to all of the delivery attempts described
> >    by that DSN.  These fields may appear at most once in any DSN.  These
> >    fields are used to correlate the DSN with the original message
> >    transaction and to provide additional information which may be useful
> >    to gateways.
> >
> >      per-message-fields =
> 
>          SET (
> 
> >           [ original-envelope-id-field CRLF ]
> >           reporting-mta-field CRLF
> >           [ dsn-gateway-field CRLF ]
> >           [ received-from-mta-field CRLF ]
> >           [ arrival-date-field CRLF ]
> >           *( extension-field CRLF )
> 
>          )


no, this doesn't work, because you want to be able to have more than
one extension-field, and you want to be able to freely mix these
fields in any order with the other fields.  

you're notation tries to change the meaning of * and [ ] within the SET
construct.  this is a lot more confusing than just writing
"can occur at most once" in prose.

> and in draft-ietf-drums-msg-fmt-07.txt:
> 
> >name-val-list   =       [CFWS] [name-val-pair *(CFWS name-val-pair)]
> 
> becomes
> 
> >name-val-list   =       [CFWS]  SET [name-val-pair *(CFWS name-val-pair)]
> 
> 
> and
> 
> >fields          =       *(trace
> >                           *(resent-date /
> >                            resent-from /
> >                            resent-sender /
> >                            resent-to /
> >                            resent-cc /
> >                            resent-bcc /
> >                            resent-id))
> >                         *(orig-date /
> >                         from /
> >                         sender /
> >                         reply-to /
> >                         to /
> >                         cc /
> >                         bcc /
> >                         message-id /
> >                         in-reply-to /
> >                         references /
> >                         subject /
> >                         comments /
> >                         keywords /
> >                         optional-field)
> 
> becomes something like:
> 
> >fields          =
> 
>                  SET (
> >                           *(trace
> >                           *(resent-date /
> >                            resent-from /
> >                            resent-sender /
> >                            resent-to /
> >                            resent-cc /
> >                            resent-bcc /
> >                            resent-id))
>                  )
>                  SET (
> >                         *(orig-date /
> >                         from /
> >                         sender /
> >                         reply-to /
> >                         to /
> >                         cc /
> >                         bcc /
> >                         message-id /
> >                         in-reply-to /
> >                         references /
> >                         subject /
> >                         comments /
> >                         keywords /
> >                         optional-field)
>                  )
> 

you're still enforcing ordering here that shouldn't be there; you're 
insisting that trace and resent fields occur before other fields.
(or did you intend this?  it's not consistent with current usage)

and the above notation would still allow arbitrary numbers of
each field to appear - because SET  ( *(foo) )  means that
*(foo) can occur at most once - but *(foo) means that foo can 
occur zero or more times.  so SET ( *(foo) ) is equivalent to *(foo). 

so no, in addition to this notation being incompatible with rewrite
grammars, it doesn't do what you think it does either.

> 
> (small point of curiosity, for Pete or whoever:  the current syntax for 
> <message> seems to mean that message headers are either fields or 
> obs-fields, but not a combination?)
> 
> Anyhow, folks, I would appreciate some technical feedback on this. 

you've gotten some.  

Keith