Language grammar opinions

"Clive D.W. Feather" <> Mon, 08 January 2007 16:10 UTC

Received: from [] ( by with esmtp (Exim 4.43) id 1H3x4e-0001ET-PA; Mon, 08 Jan 2007 11:10:12 -0500
Received: from [] ( by with esmtp (Exim 4.43) id 1H3x4d-0001DS-P4 for; Mon, 08 Jan 2007 11:10:11 -0500
Received: from ([]) by with esmtp (Exim 4.43) id 1H3x4c-0004s5-CG for; Mon, 08 Jan 2007 11:10:11 -0500
Received: from ( []) by with ESMTP� id l08GA9Rt020741; Mon, 8 Jan 2007 16:10:09 GMT
Received: from clive by with local (Exim 3.36 #1) id 1H3x4b-000HJX-00 for; Mon, 08 Jan 2007 16:10:09 +0000
Date: Mon, 8 Jan 2007 16:10:09 +0000
From: "Clive D.W. Feather" <>
Message-ID: <>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.3i
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 82c9bddb247d9ba4471160a9a865a5f3
Subject: Language grammar opinions
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: DIscussion on state machine specification in IETF protocols <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>

Unlike my previous message, this is my opinions on the proposed language
grammar. It's mostly based on the comments in the grammar.

I prefer to list the possible declaration types in the grammar, so would
prefer the commented-out version of "declaration".

There's a parsability problem. Consider the sequence:

    alpha, bravo, charlie, delta, echo, foxtrot, golf, hotel, india,
    juliet, kilo, lima, mike, november, oscar, papa, quebec, romeo,
    sierra, tango, uniform, victor, whisky, x-ray, yankee: zulu

Only after you see the next token (semicolon or comma or arrow) do you know
if this is a declaration with an unknown type, or a transition. This
produces a shift/reduce conflict in yacc or other such parser generators.
To resolve it, I would suggest reversing the order of declarations to
*begin* with the type:

    STATE : alpha, bravo, charlie ;

I'd also make the various types be reserved words, so you can't have a
state called "message" or an action called "state".

In fact, looking further, I suggest that assignments are actually a form of
declaration and can both be written the same way:

    Title = "My example (version 2)";
    STATE = alpha, bravo, charlie;
    Initial = alpha;

and be the same concept in the grammar. Using = will eliminate some parsing

Identifiers like foo----bar: I know this looks strange, and it's possible
to write a syntax that forbids it:

    identifier = ALPHA *(["-"] 1*ALNUM)

but I'm not convinced that it's worth it. Note that ABNF itself allows both
double dashes and trailing dashes. However, I think we could usefully avoid
allowing "x----->x---" to be valid.

[Oops, omission in my last message: I defined ALNUM.]

I see no benefit in restricting the characters in double-quoted names. At
the very least, why allow comma and semicolon but not full stop and colon?
Why not allow parentheses? I would simply define it as:

    dquote-chars = SP / %x21 / %x23-7E
        ; Space and all printable characters except double quote

Clive D.W. Feather  | Work:  <>   | Tel:    +44 20 8495 6138
Internet Expert     | Home:  <>  | Fax:    +44 870 051 9937
Demon Internet      | WWW: | Mobile: +44 7973 377646
THUS plc            |                            |

Cosmogol mailing list