Re: [abnf-discuss] [art] [Technical Errata Reported] RFC7601 (5435)

"John R. Levine" <> Mon, 23 July 2018 14:49 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 96D5E130EAE for <>; Mon, 23 Jul 2018 07:49:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.001
X-Spam-Status: No, score=-2.001 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (1536-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id JSxHFg_Zgtd0 for <>; Mon, 23 Jul 2018 07:49:05 -0700 (PDT)
Received: from ( [IPv6:2001:470:1f07:1126:0:43:6f73:7461]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id BC27A130EAB for <>; Mon, 23 Jul 2018 07:49:04 -0700 (PDT)
Received: (qmail 62595 invoked from network); 23 Jul 2018 14:49:03 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=simple;; h=date:message-id:from:to:cc:subject:in-reply-to:references:mime-version:content-type:user-agent; s=f47e.5b55eadf.k1807; bh=XgqpFLCcIYAheZ4rLtues9sdaA6JLs52H3UWk0nD8oY=; b=eP089XlfPBPQPtYB3tRvyrMPAHbaADGkbbCglhi/OL2Lk2Xw394GHFjjLgfLrJ2zbkP49o/hQ1oVwNspjqrZpzrrDA8qWEC1mjMPGvwsNBXr6S+F1J/lbZkX6Q9XjDWvJmMovUUMV/sEVkpFo5GwddEN8O84DzGV/Z27G+Ms3u+L11MZRcssymC9k2bPSQG5RBaaegKqlHBBeL2BQoynHjAsbZSBxt64TQhmIlrmxIvpI8b9/W3nMTtETBf1Cy6S
Received: from localhost ([IPv6:2001:470:1f07:1126::78:696d:6170]) by ([IPv6:2001:470:1f07:1126::78:696d:6170]) with ESMTPS (TLS1.2/X.509/AEAD) via TCP6; 23 Jul 2018 14:49:02 -0000
Date: 23 Jul 2018 10:49:02 -0400
Message-ID: <alpine.OSX.2.21.1807231034060.26353@ary.qy>
From: "John R. Levine" <>
To: "Peter Occil" <>
Cc: "" <>, "Murray Kucherawy" <>, "" <>, "Alexey Melnikov" <>, "" <>, "" <>
In-Reply-To: <>
References: <> <alpine.OSX.2.21.1807222152380.18947@ary.qy> <> <alpine.OSX.2.21.1807222231200.19369@ary.qy> <>
User-Agent: Alpine 2.21 (OSX 202 2017-01-01)
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset=US-ASCII
Archived-At: <>
Subject: Re: [abnf-discuss] [art] [Technical Errata Reported] RFC7601 (5435)
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: "General discussion about tools, activities and capabilities involving the ABNF meta-language" <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 23 Jul 2018 14:49:12 -0000

On Mon, 23 Jul 2018, Peter Occil wrote:
> You have pointed to an unexpected weakness of ABNF.  As a result, it's 
> unfortunate that many ABNF productions in RFCs can't generally be used 
> as is in order to build a parser, since ABNF itself has no strong 
> concept of matching.

Of course they can, if you have adequate parser tools.  I shamelessly 
recomemnd "flex & bison", published by O'Reilly.  See the discussion of 
GLR parsers on pages 230-231.  Personally I'd refactor them to LL(1) 
or LR(1) but tastes differ.

More to the point, the IETF has used ABNF in its current form for over 20 
years, and it's not going to change now.  Its purpose is to specify valid 
syntax, even if the grammar parses that syntax ambiguously.  So please do 
send errata like the one on Received-SPF, where the ABNF didn't match the 
actual syntax, but if it's correct but ambiguous, that's not a bug.

Also, having written my share of compilers (try fitting a Fortran-77 front 
end into 48Kb on a PDP-11) I can assure you that I never was able to build 
a parser from an unmodified reference grammar.  You always need to 
refactor it.

> For example, RFC 5234 sec. 3.2 is silent on which order alternative productions are to be matched (although this silence can be useful, I admit, in order for the incremental alternatives feature, sec. 3.3, to work).  The production "received-token" in RFC 5322 is illustrative:
>   received-token  =   word / angle-addr / addr-spec / domain
> Both "word" and "domain" include the production "atom".  Thus, whether a received-token parser parses an "atom" as a "word" or as a "domain" depends on the context in which the "atom" appears, the kind of parser (greedy or not), or whether earlier alternatives take precedence over latter alternatives or vice versa, none of which is defined in RFC 5234. "Obviously", if the "atom" is part of a domain, then the received-token should be the whole "domain" regardless of the details just mentioned.