[rfc-i] v3imp #8 Fragment tagging on sourcecode

pkyzivat at alum.mit.edu (Paul Kyzivat) Wed, 28 January 2015 14:56 UTC

From: "pkyzivat at alum.mit.edu"
Date: Wed, 28 Jan 2015 09:56:31 -0500
Subject: [rfc-i] v3imp #8 Fragment tagging on sourcecode
In-Reply-To: <54C870B5.7000205@seantek.com>
References: <54C20F92.4090400@seantek.com> <54C232FC.1000604@gmx.de> <54C275BC.1040905@alum.mit.edu> <20150123175511.GI2350@localhost> <54C28E3F.4040901@alum.mit.edu> <E378C876-5217-4274-86B6-1DBFB653DE24@vpnc.org> <54C29891.6040101@alum.mit.edu> <54C3576A.9030206@greenbytes.de> <54C3BE06.8010707@alum.mit.edu> <54C3C6A3.6080003@seantek.com> <54C3CF7F.6090901@seantek.com> <54C4AFF1.6030608@gmx.de> <54C7FAD7.7040500@alum.mit.edu> <54C870B5.7000205@seantek.com>
Message-ID: <54C8F89F.9080208@alum.mit.edu>

On 1/28/15 12:16 AM, Sean Leonard wrote:
> Overall I still stand by my proposition that the RFC is the module for
> ABNF purposes. Honestly it just makes things a lot simpler. To the
> extent that you need to split things inside the RFC, you can refer to
> specific sections. Specific comments below.
>
> With regard to import rules, I can concur with Julian's comment that
> ABNF should not be extended.
>
> Instead, I propose the following *informal* definition, which is based
> on Section 2.7 in RFC 7230 (and is repeated by Julian below, in RFC 7231):
>
> localrule = <foreignrule, see [RFCXXXX], Section Y.Z>

Claiming that this is not an extension is a distinction without a 
difference. If it is well defined enough that a tool can follow it to 
obtain the needed definitions then it *is* an extension. We are only 
arguing about the syntax. (And I haven't proposed a syntax.)

This also begs the question of exactly what is imported. As I mentioned 
later, there are options, and each has an impact. So that also needs to 
be specified for tools to do the right thing.

> For example, the text in Section 2.7 of RFC 7230 says:
>
> ...
>
>     paths that begin with "//".)  A "partial-URI" rule is defined for
>     protocol elements that can contain a relative URI but not a fragment
>     component.
>
>       URI-reference = <URI-reference, see[RFC3986], Section 4.1
> <http://tools.ietf.org/html/rfc3986#section-4.1>>
>       absolute-URI  = <absolute-URI, see[RFC3986], Section 4.3
> <http://tools.ietf.org/html/rfc3986#section-4.3>>
>       relative-part = <relative-part, see[RFC3986], Section 4.2
> <http://tools.ietf.org/html/rfc3986#section-4.2>>
>       scheme        = <scheme, see[RFC3986], Section 3.1
> <http://tools.ietf.org/html/rfc3986#section-3.1>>
>       authority     = <authority, see[RFC3986], Section 3.2
> <http://tools.ietf.org/html/rfc3986#section-3.2>>
>       uri-host      = <host, see[RFC3986], Section 3.2.2
> <http://tools.ietf.org/html/rfc3986#section-3.2.2>>
>       port          = <port, see[RFC3986], Section 3.2.3
> <http://tools.ietf.org/html/rfc3986#section-3.2.3>>
>       path-abempty  = <path-abempty, see[RFC3986], Section 3.3
> <http://tools.ietf.org/html/rfc3986#section-3.3>>
>       segment       = <segment, see[RFC3986], Section 3.3
> <http://tools.ietf.org/html/rfc3986#section-3.3>>
>       query         = <query, see[RFC3986], Section 3.4
> <http://tools.ietf.org/html/rfc3986#section-3.4>>
>       fragment      = <fragment, see[RFC3986], Section 3.5
> <http://tools.ietf.org/html/rfc3986#section-3.5>>
>
>       absolute-path = 1*( "/" segment )
>       partial-URI   = relative-part [ "?" query ]
> ...
>
>
>
> Any ABNF analyzer can be "smart" enough to see that when the stuff in <>
> is formatted as <foreignrule, see [RFCXXXX], Section Y.Z>. That looks
> like an informal import directive to me.
>
> Moreover, we need to distinguish between an ABNF compiler, and an ABNF
> validator. I think that Paul is thinking of some kind of ABNF compiler,
> to compile to some other computer language. But all that is
> needed/helpful for RFC publication purposes is an ABNF validator.

My primary concern is for a validator. But I do think that definitions 
sufficient for a thorough validator will also enable extraction for use 
by a compiler.

> All an ABNF validator needs to do is make sure that all rules are
> comprised of valid ABNF primitives, or other rules that decompose into
> valid ABNF primitives. ABNF primitives are:
> 1. literals, such as %d13 and %x0D (which, incidentally, are equivalent)
> 2. rules assumed to exist (i.e., RFC 5234 Appendix B)
> 1. rules defined by <>

I disagree. It is also important to check that all references to rules 
are satisfied, and that there are no conflicts between rules.

> Such a validator is quite easy to program, and doesn't need to import
> anything from other RFCs.

Again I disagree. As I've noted before, in RAI there is a custom of 
having drafts that extend the ABNF of other drafts. I've used bap to 
verify ABNF numerous times, and I've had to do a lot of manual 
preprocessing and importing/merging to thoroughly verify that all is 
well. That is all error prone, and inclines people to not bother with 
actually doing the verification.

> On 1/27/2015 12:53 PM, Paul Kyzivat wrote:
>> On 1/25/15 3:57 AM, Julian Reschke wrote:
>>> On 2015-01-24 17:59, Sean Leonard wrote:
>>>> On 1/24/2015 8:21 AM, Sean Leonard wrote:
>>>>> First of all there is no such thing as "ABNF modules" yet--only ABNF
>>>>> grammar (combined with specification text). I recognize this
>>>>> conversation is trending to creating them.
>>>>> Providing different definitions of the same rule in the same RFC is
>>>>> reckless
>>>>
>>>> The more I thought about this, the more I would like to propose that
>>>> the
>>>> RFC itself be unit of analysis (i.e., "module").
>>>> ...
>>>
>>>
>>> I agree that it's good to formalize this somewhat, but I'm not convinced
>>> updating/extending RFC 5234 is a good idea.
>>>
>>> For instance, in the HTTP specs we use prose rules with a well-defined
>>> syntax:
>>>
>>> <http://greenbytes.de/tech/webdav/rfc7231.html#imported.abnf>
>>>
>>> This might be enough for automated checkers to do the right thing.
>>
>> I agree that we need to be careful not to extend ABNF too much, making
>> it more difficult. OTOH, the people who use ABNF are not, for the most
>> part, stupid. (Does ABNF need to be understandable to someone who
>> doesn't know at least one real programming language?)
>
> I don't find this relevant to the analysis.

Perhaps not. But it speaks to how "simple" ABNF syntax needs to be.

(It doesn't need to be simple enough for politicians to understand.)

>> The use of some symbols defined in another draft presents a
>> particularly interesting issue:
>>
>> To verify the using ABNF, you need to import at least the rule
>> defining the symbol in question. But that rule may well refer to other
>> rules in the referenced document. Should you:
>>
>> - selectively import rules that are needed, one by one, until there
>>   are no more undefined symbols?
>>
>> - OR, simply import the full set of rules from the referenced document?
>
> Neither. See how it works in RFC 7230 and RFC 7231. {localrule =
> <foreignrule, ...>}. The rule name of the foreign rule is irrelevant
> since a local rule is defined with the standard syntax.

How do you assume this "import" to work? It could be equivalent to:

localrule = foreignrule
<insert text of foreignrule>

OR

localrule = <insert right hand side of foreignrule>

Either way, if the right hand side of the foreign rule contains 
references to other rules there is more to do.

One possibility would be to import some or all of the referenced foreign 
ABNF text but rewrite it, replacing all the rulenames with obfuscated 
names that don't conflict. But that is complex, and has its own 
consequences.

When I have verified in the past, I have either imported *all* the ABNF 
from the referenced document, or else I have imported rule definitions 
one by one until I no longer have unsatisfied references. Or, in the 
case of core rules, I have simply imported all the core rules and 
nothing else from 5234.

>> Either way, there may then be conflicts between rules defined in the
>> new document and those imported from the old document. The potential
>> is greater if you have imported all the ABNF from the referenced
>> document.
>
> See above; not an issue.

Disagree - see above.

>> And this of course depends a bit on whether the ABNF in the referenced
>> document was intended to be one "module" or not.
>>
>> RFC5234 is itself an interesting case study. It includes:
>
> RFC 5234 is not interesting because it is defining "itself". It is not
> appropriate to view all of the so-called "definitions" in RFC 5234 as
> actual instances of ABNF. For example, Section 3.7:
>
> ***
>
>
>       3.7 <http://tools.ietf.org/html/rfc5234#section-3.7>. Specific
>       Repetition:
>
> nRule
>
>     A rule of the form:
>
>           <n>element
>
>     is equivalent to
>
>           <n>*<n>element
>
>     That is, exactly <n> occurrences of <element>.  Thus, 2DIGIT is a
>     2-digit number, and 3ALPHA is a string of three alphabetic
>     characters.
>
>
> ***
>
> Clearly, neither {<n>element} nor {<n>*<n>element} are intended to be
> interpreted as ABNF as-is. Marking them as <artwork type="abnf"> would
> just be incorrect.

Yes, some are not ABNF. But some are.

I agree that fragments that are not ABNF should not be tagged as type 
ABNF. (But they might be tagged as ABNF-fragment or some such.)

> Actually I think that they should be:
>
> <t>A rule of the form:<br/>
> <tt xml:space="preserve">    &lt;n&gt;element</tt><br/>
> is equivalent to<br/>
> <tt xml:space="preserve">    &lt;n&gt;*&lt;n&gt;element</tt><br/>
> That is, exactly <tt>&lt;n&gt;</tt>
> occurrences of <tt>&lt;element&gt;</tt>. Thus, <tt>2DIGIT</tt> is a
>     2-digit number, and <tt>3ALPHA</tt> is a string of three alphabetic
>     characters.</t>
>
>
> The text is crystal clear that the examples {<n>element} and
> {<n>*<n>element} are treated as quotations, which are nouns for
> grammatical purposes. I.e., the entire Section 3.7 comprises *ONE*
> paragraph. Splitting these sample verbatim text elements out into
> <figure><artwork> blocks is ludicrous. There you have it: yet another
> use case in support of Improvement #1 (fine control over spaces and line
> breaks).

I'm sure there are many ways this RFC *could* have been formatted.

If we agree on how fragmenting and importing work, then it would be good 
to revise 5234 to be consistent with that.

> If you actually go through RFC 5234 piece by piece, you will see that
> there is no conflict between rule names in Section 4 (ABNF Definition of
> ABNF) and Appendix B (Core ABNF of ABNF). But anyway, as I have already
> argued, future RFCs should consider Appendix B names already
> pre-defined, and therefore should not have any need to import RFC 5234
> parts anyway.

Agreed, there is no conflict *there*.

The question comes for those who want to reuse the core rules. Must 
users of those rules also refrain from using the names of rules in 
section 4 for their own purpose?

(I guess you don't think importing ever causes name conflicts. But until 
you explain how that can work I won't agree.)

	Thanks,
	Paul

> Respectfully submitted,
>
> Sean
>
>>
>> - a set of "Core Rules" in Appendix B. This could be viewed as one
>>   ABNF "module".
>> - a complete ABNF definition of ABNF. This could also be viewed as
>>   a separate ABNF "module", but it informally indicates that it
>>   depends (imports) the Core Rules.
>> - ABNF fragments interspersed with text, duplicating rules in
>>   both of the above.
>>
>> *Many* uses of ABNF reuse rules defined in the Core Rules. When doing
>> so, it would probably be fine to import the full set of Core Rules,
>> but it would probably be inappropriate to also import the rules
>> defining the ABNF of ABNF, and it certainly would be inappropriate to
>> also import all the fragments.
>>
>> IMO it would make sense to introduce enough new syntax to ABNF to
>> define named modules, and to specify the import of specific named
>> modules from an external document.
>>
>>     Thanks,
>>     Paul
>> _______________________________________________
>> rfc-interest mailing list
>> rfc-interest at rfc-editor.org
>> https://www.rfc-editor.org/mailman/listinfo/rfc-interest
>
>
> _______________________________________________
> rfc-interest mailing list
> rfc-interest at rfc-editor.org
> https://www.rfc-editor.org/mailman/listinfo/rfc-interest
>