Re: [abnf-discuss] ABNF colloquialism for end-of-line

Matthew Kerwin <matthew@kerwin.net.au> Thu, 16 November 2017 02:18 UTC

Return-Path: <phluid61@gmail.com>
X-Original-To: abnf-discuss@ietfa.amsl.com
Delivered-To: abnf-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 414FF1241F3 for <abnf-discuss@ietfa.amsl.com>; Wed, 15 Nov 2017 18:18:07 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.15
X-Spam-Level:
X-Spam-Status: No, score=-2.15 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FORGED_FROMDOMAIN=0.199, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Ju6CTMucevz8 for <abnf-discuss@ietfa.amsl.com>; Wed, 15 Nov 2017 18:18:05 -0800 (PST)
Received: from mail-it0-x235.google.com (mail-it0-x235.google.com [IPv6:2607:f8b0:4001:c0b::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4FFD9120227 for <abnf-discuss@ietf.org>; Wed, 15 Nov 2017 18:18:05 -0800 (PST)
Received: by mail-it0-x235.google.com with SMTP id y15so4157985ita.4 for <abnf-discuss@ietf.org>; Wed, 15 Nov 2017 18:18:05 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-transfer-encoding; bh=0InDkWmBW1LpuAgYGGNahAGdXwyMAlM/fCU7Vk1UpK0=; b=IklPzlptat7W0X9BGkZZUbc8XQZlmnqWSa2FDxR+iHHcxGl6QUNsIxcIPFT2V9Vp2B X4CnH+ctvFm9MUR6wiQ+VGueDj7H3Qq+pR/elFhbVuUUo+qpuB6FiFkL7kFZbt5d7DeP qiXQfcacnjAbixxDG1ZnaRrRfbD+DSNRajhr06Ci46SAHQDC12DNsM39GByD+1uoQWlq xOEeYIxImbB7zI6Xl0kMtDSAebZnYIK1yduLmCy+who2yOGbRkwliTndFQDkVJJvOdDr pLXHyPNamfPxp2OrjVL2U2BQl3lsz+U783ntvEQOk3hxH9qar4u5IYGtc82LsQXUiQ2z uKVA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc:content-transfer-encoding; bh=0InDkWmBW1LpuAgYGGNahAGdXwyMAlM/fCU7Vk1UpK0=; b=IoRBYfDStkFLI6IiJPz6yeCa8FT7YYeNZgUtZRAfnhddj27KDC5yvE0Y0VuFyfJ+OG pmQ4J7ZV1E+pNNMAYcsTnlpoj4VAZwukLTVS86oHicyhE30qqyVvt0cRIMDE5EaCkBXU PR2lczLK18QPWIiigMh1oTMX6Drj41E96ZPk4TmYLyPVDIfPjrByB090j/vvBv8IrI96 Uk3srKRE/2keZ9vSpE0S0NqzjBBtR1u+98Jw7R4MD6Vt0uK19kq/MEv5YBbh7Czp+rHB xvbLa4uNcCQCbDC/bYyThkLhDU0NHkDbWAudagnVA7lsFGWbJVTbZWCAFl4FeXrPdTTb k4+g==
X-Gm-Message-State: AJaThX6yNtV5LI3erwQOTOn1X2ySbzpNKN01qvUD4mCvm1dSzk8hwX0o Gp2ReQ1YVqJxik66019flmXIJEUPTfS35qSSlLYQipWd
X-Google-Smtp-Source: AGs4zMZF3InFA6pjPt5EgAafdqoZYXVX92q/ADhbBE7vvaqhCJJGmtVtLJlLl+b4HArBo6XD3qikCOKM/aGR5DLb7wc=
X-Received: by 10.36.252.68 with SMTP id b65mr585465ith.151.1510798684561; Wed, 15 Nov 2017 18:18:04 -0800 (PST)
MIME-Version: 1.0
Sender: phluid61@gmail.com
Received: by 10.107.183.130 with HTTP; Wed, 15 Nov 2017 18:18:04 -0800 (PST)
In-Reply-To: <2329FFC4-0791-4B70-A8D2-E6FC33F6C483@seantek.com>
References: <97E6D6C0-7010-46D6-8641-670F10A2504C@seantek.com> <3fbd228d-c6cf-be73-c7f2-f6b15979b852@gmail.com> <2329FFC4-0791-4B70-A8D2-E6FC33F6C483@seantek.com>
From: Matthew Kerwin <matthew@kerwin.net.au>
Date: Thu, 16 Nov 2017 12:18:04 +1000
X-Google-Sender-Auth: 2M7oJ8GkzZ9tyM-DdFIL20Bydi8
Message-ID: <CACweHNAs+HioKXokQa9vx4rSsK7QHHGkMmpD3_DSctLPuhdJAA@mail.gmail.com>
To: Sean Leonard <dev+ietf@seantek.com>
Cc: Dave Crocker <dcrocker@gmail.com>, ABNF-Discuss <abnf-discuss@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/abnf-discuss/P5GTRpcwd4Hf44-a7u2EeuTUewE>
Subject: Re: [abnf-discuss] ABNF colloquialism for end-of-line
X-BeenThere: abnf-discuss@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "General discussion about tools, activities and capabilities involving the ABNF meta-language" <abnf-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/abnf-discuss>, <mailto:abnf-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/abnf-discuss/>
List-Post: <mailto:abnf-discuss@ietf.org>
List-Help: <mailto:abnf-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/abnf-discuss>, <mailto:abnf-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 16 Nov 2017 02:18:07 -0000

Hi Sean,

[trimming a bunch]

On 16 November 2017 at 09:35, Sean Leonard <dev+ietf@seantek.com> wrote:
>
>> On Nov 15, 2017, at 11:37 PM, Dave Crocker <dcrocker@gmail.com> wrote:
>>
>> But I gather you /do/ want a rule that covers both CRLF and LF.
>>
>> So define one.
>>
>> For example:
>>
>>   cbor-nl:  crlf / lf
>>
>> Is there some problem with using this?
>
> My opinion is that newline === CRLF in IETF Internet standards, and since ABNF is an Internet standard, the concept of newline === CRLF. In a local operating environment, one can convert CRLF to whatever else. For example, the C standard library treats \n (aka LF) as the end-of-line marker, and converts appropriately on ingress/egress. Ditto for HTML/XML. (This is congruent with your first paragraph on simplicity.)
>
> I understand why people want to define EOL = CRLF / LF; however, ABNF producers such as abnfgen will (probably) randomly output sample data with mixed line endings. The author’s intent of EOL = CRLF / LF is to say that a format (rather than a wire protocol) ought to accept either CRLF or LF as line endings in a file, but it has the effect of saying that CRLF/LF can be randomly interspersed in the same PDU. (And, in so doing, it privileges LF but denies CR, or NEL, or LS / PS, which don’t have anything better or worse for them going when it comes to line endings.)
>

[aside]
... except that they don't have 7-bit ASCII codes (NEL collides with U+2026).
[/aside]

>
> In my own RFC 7468, I said:
>  eol        = CRLF / CR / LF
>  eolWSP     = WSP / CR / LF
>  W          = WSP / CR / LF / %x0B / %x0C ; whitespace
>
> and while I don’t view that as a mistake necessarily, I’m just pointing out that probably a better way to handle the issue would have been to say that CRLF *is* the end-of-line marker, and that when ingressing/egressing data, to treat whatever the local convention is, as if it’s CRLF in the ABNF.
>

An "end-of-line marker" is different from a "newline character," so if
you want a 'best' colloquialism I'd say some variation of "newline" is
the way to go.  But there's no standard definition of "newline" in any
core ABNF specs, because there's no standard definition outside of
ABNF.

If authors want to write rules for parsing a byte sequence into
tokens, defined in ABNF, and they want to allow Windows or UNIX or
Apple][ newlines, I'm completely with your and Dave's `CRLF / LF /
CR`-type of definitions.  If they want to use reuse the same ABNF for
*generating* byte sequences, and want to mandate/suggest/implore that
implementers use one or other as the canonical form, can they not do
that by talking to the human reader?

~~~
   foobar  = freb *( blat ) foobar-newline
   foobar-newline = ( CRLF / LF / CR )  # when parsing liberally
                 / CRLF    # when generating conservatively
~~~

I think that's the point you're making... but I don't quite get why
there's any source for confusion.  As a wild stab, is it because of an
over-reliance on tools to turn ABNF into running code?

>
> One reason to have a uniform ABNF for line endings (i.e., CRLF) is inter-specification reuse. For example, one protocol such as URI [RFC3986] has ABNF productions that are reused throughout many other specifications, including specs that are CRLF oriented (mail...) and specs that are just LF oriented (HTML). URIs do not include line breaks, but if you want to reuse a production that includes line breaks in one RFC, but the line ending specs don’t match, then it will be a problem.
>

[aside]
HTML isn't particularly LF-oriented, unless you consider &#10; vs
&#13; a significant skew.

<https://www.w3.org/TR/html5/syntax.html#newlines>:
~~~
Newlines in HTML may be represented either as "CR" (U+000D)
characters, "LF" (U+000A) characters, or pairs of "CR" (U+000D), "LF"
(U+000A) characters in that order.
Where character references are allowed, a character reference of a
"LF" (U+000A) character (but not a "CR" (U+000D) character) also
represents a newline.
~~~
[/aside]

But yeah, I think there's a good reason CRLF is called "CRLF".
There's nothing stopping someone from minting a compelling and useful
"NEWLINE" ABNF rule that ends up being widely adopted, aside from
natural forces.

Cheers
-- 
  Matthew Kerwin
  http://matthew.kerwin.net.au/