Re: [dispatch] [media-types] 3rd WGLC - draft-ietf-dispatch-javascript-mjs - deadline 10th May

John C Klensin <john-ietf@jck.com> Wed, 19 May 2021 18:40 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: dispatch@ietfa.amsl.com
Delivered-To: dispatch@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6A0DB3A1ABC; Wed, 19 May 2021 11:40:33 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id a3ZMaUngoA-D; Wed, 19 May 2021 11:40:28 -0700 (PDT)
Received: from bsa2.jck.com (ns.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6E2EC3A1AB9; Wed, 19 May 2021 11:40:28 -0700 (PDT)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1ljR7L-000KPA-2b; Wed, 19 May 2021 14:40:27 -0400
Date: Wed, 19 May 2021 14:40:21 -0400
From: John C Klensin <john-ietf@jck.com>
To: John Levine <johnl@taugh.com>, media-types@ietf.org, dispatch@ietf.org
Message-ID: <DD466305C0BB2D9F6E4DF210@PSB>
In-Reply-To: <20210519164447.ECC5B82C752@ary.qy>
References: <20210519164447.ECC5B82C752@ary.qy>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/dispatch/cCd3pCtC2cSV6oxcUtP2MLe1WjQ>
Subject: Re: [dispatch] [media-types] 3rd WGLC - draft-ietf-dispatch-javascript-mjs - deadline 10th May
X-BeenThere: dispatch@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DISPATCH Working Group Mail List <dispatch.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dispatch>, <mailto:dispatch-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dispatch/>
List-Post: <mailto:dispatch@ietf.org>
List-Help: <mailto:dispatch-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dispatch>, <mailto:dispatch-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 19 May 2021 18:40:34 -0000


--On Wednesday, May 19, 2021 12:44 -0400 John Levine
<johnl@taugh.com> wrote:

> It appears that Martin J. Dürst <duerst@it.aoyama.ac.jp> said:
>>> Note that this language is not new to this draft; it's
>>> inherited from RFC4329 which this document supersedes.
>> 
>> It is indeed somewhat difficult to understand for people who
>> are not  familiar with the BOM, and autodetection proceedures.
>> 
>> I'd suggest changing:
>> 
>> If this step determines the character
>> encoding scheme, the octet sequence representing the Unicode
>> encoding form signature MUST be ignored when decoding the
>> binary source text.
>> 
>> to something such as:
>> 
>> If this step determines the character
>> encoding scheme, the octet sequence representing the Unicode
>> encoding form signature is not part of the actually decoded
>> binary source text.
> 
> While I agree with your concern, and have given up trying to
> figure out how the Javascript world got so broken that it
> cannot keep its encodings straight without having to sniff the
> contents of every message, I don't think we should change the
> language. It's been there for a long time, everyone wno needs
> to understand it knows what it means, and if we change it,
> we'll be in for a round of "why did they change it and what do
> I have to do different."

John,

While I mostly agree -- Martin is right, but it is not clear
whether making this correct would cause more harm by creating
confusion than it would be worth -- there is a third (and more
drastic) way to look at this.  Content-sniffing and heuristics,
rather than properly marking up text and strict observance of
media types, ultimately just lead to other problems down the
line.  The specifics are different, but a review of why we wrote
RFC 6082, deprecating the Unicode language tagging of RFC 2482,
is relevant here.  Sniffing, simplistic tagging, and so on do
not work reliably unless assumptions are made about assorted
external details and those assumptions are actually correct.
The crude example in this case is that sniffing for Unicode
encoding types works if the underlying CCS is guaranteed to be
Unicode.  If it is something else, all bets are off.

I could make comments about brokenness even stronger than yours
about the choice of file name extensions to determine details of
data types.  I thought the Internet community gave up on that
decades ago.

I've been trying to ignore this thread, but this exchange is
sort of a last straw and it seems it is time to raise more basic
questions.

So some suggestions wrt this document if it is really to go
forward in the IETF:

(1) Martin's suggestion or something like it is an improvement,
but the reason for the change should be clearly explained.  The
style of explanations in Appendix B, many of which are of the
form "We changed X" or "Updated Y to make it better" are
inadequate in that regard without an explanation as to why
and/or the implications of the change.  Otherwise, while some
are more obvious than others, many of them have the potential to
create the type of confusion you identify.

(2) To be clear, similar comments apply to almost every bullet
point in Appendix B: the changes require explanation.

(3) If this document, as suggested by both discussions in this
thread and the "... updates ... to reflect existing usage on the
Internet" introductory text is apparently the result of IETF's
opinion of actual practice in the javascript community, practice
that has diverged enough from what RFC 4329 specified to justify
such a revision.  As an Informational spec expressing our
summary of what is happening elsewhere and clearly not under our
control, the use of normative BCP 14 language is entirely
inappropriate both conceptually and because the very need for
the document is evidence that the relevant implementation and
operational communities are not paying attention.  Phrases like
"MUST foo" should be replaced by ones more like (relying on RFC
8174) "you must do X because anything else will lead to <bad
thing>".

And, if that is too much work or this is the opinion of the
authors or the javascript community about common practdice and
not the IETF, then I recommend either of two other options:

(i) Rewrite the introductory material to indicate that it is the
opinion of the authors and anything they cite.  Then Dispatch
this to the tender mercies of the ISE because, if the IETF can
merely report on actions of others rather than influencing
results, that is not a useful business for us to be in.  If the
IESG then wants to obsolete 4329 and point to the new document
for explanation, nothing prevents that, especially given that
both would be Informational rather than standards track.

or

(ii) Let ECMA/TC39 adopt this document, review, and publish it
as part of the ECMAScript collection.  Then produce a very short
RFC that obsoletes 4329, points to the ECMA document, and adds
whatever information about the reasons for the change seem
helpful.

Grump.
    john