Re: [I18ndir] I18ndir early review of draft-ietf-dispatch-javascript-mjs-07

John C Klensin <john-ietf@jck.com> Sat, 09 May 2020 04:01 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F32FC3A00AD for <i18ndir@ietfa.amsl.com>; Fri, 8 May 2020 21:01:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Tx6dgLiChKXv for <i18ndir@ietfa.amsl.com>; Fri, 8 May 2020 21:01:51 -0700 (PDT)
Received: from bsa2.jck.com (bsa2.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6B6853A0064 for <i18ndir@ietf.org>; Fri, 8 May 2020 21:01:51 -0700 (PDT)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1jXGgO-000MEF-4h; Sat, 09 May 2020 00:01:48 -0400
Date: Sat, 09 May 2020 00:01:42 -0400
From: John C Klensin <john-ietf@jck.com>
To: Barry Leiba <barryleiba@computer.org>, John Levine <johnl@taugh.com>
cc: i18ndir@ietf.org
Message-ID: <6F916805FF734CB450A3C724@PSB>
In-Reply-To: <791ca602-758e-cb0f-a1a4-8fb6b74a8b61@outer-planes.net>
References: <158896904545.17044.5288882047334991439@ietfa.amsl.com> <CALaySJ+CRJumYtDCxvGsSwzanz4y=7icuqd+toc0wMivf-mJGg@mail.gmail.c om> <CAD7Fb3diej1-3fAgqZsS_E9wOs1KC=OwVWbvxV5mVjOdQEQm5g@mail.gmail.com> <791ca602-758e-cb0f-a1a4-8fb6b74a8b61@outer-planes.net>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/bWoDQBu-U30rV6m8Ov4zhldYrr8>
Subject: Re: [I18ndir] I18ndir early review of draft-ietf-dispatch-javascript-mjs-07
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 09 May 2020 04:01:55 -0000

Hi.

Given that I felt a bit attacked for stepping in earlier
(although I'm obviously pleased that John incorporated some of
my comments in his review), a few comments that can be passed
along or not as people prefer....

--On Friday, May 8, 2020 18:00 -0600 "Matthew A. Miller"
<linuxwolf+ietf@outer-planes.net> wrote:

> Thank you very much for the review, John.  This is very
> helpful, and will work on changes to resolve the concerns.
> 
> Regarding the encoding:
> 
> UTF-8 is strongly encouraged, and required for a source to be
> a module.  Preferring BOM sniffing over a charset declaration
> is the result of general purpose browsers having to deal with
> a myriad of misconfigured servers; this being the defense
> mechanism that permitted the best interoperability.
> 
> The part about "longest matching octet sequence" is a missed
> remnant from the original document which included UTF-32BE/LE
> entries.  As UTF-32BE/LE is no longer found in the wild*,
> those entries were removed but I missed cleaning up that
> sentence.

An observation with the understanding that I'm not advocating
for UTF-32, in either ordering, and especially not on the wire.
First, until Unicode deprecates it (which, AFAIK, they haven't
done), "no longer found in the wild" is a rather bold statement
(analogies to the recent IETF list discussion about POP2 come to
mind) and rather different from "no current or recent version of
any browser with which the authors are familiar accepts and
supports UTF-32 encodings" (which is what "Web browsers haven't
accepted UTF-32 encodings of JavaScript for quite a long while"
almost certainly means).   I have no information about
Javascript specifically, but I'm aware of several systems that
still use UTF-32 for internal processing because, if one does
not have storage space limitations, there are significant
advantages of having exactly the same number of octets (four)
for each code point regardless of where it falls in the Unicode
tables and hence without the need to pack and unpack UTF-8 or be
aware of and process surrogates.   

More important, my understanding of the general practices in the
IETF for many years is that we don't deprecate features that may
once have been accepted and used, even if not "for quite a long
while") by simply removing them and assuming that readers will
observe their absence and make the appropriate inferences,
whatever those might be.  I see nothing in either the body of
the document nor in Appendix B that makes it explicit that
support for UTF-32 has been dropped.   I think such a comment
should be added and explained, even if the explanation is that
no one significant is accepting it any more and it is therefore
NOT RECOMMENDED and being dropped from the spec.


 
> Implementations today do ignore the BOM once the character
> encoding is worked out, so this was portion from Section 4.2
> was kept to reflect that.  I'm not sure what changes to make
> here regarding that, although the ending "to source text" is
> another missed remnant from my editing that will be removed.
> 
> Section 4.2 still includes step #3 to deal with the (in
> practice quite common) case of a missing BOM and the media
> type missing a charset parameter.  There are also too many
> servers that set this to "ISO-8859-1" without otherwise
> examining the sources being served. We'll make it clearer this
> is a default/fallback case.

Noting (again) that, absent a BOM (or even with one) UTF-8
cannot be reliably distinguished from ISO-8859-X (for any
registered or unregistered value of X), I don't know what the
default/fallback case actually is or, more generally, what  the
above paragraph means.  Unless I have misunderstood something
important, the reality is that, if there is anywhere on the
Internet that a web browser or server (including decades-old
embedded servers) treat ISO/IEC 8859-1 as either the default or
legitimate, then there are two possibilities: accurate labeling
of the charset in use or use of heuristics that, by their nature
and the nature of possible CCSs (not just encoding schemes), may
fail.   That calls for at least a health warning in the
document, not proceeding as if the heuristics are foolproof.

best,
   john

p.s. Noting a recent unpleasant discussion on the IETF list and
that the review is being handled as an individual one rather
than as a conclusion by the directorate, if any ideas or text
are taken from this or my prior comments on the draft, I expect
that, consistent with BCP 78, those comments will be attributed
so they can be properly acknowledged in the I-D.