Re: [I18ndir] I18ndir early review of draft-ietf-dispatch-javascript-mjs-07

Asmus Freytag <asmusf@ix.netcom.com> Sat, 09 May 2020 04:21 UTC

Return-Path: <asmusf@ix.netcom.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 946D13A0366 for <i18ndir@ietfa.amsl.com>; Fri, 8 May 2020 21:21:43 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.996
X-Spam-Level:
X-Spam-Status: No, score=-1.996 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=ix.netcom.com; domainkeys=pass (2048-bit key) header.from=asmusf@ix.netcom.com header.d=ix.netcom.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ppFxelF9yjaI for <i18ndir@ietfa.amsl.com>; Fri, 8 May 2020 21:21:40 -0700 (PDT)
Received: from elasmtp-masked.atl.sa.earthlink.net (elasmtp-masked.atl.sa.earthlink.net [209.86.89.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 508BA3A0365 for <i18ndir@ietf.org>; Fri, 8 May 2020 21:21:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ix.netcom.com; s=dk12062016; t=1588998100; bh=qB9HJ4Cvf4iYQ2YGQevTqSfKuoJK46HUL/s/ cHFLBts=; h=Received:Subject:To:References:From:Message-ID:Date: User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Language: X-ELNK-Trace:X-Originating-IP; b=F/EjMKUDk78qnXinI278hmOgb5LVqVoWq yFFLjBlxAcETNC5HcCvJkLoBxLWdwgka7NpsGoP1KxRyT8Xx1cxzm5hZpzB4UrOBX71 qJPJ2Uq21P2Q0a5mcsPWZ5PyljtNUa0er9dckhLwiEfEs/ntI0Jnv81oaLuAZI7wGgi s8VBaGvCT2JftZsF++xrbvBo6oxsA67v+Cgak/Rsait1CK5a5OZUJMgLlI0NIDTOafs NX8OD21SqwTyu4Xc8uu46zW0AdPWNULpXMx8VE+ooRRmV3RBCNMXupP5KBbdGEFVLmU SkMQIkDNoh4jpX2MVezp2YS4sv5exyTOQgn/TzSzw==
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dk12062016; d=ix.netcom.com; b=MBnvCfEz/QQ4QnsgaKUS4D2pHWe+0/XAaambzfrUZUiGUiYK8nRlJKrfqyZGlp3gt7KoDAaSr+DKZD6WGLfl4/6LBIh02zlEf143wCNcPhVFis9iPYRmxkszGkiez1qUjkNyBd9DcHzArJHohI5g3kqDNtRqESQ7aF2VrpY1e2XOFbShRjqEAnACxPOXTwoDnAzY9vSNsD+waO27g9X72N9wHPBJ9C3zXVQQA0YDhfiyvuzyD89GkPZZn0GlNIALbYsQVC+x91ZD+PvdshJD6OmjWsOjRJ/4womWkaYzCSDrWLv4F4maAa8qj3XWlvr6aCQPx//5rdh+4bi9J0eZog==; h=Received:Subject:To:References:From:Message-ID:Date:User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Language:X-ELNK-Trace:X-Originating-IP;
Received: from [75.172.116.31] (helo=[192.168.1.106]) by elasmtp-masked.atl.sa.earthlink.net with esmtpa (Exim 4) (envelope-from <asmusf@ix.netcom.com>) id 1jXGzb-000DeX-9F for i18ndir@ietf.org; Sat, 09 May 2020 00:21:39 -0400
To: i18ndir@ietf.org
References: <158896904545.17044.5288882047334991439@ietfa.amsl.com> <CALaySJ+CRJumYtDCxvGsSwzanz4y=7icuqd+toc0wMivf-mJGg@mail.gmail.c om> <CAD7Fb3diej1-3fAgqZsS_E9wOs1KC=OwVWbvxV5mVjOdQEQm5g@mail.gmail.com> <791ca602-758e-cb0f-a1a4-8fb6b74a8b61@outer-planes.net> <6F916805FF734CB450A3C724@PSB>
From: Asmus Freytag <asmusf@ix.netcom.com>
Message-ID: <780697f5-6d67-8a55-8120-a9c3b99f5cfd@ix.netcom.com>
Date: Fri, 8 May 2020 21:21:38 -0700
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0
MIME-Version: 1.0
In-Reply-To: <6F916805FF734CB450A3C724@PSB>
Content-Type: multipart/alternative; boundary="------------B7818121A051D5FE02F45165"
Content-Language: en-US
X-ELNK-Trace: 464f085de979d7246f36dc87813833b26976a2cdabd2db7ad54899d183daa1264484e7995359847c350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
X-Originating-IP: 75.172.116.31
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/kZxCpdWplFwMUcdIEuYHVyxbHhg>
Subject: Re: [I18ndir] I18ndir early review of draft-ietf-dispatch-javascript-mjs-07
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 09 May 2020 04:21:45 -0000

On 5/8/2020 9:01 PM, John C Klensin wrote:
> Hi.
>
> Given that I felt a bit attacked for stepping in earlier
> (although I'm obviously pleased that John incorporated some of
> my comments in his review), a few comments that can be passed
> along or not as people prefer....
>
> --On Friday, May 8, 2020 18:00 -0600 "Matthew A. Miller"
> <linuxwolf+ietf@outer-planes.net> wrote:
>
>> Thank you very much for the review, John.  This is very
>> helpful, and will work on changes to resolve the concerns.
>>
>> Regarding the encoding:
>>
>> UTF-8 is strongly encouraged, and required for a source to be
>> a module.  Preferring BOM sniffing over a charset declaration
>> is the result of general purpose browsers having to deal with
>> a myriad of misconfigured servers; this being the defense
>> mechanism that permitted the best interoperability.
>>
>> The part about "longest matching octet sequence" is a missed
>> remnant from the original document which included UTF-32BE/LE
>> entries.  As UTF-32BE/LE is no longer found in the wild*,
>> those entries were removed but I missed cleaning up that
>> sentence.
> An observation with the understanding that I'm not advocating
> for UTF-32, in either ordering, and especially not on the wire.
> First, until Unicode deprecates it (which, AFAIK, they haven't
> done), "no longer found in the wild" is a rather bold statement
> (analogies to the recent IETF list discussion about POP2 come to
> mind) and rather different from "no current or recent version of
> any browser with which the authors are familiar accepts and
> supports UTF-32 encodings" (which is what "Web browsers haven't
> accepted UTF-32 encodings of JavaScript for quite a long while"
> almost certainly means).

There's no reason for Unicode to deprecate UTF-32.

Nobody really advocates it as an encoding form for transmission, but 
Unicode's definition of UTFs are not file formats. That's an essential 
misunderstanding. Anytime a per-character API accepts or returns 32-bit 
value that cover the entire code space, that format is effectively 
UTF-32. Deprecating this would force all those APIs to work on strings - 
not necessarily an improvement.

Whether it's advisable to have an internal string format that's UTF-32 
(rather than the more widely implemented UTF-16) is something that 
Unicode correctly is agnostic about.

It's not Unicode, but things like W3C and IETF standards that are 
supposed to define transmission / file formats. The Unicode-defined 
range of UTFs just represents the building blocks.

>   I have no information about
> Javascript specifically, but I'm aware of several systems that
> still use UTF-32 for internal processing because, if one does
> not have storage space limitations, there are significant
> advantages of having exactly the same number of octets (four)
> for each code point regardless of where it falls in the Unicode
> tables and hence without the need to pack and unpack UTF-8 or be
> aware of and process surrogates.

That's independent of the definition of a programming language. How you 
implement such a language on top of  a "system" that works natively in 
UTF-32 is a matter for the implementer of that programming language. 
Javascript has all the rights to define whether UTF-32 is or is not an 
acceptable source code format (and to deprecate it for that purpose).
>
> More important, my understanding of the general practices in the
> IETF for many years is that we don't deprecate features that may
> once have been accepted and used, even if not "for quite a long
> while") by simply removing them and assuming that readers will
> observe their absence and make the appropriate inferences,
> whatever those might be.  I see nothing in either the body of
> the document nor in Appendix B that makes it explicit that
> support for UTF-32 has been dropped.   I think such a comment
> should be added and explained, even if the explanation is that
> no one significant is accepting it any more and it is therefore
> NOT RECOMMENDED and being dropped from the spec.
>
If that understanding is correct, I agree with the conclusion.

>   
>> Implementations today do ignore the BOM once the character
>> encoding is worked out, so this was portion from Section 4.2
>> was kept to reflect that.  I'm not sure what changes to make
>> here regarding that, although the ending "to source text" is
>> another missed remnant from my editing that will be removed.
>>
>> Section 4.2 still includes step #3 to deal with the (in
>> practice quite common) case of a missing BOM and the media
>> type missing a charset parameter.  There are also too many
>> servers that set this to "ISO-8859-1" without otherwise
>> examining the sources being served. We'll make it clearer this
>> is a default/fallback case.
> Noting (again) that, absent a BOM (or even with one) UTF-8
> cannot be reliably distinguished from ISO-8859-X (for any
> registered or unregistered value of X), I don't know what the
> default/fallback case actually is or, more generally, what  the
> above paragraph means.  Unless I have misunderstood something
> important, the reality is that, if there is anywhere on the
> Internet that a web browser or server (including decades-old
> embedded servers) treat ISO/IEC 8859-1 as either the default or
> legitimate, then there are two possibilities: accurate labeling
> of the charset in use or use of heuristics that, by their nature
> and the nature of possible CCSs (not just encoding schemes), may
> fail.   That calls for at least a health warning in the
> document, not proceeding as if the heuristics are foolproof.

A move to the 21st century would imply making  UTF-8 the default (in the 
absence of another declaration). If that can't be done because of 
pervasive legacy, then that needs to be stated and motivated.

If a BOM is present and disagrees with the declaration, that should be 
an error condition. Hopefully, there isn't pervasive legacy to the 
contrary.

It looks like in source code a proper BOM is always treated as whitspace 
and ignored, if so, that could be mentioned as a reason why it's not 
necessary to actively skip it.


Another thing: in our discussion we mentioned the need to take the 
description of the charset handling out of the security consideration 
section. I don't see that mentioned.

>
> best,
>     john
>
> p.s. Noting a recent unpleasant discussion on the IETF list and
> that the review is being handled as an individual one rather
> than as a conclusion by the directorate, if any ideas or text
> are taken from this or my prior comments on the draft, I expect
> that, consistent with BCP 78, those comments will be attributed
> so they can be properly acknowledged in the I-D.
>