Re: [I18ndir] Review volunteer needed (Fwd: [dispatch] WGLC of draft-ietf-dispatch-javascript-mjs-07)

Asmus Freytag <asmusf@ix.netcom.com> Thu, 30 April 2020 20:33 UTC

Return-Path: <asmusf@ix.netcom.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F07923A132B for <i18ndir@ietfa.amsl.com>; Thu, 30 Apr 2020 13:33:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.995
X-Spam-Level:
X-Spam-Status: No, score=-1.995 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=ix.netcom.com; domainkeys=pass (2048-bit key) header.from=asmusf@ix.netcom.com header.d=ix.netcom.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7jLqUATMOJBZ for <i18ndir@ietfa.amsl.com>; Thu, 30 Apr 2020 13:33:14 -0700 (PDT)
Received: from elasmtp-galgo.atl.sa.earthlink.net (elasmtp-galgo.atl.sa.earthlink.net [209.86.89.61]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 077083A1326 for <i18ndir@ietf.org>; Thu, 30 Apr 2020 13:33:11 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ix.netcom.com; s=dk12062016; t=1588278792; bh=DpDh8WvaewKe2NQBPECXfj/2/pRul2oWhLn4 R8r1jV4=; h=Received:Subject:To:References:From:Message-ID:Date: User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Language: X-ELNK-Trace:X-Originating-IP; b=lGXkc+I/9tLwLjPxkLOxOUoBqcKAcaV9O c8P7qosXfo7gt4LFz+3njzycTO/2pTVcsT9IGhzjZbmlvUbUuWrHh1IA/n2toQ1lFhB 7iGxiwt1pFd+DMxtn/UBN3/rqjZ3/bHzUAVCXU6Iqdq+McBr++cGw3Ly+vPgfgtFP3V ESzgthAFdryjwBzqhgG1eYbm695rV3wK2VZJAwKQvPqNAWeDuxcZH26kxczRkQX/11n 5wriADbTRKBIhXMbL/3xR5jbIekwxl6x/r5O9fAMTTYp/OotypnvJsCpr/XEGQTywDE +nj07lz53ERQcyTApe64E6+OlvC87S2e6UFbcdS8A==
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dk12062016; d=ix.netcom.com; b=O04mDDXTq6PuH4ms3PKkcdnpHWu+gJtWN+ySFQCVC6QSUBskQHJE4ZjF+HwDE8yGEXjPCkNAtCUMDpAAGMH2sFxbcsJSXX2AM0DdG12eaAWgTmE7hkgSxDnF1O9bu3wnHCj/AEWp8oqT+76XV0tK3Bl2vH8Tk2f3Wv4RtMJKVSV7u0SloxaUQG/MSzsW+3qh9GfuYPVzAUsSemp5FQ070pBr8lv3E2JiP48IHfPzAz+TuyrdvzKcNuORkB8ADviRIylxw+h+PUotCrhVUShYdEX47vnwF9az/06FP8fijWXQTv8y9L25UK3KPNQnU6S2L2IZ7C0TClPLggXk/KnMvg==; h=Received:Subject:To:References:From:Message-ID:Date:User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Language:X-ELNK-Trace:X-Originating-IP;
Received: from [75.172.116.31] (helo=[192.168.0.5]) by elasmtp-galgo.atl.sa.earthlink.net with esmtpa (Exim 4) (envelope-from <asmusf@ix.netcom.com>) id 1jUFrn-000AOO-GX for i18ndir@ietf.org; Thu, 30 Apr 2020 16:33:07 -0400
To: i18ndir@ietf.org
References: <20200430014516.01551188B50A@ary.qy> <33a39102-0385-e235-1cdc-57cf6dad4f4b@ix.netcom.com> <7AD06F46449F354499AC2E24@PSB> <ACB0D0AB-2271-409D-A9A1-DFFD5A1AEE93@episteme.net> <alpine.OSX.2.22.407.2004301241440.26342@ary.qy> <8CE808C7-DF4F-45A9-9C17-2D82A8B78A9E@episteme.net> <477C5A18357719590D6336D9@PSB>
From: Asmus Freytag <asmusf@ix.netcom.com>
Message-ID: <fe750148-c879-32d0-48d7-153596f63cd9@ix.netcom.com>
Date: Thu, 30 Apr 2020 13:33:09 -0700
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0
MIME-Version: 1.0
In-Reply-To: <477C5A18357719590D6336D9@PSB>
Content-Type: multipart/alternative; boundary="------------ABCD7AD2B4CC5BD0B67B21D9"
Content-Language: en-US
X-ELNK-Trace: 464f085de979d7246f36dc87813833b26976a2cdabd2db7a9a79bc2f2e78c649cbf67eec2761ec79350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
X-Originating-IP: 75.172.116.31
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/6PRJWpy-Mb_DGKZVx2g7LJVHBjo>
Subject: Re: [I18ndir] Review volunteer needed (Fwd: [dispatch] WGLC of draft-ietf-dispatch-javascript-mjs-07)
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Apr 2020 20:33:27 -0000

I'm in broad stroke agreement with John K as to the suggested
recommendations to the WG. (found under (b)) and the other
feedback found mostly in item (6).

(Unfortunately these are buried in the meta discussion of how we
got there which we should spare the WG).

As an aside, he's correct that in this case all four of us have come
from slightly different directions and with positive effect.

As for the other John (L) I sympathize with his comment on (3) - but
it's out of my direct expertise.

His comment on (1) supports the recommendation to separate
the specifications for Script and Modules.

A./

On 4/30/2020 1:11 PM, John R Levine wrote:

> 1) We all interpreted it as "check the 'charset' parameter
> first".  The question was what was to be done next.
>
> Not really.  It says:
>
>    The charset parameter is only used when processing a Script goal
>    source; Module goal sources MUST always be processed as UTF-8.
>
> i.e., if it ends with .mjs it's UTF-8
>
>> (2) At least two of us expressed concern about the use of a file
>> name suffix as a classifier. ...
>
>> (3) Other than the statement "Source text is expected to be in
>> Unicode Normalization Form C", there is apparently no
>> requirement that the underlying CCS be Unicode. ...
>
> If there were some politically viable way to say it, I'd say to throw 
> the entire draft away other than the IANA registrations and replace it 
> with something written by someone who understands MIME and isn't 
> totally embedded in the html/javascript/UCS world.


On 4/30/2020 12:47 PM, John C Klensin wrote:
> Pete,
>
> I do not have time (especially today or tomorrow) to engage with
> this further or try to explain again, but parts of John's
> summary are, I believe, inconsistent with how Patrik, Asmus, and
> myself interpreted the document.  Specifically,
>
> (1) We all interpreted it as "check the 'charset' parameter
> first".  The question was what was to be done next.
>
> (2) At least two of us expressed concern about the use of a file
> name suffix as a classifier.  Even if that is a carry-forward
> from 4329, it is a major step back from the reason why both
> email and the web adopted media type labeling and, if we are
> counting deployment and running code, I think it is safe to
> suggest that those two applications (or, if you prefer, sets of
> applications) are somewhat more broadly deployed than these
> so-called "scripting media types".
>
> (3) Other than the statement "Source text is expected to be in
> Unicode Normalization Form C", there is apparently no
> requirement that the underlying CCS be Unicode.  The statement
> "Implementations are required to support the UTF-8 character
> encoding scheme" does not impose that requirement either, it
> just makes UTF-8 support mandatory to implement... if it does
> that, because normative requirements of that type are normally
> not buried in Security Considerations sections.  This I-D claims
> authority for insisting on NFC from the Security Considerations
> Section of RFC 3629, but that Section does not discuss
> normalization forms at all: instead, it discusses "the same
> thing" issue and then says that the problems are "amenable to
> solutions" based on normalization (not just NFC) and UAX15.
> Even if that was good advice based on our best understanding in
> 2003, it certainly is neither necessary nor sufficient now.
> More generally, the current conventional wisdom (or, if you
> will, "best practice") is that, except where special
> circumstances apply (as they do with IDNA) normalization should
> occur if needed at processing (especially comparison) time, not
> at storage or transmission time.  Yet, this document specifies
> NFC and does so without explanation other than, as discussed
> about, blaming RFC 3629 which is not only not guilty but is not
> applicable to any charset other than UTF-8.
>
> And, more broadly and probably more important:
>
> (5) It is not an i18n issue specifically, but as a co-author of
> RFC 6838 / BCP 13, I find it deeply troubling that this document
> is put forward as a media type registration when what it appears
> to do is to (i) allow the charset parameter but make it optional
> and specifically provide that it is to be ignored if present for
> Module Goals (Section 4.1) and then provide that, even for
> Script Goals, it can be ignored if heuristics are applied the
> suggest the charset (and encoding form) in use is something
> else.  In particular, one can conform to the SHOULD in that
> Section by specifying, for example, 'charset="IEO-8859-6"' and,
> since the document doesn't specify what to do with Script goal
> charsets one does not recognize, support for that drags in all
> of the bidi and troublesome sequence issues associated with
> Arabic without any of the support Unicode the documents
> surrounding it provide.
>
> (6) The text and section organization that I described as
> "convoluted" is very troublesome.  It was bad in 4329; the
> changes in the current I-D certainly does not fix the problem
> and may make it worse.  When four of us (Patrik, Asmus, John
> Levine, and myself) each with considerable experience in reading
> technical specifications, can read the same spec and come to
> four different conclusions as to what it says to do, that is a
> deep, fundamental, problem independent of any details.
>
> What is even more troublesome is that they could rather easily
> dig themselves out of most (sadly, not all) of this mess by, as
> Asmus more or less put it, joining the 21st century.    For
> example, well-placed requirements that state clearly:
>
> (i) For Module goal sources, the information MUST be in Unicode,
> using encoding form UTF-8.  A charset parameter SHOULD NOT be
> specified, if it is, its value MUST be "UTF-8".
>
> (ii) For Script goal sources, a charset parameter MUST be
> specified and MUST be one of "UTF-8", "UTF-16BE", or "UTF-16LE".
> If it is omitted, the receiving system MAY dig itself into as
> deep a whole as it prefers, possibly using BOM heuristics if
> there is an explicit "MUST use Unicode" requirements for Script
> goals.
>
> and getting all requirements on the spec itself moved  out of
> the Security Considerations section and stated as requirements,
> without relying on requirements or recommendations of documents
> like RFC 3629 that are somewhat outdated and/or don't say what
> the I-D claims or implies that they say and/or are not
> applicable to encoding forms (and non-Unicode CCSs) that the I-D
> allows.
>
> _Recommendations_
>
> (a) We have said in multiple places, most recently in what is
> now RFC 8753, that this i18n stuff requires a collaborative
> effort by people whose expertise comes from a variety of
> different perspectives.  The comments from Patrik, Asmus, John
> Levine, and myself illustrate the reasons for that.  So either
> no review should ever go out unless it either reflects multiple
> sets of eyes and consensus (at least among those who were
> willing to look) or it should bear a much stronger disclaimer
> than is typical for "area review team" review assignments.  The
> latter might say something like "while this review was assigned
> to me by the i18n directorate, it represents my opinion only and
> not consensus among the experts who make up that directorate,
> even consensus that my summary of their discussion is accurate".
>
>
> Consider what might happen in this case without one or the
> other.  A review goes off that talks about the concerns of the
> directorate and John's summary of those concerns ("We understand
> it to say...").  The WG addresses those issues and the document
> goes to IETF LC.  Some of Patrik, Asmus, and me (and maybe
> others) respond to the IETF LC pointing out the issues raised in
> our earlier notes and above, strongly suggesting that the WG
> should have known about most of this, that they are depending on
> documents that don't say what the WG claims they say and that
> violate the letter and spirit of assorted RCPs.  We point out to
> IANA that this document is not a proper Media Type registration
> and that 4329 wasn't either.   The WG responds with dismay
> because all of this is new to them.  And the ART ADs (whom I
> believe are on this list) end up with egg on their faces as does
> the whole directorate and its leadership.
>
> (b) Let's respond to the WG with the issue I think those of us
> who have looked at the document are all agreed about: it is
> _really_ hard to figure out just what the document specifies and
> hence to comment on it in an authoritative way.   If they are
> assuming Unicode, they need to make that a requirement, not hope
> the reader figures it out.   The notorious Section 4 may need to
> be split up into separate subsections for Module goals and
> Script goals or otherwise structured to be sure it is clear what
> one is to do in each case and with and without charset
> parameters.  And probably (less important for this iteration
> since no one else mentioned it, but I predict an extra iteration
> if it is not done), normative requirements on the spec must not
> appear only in the Security Considerations section and they
> better check the applicability of their references.   Only when
> they fix enough of those things that we can all agree about what
> the documents says are they going to get a review of substantive
> i18n issues.
>
> Disgustedly,
>     john
>
>
> --On Thursday, April 30, 2020 12:30 -0500 Pete Resnick
> <resnick@episteme.net> wrote:
>
>> On 30 Apr 2020, at 12:22, John R Levine wrote:
>>
>>>> the WG to take some action? If I don't hear from anyone,
>>>> I'll start  accosting people privately.
>>> Nooo, not the Private Accosting.
>> Obviously you have never experienced my full-out private
>> accosting. :-)
>>
>>> Summary:
>>>
>>> The i18n directorate has some concerns about character set
>>> handling in draft-ietf-dispatch-javascript-mjs-07.
>>>
>>> We understand it to say that if a javascript MIME element
>>> does not  have a name that ends with .mjs, a consumer ignores
>>> the declared  charset and looks at the first few bytes of the
>>> content for a byte  order mark (BOM.) If it finds one, it
>>> uses the charset implied by the  BOM, which can be UTF-16BE,
>>> UTF-16LE, or UTF-8.  If there's no BOM, it  uses the declared
>>> charset unless there isn't one, in which case it  defaults to
>>> UTF-8.
>>>
>>> We are unaware of any other MIME type that uses this sort of
>>> trick to  work around mislabelled content, and are concerned
>>> that it leads to  failures in general MIME code that doesn't
>>> handle this special case.   We also don't know how important
>>> the workaround is in practice, e.g.,  how many MIME producers
>>> still mislabel UTF-16 as UTF-8 or vice versa.
>>>
>>> For better interoperation it could say something like
>>> producers MUST  put the correct charset on any media (same as
>>> any other media type)  and that consumers SHOULD use the
>>> declared charset but MAY do the BOM  trick for backward
>>> compatibility in certain cases.
>>>
>>> It also says the BOM must be removed from the decoded text.
>>> That's  confusing since ECMAscript treats a BOM as a space
>>> which would be  harmless at the start of a block of code.
>> Thanks for taking up the pen John. If folks think something
>> needs to be elaborated or added, or if you have some
>> wordsmithing, do speak up.
>>
>> I'll check with Barry whether he wants this on the official
>> review form. If so, I'll assign the review to you in the
>> datatracker. Otherwise, you can just email the dispatch list
>> and sign it "John, stuckee for the directorate" or some such.
>>
>> pr
>> -- 
>> Pete Resnick https://www.episteme.net/
>> All connections to the world are tenuous at best
>