Re: [I18ndir] I18ndir early review of draft-ietf-dispatch-javascript-mjs-07

"Matthew A. Miller" <linuxwolf+ietf@outer-planes.net> Sat, 09 May 2020 00:00 UTC

Return-Path: <linuxwolf+ietf@outer-planes.net>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CE02B3A00E2 for <i18ndir@ietfa.amsl.com>; Fri, 8 May 2020 17:00:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=outer-planes-net.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IXI8iLJoctCd for <i18ndir@ietfa.amsl.com>; Fri, 8 May 2020 17:00:06 -0700 (PDT)
Received: from mail-io1-xd36.google.com (mail-io1-xd36.google.com [IPv6:2607:f8b0:4864:20::d36]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 896783A0147 for <i18ndir@ietf.org>; Fri, 8 May 2020 17:00:03 -0700 (PDT)
Received: by mail-io1-xd36.google.com with SMTP id i19so3543715ioh.12 for <i18ndir@ietf.org>; Fri, 08 May 2020 17:00:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outer-planes-net.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:autocrypt:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=K5cJzbmUn3YqJzUmQobAkPKl1D5hOqp0iK/mCsv5uLk=; b=zJnrZx8QxZXNKJjGFyKLaDWEuwiGTW8dznUHsX7zk+8cCjRITpZbDd+lbJAbRL/TkW MmIbrXyOSCIO5dfQn9yI2uT3rtmGm/zCKSMXZf+ZE/GIHK6Pn8gm4IsZ6z0ylvj2NRcJ jZWvyAeQw2BqVzy7EFNMtu9s+r/p3ORzUzajIJ4yHYx3ZjAi1z2HaT/1IRKOTCnoTVfK NXQuUthZWWxOcDR4azoFAHJ0/3jqcfTli9z/2Ez5qFbsyq8Ug6IQyNFT+GpJRmaKOpQt nRIdFqik0yJprVnXy/pWSpEbaH4BMQ7WSn4dqK8FmUR/l6+HOxa9CLGTFw8kkYcoJcdu KVPA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:autocrypt :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=K5cJzbmUn3YqJzUmQobAkPKl1D5hOqp0iK/mCsv5uLk=; b=GPgobfXHdxD1e9EvA+OvyJmj9n9skqySRbGtyXcpdoiSVqLzbo9V9Gf1I8fbzAVJIN fWc6cQHvdZsXpv8hL9ytsRclp0usUhgVRZZKmxPUZrO4i03oKeOqoJIH+6j6Pl3yFxc4 RYf/Xirn9lNL2RIx6nByydU1buCaByI3sMW1LK05L+BqOf2EBpPCi0RB7VwU7WAPXkcF OouEAIF9aJK/Q+Pez4KqkdkZHz5pGxBSohpnjg7YQyVjFIYI9F12mBL+k6drr6j2X5D1 5OsTv6zfv1+5guKD6iuIJN7E3NebwMTQF0rAOTVqpp09VgCcBg4K9XjZJoV4fkq+rdV4 xBAA==
X-Gm-Message-State: AGi0Pub/V1GxlekI4XunlCrBfLqLVv0ptwx7cthGEzretY4LS9wnWYiz 9xCW+l/E/RiJTB6kt+WKydCqTazzG9Lo71g6
X-Google-Smtp-Source: APiQypJbnn40906RNpP8sBVI2AZJIIjTnboKyPcEQdWpe1pxywjOSAY3ifobHyBMCF/rOF+LqFzz6w==
X-Received: by 2002:a02:b19a:: with SMTP id t26mr5141030jah.111.1588982401809; Fri, 08 May 2020 17:00:01 -0700 (PDT)
Received: from mmiller-44677.local ([2601:280:4f00:14a:19d9:fc77:2aa8:83c9]) by smtp.gmail.com with ESMTPSA id w23sm548777iod.9.2020.05.08.17.00.00 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 08 May 2020 17:00:01 -0700 (PDT)
To: Myles Borins <mylesborins@google.com>, Barry Leiba <barryleiba@computer.org>
Cc: John Levine <johnl@taugh.com>, DISPATCH WG <dispatch@ietf.org>, draft-ietf-dispatch-javascript-mjs.all@ietf.org, i18ndir@ietf.org
References: <158896904545.17044.5288882047334991439@ietfa.amsl.com> <CALaySJ+CRJumYtDCxvGsSwzanz4y=7icuqd+toc0wMivf-mJGg@mail.gmail.com> <CAD7Fb3diej1-3fAgqZsS_E9wOs1KC=OwVWbvxV5mVjOdQEQm5g@mail.gmail.com>
From: "Matthew A. Miller" <linuxwolf+ietf@outer-planes.net>
Autocrypt: addr=linuxwolf+ietf@outer-planes.net; prefer-encrypt=mutual; keydata= mQENBFJoAooBCADQmEtpbpY/4wTeKgZIuyG7HkxIFgiUeqOvtiBKj/pCA73d7Q5hCvQdGcKJ 6uZsYz3Il9oKoKFxVt90iEXspbE39g6ek19e6RsB4j0Q10l4QvH+EqeD760gs0H2yf/eYj9i uk9/VY6axdQlPsmid1zoQgCNjSM7X4/K26WGMs03sbXJpKdoonelzIlJSNfzi0q546iplo72 D2cCm9BriMkQvcGnsm4B9eBIBn3GKmVx1tsmPNeNTyun2DvaLnrYxbA0Ivo1DzZReds9NZ25 uROI/+b+lcg9/kmHzhK+q8NMQCFWmqpS/lZRKxVBSijKGpGr5h8VLVf5iURHtwG+B/QxABEB AAG0M01hdHRoZXcgQS4gTWlsbGVyIDxsaW51eHdvbGYraWV0ZkBvdXRlci1wbGFuZXMubmV0 PokBVAQTAQoAPgIbAwULCQgHAwUVCgkICwUWAgMBAAIeAQIXgBYhBDHXWI3skGkNa8yY4Oz0 ck4QngW7BQJeDg4fBQkVMJSlAAoJEOz0ck4QngW7XCcH/RBVW3Nd0ezXtL9XSn5DHJxRTb5q 6ZVIBQgVIMcH2DVzO/aCs3o1ECONHAazVGQ9b6cwHCtPWJpM0ENGx7DERa/Ay4vDeKXc1TEX VuukdGrX2zWOaFHDT/oU1SEg0C+f3JGnaTwYQ7i2KXkFuYNmqROkB+Z0PDaLu4biCYdjhkIm Yu3frzySHhEX2VVMcJA6lcqdBTE3j2+ywQ7icpiWUcvLuhCeuFER1JjTRchcXwtuiOAKPQCZ BM9B70Q73hiKKK4ylNjhLFKGomkWDqsQ6sAENn6YkWyBuXNr5Y66uFxFS0VY938o/ZoXw4tb qUIdBzMnHkHxxiNUUBb6dPkaEGO5AQ0EUmgCigEIAMD+u4fBiVDul2Mljq3CRlwyZ52RA0vq vm00F5CTBWu+K1SMdMoqKmPEHaQSRRmjE+AwjWHv96cOtWUwwyqrpEgnzof7LHXfM0hk0GUl +ZUeAePtNPyylroD+ohxx2IhE2wVW+W8XGkfyxONVsd89h7Ft05HmQellZPNjE3JUtcwrmN6 fQHgr6+NuAUkC+ygt/MtnkHPeRvp2m7FQ3OqEPKGTn9Q9oIgW9lYG2JEqaSo/ASrwbZowmrl nhKvwJGSmgwHbmvEI9LxH4HKIfGmr5TyYq6o9WDUsnNwDuEeaazxoE3qXFKVvIqfMSDwBaCV 37r7GUle7lT9+oMAKVOPmZ8AEQEAAYkBPAQYAQoAJgIbDBYhBDHXWI3skGkNa8yY4Oz0ck4Q ngW7BQJeM+W5BQkPjlg/AAoJEOz0ck4QngW7a+IIANBU7R3t17LKflQo3nSUoqMBLkjxo9/e yzKAb3u0Fjb5md+9ESrFb03w1ZUkKLh/b6leTFq50IJbfxgDlVgkTn/j0XPOmIHpfDtVYPnA /rI5sqMzjb3qFOPFZFX9Til360uv9Zc5mlkJcM57X4aLRl7wSGRXPqh7v356s+JlvLF8rBtZ 7LU5SrCWeoWZu/7NvqW+UNEOOP2xAlOId4BeYWflkpzNcSPkhAkD2Xvw/GmyOm24Im7Ef2O5 scQhEO/dG+3jU4QnSGFtLXHndHpNM20vD6T+uWUpyp5g27KrIHApWq9M3o6KR68pTOLJrMxc th8xmHLOpuWVAKEABNQRDfE=
Message-ID: <791ca602-758e-cb0f-a1a4-8fb6b74a8b61@outer-planes.net>
Date: Fri, 8 May 2020 18:00:00 -0600
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0) Gecko/20100101 Thunderbird/68.8.0
MIME-Version: 1.0
In-Reply-To: <CAD7Fb3diej1-3fAgqZsS_E9wOs1KC=OwVWbvxV5mVjOdQEQm5g@mail.gmail.com>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/Xs0ady1-JOcZd_9HXrevDNsa7No>
Subject: Re: [I18ndir] I18ndir early review of draft-ietf-dispatch-javascript-mjs-07
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 09 May 2020 00:00:19 -0000

Thank you very much for the review, John.  This is very helpful, and
will work on changes to resolve the concerns.

Regarding the encoding:

UTF-8 is strongly encouraged, and required for a source to be a module.
 Preferring BOM sniffing over a charset declaration is the result of
general purpose browsers having to deal with a myriad of misconfigured
servers; this being the defense mechanism that permitted the best
interoperability.

The part about "longest matching octet sequence" is a missed remnant
from the original document which included UTF-32BE/LE entries.  As
UTF-32BE/LE is no longer found in the wild*, those entries were removed
but I missed cleaning up that sentence.

Implementations today do ignore the BOM once the character encoding is
worked out, so this was portion from Section 4.2 was kept to reflect
that.  I'm not sure what changes to make here regarding that, although
the ending "to source text" is another missed remnant from my editing
that will be removed.

Section 4.2 still includes step #3 to deal with the (in practice quite
common) case of a missing BOM and the media type missing a charset
parameter.  There are also too many servers that set this to
"ISO-8859-1" without otherwise examining the sources being served.
We'll make it clearer this is a default/fallback case.


- m&m

Matthew A. Miller

* Web browsers haven't accepted UTF-32 encodings of JavaScript for quite
a long while
On 20/05/08 17:24, Myles Borins wrote:
> Regarding the mime type "text/javascript"
> 
> The HTML Specification is rather explicit
> <https://html.spec.whatwg.org/multipage/scripting.html#scriptingLanguages>
> 
>     Servers should use text/javascript for JavaScript resources. Servers
>     should not use other JavaScript MIME types for JavaScript resources,
>     and must not use non-JavaScript MIME types.
> 
> 
> Browsers validate that source text being loaded is delivered with the
> "text/javascript" mime type and fails if it is any other mime is used.
> There is no in-band way to determine the goal of a Module, and Node.js
> needed a signal to be able to determine how to interpret source text.
> This was the motivation behind creating .mjs.
> 
> The problem this created was that webservers, not knowing the extension,
> were not serving the test/javascript mimetype and browsers were then in
> turn failing to load them as modules. This was a pretty awful developer
> experience, especially for folks experimenting with a newer technology
> they didn't entirely understand to begin with.
> 
> With this context in mind I don't think it is really worth going and
> making a distinct mimetype for modules, at least not one that would be
> strictly enforced, at the very least this probably shouldn't be
> pursued without broader socialization and buy-in from browser vendors.
> 
> My gut is that we are best to use this proposal to document the existing
> status quo... what people are using and is working, rather than
> introduce new constraints that do not have any existing implementation
> or adoption.
> 
> I'll let folks more familiar with your other notes field those questions.
> 
> On Fri, May 8, 2020 at 7:19 PM Barry Leiba <barryleiba@computer.org
> <mailto:barryleiba@computer.org>> wrote:
> 
>     Thanks, John, for taking the time on this.  It’s a really helpful
>     review, and appreciated.
> 
>     Barry
> 
>     On Fri, May 8, 2020 at 4:17 PM John Levine via Datatracker
>     <noreply@ietf.org <mailto:noreply@ietf.org>> wrote:
> 
>         Reviewer: John Levine
>         Review result: Ready with Issues
> 
>         This is my take on issues with this document mostly from my personal
>         review but also after some discussion we've had on the i18ndir list.
> 
>         Some parts of this draft are quite hard to follow, so I'm giving my
>         understanding of the parts I'm commenting on in case I got them
>         wrong.
>         I realize that a lot of this is unchanged from 4329, which we should
>         have reviewed more carefully 15 years ago.
> 
>         Section 4 on Encoding: I believe it says that the preferred encoding
>         for all javascript is UTF-8, but some sources use other
>         encodings and
>         sometimes mislabel them.  So for anything that you don't know is a
>         module, you have to sniff the contents to see if starts with a BOM,
>         and if so, use the BOM's encoding and delete the BOM.  If the
>         BOM uses
>         an encoding the consumer doesn't support, fail.  If there's no BOM,
>         use the declared character set, or if it's one the consumer doesn't
>         understand, treat it as UTF-8 anyway.
> 
>         Step 1 says "The longest matching octet sequence determines the
>         encoding."
>         which I don't understand, since none of the encodings overlap. 
>         Does that
>         mean it should interpret a partial BOM, e.g., EF BB 20 for
>         UTF-8? Also,
>         why is the BOM deleted?  ECMAscript says a BOM is a space so it
>         should be
>         harmless.
> 
>         While I understand that there is a lot of history here, I'm
>         wondering if
>         the range mislabeling is really as extreme as this implies.  Is
>         there,
>         say, text labelled Shift-JIS which is really UTF-8 or UTF-16? If
>         the
>         mislabelled stuff is consistently mislabelled as one of
>         UTF-8/16/16BE/16LE
>         perhaps it could say to try the BOM trick on those encodings and
>         fail otherwise.
> 
>         I don't understand step 3, "The character encoding scheme is
>         determined to be UTF-8."  How can it be determined to be UTF-8 other
>         than by steps 1 and 2?  Or is it saying that if the declared charset
>         is one the consumer doesn't understand such as KOI8-U, assume it's
>         UTF-8 anyway.
> 
>         I'd suggest rewriting the section to make it clearer that if
>         it's not
>         a module, you look for a BOM, use its encoding if you find one,
>         and (I
>         think) otherwise use the declared encoding.
> 
>         Section 4.3 on error handling: I think it says that if there's a
>         byte
>         sequence that isn't a valid code point in the current encoding,
>         it can
>         fail or it can turn the bytes into Unicode replacement
>         characters, but
>         MUST NOT try anything else.  I agree with this advice but again it
>         could be clearer.
> 
>         Section 3 on Modules: I believe it says that JS scripts and
>         modules have
>         different syntax but you can't easily tell them apart by
>         inspection. 
>         (The term "goal" is familiar since I used to write books about
>         compiler
>         tools, and I realize it's what the ECMAscript spec uses, but it's
>         confusing if you're not a programming language expert.  How
>         about just
>         saying that scripts and modules have different syntax?)
> 
>         Hence some software uses a .mjs filename as a hint that
>         something is a
>         module.  Again I realize that there is a bunch of existing code but
>         this is not great MIME practice.  If the difference matters, it's
>         worth providing a new MIME type such as text/jsmodule, which could
>         have consistently accurate content encodings.  It would coexist with
>         all of the other old MIME types and the .mjs hints. Since this draft
>         deprecates a bunch of existing types and de-deprecates another, this
>         seems as good a time as any to do it.
> 
>         I also wonder whether it's worth making a distinction in MIME
>         processing between modules and scripts.  Would there be any harm in
>         saying to sniff everything for a BOM?  If a .mjs file turns out to
>         have a UTF-16 BOM, it's wrong, but is it likely to be anything other
>         than a javascript module in UTF-16?
> 
> 
>