[I18ndir] I18ndir early review of draft-ietf-dispatch-javascript-mjs-07

John Levine via Datatracker <noreply@ietf.org> Fri, 08 May 2020 20:17 UTC

MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
From: John Levine via Datatracker <noreply@ietf.org>
To: i18ndir@ietf.org
Cc: dispatch@ietf.org, draft-ietf-dispatch-javascript-mjs.all@ietf.org
Auto-Submitted: auto-generated
Precedence: bulk
Message-ID: <158896904545.17044.5288882047334991439@ietfa.amsl.com>
Reply-To: John Levine <johnl@taugh.com>
Date: Fri, 08 May 2020 13:17:25 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/3EXRWEXYDvHcWw1XsiXXs5luiFE>
Subject: [I18ndir] I18ndir early review of draft-ietf-dispatch-javascript-mjs-07

Reviewer: John Levine
Review result: Ready with Issues

This is my take on issues with this document mostly from my personal
review but also after some discussion we've had on the i18ndir list.

Some parts of this draft are quite hard to follow, so I'm giving my
understanding of the parts I'm commenting on in case I got them wrong.
I realize that a lot of this is unchanged from 4329, which we should
have reviewed more carefully 15 years ago.

Section 4 on Encoding: I believe it says that the preferred encoding
for all javascript is UTF-8, but some sources use other encodings and
sometimes mislabel them.  So for anything that you don't know is a
module, you have to sniff the contents to see if starts with a BOM,
and if so, use the BOM's encoding and delete the BOM.  If the BOM uses
an encoding the consumer doesn't support, fail.  If there's no BOM,
use the declared character set, or if it's one the consumer doesn't
understand, treat it as UTF-8 anyway.

Step 1 says "The longest matching octet sequence determines the encoding." 
which I don't understand, since none of the encodings overlap.  Does that 
mean it should interpret a partial BOM, e.g., EF BB 20 for UTF-8? Also, 
why is the BOM deleted?  ECMAscript says a BOM is a space so it should be 
harmless.

While I understand that there is a lot of history here, I'm wondering if 
the range mislabeling is really as extreme as this implies.  Is there, 
say, text labelled Shift-JIS which is really UTF-8 or UTF-16? If the 
mislabelled stuff is consistently mislabelled as one of UTF-8/16/16BE/16LE
perhaps it could say to try the BOM trick on those encodings and fail otherwise.

I don't understand step 3, "The character encoding scheme is
determined to be UTF-8."  How can it be determined to be UTF-8 other
than by steps 1 and 2?  Or is it saying that if the declared charset
is one the consumer doesn't understand such as KOI8-U, assume it's
UTF-8 anyway.

I'd suggest rewriting the section to make it clearer that if it's not
a module, you look for a BOM, use its encoding if you find one, and (I
think) otherwise use the declared encoding.

Section 4.3 on error handling: I think it says that if there's a byte
sequence that isn't a valid code point in the current encoding, it can
fail or it can turn the bytes into Unicode replacement characters, but
MUST NOT try anything else.  I agree with this advice but again it
could be clearer.

Section 3 on Modules: I believe it says that JS scripts and modules have 
different syntax but you can't easily tell them apart by inspection.  
(The term "goal" is familiar since I used to write books about compiler 
tools, and I realize it's what the ECMAscript spec uses, but it's 
confusing if you're not a programming language expert.  How about just 
saying that scripts and modules have different syntax?)

Hence some software uses a .mjs filename as a hint that something is a
module.  Again I realize that there is a bunch of existing code but
this is not great MIME practice.  If the difference matters, it's
worth providing a new MIME type such as text/jsmodule, which could
have consistently accurate content encodings.  It would coexist with
all of the other old MIME types and the .mjs hints. Since this draft
deprecates a bunch of existing types and de-deprecates another, this
seems as good a time as any to do it.

I also wonder whether it's worth making a distinction in MIME
processing between modules and scripts.  Would there be any harm in
saying to sniff everything for a BOM?  If a .mjs file turns out to
have a UTF-16 BOM, it's wrong, but is it likely to be anything other
than a javascript module in UTF-16?

[I18ndir] I18ndir early review of draft-ietf-disp… John Levine via Datatracker
Re: [I18ndir] I18ndir early review of draft-ietf-… Barry Leiba
Re: [I18ndir] I18ndir early review of draft-ietf-… Myles Borins
Re: [I18ndir] I18ndir early review of draft-ietf-… John R Levine
Re: [I18ndir] I18ndir early review of draft-ietf-… Matthew A. Miller
Re: [I18ndir] I18ndir early review of draft-ietf-… Asmus Freytag
Re: [I18ndir] I18ndir early review of draft-ietf-… John C Klensin
Re: [I18ndir] I18ndir early review of draft-ietf-… Patrik Fältström
Re: [I18ndir] I18ndir early review of draft-ietf-… Martin J. Dürst
Re: [I18ndir] I18ndir early review of draft-ietf-… John R Levine
Re: [I18ndir] I18ndir early review of draft-ietf-… John R Levine
Re: [I18ndir] I18ndir early review of draft-ietf-… John R Levine
Re: [I18ndir] I18ndir early review of draft-ietf-… John C Klensin
Re: [I18ndir] I18ndir early review of draft-ietf-… John C Klensin
Re: [I18ndir] I18ndir early review of draft-ietf-… John C Klensin
Re: [I18ndir] I18ndir early review of draft-ietf-… Barry Leiba
Re: [I18ndir] I18ndir early review of draft-ietf-… John R Levine
Re: [I18ndir] I18ndir early review of draft-ietf-… John C Klensin
Re: [I18ndir] I18ndir early review of draft-ietf-… Asmus Freytag
Re: [I18ndir] I18ndir early review of draft-ietf-… John Levine
Re: [I18ndir] I18ndir early review of draft-ietf-… Patrik Fältström
Re: [I18ndir] I18ndir early review of draft-ietf-… Asmus Freytag
Re: [I18ndir] I18ndir early review of draft-ietf-… Mathias Bynens
Re: [I18ndir] I18ndir early review of draft-ietf-… John R Levine
Re: [I18ndir] I18ndir early review of draft-ietf-… John C Klensin
Re: [I18ndir] I18ndir early review of draft-ietf-… Bradley Farias
Re: [I18ndir] I18ndir early review of draft-ietf-… Barry Leiba