Re: [I18ndir] I18ndir early review of draft-ietf-dispatch-javascript-mjs-07

Barry Leiba <> Fri, 08 May 2020 23:19 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 61F373A1032; Fri, 8 May 2020 16:19:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.399
X-Spam-Status: No, score=-1.399 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, FREEMAIL_FORGED_FROMDOMAIN=0.25, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id GxrU3KUxWoUt; Fri, 8 May 2020 16:19:03 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 9DF3F3A0FFE; Fri, 8 May 2020 16:19:00 -0700 (PDT)
Received: by with SMTP id z2so3471866iol.11; Fri, 08 May 2020 16:19:00 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=E0WjntUYlYjWvjz3ahh4ucg3kXdAqEcPfBK4em24L0A=; b=EkIMdp5i2yYK6TKXYnP4KvkQzCcSISLAxk5DQHi5jwmZ2N26raUiMgeZIAxLQY6evB i0I53TO6K4xr7us6v6jyMGkKZjYg5WX4aKrnyj3jW7BB8KfMDFfInA5gxYomDEDqDFwG YPbtTk1pdutmqaQDBdzZo/NUooXLuadxkUdnBobV6Tfc9rdjlAjgBdxh/xCrAOmw5l8k guDFXtmXScSYdGTfyX8v7DHnQNs8K2yYRqwfuUGAKd3leCoPbN40RjpqoyAp7rvZB3Io EJC7/mwc5amago7lsggBfFpA+k3R7a+RuX7VEpy4FdWWXZTvqvxPKuooiiOb9wcn2OkH aWtQ==
X-Gm-Message-State: AGi0PuZTsQEeM3R1PtgTen1JcbcVzB9VIvhmQ+wjlRS9yTbyuRsaKreX pdWCnRuzsMgUc043AhlGJm7TjDf7VzaTmnFERBc=
X-Google-Smtp-Source: APiQypJiN6KcMMSIQyVUNcg+BeLyegxarysN8Ab3Pmh8d4h6TlxwTzrUMb5zl/C8hgsItmmttxUjrCHIpwW2ExEfZ4E=
X-Received: by 2002:a02:b794:: with SMTP id f20mr4924875jam.118.1588979939622; Fri, 08 May 2020 16:18:59 -0700 (PDT)
MIME-Version: 1.0
References: <>
In-Reply-To: <>
From: Barry Leiba <>
Date: Fri, 8 May 2020 19:18:48 -0400
Message-ID: <>
To: John Levine <>
Content-Type: multipart/alternative; boundary="000000000000db446305a52b3889"
Archived-At: <>
Subject: Re: [I18ndir] I18ndir early review of draft-ietf-dispatch-javascript-mjs-07
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Fri, 08 May 2020 23:19:06 -0000

Thanks, John, for taking the time on this.  It’s a really helpful review,
and appreciated.


On Fri, May 8, 2020 at 4:17 PM John Levine via Datatracker <>

> Reviewer: John Levine
> Review result: Ready with Issues
> This is my take on issues with this document mostly from my personal
> review but also after some discussion we've had on the i18ndir list.
> Some parts of this draft are quite hard to follow, so I'm giving my
> understanding of the parts I'm commenting on in case I got them wrong.
> I realize that a lot of this is unchanged from 4329, which we should
> have reviewed more carefully 15 years ago.
> Section 4 on Encoding: I believe it says that the preferred encoding
> for all javascript is UTF-8, but some sources use other encodings and
> sometimes mislabel them.  So for anything that you don't know is a
> module, you have to sniff the contents to see if starts with a BOM,
> and if so, use the BOM's encoding and delete the BOM.  If the BOM uses
> an encoding the consumer doesn't support, fail.  If there's no BOM,
> use the declared character set, or if it's one the consumer doesn't
> understand, treat it as UTF-8 anyway.
> Step 1 says "The longest matching octet sequence determines the encoding."
> which I don't understand, since none of the encodings overlap.  Does that
> mean it should interpret a partial BOM, e.g., EF BB 20 for UTF-8? Also,
> why is the BOM deleted?  ECMAscript says a BOM is a space so it should be
> harmless.
> While I understand that there is a lot of history here, I'm wondering if
> the range mislabeling is really as extreme as this implies.  Is there,
> say, text labelled Shift-JIS which is really UTF-8 or UTF-16? If the
> mislabelled stuff is consistently mislabelled as one of UTF-8/16/16BE/16LE
> perhaps it could say to try the BOM trick on those encodings and fail
> otherwise.
> I don't understand step 3, "The character encoding scheme is
> determined to be UTF-8."  How can it be determined to be UTF-8 other
> than by steps 1 and 2?  Or is it saying that if the declared charset
> is one the consumer doesn't understand such as KOI8-U, assume it's
> UTF-8 anyway.
> I'd suggest rewriting the section to make it clearer that if it's not
> a module, you look for a BOM, use its encoding if you find one, and (I
> think) otherwise use the declared encoding.
> Section 4.3 on error handling: I think it says that if there's a byte
> sequence that isn't a valid code point in the current encoding, it can
> fail or it can turn the bytes into Unicode replacement characters, but
> MUST NOT try anything else.  I agree with this advice but again it
> could be clearer.
> Section 3 on Modules: I believe it says that JS scripts and modules have
> different syntax but you can't easily tell them apart by inspection.
> (The term "goal" is familiar since I used to write books about compiler
> tools, and I realize it's what the ECMAscript spec uses, but it's
> confusing if you're not a programming language expert.  How about just
> saying that scripts and modules have different syntax?)
> Hence some software uses a .mjs filename as a hint that something is a
> module.  Again I realize that there is a bunch of existing code but
> this is not great MIME practice.  If the difference matters, it's
> worth providing a new MIME type such as text/jsmodule, which could
> have consistently accurate content encodings.  It would coexist with
> all of the other old MIME types and the .mjs hints. Since this draft
> deprecates a bunch of existing types and de-deprecates another, this
> seems as good a time as any to do it.
> I also wonder whether it's worth making a distinction in MIME
> processing between modules and scripts.  Would there be any harm in
> saying to sniff everything for a BOM?  If a .mjs file turns out to
> have a UTF-16 BOM, it's wrong, but is it likely to be anything other
> than a javascript module in UTF-16?