Re: [I18ndir] I18ndir early review of draft-ietf-dispatch-javascript-mjs-07

Barry Leiba <barryleiba@computer.org> Fri, 08 May 2020 23:19 UTC

Return-Path: <barryleiba@gmail.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 61F373A1032; Fri, 8 May 2020 16:19:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.399
X-Spam-Level:
X-Spam-Status: No, score=-1.399 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, FREEMAIL_FORGED_FROMDOMAIN=0.25, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GxrU3KUxWoUt; Fri, 8 May 2020 16:19:03 -0700 (PDT)
Received: from mail-io1-f50.google.com (mail-io1-f50.google.com [209.85.166.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9DF3F3A0FFE; Fri, 8 May 2020 16:19:00 -0700 (PDT)
Received: by mail-io1-f50.google.com with SMTP id z2so3471866iol.11; Fri, 08 May 2020 16:19:00 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=E0WjntUYlYjWvjz3ahh4ucg3kXdAqEcPfBK4em24L0A=; b=EkIMdp5i2yYK6TKXYnP4KvkQzCcSISLAxk5DQHi5jwmZ2N26raUiMgeZIAxLQY6evB i0I53TO6K4xr7us6v6jyMGkKZjYg5WX4aKrnyj3jW7BB8KfMDFfInA5gxYomDEDqDFwG YPbtTk1pdutmqaQDBdzZo/NUooXLuadxkUdnBobV6Tfc9rdjlAjgBdxh/xCrAOmw5l8k guDFXtmXScSYdGTfyX8v7DHnQNs8K2yYRqwfuUGAKd3leCoPbN40RjpqoyAp7rvZB3Io EJC7/mwc5amago7lsggBfFpA+k3R7a+RuX7VEpy4FdWWXZTvqvxPKuooiiOb9wcn2OkH aWtQ==
X-Gm-Message-State: AGi0PuZTsQEeM3R1PtgTen1JcbcVzB9VIvhmQ+wjlRS9yTbyuRsaKreX pdWCnRuzsMgUc043AhlGJm7TjDf7VzaTmnFERBc=
X-Google-Smtp-Source: APiQypJiN6KcMMSIQyVUNcg+BeLyegxarysN8Ab3Pmh8d4h6TlxwTzrUMb5zl/C8hgsItmmttxUjrCHIpwW2ExEfZ4E=
X-Received: by 2002:a02:b794:: with SMTP id f20mr4924875jam.118.1588979939622; Fri, 08 May 2020 16:18:59 -0700 (PDT)
MIME-Version: 1.0
References: <158896904545.17044.5288882047334991439@ietfa.amsl.com>
In-Reply-To: <158896904545.17044.5288882047334991439@ietfa.amsl.com>
From: Barry Leiba <barryleiba@computer.org>
Date: Fri, 8 May 2020 19:18:48 -0400
Message-ID: <CALaySJ+CRJumYtDCxvGsSwzanz4y=7icuqd+toc0wMivf-mJGg@mail.gmail.com>
To: John Levine <johnl@taugh.com>
Cc: dispatch@ietf.org, draft-ietf-dispatch-javascript-mjs.all@ietf.org, i18ndir@ietf.org
Content-Type: multipart/alternative; boundary="000000000000db446305a52b3889"
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/6B9UQoupdXSV1-zftln0sbQKWwU>
Subject: Re: [I18ndir] I18ndir early review of draft-ietf-dispatch-javascript-mjs-07
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 08 May 2020 23:19:06 -0000

Thanks, John, for taking the time on this.  It’s a really helpful review,
and appreciated.

Barry

On Fri, May 8, 2020 at 4:17 PM John Levine via Datatracker <noreply@ietf.org>
wrote:

> Reviewer: John Levine
> Review result: Ready with Issues
>
> This is my take on issues with this document mostly from my personal
> review but also after some discussion we've had on the i18ndir list.
>
> Some parts of this draft are quite hard to follow, so I'm giving my
> understanding of the parts I'm commenting on in case I got them wrong.
> I realize that a lot of this is unchanged from 4329, which we should
> have reviewed more carefully 15 years ago.
>
> Section 4 on Encoding: I believe it says that the preferred encoding
> for all javascript is UTF-8, but some sources use other encodings and
> sometimes mislabel them.  So for anything that you don't know is a
> module, you have to sniff the contents to see if starts with a BOM,
> and if so, use the BOM's encoding and delete the BOM.  If the BOM uses
> an encoding the consumer doesn't support, fail.  If there's no BOM,
> use the declared character set, or if it's one the consumer doesn't
> understand, treat it as UTF-8 anyway.
>
> Step 1 says "The longest matching octet sequence determines the encoding."
> which I don't understand, since none of the encodings overlap.  Does that
> mean it should interpret a partial BOM, e.g., EF BB 20 for UTF-8? Also,
> why is the BOM deleted?  ECMAscript says a BOM is a space so it should be
> harmless.
>
> While I understand that there is a lot of history here, I'm wondering if
> the range mislabeling is really as extreme as this implies.  Is there,
> say, text labelled Shift-JIS which is really UTF-8 or UTF-16? If the
> mislabelled stuff is consistently mislabelled as one of UTF-8/16/16BE/16LE
> perhaps it could say to try the BOM trick on those encodings and fail
> otherwise.
>
> I don't understand step 3, "The character encoding scheme is
> determined to be UTF-8."  How can it be determined to be UTF-8 other
> than by steps 1 and 2?  Or is it saying that if the declared charset
> is one the consumer doesn't understand such as KOI8-U, assume it's
> UTF-8 anyway.
>
> I'd suggest rewriting the section to make it clearer that if it's not
> a module, you look for a BOM, use its encoding if you find one, and (I
> think) otherwise use the declared encoding.
>
> Section 4.3 on error handling: I think it says that if there's a byte
> sequence that isn't a valid code point in the current encoding, it can
> fail or it can turn the bytes into Unicode replacement characters, but
> MUST NOT try anything else.  I agree with this advice but again it
> could be clearer.
>
> Section 3 on Modules: I believe it says that JS scripts and modules have
> different syntax but you can't easily tell them apart by inspection.
> (The term "goal" is familiar since I used to write books about compiler
> tools, and I realize it's what the ECMAscript spec uses, but it's
> confusing if you're not a programming language expert.  How about just
> saying that scripts and modules have different syntax?)
>
> Hence some software uses a .mjs filename as a hint that something is a
> module.  Again I realize that there is a bunch of existing code but
> this is not great MIME practice.  If the difference matters, it's
> worth providing a new MIME type such as text/jsmodule, which could
> have consistently accurate content encodings.  It would coexist with
> all of the other old MIME types and the .mjs hints. Since this draft
> deprecates a bunch of existing types and de-deprecates another, this
> seems as good a time as any to do it.
>
> I also wonder whether it's worth making a distinction in MIME
> processing between modules and scripts.  Would there be any harm in
> saying to sniff everything for a BOM?  If a .mjs file turns out to
> have a UTF-16 BOM, it's wrong, but is it likely to be anything other
> than a javascript module in UTF-16?
>
>
>
>