Re: [I18ndir] I18ndir early review of draft-ietf-dispatch-javascript-mjs-07

Myles Borins <mylesborins@google.com> Fri, 08 May 2020 23:25 UTC

Return-Path: <mborins@google.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E16583A0E2C for <i18ndir@ietfa.amsl.com>; Fri, 8 May 2020 16:25:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.099
X-Spam-Level:
X-Spam-Status: No, score=-17.099 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UsNr9v7Yz_5s for <i18ndir@ietfa.amsl.com>; Fri, 8 May 2020 16:25:01 -0700 (PDT)
Received: from mail-ot1-x32a.google.com (mail-ot1-x32a.google.com [IPv6:2607:f8b0:4864:20::32a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3E71C3A100C for <i18ndir@ietf.org>; Fri, 8 May 2020 16:24:58 -0700 (PDT)
Received: by mail-ot1-x32a.google.com with SMTP id c3so2849420otp.8 for <i18ndir@ietf.org>; Fri, 08 May 2020 16:24:58 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=11+GTKXDK5niqCTcEtmZyPK7Z5kZNmEhXPkivEa2FA8=; b=R28CwxsHgEAcLqFQ5y4Al+DRdEImOejOYKM5MdWkkzwGtCnra4mP+VTP5i04E0bkmh 55nxhQbabznaFzdkE9SLRFCjaewEEGq5AjojhY5y6l5lhnbD3o+GQj7prlQvZQj8wAH1 SwX7W+F3PuJpVqhf2cKLfvgTJuxgLEfMWqozHc7YMEnYfUYM8NWxtNN6kkdsqGhxdqMa wbgS6oM5PR+TuMTfxb8NccQ5aJyrg6QUd7OZG+M/RKc/gUQbtyFtoqCZ0LMWlaDHkN7c rws1yxGUpv+L+9LCRm7qzxtDMFvuliDd8H8v0Bv+5rlTRQBjKv+XKlus3lSMGzAxpRYI 1iNQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=11+GTKXDK5niqCTcEtmZyPK7Z5kZNmEhXPkivEa2FA8=; b=qmSTsHbSYtnWDqBqyaatlq/7ulbDQUJnfyQmyfIxJ1RpzE6xNfFM6UF0OXQnS2dPvq mr+xISdkKDIg3eckmNQ3WM4CkkPcFdTlkYOyt+3hMYHnVUHmgb7y000jriD4ajgl7RTK eFilGNChJou8qcu+5mMRYOd8gYqeewwPT3AHEVj+CjM5z8Tg7/fcNCCKcYHgvt0DNY20 64rvmWCm9A1MFcCFbi8WOYA5VUeJmsEfPieafxdBjH6IvbxTOqzBpVktAIe/DI9qw88g KpUNS03xbYg5aZckdfo15D6+8aqvs2lqlwEoCZLpt5xjLVH/gUjq0lrXnxOYYDwZIWWl eRXw==
X-Gm-Message-State: AGi0Pub9WflA7u4oQ+Qa6V2a8z9caTRPxHNLQ37eh4H+DW3dZTu8hC8T fFjD2h0YK0JkSmSP7BhT0CmIR9OAYtkmjeCBObgnuQ==
X-Google-Smtp-Source: APiQypKAPLlMJbslU+YCDMBGcwpcpA2kmxRl6GP4Z/uoTMF3ALbY8xrlDFq9Ai8IiShFbm7bLgIKlY8FMubUOP9VvLc=
X-Received: by 2002:a9d:12e2:: with SMTP id g89mr4091366otg.289.1588980297608; Fri, 08 May 2020 16:24:57 -0700 (PDT)
MIME-Version: 1.0
References: <158896904545.17044.5288882047334991439@ietfa.amsl.com> <CALaySJ+CRJumYtDCxvGsSwzanz4y=7icuqd+toc0wMivf-mJGg@mail.gmail.com>
In-Reply-To: <CALaySJ+CRJumYtDCxvGsSwzanz4y=7icuqd+toc0wMivf-mJGg@mail.gmail.com>
From: Myles Borins <mylesborins@google.com>
Date: Fri, 8 May 2020 19:24:46 -0400
Message-ID: <CAD7Fb3diej1-3fAgqZsS_E9wOs1KC=OwVWbvxV5mVjOdQEQm5g@mail.gmail.com>
To: Barry Leiba <barryleiba@computer.org>
Cc: John Levine <johnl@taugh.com>, DISPATCH WG <dispatch@ietf.org>, draft-ietf-dispatch-javascript-mjs.all@ietf.org, i18ndir@ietf.org
Content-Type: multipart/alternative; boundary="000000000000320ce905a52b4e28"
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/dXYmjEUGiHEFc0sGR2YfJgJWLKo>
Subject: Re: [I18ndir] I18ndir early review of draft-ietf-dispatch-javascript-mjs-07
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 08 May 2020 23:25:03 -0000

Regarding the mime type "text/javascript"

The HTML Specification is rather explicit
<https://html.spec.whatwg.org/multipage/scripting.html#scriptingLanguages>

Servers should use text/javascript for JavaScript resources. Servers should
not use other JavaScript MIME types for JavaScript resources, and must not
use non-JavaScript MIME types.


Browsers validate that source text being loaded is delivered with the
"text/javascript" mime type and fails if it is any other mime is used.
There is no in-band way to determine the goal of a Module, and Node.js
needed a signal to be able to determine how to interpret source text. This
was the motivation behind creating .mjs.

The problem this created was that webservers, not knowing the extension,
were not serving the test/javascript mimetype and browsers were then in
turn failing to load them as modules. This was a pretty awful developer
experience, especially for folks experimenting with a newer technology they
didn't entirely understand to begin with.

With this context in mind I don't think it is really worth going and making
a distinct mimetype for modules, at least not one that would be
strictly enforced, at the very least this probably shouldn't be
pursued without broader socialization and buy-in from browser vendors.

My gut is that we are best to use this proposal to document the existing
status quo... what people are using and is working, rather than introduce
new constraints that do not have any existing implementation or adoption.

I'll let folks more familiar with your other notes field those questions.

On Fri, May 8, 2020 at 7:19 PM Barry Leiba <barryleiba@computer.org> wrote:

> Thanks, John, for taking the time on this.  It’s a really helpful review,
> and appreciated.
>
> Barry
>
> On Fri, May 8, 2020 at 4:17 PM John Levine via Datatracker <
> noreply@ietf.org> wrote:
>
>> Reviewer: John Levine
>> Review result: Ready with Issues
>>
>> This is my take on issues with this document mostly from my personal
>> review but also after some discussion we've had on the i18ndir list.
>>
>> Some parts of this draft are quite hard to follow, so I'm giving my
>> understanding of the parts I'm commenting on in case I got them wrong.
>> I realize that a lot of this is unchanged from 4329, which we should
>> have reviewed more carefully 15 years ago.
>>
>> Section 4 on Encoding: I believe it says that the preferred encoding
>> for all javascript is UTF-8, but some sources use other encodings and
>> sometimes mislabel them.  So for anything that you don't know is a
>> module, you have to sniff the contents to see if starts with a BOM,
>> and if so, use the BOM's encoding and delete the BOM.  If the BOM uses
>> an encoding the consumer doesn't support, fail.  If there's no BOM,
>> use the declared character set, or if it's one the consumer doesn't
>> understand, treat it as UTF-8 anyway.
>>
>> Step 1 says "The longest matching octet sequence determines the
>> encoding."
>> which I don't understand, since none of the encodings overlap.  Does that
>> mean it should interpret a partial BOM, e.g., EF BB 20 for UTF-8? Also,
>> why is the BOM deleted?  ECMAscript says a BOM is a space so it should be
>> harmless.
>>
>> While I understand that there is a lot of history here, I'm wondering if
>> the range mislabeling is really as extreme as this implies.  Is there,
>> say, text labelled Shift-JIS which is really UTF-8 or UTF-16? If the
>> mislabelled stuff is consistently mislabelled as one of UTF-8/16/16BE/16LE
>> perhaps it could say to try the BOM trick on those encodings and fail
>> otherwise.
>>
>> I don't understand step 3, "The character encoding scheme is
>> determined to be UTF-8."  How can it be determined to be UTF-8 other
>> than by steps 1 and 2?  Or is it saying that if the declared charset
>> is one the consumer doesn't understand such as KOI8-U, assume it's
>> UTF-8 anyway.
>>
>> I'd suggest rewriting the section to make it clearer that if it's not
>> a module, you look for a BOM, use its encoding if you find one, and (I
>> think) otherwise use the declared encoding.
>>
>> Section 4.3 on error handling: I think it says that if there's a byte
>> sequence that isn't a valid code point in the current encoding, it can
>> fail or it can turn the bytes into Unicode replacement characters, but
>> MUST NOT try anything else.  I agree with this advice but again it
>> could be clearer.
>>
>> Section 3 on Modules: I believe it says that JS scripts and modules have
>> different syntax but you can't easily tell them apart by inspection.
>> (The term "goal" is familiar since I used to write books about compiler
>> tools, and I realize it's what the ECMAscript spec uses, but it's
>> confusing if you're not a programming language expert.  How about just
>> saying that scripts and modules have different syntax?)
>>
>> Hence some software uses a .mjs filename as a hint that something is a
>> module.  Again I realize that there is a bunch of existing code but
>> this is not great MIME practice.  If the difference matters, it's
>> worth providing a new MIME type such as text/jsmodule, which could
>> have consistently accurate content encodings.  It would coexist with
>> all of the other old MIME types and the .mjs hints. Since this draft
>> deprecates a bunch of existing types and de-deprecates another, this
>> seems as good a time as any to do it.
>>
>> I also wonder whether it's worth making a distinction in MIME
>> processing between modules and scripts.  Would there be any harm in
>> saying to sniff everything for a BOM?  If a .mjs file turns out to
>> have a UTF-16 BOM, it's wrong, but is it likely to be anything other
>> than a javascript module in UTF-16?
>>
>>
>>
>>