Re: [I18ndir] Review volunteer needed (Fwd: [dispatch] WGLC of draft-ietf-dispatch-javascript-mjs-07)

Asmus Freytag <> Thu, 30 April 2020 02:38 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id A132B3A0DB4 for <>; Wed, 29 Apr 2020 19:38:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.996
X-Spam-Status: No, score=-1.996 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key); domainkeys=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id BbIzeSic3t7U for <>; Wed, 29 Apr 2020 19:38:00 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 1E60B3A0DB1 for <>; Wed, 29 Apr 2020 19:38:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=dk12062016; t=1588214280; bh=H0Ubcht4GIq4eEMaLp0g8uMy+OJod3Werl0x IpJ0pWM=; h=Received:Subject:To:References:From:Message-ID:Date: User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Language: X-ELNK-Trace:X-Originating-IP; b=GEPBUInpRd8vu/qmMVzYxTZFMNoR/IHgH yMTV2gooYCu6yWtU345+GE5rok19dDsbkom2yF5+FcehYdRnqWUcKFQkzF2pqRljRKP EHhHx3EjXcZqR081MPHHsdk+TykzQHF/aloCn0asAS4TbILxa8BJzSWH1w2Z73YgrRB Pmv3GJYMc1TUDRNS+d4oWvbn6xxyV7SS/cBmen5IUC+8H+wk19kuPpuimGny2i4Yxdb fluhts1bA8Hc2xQ+2RL8MiRmwF4AJADCinPVkp6159P5OCadAPukAA/SI6o/MeHBRYM 5n9Gjq8IovFulA1tjqAXaqHuV3JSjVbYIgwRMmk4w==
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dk12062016;; b=lh3d+bkgcGO2fs03bXcf5A+UA+wNihkohbLMGGAPI5g8eGNlPfLf+UId2Z+cTGpuGnnUHPzJDgK6NmiSMLFwraT0bX8cTBXlqgcHyd3UTJ8gi2zFGpE6oJLqNDARLRjZL/tum1D7ssK8goNCHFF/XgLhwJj0hLFeSdXPyt1Eaf4cJmBX5HtoPJh6tpoK57AMwh0Izwg9N7g/NjkfiFz9RhOz4zVe3y/bgn/lurvOZS9KoB5hlI1kwF7vtKQfjB4V8KmlBki889mVoK/EQfFDM7X/pu12V0kohl8x8sQHQGfucJgFurG6OJwxVXf8UnamefvF12Cpeo6FDNxd8tr/Fw==; h=Received:Subject:To:References:From:Message-ID:Date:User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Language:X-ELNK-Trace:X-Originating-IP;
Received: from [] (helo=[]) by with esmtpa (Exim 4) (envelope-from <>) id 1jTz5K-000BpD-Nr for; Wed, 29 Apr 2020 22:37:58 -0400
References: <20200430014516.01551188B50A@ary.qy>
From: Asmus Freytag <>
Message-ID: <>
Date: Wed, 29 Apr 2020 19:37:59 -0700
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0
MIME-Version: 1.0
In-Reply-To: <20200430014516.01551188B50A@ary.qy>
Content-Type: multipart/alternative; boundary="------------101B18FB004ED14FAA3AD259"
Content-Language: en-US
X-ELNK-Trace: 464f085de979d7246f36dc87813833b26976a2cdabd2db7a24a996f6caf5e5d628aa1e8d400f00db350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
Archived-At: <>
Subject: Re: [I18ndir] Review volunteer needed (Fwd: [dispatch] WGLC of draft-ietf-dispatch-javascript-mjs-07)
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 30 Apr 2020 02:38:02 -0000

On 4/29/2020 6:45 PM, John Levine wrote:
> It looks like step 1 is saying that if the text starts with a BOM, you
> ignore the declared charset and sniff the BOM instead, which sounds to
> me like an ancient workaround that is perhaps no longer needed.

If data are preceded by ("start with")  a BOM, you do want to strip it; 
you never want to keep it (and the chance that a legitimate text starts 
with a BOM that has an actual function in the text is perfectly negligible).

If you have data that carries a UTF-16 bom it cannot be UTF-8, no matter 
what the charset declaration says (FF can't be a byte in UTF-8).

Therefore, you always want to look for all three.

If  it's UTF-8 you confirm you have the good character set and remove it.

If it's one of the UTF-16 ones, you switch the charset to UTF-16 (or 
proper endianness) and remove it. (Or reject the input?)

> Given that they are deprecating all of the existing javascript media
> types and reviving text/javascript which 4329 declared obsolete, this
> might be a good time to say if you're going to use our lovely new
> (old) media type, declare the correct character set so consumers can
> believe it and stop doing byte sniffing kludges.

There are two issues here. One centers around the fact that BOMs are 
"invisible". You can work hard at avoiding them, but they may be added 
by some "helpful" tool.

The other is that they serve as a useful consistency check when present.

You may write the spec to reject data with initial BOM, but then you'd 
still need to check for them. You definitely don't want to admit data 
without checking. Since you already need to check for them, you are 
better off giving clear instructions of how to make use of the fact that 
you've detected one.