Re: [Json] [secdir] secdir review of draft-ietf-jsonbis-rfc7159bis-03

Nico Williams <nico@cryptonector.com> Mon, 13 March 2017 18:37 UTC

Return-Path: <nico@cryptonector.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 06495129A23; Mon, 13 Mar 2017 11:37:21 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.5
X-Spam-Level:
X-Spam-Status: No, score=-1.5 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_SORBS_SPAM=0.5, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cryptonector.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zLDoGf9Vj3q1; Mon, 13 Mar 2017 11:37:19 -0700 (PDT)
Received: from homiemail-a72.g.dreamhost.com (sub4.mail.dreamhost.com [69.163.253.135]) (using TLSv1.1 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D824C1299E4; Mon, 13 Mar 2017 11:37:19 -0700 (PDT)
Received: from homiemail-a72.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a72.g.dreamhost.com (Postfix) with ESMTP id 71B2BA04E8C1; Mon, 13 Mar 2017 11:37:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to:content-transfer-encoding; s= cryptonector.com; bh=jpyGA/EMeqfrnRqFPCK1jLr5zCA=; b=sgf77j1lAKQ lELAPDBqTnWvBXPyxP0WR9BE4/ilDTQFQqFC95q6KZ0EjD4t2qwE3uEirJVSvdj4 vuyHOpM/4dggjoXnamJJ4yDXNYNR5Z6GL9qz8wvr+6QiBnp4+RxbkGTE74JhNHZ4 Td3WbEpwZltfElSPV7N/ZotAPfBi8nVg=
Received: from localhost (cpe-70-123-158-140.austin.res.rr.com [70.123.158.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: nico@cryptonector.com) by homiemail-a72.g.dreamhost.com (Postfix) with ESMTPSA id 8097AA000B32; Mon, 13 Mar 2017 11:37:18 -0700 (PDT)
Date: Mon, 13 Mar 2017 13:36:29 -0500
From: Nico Williams <nico@cryptonector.com>
To: Martin =?iso-8859-1?Q?J=2E_D=FCrst?= <duerst@it.aoyama.ac.jp>
Message-ID: <20170313183628.GE543@localhost>
References: <otwresf20y4vnpmoboqqjnux.1489359742487@email.android.com> <0d3258fa-0f9d-cc5d-06d7-fcba943349ad@gmx.de> <f63c6a4a-dfbb-e03a-ea1e-38002f81ced8@it.aoyama.ac.jp>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
In-Reply-To: <f63c6a4a-dfbb-e03a-ea1e-38002f81ced8@it.aoyama.ac.jp>
User-Agent: Mutt/1.5.24 (2015-08-30)
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/hvFubxAFGL8pPjfmALiuTxvT_Kk>
X-Mailman-Approved-At: Mon, 13 Mar 2017 13:35:46 -0700
Cc: draft-ietf-jsonbis-rfc7159bis.all@ietf.org, John Cowan <cowan@ccil.org>, Ned Freed <ned.freed@mrochek.com>, ietf@ietf.org, Peter Cordell <petejson@codalogic.com>, secdir@ietf.org, Julian Reschke <julian.reschke@gmx.de>, "json@ietf.org" <json@ietf.org>, Elwyn Davies <elwynd@dial.pipex.com>
Subject: Re: [Json] [secdir] secdir review of draft-ietf-jsonbis-rfc7159bis-03
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Mar 2017 18:37:21 -0000

On Mon, Mar 13, 2017 at 04:51:58PM +0900, Martin J. Dürst wrote:
> My personal opinion is that we could try to fix this by changing the
> following:
> 
> >>>>
>    JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32 [UNICODE]
>    (Section 3).  The default encoding is UTF-8, and JSON texts that are
>    encoded in UTF-8 are interoperable in the sense that they will be
>    read successfully by the maximum number of implementations; there are
>    many implementations that cannot successfully read texts in other
>    encodings (such as UTF-16 and UTF-32).
> >>>>
> 
> to something like the following:
> 
> >>>>
>    JSON text SHOULD be encoded in UTF-8 [UNICODE]
>    (Section 3).  JSON texts that are
>    encoded in UTF-8 are interoperable in the sense that they will be
>    read successfully by the maximum number of implementations.
> 
>    There are
>    many implementations that cannot successfully read texts in other
>    encodings (such as UTF-16 and UTF-32). JSON text MAY be encoded in
>    UTF-16 or UTF-32 [UNICODE] (Section 3) if the sender is sure that
>    the intended recipients can read them.
> >>>>
> 
> That should then go together with a MIME registration that only lists UTF-8.

+1.

I would restore the text from an older draft about how counting the
number of NULs in the first four bytes can be used to determine which of
UTF-8, 16, or 32 the text is encoded in, modified to avoid all talk of
BOMs (which had no consensus):

   Implementors MAY count the number of ASCII NULs in the first four
   bytes of any JSON text to detect which of UTF-8, UTF-16, or UTF-32
   the text is encoded in:

    - if the count is zero, then the text is encoded in UTF-8
    - if the count is one or two, then the text is encoded in UTF-16
    - if the count is three, then the text is encoded in UTF-32

   This results from a) JSON texts having to start with an ASCII
   character, b) no unescaped NULs being allowed in JSON strings, and c)
   any type being allowed at the top-level, thus the first character may
   be a double-quote and the second may be any permissible, unescaped
   Unicode codepoint.  An ASCII character requires a NUL-valued byte in
   UTF-16 encoding, three in UTF-32, and none in UTF-8.

Note the "MAY".  The idea is that we should be sending (and storing)
JSON texts encoded in UTF-8 per the text I +1'ed above, so we shouldn't
bother recommending or requiring the use of this UTF detection scheme.
But there's no harm in describing it.

See https://www.ietf.org/mail-archive/web/json/current/msg02053.html

The encoding detection algorithm can be optimized a bit since if neither
the first nor the second bytes are NUL-valued then the encoding must be
UTF-8.

The algorithm can also be combined with an understanding of byte
ordering to detect encoding by looking only at the first two bytes in
some cases, but whatever.

ASIDE:

  I don't see any reason not to also include text discussing byte-order
  detection in the UTF-16 and UTF-32 cases, but I don't care very much
  about it, and given the prior lack of consensus for it, there's no
  need to relitigate that point now.  Also, it's pretty obvious how to
  detect byte-order anyways, at least given the text on how to detect
  encoding: just looking at the first two bytes will tell you what the
  byte order is!

  If the first byte is zero-valued, then the byte order is big-endian,
  if the first is not zero-valued but the second is, then the byte order
  is little-endian, and if both, the first and second bytes are
  zero-valued, then the byte order is big-endian; if neither the first
  nor the second bytes are zero-valued then the encoding is UTF-8 and
  byte-order is irrelevant.

Cheers,

Nico
--