Re: [Json] In "praise" of UTF-16

Nico Williams <nico@cryptonector.com> Mon, 02 September 2019 21:17 UTC

Return-Path: <nico@cryptonector.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6FEEE1200B4 for <json@ietfa.amsl.com>; Mon, 2 Sep 2019 14:17:55 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cryptonector.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9st8g4lC-D0Y for <json@ietfa.amsl.com>; Mon, 2 Sep 2019 14:17:53 -0700 (PDT)
Received: from buffalo.birch.relay.mailchannels.net (buffalo.birch.relay.mailchannels.net [23.83.209.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AF92712004A for <json@ietf.org>; Mon, 2 Sep 2019 14:17:53 -0700 (PDT)
X-Sender-Id: dreamhost|x-authsender|nico@cryptonector.com
Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id E0E206A30A4; Mon, 2 Sep 2019 21:17:52 +0000 (UTC)
Received: from pdx1-sub0-mail-a35.g.dreamhost.com (100-96-86-133.trex.outbound.svc.cluster.local [100.96.86.133]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id E13F66A2A5E; Mon, 2 Sep 2019 21:17:51 +0000 (UTC)
X-Sender-Id: dreamhost|x-authsender|nico@cryptonector.com
Received: from pdx1-sub0-mail-a35.g.dreamhost.com ([TEMPUNAVAIL]. [64.90.62.162]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384) by 0.0.0.0:2500 (trex/5.17.5); Mon, 02 Sep 2019 21:17:52 +0000
X-MC-Relay: Neutral
X-MailChannels-SenderId: dreamhost|x-authsender|nico@cryptonector.com
X-MailChannels-Auth-Id: dreamhost
X-Fumbling-Army: 5f40987931343282_1567459072306_1677195476
X-MC-Loop-Signature: 1567459072306:2243112277
X-MC-Ingress-Time: 1567459072306
Received: from pdx1-sub0-mail-a35.g.dreamhost.com (localhost [127.0.0.1]) by pdx1-sub0-mail-a35.g.dreamhost.com (Postfix) with ESMTP id 2DB427F249; Mon, 2 Sep 2019 14:17:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to:content-transfer-encoding; s= cryptonector.com; bh=ZurHkOx599jvFZuuvagJM+ALQ/I=; b=Dy/rRy3Ingr f2klMm3nMyM3aYVWOzG4/Gf27G7l9WOGP7WTq7sPPyyTAuSveiJIuMaGbbwgUdCq nE2VkVWilZN1BzkTsBMqaQGfzbSUSjuq1YVEyShymBUHOzy414KtC26FS/uTfkgE AZTi6ucBVUZHHJNmYkqbWFb0iWluIlz4=
Received: from localhost (sdzac10-108-1-nat.nje.twosigma.com [8.2.105.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: nico@cryptonector.com) by pdx1-sub0-mail-a35.g.dreamhost.com (Postfix) with ESMTPSA id E63547F5E0; Mon, 2 Sep 2019 14:17:47 -0700 (PDT)
Date: Mon, 02 Sep 2019 16:17:45 -0500
X-DH-BACKEND: pdx1-sub0-mail-a35
From: Nico Williams <nico@cryptonector.com>
To: Carsten Bormann <cabo@tzi.org>
Cc: Anders Rundgren <anders.rundgren.net@gmail.com>, "json@ietf.org" <json@ietf.org>
Message-ID: <20190902211744.GA7920@localhost>
References: <cc3dc24d-3e13-e319-e48f-7b52ddd017d0@gmail.com> <00231270-86DF-4AD2-949E-25B04D518577@tzi.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
In-Reply-To: <00231270-86DF-4AD2-949E-25B04D518577@tzi.org>
User-Agent: Mutt/1.9.4 (2018-02-28)
X-VR-OUT-STATUS: OK
X-VR-OUT-SCORE: -100
X-VR-OUT-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgeduvddrudejtddgudeitdcutefuodetggdotefrodftvfcurfhrohhfihhlvgemucggtfgfnhhsuhgsshgtrhhisggvpdfftffgtefojffquffvnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjughrpeffhffvuffkfhggtggugfgjfgesthekredttderjeenucfhrhhomheppfhitghoucghihhllhhirghmshcuoehnihgtohestghrhihpthhonhgvtghtohhrrdgtohhmqeenucfkphepkedrvddruddthedrudejnecurfgrrhgrmhepmhhouggvpehsmhhtphdphhgvlhhopehlohgtrghlhhhoshhtpdhinhgvthepkedrvddruddthedrudejpdhrvghtuhhrnhdqphgrthhhpefpihgtohcuhghilhhlihgrmhhsuceonhhitghosegtrhihphhtohhnvggtthhorhdrtghomheqpdhmrghilhhfrhhomhepnhhitghosegtrhihphhtohhnvggtthhorhdrtghomhdpnhhrtghpthhtohepnhhitghosegtrhihphhtohhnvggtthhorhdrtghomhenucevlhhushhtvghrufhiiigvpedt
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/xswU6MHTP3LKjjA53SrhV2bXcMQ>
Subject: Re: [Json] In "praise" of UTF-16
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Sep 2019 21:17:55 -0000

On Mon, Sep 02, 2019 at 10:59:59PM +0200, Carsten Bormann wrote:
> Defining a deterministic encoding scheme (“canonicalization”) for JSON
> in 2019 that needs a detour through UTF-16-land looks like a cruel
> joke.  [...]

Is that what this is about?!

Oh no.

Every JSON implementor needs to deal with UTF-8, but not every JSON
implementor needs to deal with UTF-16.

Speaking as a maintainer and developer of one popular JSON
implementation that only supports UTF-8, I'm loather to have to add
support for transliterating to UTF-16 code for canonicalization
purposes.

I understand that for many implementors the use of UTF-16 for this would
be far, _far_ more convenient than UTF-8, and that were I one of them,
I'd probably be on the other side.  For this reason I think it would be
fair for either camp to end up on the rough side of consensus, and if it
be my side so be it.

But first we should find what the consensus is.  Has there been a
consensus call on this?

Whatever we do here will make some unhappy.  Like endianness, we can
have some notion of purity (network byte order) and yet often prefer
that which is most widely implemented in hardware (little endian byte
order).  In Internet standards we _tend_ to prefer purity.

> [...].  If you only ever care about Java and its contemporaries, it may
> actually seem practical to you.  Given that most map keys will be
> ASCII anyway, people will certainly find shortcuts (also known as
> sleeping interoperability problems) around the issue.  A protocol
> designer would not touch this with a 16-foot pole.

+1

Nico
--