Re: [Json] In "praise" of UTF-16

Nico Williams <nico@cryptonector.com> Mon, 02 September 2019 23:13 UTC

Return-Path: <nico@cryptonector.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 84BE01200A4 for <json@ietfa.amsl.com>; Mon, 2 Sep 2019 16:13:54 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cryptonector.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dnoroW4o83y5 for <json@ietfa.amsl.com>; Mon, 2 Sep 2019 16:13:53 -0700 (PDT)
Received: from azure.elm.relay.mailchannels.net (azure.elm.relay.mailchannels.net [23.83.212.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 05D6012007C for <json@ietf.org>; Mon, 2 Sep 2019 16:13:52 -0700 (PDT)
X-Sender-Id: dreamhost|x-authsender|nico@cryptonector.com
Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 08F5B500F7C; Mon, 2 Sep 2019 23:13:52 +0000 (UTC)
Received: from pdx1-sub0-mail-a24.g.dreamhost.com (100-96-168-83.trex.outbound.svc.cluster.local [100.96.168.83]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id 24CB8501330; Mon, 2 Sep 2019 23:13:51 +0000 (UTC)
X-Sender-Id: dreamhost|x-authsender|nico@cryptonector.com
Received: from pdx1-sub0-mail-a24.g.dreamhost.com ([TEMPUNAVAIL]. [64.90.62.162]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384) by 0.0.0.0:2500 (trex/5.17.5); Mon, 02 Sep 2019 23:13:51 +0000
X-MC-Relay: Neutral
X-MailChannels-SenderId: dreamhost|x-authsender|nico@cryptonector.com
X-MailChannels-Auth-Id: dreamhost
X-Arch-Illegal: 61f354fe2c2d2268_1567466031552_1470261916
X-MC-Loop-Signature: 1567466031552:919563922
X-MC-Ingress-Time: 1567466031551
Received: from pdx1-sub0-mail-a24.g.dreamhost.com (localhost [127.0.0.1]) by pdx1-sub0-mail-a24.g.dreamhost.com (Postfix) with ESMTP id C45B783FE1; Mon, 2 Sep 2019 16:13:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=cryptonector.com; bh=LWmiHn/kHCSZjg u4BbJ7WRgCJ7o=; b=nIaaTgM0fTqwxup+uPAU8TjMg0t6h1G/NEDiMgUZPqVDoe 38Nv70PSOwck2BK4eJMLSkjuQhAmCd2/oar80hrBKh0/c/RUhiR0lO/0EAMweOu9 ONgzx8yq8rITyEhLjsG+OoCXiCz8xq3N/7DHVfn9o2TMHSfNX1muTb/S0RxR0=
Received: from localhost (sdzac10-108-1-nat.nje.twosigma.com [8.2.105.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: nico@cryptonector.com) by pdx1-sub0-mail-a24.g.dreamhost.com (Postfix) with ESMTPSA id EBE4983FDF; Mon, 2 Sep 2019 16:13:43 -0700 (PDT)
Date: Mon, 02 Sep 2019 18:09:38 -0500
X-DH-BACKEND: pdx1-sub0-mail-a24
From: Nico Williams <nico@cryptonector.com>
To: Joe Hildebrand <hildjj@cursive.net>
Cc: Carsten Bormann <cabo@tzi.org>, Anders Rundgren <anders.rundgren.net@gmail.com>, "json@ietf.org" <json@ietf.org>
Message-ID: <20190902230937.GC7920@localhost>
References: <cc3dc24d-3e13-e319-e48f-7b52ddd017d0@gmail.com> <00231270-86DF-4AD2-949E-25B04D518577@tzi.org> <20190902211744.GA7920@localhost> <40386571-301A-47BD-937D-55666566CFB5@tzi.org> <20190902214047.GB7920@localhost> <E387B935-8AA9-41E3-87D1-4EE72BB34BAE@tzi.org> <CAP5p=kr=ZjLQGb54HdDsfO1kdvhcC-WzDwsg_2zJkmT3qrukUw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <CAP5p=kr=ZjLQGb54HdDsfO1kdvhcC-WzDwsg_2zJkmT3qrukUw@mail.gmail.com>
User-Agent: Mutt/1.9.4 (2018-02-28)
X-VR-OUT-STATUS: OK
X-VR-OUT-SCORE: -100
X-VR-OUT-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgeduvddrudejuddgudelucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuggftfghnshhusghstghrihgsvgdpffftgfetoffjqffuvfenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhepfffhvffukfhfgggtuggjfgesthdtredttdervdenucfhrhhomheppfhitghoucghihhllhhirghmshcuoehnihgtohestghrhihpthhonhgvtghtohhrrdgtohhmqeenucfkphepkedrvddruddthedrudejnecurfgrrhgrmhepmhhouggvpehsmhhtphdphhgvlhhopehlohgtrghlhhhoshhtpdhinhgvthepkedrvddruddthedrudejpdhrvghtuhhrnhdqphgrthhhpefpihgtohcuhghilhhlihgrmhhsuceonhhitghosegtrhihphhtohhnvggtthhorhdrtghomheqpdhmrghilhhfrhhomhepnhhitghosegtrhihphhtohhnvggtthhorhdrtghomhdpnhhrtghpthhtohepnhhitghosegtrhihphhtohhnvggtthhorhdrtghomhenucevlhhushhtvghrufhiiigvpedt
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/rL3b_3yYg7K1CbvVbDyF7bSiLAY>
Subject: Re: [Json] In "praise" of UTF-16
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Sep 2019 23:13:55 -0000

On Mon, Sep 02, 2019 at 03:56:09PM -0600, Joe Hildebrand wrote:
> I agree with Carsten. The use of utf-16 here is just one more reason to
> think this is a bad idea.

Well, yes, there's that too.

We all know that canonicalization should just not be necessary.  In
practice, to not need canonicalization requires additional functionality
somewhere.

For example, the Heimdal ASN.1 compiler will (for types where it's
requested) include a _save field in the emitted structures that has a
copy of the sub-string of the octet string being decoded that
corresponds to that structure.  E.g., you can request that the
TBSCertificate structure include a _save field whose value will be the
original encoding of a tbsCertificate field of a Certificate.  This
makes it trivial to validate the signature in a Certificate.

I'm starting to think that JSON parsers are going to need a similar
feature...

Nico
--