Re: [Json] In "praise" of UTF-16

Rob Sayre <sayrer@gmail.com> Mon, 02 September 2019 23:30 UTC

Return-Path: <sayrer@gmail.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 478F9120164 for <json@ietfa.amsl.com>; Mon, 2 Sep 2019 16:30:27 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.998
X-Spam-Level:
X-Spam-Status: No, score=-1.998 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wj2yoWUqanwO for <json@ietfa.amsl.com>; Mon, 2 Sep 2019 16:30:25 -0700 (PDT)
Received: from mail-io1-xd43.google.com (mail-io1-xd43.google.com [IPv6:2607:f8b0:4864:20::d43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2DCEA12018D for <json@ietf.org>; Mon, 2 Sep 2019 16:30:25 -0700 (PDT)
Received: by mail-io1-xd43.google.com with SMTP id x4so31743234iog.13 for <json@ietf.org>; Mon, 02 Sep 2019 16:30:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=wL8FrBxVtTT0/jkRFpQ9lmRWXRXQ5WhHw7ulePuAILM=; b=YPBjNufS6hLynBo31nYJfqFtdn04UEAxoz684rO/HDPpJgOAq46yg2PiEQAZZaGCj5 RUflwcpKD3BP7qv2MJHG17bNnGAgOTGUJQtDSVnRzodkiZom1b5G5gsDEr72IxEOoD+o BMipHvGmHZpMmq5pBhXXYF24Asl7aJANWIbDdrVQ9RHH9/pwzpBqu2bontIFCdRg03SH 2K7gdskSEvRsPuS+TwgztjI8SnwD8uYdAVUhgI7FPyJow++5uvKR8YnSI0Swpnm4vLr8 H/IFOtkqE+XNxNKHIZx1bId+d8w0hYM9363i2mT99LvH68FvFpzwswJLi81jDnT7KZzP zAVg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=wL8FrBxVtTT0/jkRFpQ9lmRWXRXQ5WhHw7ulePuAILM=; b=a7rCwa5MGkdPBsjq4NstzSKO9YCU4OXoWw18FQAiadIWErq5qNYmsMn9yDuvBniStU W5uignZkozfocpfOgkkGDp3tUTnepSI5Ikho0kPT/fFaxrik4MzXXg7QecVi4xSlxfX+ u4+cTsfQXJs/vRvD3rxo8Rwt0i1YAIle/0dhhEQG4Zh98d7Xa6VYdnKYv5s+olBF7/nY lOgKIx2gng+7jg7dBH2g2MZfql1ISGYWJyszSIOelS8yWPPFQx++uIDth98x44V0zkCF 6bqLdM6SM/8D0r/KKTGgQ77yPuwCNu4mWOEe6ASgRdOztlcCtF+RDHVqHPjoJK+G9o81 w5+g==
X-Gm-Message-State: APjAAAVom4FCUayjXce2bYD/ozFuq4YFsCeHTouwDcQcPxn4uJTDRnV7 /+U2qfQD2teJSk7KUCOFms1LVRSAb4NpGZGQdQQ=
X-Google-Smtp-Source: APXvYqx5AbKm2M5se6Sl7KR9ys0n9L3To1Y8IZqi7EOcYiLllVLvU48t8d12ciZK5ZV/eFx/jcM9m3wDyVJajts3wm4=
X-Received: by 2002:a5d:8457:: with SMTP id w23mr25751598ior.189.1567467024260; Mon, 02 Sep 2019 16:30:24 -0700 (PDT)
MIME-Version: 1.0
References: <cc3dc24d-3e13-e319-e48f-7b52ddd017d0@gmail.com> <00231270-86DF-4AD2-949E-25B04D518577@tzi.org> <20190902211744.GA7920@localhost> <40386571-301A-47BD-937D-55666566CFB5@tzi.org> <20190902214047.GB7920@localhost> <E387B935-8AA9-41E3-87D1-4EE72BB34BAE@tzi.org>
In-Reply-To: <E387B935-8AA9-41E3-87D1-4EE72BB34BAE@tzi.org>
From: Rob Sayre <sayrer@gmail.com>
Date: Mon, 02 Sep 2019 16:30:10 -0700
Message-ID: <CAChr6SwLw9srC-9jNMp8frNbr9gSrTDDY8p-Nv9PTgQhHmTjnQ@mail.gmail.com>
To: Carsten Bormann <cabo@tzi.org>
Cc: Nico Williams <nico@cryptonector.com>, Anders Rundgren <anders.rundgren.net@gmail.com>, "json@ietf.org" <json@ietf.org>
Content-Type: multipart/alternative; boundary="0000000000002da3a305919a5b35"
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/tp8BABuSPnjIi6PWW6yQaEUnGh8>
Subject: Re: [Json] In "praise" of UTF-16
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Sep 2019 23:30:39 -0000

On Mon, Sep 2, 2019 at 2:49 PM Carsten Bormann <cabo@tzi.org> wrote:

> On Sep 2, 2019, at 23:40, Nico Williams <nico@cryptonector.com> wrote:
> >
> > Yes, I'm aware.  It's not that bad.  It's not that bad for the other
> > camp either (since they must already have UTF-{8, 16} transliteration.
>
> It’s trivial for the UTF-16 side because JSON needs conversion to UTF-8
> already.
> Only if you then want to carry around the JSON as UTF-16 inside your
> program (which appears to be something that some Java people like) the
> whole thing becomes ugly.
>

Doesn't this argument miss the escaping syntax that JSON requires?

'To escape an extended character that is not in the Basic Multilingual
Plane, the character is represented as a 12-character sequence, encoding
the UTF-16 surrogate pair.  So, for example, a string containing only the G
clef character (U+1D11E) may be represented as "\uD834\uDD1E".'

<https://www.rfc-editor.org/rfc/rfc8259.txt>, section 7.

thanks,
Rob