Re: [Json] In "praise" of UTF-16

Tim Bray <tbray@textuality.com> Sat, 31 August 2019 18:21 UTC

Return-Path: <tbray@textuality.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B806B1200F9 for <json@ietfa.amsl.com>; Sat, 31 Aug 2019 11:21:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NONE=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=textuality-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yP7ReFx7D3Ia for <json@ietfa.amsl.com>; Sat, 31 Aug 2019 11:21:20 -0700 (PDT)
Received: from mail-io1-xd2e.google.com (mail-io1-xd2e.google.com [IPv6:2607:f8b0:4864:20::d2e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D3A221200F1 for <json@ietf.org>; Sat, 31 Aug 2019 11:21:20 -0700 (PDT)
Received: by mail-io1-xd2e.google.com with SMTP id s21so20933545ioa.1 for <json@ietf.org>; Sat, 31 Aug 2019 11:21:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=textuality-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=/Q2MpgUXLvCc8UKf2YIe9ucGlGsTeogjN+AyAw4ZxLY=; b=1OfEHamHNPltNCmY7VuczLwhxpUIIfNro6YOMAWYUpmWcjWERdVOs8ag7/FAKcNRpy AyMKQ7GVbgPdbKEHb0dY0MtCL0i6axX+JZpX8QWXQ86YdlXzE4IzLEuJ9SarKmhlH5S3 jF/DriTuCM/sxVB0/bTDq/0U3SBSip7W06CoaYuThKaJprrfWKKEiNI5l6GRhiS7BTY6 jYfJVVae50sPvh0QU+1aoGMcVx7Clkxm6EiBrj8UgLKvB8iRcEks41qdOXC5XljXoVmH 6fp8GlVmTyCS/a2j/l3UyxjLYRbS7EUQuAzr4ujLzAUfvtMSV6wjbbdC6i8paS2ziplI m8pw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=/Q2MpgUXLvCc8UKf2YIe9ucGlGsTeogjN+AyAw4ZxLY=; b=Fd+9Pq1d8aRVErIbyo99W6TT8QSKaXM6pT3afzNQ2ROBILskeyfELgT1njn5geFiae e+Xm84cxdfp5uwKBjQJU/SbnK3sx/zZc6V9FNs6v5E5J98moMks8FVmG47XJ0k3B/Erm fi+5ERZ7Bq3znd/LD1ZT16rwUitJ1rOFPLcWk9Tfs0QIL+loiL8Zw89Fa5Un/qy/MdlM o28A5COckqLzHVDx4hzOA3gus7P9epGnXxKyOi2Vlik95AwY8earZZPdC8b3TcAeDg+h vxDklJa61R74TY9DiRafoEEL5evAAYSAotJ8qD4LjE7CxtfyFARVXoLiRvVs5DB6d+FE 4XJg==
X-Gm-Message-State: APjAAAXoN9bRdXIYbZMF4joFTrVVYH77ljEfrn8F5Mdftj1UKTf58o1K soRNP4L1VmYEaPdRNqscMXfre9k3mS6wDs04/fZ75g==
X-Google-Smtp-Source: APXvYqxcrsy3rzvhFvlqQHE0l7Y75yD/Kuh/k5d/cXHvBXu3dgijIaV9aRcYy/9UMsj7rtu8nNPhcDsNXmhDift7/gc=
X-Received: by 2002:a5d:940c:: with SMTP id v12mr13927273ion.233.1567275680100; Sat, 31 Aug 2019 11:21:20 -0700 (PDT)
MIME-Version: 1.0
References: <cc3dc24d-3e13-e319-e48f-7b52ddd017d0@gmail.com>
In-Reply-To: <cc3dc24d-3e13-e319-e48f-7b52ddd017d0@gmail.com>
From: Tim Bray <tbray@textuality.com>
Date: Sat, 31 Aug 2019 11:21:09 -0700
Message-ID: <CAHBU6iu3YT6M1bcZAvCVcs7vW+Hkx30=dqiCpS8KiQPB2ihxrA@mail.gmail.com>
To: Anders Rundgren <anders.rundgren.net@gmail.com>
Cc: "json@ietf.org" <json@ietf.org>
Content-Type: multipart/alternative; boundary="0000000000002d8b0d05916dce80"
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/6VJlTUpKUCXU6gKN0eg3i-GADtw>
Subject: Re: [Json] In "praise" of UTF-16
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 31 Aug 2019 18:21:23 -0000

It's incorrect to say that nobody complains about UTF-16, I do so all the
time. But it doesn’t do any good.

I don't see any problem with using UTF-16 this way.

On Sat, Aug 31, 2019 at 8:39 AM Anders Rundgren <
anders.rundgren.net@gmail.com> wrote:

> Hi JSON experts,
> Pardon the subject line, I'm by no means an UTF-16 aficionado.
>
> That UTF-16 has been deprecated by the industry at large for EXTERNAL
> representation of textual data is completely understandable.
>
> However, an I-D dealing with canonical JSON serialization is currently in
> the IETF ISE queue got criticism for using UTF-16 encoding INTERNALLY for
> sorting properties/keys.
> I don't see why since the only purpose of the sorting is creating a
> defined order.   That sorting on UTF-8 or UTF-32 would give another result
> is true but for the stated purpose that is of no importance.
>
> In addition, JSON itself also depends on UTF-16 encoding for \uhhhh
> constants and AFAIK nobody have complained about that.
> Example: A smiley Emoji has the Unicode value U+1F600 but would in a JSON
> escape sequence be represented as \ud83d\ude00
>
> The reason for preferring UTF-16 in this particular case is simply because
> JavaScript, Windows and Java use UTF-16 as internal representation.  That's
> obviously a slight platform bias but the my Go and Python implementations
> show that the UTF-16 requirement in practice is a no-issue.
>
> According to the Unicode standard UTF-16 belongs to the set of supported
> fully interchangeable encodings.
>
> WDYT?
>
> thanx,
> Anders
> https://tools.ietf.org/html/draft-rundgren-json-canonicalization-scheme-06
>
> _______________________________________________
> json mailing list
> json@ietf.org
> https://www.ietf.org/mailman/listinfo/json
>