Re: [Json] In "praise" of UTF-16

Anders Rundgren <anders.rundgren.net@gmail.com> Mon, 02 September 2019 19:48 UTC

Return-Path: <anders.rundgren.net@gmail.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7C71A12008D for <json@ietfa.amsl.com>; Mon, 2 Sep 2019 12:48:29 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BLPRzN3CJf7b for <json@ietfa.amsl.com>; Mon, 2 Sep 2019 12:48:27 -0700 (PDT)
Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 23A3212004E for <json@ietf.org>; Mon, 2 Sep 2019 12:48:27 -0700 (PDT)
Received: by mail-wr1-x436.google.com with SMTP id y19so15070666wrd.3 for <json@ietf.org>; Mon, 02 Sep 2019 12:48:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=2vkErUpe2oSLcYz6LjP82eR7ZWPqZekfN/VMGhDEbBI=; b=jpe8JjvST9yg9Kd0rYe5kEqlnku6o0ayvik5lCWWQoRsKrrJvWSf65LKwErOM0hOnC CCcIlR1+S5HnJMEBXmOfvwbgDZ8M/r7QoRAB8MCG1rc+d+/x7DJtBe1QOqQW3UGfG/VZ h35kR/5whS9owbt8fMpKlrAlsYBrEVUukxTH600TkxLKzgOEo7aM63G9R1uUs0/CbFna sgjXIpl17VNkXXfnrmWol94HxF0EKsvq0vQn1c1QG/RFbSHEQ0ExJ0bTRjZbfsWYOJf5 MGGiK/0bfqUL0Ej8Pkb1FgorJsQR20+kqR6jWrwSuQN4k79efDu7PRSkw9Q+SjNLjAZH Bm5Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=2vkErUpe2oSLcYz6LjP82eR7ZWPqZekfN/VMGhDEbBI=; b=fafy8DLg1snPPv5aRnXcDvaMVQGA6Fyzm8caZFxWqo4JOQRStxWEWjLdW2JS+p6X7U 9BORnuaUkjBri0qOb8GTIIXI08shsBGV7j6yTeSi4oLbqslWjbHk4APb2BDa4I8LOJJg vFQLOlY3JWm5ZoUX8RLLT0rokg07AsgRj0ypnFaDdevP3ncXXiPfPm0Kkvvd8jRdpYIk yc4XSvgCJcCKZKQLAbpXm9mag9fvI5rocIhsECyTKDWCU9iUulXwv37PzZYL4oY+kd+1 JKEQJrU5qeHbmJ8n8EpE+602qalSWd/863RIvsXBT4LaASqnqaC6jGXj2LMjfwhjW4bR Puzw==
X-Gm-Message-State: APjAAAVcy2svG5Szcx9EFD1rAd3fyGUNfSlLRZgJ5Z0QL3L+Xnp641MZ ayz5fdB/EAgte/jL6SP8xI9EZTKO
X-Google-Smtp-Source: APXvYqz0P/mIYqc87ACgQ9aFEAh5QzqE6fGdIhDkUZA3X/73GtHhYKTLvYjStnacJUUVXZljfYJ0qA==
X-Received: by 2002:adf:f20f:: with SMTP id p15mr2391580wro.17.1567453705316; Mon, 02 Sep 2019 12:48:25 -0700 (PDT)
Received: from [192.168.1.79] (25.131.146.77.rev.sfr.net. [77.146.131.25]) by smtp.googlemail.com with ESMTPSA id t203sm16974993wmf.42.2019.09.02.12.48.21 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 02 Sep 2019 12:48:24 -0700 (PDT)
To: Tim Bray <tbray@textuality.com>, John Levine <johnl@taugh.com>
Cc: "json@ietf.org" <json@ietf.org>
References: <cc3dc24d-3e13-e319-e48f-7b52ddd017d0@gmail.com> <CAHBU6iu3YT6M1bcZAvCVcs7vW+Hkx30=dqiCpS8KiQPB2ihxrA@mail.gmail.com>
From: Anders Rundgren <anders.rundgren.net@gmail.com>
Message-ID: <042f281f-0d8e-7ebf-f1bc-592790025786@gmail.com>
Date: Mon, 02 Sep 2019 21:48:19 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0
MIME-Version: 1.0
In-Reply-To: <CAHBU6iu3YT6M1bcZAvCVcs7vW+Hkx30=dqiCpS8KiQPB2ihxrA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/nr9NCIPFwV3_IIwiJfLpNOZLYkU>
Subject: Re: [Json] In "praise" of UTF-16
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Sep 2019 19:48:30 -0000

Thanx John and Tim, your input was much appreciated!

Anders

On 2019-08-31 20:21, Tim Bray wrote:
> It's incorrect to say that nobody complains about UTF-16, I do so all the time. But it doesn’t do any good.
> 
> I don't see any problem with using UTF-16 this way.
> 
> On Sat, Aug 31, 2019 at 8:39 AM Anders Rundgren <anders.rundgren.net@gmail.com <mailto:anders.rundgren.net@gmail.com>> wrote:
> 
>     Hi JSON experts,
>     Pardon the subject line, I'm by no means an UTF-16 aficionado.
> 
>     That UTF-16 has been deprecated by the industry at large for EXTERNAL representation of textual data is completely understandable.
> 
>     However, an I-D dealing with canonical JSON serialization is currently in the IETF ISE queue got criticism for using UTF-16 encoding INTERNALLY for sorting properties/keys.
>     I don't see why since the only purpose of the sorting is creating a defined order.   That sorting on UTF-8 or UTF-32 would give another result is true but for the stated purpose that is of no importance.
> 
>     In addition, JSON itself also depends on UTF-16 encoding for \uhhhh constants and AFAIK nobody have complained about that.
>     Example: A smiley Emoji has the Unicode value U+1F600 but would in a JSON escape sequence be represented as \ud83d\ude00
> 
>     The reason for preferring UTF-16 in this particular case is simply because JavaScript, Windows and Java use UTF-16 as internal representation.  That's obviously a slight platform bias but the my Go and Python implementations show that the UTF-16 requirement in practice is a no-issue.
> 
>     According to the Unicode standard UTF-16 belongs to the set of supported fully interchangeable encodings.
> 
>     WDYT?
> 
>     thanx,
>     Anders
>     https://tools.ietf.org/html/draft-rundgren-json-canonicalization-scheme-06
> 
>     _______________________________________________
>     json mailing list
>     json@ietf.org <mailto:json@ietf.org>
>     https://www.ietf.org/mailman/listinfo/json
>