Re: [Json] Proposal for strings/Unicode text

Norbert Lindenberg <ietf@lindenbergsoftware.com> Tue, 18 June 2013 07:44 UTC

Return-Path: <ietf@lindenbergsoftware.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 56A8521F9CFE for <json@ietfa.amsl.com>; Tue, 18 Jun 2013 00:44:06 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.964
X-Spam-Level:
X-Spam-Status: No, score=-3.964 tagged_above=-999 required=5 tests=[AWL=-0.365, BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AATILJsjUS7n for <json@ietfa.amsl.com>; Tue, 18 Jun 2013 00:43:59 -0700 (PDT)
Received: from mirach.lunarpages.com (mirach.lunarpages.com [216.97.235.70]) by ietfa.amsl.com (Postfix) with ESMTP id 8B54A21F9CD0 for <json@ietf.org>; Tue, 18 Jun 2013 00:43:59 -0700 (PDT)
Received: from 50-0-136-241.dsl.dynamic.sonic.net ([50.0.136.241]:63054 helo=[192.168.0.5]) by mirach.lunarpages.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.80) (envelope-from <ietf@lindenbergsoftware.com>) id 1Uoqa6-001GAA-EF; Tue, 18 Jun 2013 00:43:58 -0700
Mime-Version: 1.0 (Apple Message framework v1283)
Content-Type: text/plain; charset="us-ascii"
From: Norbert Lindenberg <ietf@lindenbergsoftware.com>
In-Reply-To: <20130613181955.GH29284@mercury.ccil.org>
Date: Tue, 18 Jun 2013 00:43:56 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <C926DA54-728E-40B2-B26F-CCCCB3418F42@lindenbergsoftware.com>
References: <CAHBU6ivNjMUwN2Hsn-E8FKxjqXS6b4qz=_MeeaHahWBWqG_Hgg@mail.gmail.com> <ED62F638-C0C4-411D-BA5B-EB9BA71EDB75@lindenbergsoftware.com> <20130613003213.GA26989@mercury.ccil.org> <jr5jr85h6pig2cr9id5hf1eh586g0u09i7@hive.bjoern.hoehrmann.de> <20130613121620.GB11739@mercury.ccil.org> <CAHBU6ismp6HZqUQOgDnjBRYtC5jFCzhTB3RFG8Ms7qohz+w1eg@mail.gmail.com> <20130613181955.GH29284@mercury.ccil.org>
To: John Cowan <cowan@mercury.ccil.org>
X-Mailer: Apple Mail (2.1283)
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - mirach.lunarpages.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - lindenbergsoftware.com
X-Get-Message-Sender-Via: mirach.lunarpages.com: authenticated_id: ietf@lindenbergsoftware.com
Cc: Norbert Lindenberg <ietf@lindenbergsoftware.com>, Bjoern Hoehrmann <derhoermi@gmx.net>, Tim Bray <tbray@textuality.com>, "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] Proposal for strings/Unicode text
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Jun 2013 07:44:06 -0000

On Jun 13, 2013, at 11:19 , John Cowan wrote:

> Tim Bray scripsit:
> 
>> This will break lots of things, not just UTF-8 decoders (most of which,
>> I bet, will never actually notice).  -T
> 
> Modern ones that pay attention to spoofing most definitely will.

I did a bit of testing: In the current versions of the major desktop browsers (Firefox, Chrome, Safari, Explorer, Opera) the UTF-8 decoders used for external scripts and XMLHttpRequest replace CESU-8 encoded surrogate code points with replacement character sequences of varying lengths. In the current version of Node.js, the UTF-8 decoder used for external scripts will pass them through, with the effect that a sequence of CESU-8 encoded surrogate code points can turn into a valid supplementary character in UTF-16.

Norbert