[Json] Unpaired surrogates in JSON strings

John Cowan <cowan@mercury.ccil.org> Wed, 05 June 2013 16:22 UTC

Return-Path: <cowan@ccil.org>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 63CCD21F9B5E for <json@ietfa.amsl.com>; Wed, 5 Jun 2013 09:22:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.599
X-Spam-Level:
X-Spam-Status: No, score=-3.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BU+Gz1NDFpYO for <json@ietfa.amsl.com>; Wed, 5 Jun 2013 09:22:48 -0700 (PDT)
Received: from earth.ccil.org (earth.ccil.org [192.190.237.11]) by ietfa.amsl.com (Postfix) with ESMTP id 06F0321F9BA6 for <json@ietf.org>; Wed, 5 Jun 2013 09:22:47 -0700 (PDT)
Received: from cowan by earth.ccil.org with local (Exim 4.72) (envelope-from <cowan@ccil.org>) id 1UkGU2-000548-B3 for json@ietf.org; Wed, 05 Jun 2013 12:22:46 -0400
Date: Wed, 05 Jun 2013 12:22:46 -0400
From: John Cowan <cowan@mercury.ccil.org>
To: json@ietf.org
Message-ID: <20130605162246.GG3680@mercury.ccil.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
User-Agent: Mutt/1.5.20 (2009-06-14)
Sender: John Cowan <cowan@ccil.org>
Subject: [Json] Unpaired surrogates in JSON strings
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 05 Jun 2013 16:22:52 -0000

RFC 4627 Section 1 says:

    A string is a sequence of zero or more Unicode characters.

However, the grammar of strings permits things like "foo\uDC00bar",
which contains an escape sequence that does not correspond to any
Unicode character.  This provides backward compatibility with JavaScript,
where a string is not a sequence of characters but a sequence of UTF-16
code units.

If Section 1 is normative, then there is a contradiction with Section 4,
which says:

    A JSON parser MUST accept all texts that conform to the JSON
    grammar.

In my view, JSONbis processors should be REQUIRED to produce only strings
that conform to Section 1.

-- 
And it was said that ever after, if any                 John Cowan
man looked in that Stone, unless he had a               cowan@ccil.org
great strength of will to turn it to other              http://ccil.org/~cowan
purpose, he saw only two aged hands withering
in flame.   --"The Pyre of Denethor"