Re: [Json] Unpaired surrogates in JSON strings

Tim Bray <tbray@textuality.com> Thu, 06 June 2013 01:05 UTC

Return-Path: <tbray@textuality.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D934921F94DC for <json@ietfa.amsl.com>; Wed, 5 Jun 2013 18:05:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.223
X-Spam-Level:
X-Spam-Status: No, score=0.223 tagged_above=-999 required=5 tests=[FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, J_CHICKENPOX_14=0.6, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WMjHV+5jyuuR for <json@ietfa.amsl.com>; Wed, 5 Jun 2013 18:05:26 -0700 (PDT)
Received: from mail-vc0-f175.google.com (mail-vc0-f175.google.com [209.85.220.175]) by ietfa.amsl.com (Postfix) with ESMTP id 06AD521F949F for <json@ietf.org>; Wed, 5 Jun 2013 18:05:25 -0700 (PDT)
Received: by mail-vc0-f175.google.com with SMTP id hr11so1579099vcb.6 for <json@ietf.org>; Wed, 05 Jun 2013 18:05:25 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-originating-ip:in-reply-to:references:date :message-id:subject:from:to:cc:content-type:x-gm-message-state; bh=Sdd/TQr0C+5qnE0Qbm4UJzuI2n/r/kQXKzn1sAi18yM=; b=KXnsh58QcIwuxs3vg2l/gaHnlfkaZw7AcOlhyXyqvcFmjisOViHwLuMHrngza9heYC qNjBUrsdj2Dlm6bMEvUGmhMEPTHAPwg2PpeUo1xc47iTR7x49XNBVvOx1QARWBcA2cj6 cjAHhA9ZbsVylxc1lL+U7E3FUF/yLM0CJd8MLapyxH8q2urHnrGyR5Nw+wuBBR7O+0OE gU2HsKsDZfx/LjL2XXHQte+PuJ54PlhLLT5Zb5QvlA1TKNfUDYtkGAf93bjLHqYnzYYe bmfhgFFmVY+52iFUREumVoEOPTpU+R+aDUepO/ON8bkfxWNnGzIVKIpMBRoWki0Magbp spEg==
MIME-Version: 1.0
X-Received: by 10.52.237.228 with SMTP id vf4mr1002259vdc.79.1370480725316; Wed, 05 Jun 2013 18:05:25 -0700 (PDT)
Received: by 10.220.48.14 with HTTP; Wed, 5 Jun 2013 18:05:25 -0700 (PDT)
X-Originating-IP: [24.84.235.32]
In-Reply-To: <83728898-9A2D-4758-9C06-1157E2954CCB@vpnc.org>
References: <A723FC6ECC552A4D8C8249D9E07425A70FC2C12D@xmb-rcd-x10.cisco.com> <83728898-9A2D-4758-9C06-1157E2954CCB@vpnc.org>
Date: Wed, 05 Jun 2013 18:05:25 -0700
Message-ID: <CAHBU6isNhsqUJ7x-0ttAPacA2f+sS99tsGONpMspcrWyqtSLUg@mail.gmail.com>
From: Tim Bray <tbray@textuality.com>
To: Paul Hoffman <paul.hoffman@vpnc.org>
Content-Type: multipart/alternative; boundary="089e0122f6aacddd0904de71e896"
X-Gm-Message-State: ALoCoQl9Tah3b4A6swoJ3DihTqATeYLzalXO8kuOAgRbHn9c72oyQR7fOMsY1V5jPvUphotDZekN
Cc: "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] Unpaired surrogates in JSON strings
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 06 Jun 2013 01:05:31 -0000

On Wed, Jun 5, 2013 at 5:58 PM, Paul Hoffman <paul.hoffman@vpnc.org> wrote:

> > Escape sequences between \uD800 and \uDFFF SHOULD be generated only as
> > valid UTF16 surrogate pairs
>

Turn it into MUST. The intent of 4627 is perfectly clear, even if the BNF
is buggy.  I don’t think this rewrite should allow things that the previous
spec didn’t. -T



> (this SHOULD is only to allow backward
> > compatibility).  When encountering an invalid surrogate pair (such as
> > "foo\uD834bar" or "\uDD1E\uD834"), parsers MAY either throw an error
> > (taking the risk of some backward incompatibility with old generators) or
> > MAY ignore the sequence.
>
> Alternate proposal:
>
> Code points between U+D800 and U+DFFF SHOULD be generated only as
> valid UTF16 surrogate pairs; this SHOULD is only to allow backward
> compatibility with applications that ignored the restriction that
> strings consist of Unicode characters. A parser that encounters
> an invalid surrogate pair (such as "foo\uD834bar" or "\uDD1E\uD834"),
> SHOULD throw an error because the string does not consist of characters;
> it might ignore the errant code points, but at the risk of allowing
> strings that other parsers would find illegal.
>
> --Paul Hoffman
> _______________________________________________
> json mailing list
> json@ietf.org
> https://www.ietf.org/mailman/listinfo/json
>