Re: [Json] I-JSON Topic #3: Unicode

Tim Bray <tbray@textuality.com> Tue, 29 April 2014 17:22 UTC

Return-Path: <tbray@textuality.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CF0161A0935 for <json@ietfa.amsl.com>; Tue, 29 Apr 2014 10:22:07 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.977
X-Spam-Level:
X-Spam-Status: No, score=-3.977 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, FM_FORGED_GMAIL=0.622, GB_I_LETTER=-2, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Til2RLAeWVem for <json@ietfa.amsl.com>; Tue, 29 Apr 2014 10:22:06 -0700 (PDT)
Received: from mail-ve0-f177.google.com (mail-ve0-f177.google.com [209.85.128.177]) by ietfa.amsl.com (Postfix) with ESMTP id 1C32D1A0920 for <json@ietf.org>; Tue, 29 Apr 2014 10:22:06 -0700 (PDT)
Received: by mail-ve0-f177.google.com with SMTP id sa20so667594veb.8 for <json@ietf.org>; Tue, 29 Apr 2014 10:22:04 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=QP6UtjrpcFqnh+F4TaXOTC/5UUnCVR+Ef8hM9gQmEMs=; b=aW6b2ALhPMxE8hJViDbk6RF2Jv+t47AiJwRPptAg5/+aCylPwT+K+/HvsIXeh/8EJA f1doJy90d5V5cY2mzcemaWVhKXp116pUcf4qHRMvRC+HIuuXmB4EWRbjZbPcbbVXTA8F WkDdkzTuZvJPNLVzSJlKc204bYA578PDcQM1JrL6Q5i5oeIzJTg85LhO2N5u4X79HzDW Nxe0V+yJZhhZr5L0gZYnDpfra/EFMWHL7bjzfwEF50+RZSg2KBZLaZRIXl6lAwDgaEhn 960D7j/FIlMdYj8psBJMuGR1mXRCRGhCuGGiFJxk8BHbvEX69KYCeZK9CdrrV4V2kekK f2rQ==
X-Gm-Message-State: ALoCoQkwdN7vpqFbDgFSi6/uGVoevXanOxZ1TUXB/ojepADRnEC8m5mSeG/DSqAlT1zsy4r9a54L
X-Received: by 10.58.198.75 with SMTP id ja11mr235824vec.59.1398792124724; Tue, 29 Apr 2014 10:22:04 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.220.98.73 with HTTP; Tue, 29 Apr 2014 10:21:43 -0700 (PDT)
X-Originating-IP: [96.49.81.176]
In-Reply-To: <20140429171613.GZ11962@mercury.ccil.org>
References: <20140429171613.GZ11962@mercury.ccil.org>
From: Tim Bray <tbray@textuality.com>
Date: Tue, 29 Apr 2014 10:21:43 -0700
Message-ID: <CAHBU6iuLh1UETBJyo8m_qTPvapgv3P9YMFt=Tm0Z=-i17iLB2Q@mail.gmail.com>
To: John Cowan <cowan@mercury.ccil.org>
Content-Type: multipart/alternative; boundary=047d7b6d9944b593e904f831abaa
Archived-At: http://mailarchive.ietf.org/arch/msg/json/D6APxZevEdBo83UGhY4X_W-WJc0
Cc: "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] I-JSON Topic #3: Unicode
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 29 Apr 2014 17:22:07 -0000

Actually, the statement currently is

<<<<
thus, "\uDEAD" is always illegal.
>>>>

And that is in fact correct if you look close at the quotation marks; 23
pedantry points to ME!  Having said that, it would be improved by saying
"\uDEAD" is always illegal because it is an unpaired surrogate, while
"\uD800\uDEAD" is legal.


On Tue, Apr 29, 2014 at 10:16 AM, John Cowan <cowan@mercury.ccil.org> wrote:

> > draft-i-json-01 excludes the use of, and I quote, “Surrogates or
> > Noncharacters”.   Is that the right use of Unicode nomenclature?  This
> > really matters and I think it’s OK now, but first-class Unicode lawyering
> > is required here.
>
> It should say "code points which represent unpaired surrogates or
> noncharacters."  The statement "\uDEAD is always illegal" is incorrect;
> it is entirely legal in the sequence "\uD800\uDEAD", which represents
> U+102AD CARIAN LETTER T.  Rather it should say that "\uDEAD" is illegal
> unless preceded by an escape between "\uD800" and "\uDBFF" inclusive.
>
> --
> John Cowan          http://www.ccil.org/~cowan        cowan@ccil.org
> I come from under the hill, and under the hills and over the hills my paths
> led. And through the air. I am he that walks unseen.  I am the clue-finder,
> the web-cutter, the stinging fly. I was chosen for the lucky number.
>  --Bilbo
>
> _______________________________________________
> json mailing list
> json@ietf.org
> https://www.ietf.org/mailman/listinfo/json
>