Re: [Json] I-JSON Topic #3: Unicode

John Cowan <cowan@mercury.ccil.org> Tue, 29 April 2014 17:16 UTC

Return-Path: <cowan@ccil.org>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 06D791A04AF for <json@ietfa.amsl.com>; Tue, 29 Apr 2014 10:16:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.765
X-Spam-Level:
X-Spam-Status: No, score=-3.765 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, FAKE_REPLY_C=1.486, GB_I_LETTER=-2, RCVD_IN_DNSWL_LOW=-0.7, RP_MATCHES_RCVD=-0.651] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tsRUHOo2WFBz for <json@ietfa.amsl.com>; Tue, 29 Apr 2014 10:16:15 -0700 (PDT)
Received: from earth.ccil.org (earth.ccil.org [192.190.237.11]) by ietfa.amsl.com (Postfix) with ESMTP id CD44C1A08DB for <json@ietf.org>; Tue, 29 Apr 2014 10:16:14 -0700 (PDT)
Received: from cowan by earth.ccil.org with local (Exim 4.72) (envelope-from <cowan@ccil.org>) id 1WfBdd-0001sz-LZ for json@ietf.org; Tue, 29 Apr 2014 13:16:13 -0400
Date: Tue, 29 Apr 2014 13:16:13 -0400
From: John Cowan <cowan@mercury.ccil.org>
To: json@ietf.org
Message-ID: <20140429171613.GZ11962@mercury.ccil.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
User-Agent: Mutt/1.5.20 (2009-06-14)
Sender: John Cowan <cowan@ccil.org>
Archived-At: http://mailarchive.ietf.org/arch/msg/json/qAXBAIba5CbgXFaPbxyOyyVv_bI
Subject: Re: [Json] I-JSON Topic #3: Unicode
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 29 Apr 2014 17:16:17 -0000

> draft-i-json-01 excludes the use of, and I quote, “Surrogates or
> Noncharacters”.   Is that the right use of Unicode nomenclature?  This
> really matters and I think it’s OK now, but first-class Unicode lawyering
> is required here.

It should say "code points which represent unpaired surrogates or
noncharacters."  The statement "\uDEAD is always illegal" is incorrect;
it is entirely legal in the sequence "\uD800\uDEAD", which represents
U+102AD CARIAN LETTER T.  Rather it should say that "\uDEAD" is illegal
unless preceded by an escape between "\uD800" and "\uDBFF" inclusive.

-- 
John Cowan          http://www.ccil.org/~cowan        cowan@ccil.org
I come from under the hill, and under the hills and over the hills my paths
led. And through the air. I am he that walks unseen.  I am the clue-finder,
the web-cutter, the stinging fly. I was chosen for the lucky number.  --Bilbo