Re: [Json] Proposed minimal change for strings

John Cowan <cowan@mercury.ccil.org> Wed, 03 July 2013 01:26 UTC

Return-Path: <cowan@ccil.org>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8F0F411E8128 for <json@ietfa.amsl.com>; Tue, 2 Jul 2013 18:26:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.599
X-Spam-Level:
X-Spam-Status: No, score=-3.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vPooTWKss7xL for <json@ietfa.amsl.com>; Tue, 2 Jul 2013 18:26:15 -0700 (PDT)
Received: from earth.ccil.org (earth.ccil.org [192.190.237.11]) by ietfa.amsl.com (Postfix) with ESMTP id 41F9411E8111 for <json@ietf.org>; Tue, 2 Jul 2013 18:26:15 -0700 (PDT)
Received: from cowan by earth.ccil.org with local (Exim 4.72) (envelope-from <cowan@ccil.org>) id 1UuBpm-0002zL-Dq; Tue, 02 Jul 2013 21:26:14 -0400
Date: Tue, 02 Jul 2013 21:26:14 -0400
From: John Cowan <cowan@mercury.ccil.org>
To: Paul Hoffman <paul.hoffman@vpnc.org>
Message-ID: <20130703012614.GL31347@mercury.ccil.org>
References: <9BACB3F2-F9BF-40C7-B4BA-C0C2F33E4278@vpnc.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <9BACB3F2-F9BF-40C7-B4BA-C0C2F33E4278@vpnc.org>
User-Agent: Mutt/1.5.20 (2009-06-14)
Sender: John Cowan <cowan@ccil.org>
Cc: "json@ietf.org WG" <json@ietf.org>
Subject: Re: [Json] Proposed minimal change for strings
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 03 Jul 2013 01:26:19 -0000

Paul Hoffman scripsit:

> Proposal 1 (allow all code units in their unescaped form):
>
>     A string is a sequence of zero or more Unicode code units [UNICODE].

I object to this, because "code unit" is not well defined without
mentioning its size.  I suppose it is 16-bit code units that are meant,
but if so, that must be said.  "If Parliament does not mean what it says,
it must say so."

> In section 2.2 (Strings):
>   Leave the production for "unescaped" as-is.
> In section 3 (Encoding):
>   Add "Some strings, notably those that have unescaped surrogate code units
>   (value 0xD800 to 0xDFFF), cannot be encoded in UTF-8."

While this statement is true, it is not the whole truth, which is that
unescaped surrogate code units cannot be encoded in *any* encoding.
Indeed, it is useless to permit unescaped surrogate code units at all.
Therefore, I reject this proposal in toto.

> Proposal 2 (prohibit unescaped surrogates):
> 
> In section 1 (Introduction):
>   A string is a sequence of zero or more Unicode scalar values [UNICODE].
> In section 2.2 (Strings):
>   Change the production for "unescaped" to be:
>     unescaped = %x20-21 / %x23-5B / %x5D-D7FF / %xE000-10FFFF

I approve of this proposal.

-- 
John Cowan  cowan@ccil.org   http://ccil.org/~cowan
"The exception proves the rule."  Dimbulbs think: "Your counterexample proves
my theory."  Latin students think "'Probat' means 'tests': the exception puts
the rule to the proof."  But legal historians know it means "Evidence for an
exception is evidence of the existence of a rule in cases not excepted from."