Re: [Json] A possible summary of the discussion so far on code points and characters

Stephen Dolan <stephen.dolan@cl.cam.ac.uk> Sat, 08 June 2013 21:11 UTC

Return-Path: <stedolan@stedolan.net>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C317721F944F for <json@ietfa.amsl.com>; Sat, 8 Jun 2013 14:11:33 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.331
X-Spam-Level:
X-Spam-Status: No, score=0.331 tagged_above=-999 required=5 tests=[AWL=0.757, BAYES_00=-2.599, FH_RELAY_NODNS=1.451, FM_FORGED_GMAIL=0.622, RDNS_NONE=0.1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8+Z+YZmmGAo0 for <json@ietfa.amsl.com>; Sat, 8 Jun 2013 14:11:28 -0700 (PDT)
Received: from mail-la0-x22c.google.com (mail-la0-x22c.google.com [IPv6:2a00:1450:4010:c03::22c]) by ietfa.amsl.com (Postfix) with ESMTP id 63D0F21F93DA for <json@ietf.org>; Sat, 8 Jun 2013 14:11:28 -0700 (PDT)
Received: by mail-la0-f44.google.com with SMTP id er20so4726580lab.17 for <json@ietf.org>; Sat, 08 Jun 2013 14:11:27 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:sender:x-originating-ip:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :x-gm-message-state; bh=B4F758WYz8uYryhj4BukisHPp3crH3z0dtcoGz7YZe0=; b=mPD/HdUAuvEhixQAPBRLCSR5kKjaNJxs0PTSZFFx6b/TbzikopgV6dHt/97iDiUTKV YgSnGKJJdkceXjVVctMRaHgshCeGNGle21dywchKTAoM5AcEP1IP7gbzuDJVCQvn4nXg xGVRjrPieQGaxy2LuUx2uxCL/UL4sYyCiw4DLZVcP9iYi5BdJ9Bz/H9hnvxFCrDoSb6u pox+RZeXUNMAmaE5GQMz9PXT3tOBMJftQ9nGObw8UWxYoQNf0aK3keYFNwJ/lPq3lk7l m6v5pA+csl8iqgAUn1fzZbI1uVVU3G7Cr9OVjj4U2g+1MEUmgVCdsgy/fFkjD0jzA+i2 +agQ==
MIME-Version: 1.0
X-Received: by 10.112.159.169 with SMTP id xd9mr3171919lbb.43.1370725886931; Sat, 08 Jun 2013 14:11:26 -0700 (PDT)
Sender: stedolan@stedolan.net
Received: by 10.114.176.231 with HTTP; Sat, 8 Jun 2013 14:11:26 -0700 (PDT)
X-Originating-IP: [131.111.184.8]
In-Reply-To: <CAChr6SwLDCUk0DC9pGTKqUu_V5vJHvs7Sgv4EneTJMryn1iKSA@mail.gmail.com>
References: <AF793CAF-B30B-44A7-B864-82CEF79EA34D@vpnc.org> <CAChr6SwLDCUk0DC9pGTKqUu_V5vJHvs7Sgv4EneTJMryn1iKSA@mail.gmail.com>
Date: Sat, 8 Jun 2013 22:11:26 +0100
X-Google-Sender-Auth: QejsPXhyH9L7K5y51roIgl3zr5w
Message-ID: <CA+mHimPdoN0vf8c3AzYrZ8HXgPbUJPkvViwU4iWrcZBBKJRmNg@mail.gmail.com>
From: Stephen Dolan <stephen.dolan@cl.cam.ac.uk>
To: R S <sayrer@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
X-Gm-Message-State: ALoCoQmH45346bv8Fg6WAxsa7oNDdSDvaJ2wJdq+pvg74lza58j7aLwh7dGZDSNffvsXeqSbxRKr
Cc: Paul Hoffman <paul.hoffman@vpnc.org>, "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] A possible summary of the discussion so far on code points and characters
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 08 Jun 2013 21:11:33 -0000

On Sat, Jun 8, 2013 at 9:52 PM, R S <sayrer@gmail.com> wrote:
> A seventh point of view, which I happen to agree with: JSON strings are a
> sequence of code units.
>
> This is similar to the definition of 'source text' in ECMAScript:
>
> "ECMAScript source text is assumed to be a sequence of 16-bit code units for
> the purposes of this specification. Such a source text may include sequences
> of 16-bit code units that are not valid UTF-16 character encodings."

That's a very out-of-context quote. The linked document states:

"ECMAScript source text is represented as a sequence of characters in
the Unicode character encoding, version 3.0 or later."

It then gives your quote, and states "If an actual source text is
encoded in a form other than 16-bit code units it must be processed as
if it was first convert [sic] to UTF-16". It seems like UTF-16 is a
convenient way to frame the document, rather than a requirement of the
specification.

Stephen