Re: [Json] A possible summary of the discussion so far on code points and characters
R S <sayrer@gmail.com> Sun, 09 June 2013 00:24 UTC
Return-Path: <sayrer@gmail.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix)
with ESMTP id 9F67A21F9385 for <json@ietfa.amsl.com>;
Sat, 8 Jun 2013 17:24:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.556
X-Spam-Level:
X-Spam-Status: No, score=-2.556 tagged_above=-999 required=5 tests=[AWL=0.043,
BAYES_00=-2.599, HTML_MESSAGE=0.001, NO_RELAYS=-0.001]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com
[127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hIEcchb+5DHi for
<json@ietfa.amsl.com>; Sat, 8 Jun 2013 17:23:59 -0700 (PDT)
Received: from mail-we0-x229.google.com (mail-we0-x229.google.com
[IPv6:2a00:1450:400c:c03::229]) by ietfa.amsl.com (Postfix) with ESMTP id
5476421F9007 for <json@ietf.org>; Sat, 8 Jun 2013 17:23:59 -0700 (PDT)
Received: by mail-we0-f169.google.com with SMTP id n57so4015793wev.28 for
<json@ietf.org>; Sat, 08 Jun 2013 17:23:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
h=mime-version:in-reply-to:references:date:message-id:subject:from:to
:cc:content-type; bh=Cojm1mCvw5Ly6PP9wGGhJbXIG6mCOK2yllvGNd+iRmU=;
b=d2juJUglQdlgRabo0EF2+v/Ea1MJ2PgZ78C9eU/NBpCnUlxdvkEjSXNvwLCj2C+1KE
t0yhkusuYOkD0Znpr52ADOTeVtbV8i1ygO7vXv7hUkVs9bbKSTEfwp69u2kMyH+aTUdz
ERKZsA8ixCrucBhYDvRMUpAXgMQmdd/01cQ1ZkG3ucI2FcDrKMRdoRwq/ULjHLYaTGIv
FfvFCh8bkFH9gNeiqmzxuYrBPWhR+c8WnhEtl+Djt+CqQeDvE39hAkj7qnTXVNU4WVMP
1ydBgtNZTX9ytOer61L+2enCuasO6roDQD5QmKBOSjCWHb75dJU+eXktZBtQ5HGPR+/a JsPw==
MIME-Version: 1.0
X-Received: by 10.194.63.229 with SMTP id j5mr2321379wjs.79.1370737437264;
Sat, 08 Jun 2013 17:23:57 -0700 (PDT)
Received: by 10.194.83.35 with HTTP; Sat, 8 Jun 2013 17:23:57 -0700 (PDT)
In-Reply-To: <CA+mHimPdoN0vf8c3AzYrZ8HXgPbUJPkvViwU4iWrcZBBKJRmNg@mail.gmail.com>
References: <AF793CAF-B30B-44A7-B864-82CEF79EA34D@vpnc.org>
<CAChr6SwLDCUk0DC9pGTKqUu_V5vJHvs7Sgv4EneTJMryn1iKSA@mail.gmail.com>
<CA+mHimPdoN0vf8c3AzYrZ8HXgPbUJPkvViwU4iWrcZBBKJRmNg@mail.gmail.com>
Date: Sat, 8 Jun 2013 17:23:57 -0700
Message-ID: <CAChr6SyM0ERZ6bqEbG4ULDZx-MsKo8sx-9WB5sVLFyONm++kbQ@mail.gmail.com>
From: R S <sayrer@gmail.com>
To: Stephen Dolan <stephen.dolan@cl.cam.ac.uk>
Content-Type: multipart/alternative; boundary=047d7ba9751807527804deadaeb2
Cc: Paul Hoffman <paul.hoffman@vpnc.org>, "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] A possible summary of the discussion so far on code points
and characters
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>,
<mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>,
<mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 09 Jun 2013 00:24:00 -0000
On Sat, Jun 8, 2013 at 2:11 PM, Stephen Dolan <stephen.dolan@cl.cam.ac.uk>wrote;wrote: > On Sat, Jun 8, 2013 at 9:52 PM, R S <sayrer@gmail.com> wrote: > > A seventh point of view, which I happen to agree with: JSON strings are a > > sequence of code units. > > > > This is similar to the definition of 'source text' in ECMAScript: > > > > "ECMAScript source text is assumed to be a sequence of 16-bit code units > for > > the purposes of this specification. Such a source text may include > sequences > > of 16-bit code units that are not valid UTF-16 character encodings." > > That's a very out-of-context quote. The linked document states: > > "ECMAScript source text is represented as a sequence of characters in > the Unicode character encoding, version 3.0 or later." > > It then gives your quote, and states "If an actual source text is > encoded in a form other than 16-bit code units it must be processed as > if it was first convert [sic] to UTF-16". It seems like UTF-16 is a > convenient way to frame the document, rather than a requirement of the > specification. It's a requirement. Here are some additional references: < http://wiki.ecmascript.org/doku.php?id=strawman:support_full_unicode_in_strings&rev=1305822947 > <https://mail.mozilla.org/pipermail/es-discuss/2011-May/014337.html> The paragraph following the one I cited: 'Throughout the rest of this document, the phrase “code unit” and the word “character” will be used to refer to a 16-bit unsigned value used to represent a single 16-bit unit of text. The phrase “Unicode character” will be used to refer to the abstract linguistic or typographical unit represented by a single Unicode scalar value (which may be longer than 16 bits and thus may be represented by more than one code unit). The phrase “code point” refers to such a Unicode scalar value. “Unicode character” only refers to entities represented by single Unicode scalar values: the components of a combining character sequence are still individual “Unicode characters,” even though a user might think of the whole sequence as a single character.' <http://es5.github.io/x6.html> - Rob
- [Json] A possible summary of the discussion so fa… Paul Hoffman
- Re: [Json] A possible summary of the discussion s… R S
- Re: [Json] A possible summary of the discussion s… Paul Hoffman
- Re: [Json] A possible summary of the discussion s… Stephen Dolan
- Re: [Json] A possible summary of the discussion s… R S
- Re: [Json] A possible summary of the discussion s… Carsten Bormann
- Re: [Json] A possible summary of the discussion s… R S
- Re: [Json] A possible summary of the discussion s… Carsten Bormann
- Re: [Json] A possible summary of the discussion s… R S
- Re: [Json] A possible summary of the discussion s… Tim Bray
- Re: [Json] A possible summary of the discussion s… Stephen Dolan
- Re: [Json] A possible summary of the discussion s… Norbert Lindenberg