Re: [Json] A possible summary of the discussion so far on code points and characters

R S <> Sun, 09 June 2013 00:24 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 9F67A21F9385 for <>; Sat, 8 Jun 2013 17:24:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.556
X-Spam-Status: No, score=-2.556 tagged_above=-999 required=5 tests=[AWL=0.043, BAYES_00=-2.599, HTML_MESSAGE=0.001, NO_RELAYS=-0.001]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id hIEcchb+5DHi for <>; Sat, 8 Jun 2013 17:23:59 -0700 (PDT)
Received: from ( [IPv6:2a00:1450:400c:c03::229]) by (Postfix) with ESMTP id 5476421F9007 for <>; Sat, 8 Jun 2013 17:23:59 -0700 (PDT)
Received: by with SMTP id n57so4015793wev.28 for <>; Sat, 08 Jun 2013 17:23:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Cojm1mCvw5Ly6PP9wGGhJbXIG6mCOK2yllvGNd+iRmU=; b=d2juJUglQdlgRabo0EF2+v/Ea1MJ2PgZ78C9eU/NBpCnUlxdvkEjSXNvwLCj2C+1KE t0yhkusuYOkD0Znpr52ADOTeVtbV8i1ygO7vXv7hUkVs9bbKSTEfwp69u2kMyH+aTUdz ERKZsA8ixCrucBhYDvRMUpAXgMQmdd/01cQ1ZkG3ucI2FcDrKMRdoRwq/ULjHLYaTGIv FfvFCh8bkFH9gNeiqmzxuYrBPWhR+c8WnhEtl+Djt+CqQeDvE39hAkj7qnTXVNU4WVMP 1ydBgtNZTX9ytOer61L+2enCuasO6roDQD5QmKBOSjCWHb75dJU+eXktZBtQ5HGPR+/a JsPw==
MIME-Version: 1.0
X-Received: by with SMTP id j5mr2321379wjs.79.1370737437264; Sat, 08 Jun 2013 17:23:57 -0700 (PDT)
Received: by with HTTP; Sat, 8 Jun 2013 17:23:57 -0700 (PDT)
In-Reply-To: <>
References: <> <> <>
Date: Sat, 8 Jun 2013 17:23:57 -0700
Message-ID: <>
From: R S <>
To: Stephen Dolan <>
Content-Type: multipart/alternative; boundary=047d7ba9751807527804deadaeb2
Cc: Paul Hoffman <>, "" <>
Subject: Re: [Json] A possible summary of the discussion so far on code points and characters
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sun, 09 Jun 2013 00:24:00 -0000

On Sat, Jun 8, 2013 at 2:11 PM, Stephen Dolan <>wrote;wrote:

> On Sat, Jun 8, 2013 at 9:52 PM, R S <> wrote:
> > A seventh point of view, which I happen to agree with: JSON strings are a
> > sequence of code units.
> >
> > This is similar to the definition of 'source text' in ECMAScript:
> >
> > "ECMAScript source text is assumed to be a sequence of 16-bit code units
> for
> > the purposes of this specification. Such a source text may include
> sequences
> > of 16-bit code units that are not valid UTF-16 character encodings."
> That's a very out-of-context quote. The linked document states:
> "ECMAScript source text is represented as a sequence of characters in
> the Unicode character encoding, version 3.0 or later."
> It then gives your quote, and states "If an actual source text is
> encoded in a form other than 16-bit code units it must be processed as
> if it was first convert [sic] to UTF-16". It seems like UTF-16 is a
> convenient way to frame the document, rather than a requirement of the
> specification.

It's a requirement. Here are some additional references:


The paragraph following the one I cited:

'Throughout the rest of this document, the phrase “code unit” and the word
“character” will be used to refer to a 16-bit unsigned value used to
represent a single 16-bit unit of text. The phrase “Unicode character” will
be used to refer to the abstract linguistic or typographical unit
represented by a single Unicode scalar value (which may be longer than 16
bits and thus may be represented by more than one code unit). The phrase
“code point” refers to such a Unicode scalar value. “Unicode character”
only refers to entities represented by single Unicode scalar values: the
components of a combining character sequence are still individual “Unicode
characters,” even though a user might think of the whole sequence as a
single character.' <>

- Rob