Re: [Json] Proposal for strings/Unicode text
Tim Bray <tbray@textuality.com> Wed, 12 June 2013 22:54 UTC
Return-Path: <tbray@textuality.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C416611E8109 for <json@ietfa.amsl.com>; Wed, 12 Jun 2013 15:54:32 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.776
X-Spam-Level:
X-Spam-Status: No, score=-2.776 tagged_above=-999 required=5 tests=[AWL=1.600, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, GB_I_LETTER=-2, HTML_MESSAGE=0.001, J_CHICKENPOX_82=0.6, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 51GKsgWFunEx for <json@ietfa.amsl.com>; Wed, 12 Jun 2013 15:54:26 -0700 (PDT)
Received: from mail-vc0-f176.google.com (mail-vc0-f176.google.com [209.85.220.176]) by ietfa.amsl.com (Postfix) with ESMTP id 4A38211E80F8 for <json@ietf.org>; Wed, 12 Jun 2013 15:54:26 -0700 (PDT)
Received: by mail-vc0-f176.google.com with SMTP id ha12so3205022vcb.35 for <json@ietf.org>; Wed, 12 Jun 2013 15:54:25 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-originating-ip:in-reply-to:references:date :message-id:subject:from:to:cc:content-type:x-gm-message-state; bh=9Kz/ziuDHClFqiVdRfFQmW1BIro+JQ4qpZoh518n03U=; b=h7aKdPIpyqkoR/uF03+SMALXBi8XEnWg8qK3UEg+cUN4JMhiYiDvI8OYPV838h+yvr oQnO8179qmQtLzib4XkS+6mFT+LKj2okDYygbv6GEn/nfeK+wB5ARVmZ0Dush7Hprjcu r0yOyeJW4NgVcWZVHdnDh0BRggdnypFkxCv0ergMyJgXWlDEHHn0OzzDM5+75j+dPZXG fInpKjjwj1jjU8riz0i+xlJ8sRoT25i7S5mHw++S9+VUcpCesWaFj2YLIla4Xh+QnGgH XNG7EVIkV0XvQl8KZ8MUzioErAO8PU06yjguGjV5pxl0Wyp/JyEp+UOVK80lncOV5ygy GJmQ==
MIME-Version: 1.0
X-Received: by 10.52.112.5 with SMTP id im5mr8985553vdb.4.1371077665581; Wed, 12 Jun 2013 15:54:25 -0700 (PDT)
Received: by 10.220.25.199 with HTTP; Wed, 12 Jun 2013 15:54:25 -0700 (PDT)
X-Originating-IP: [96.49.81.176]
In-Reply-To: <ED62F638-C0C4-411D-BA5B-EB9BA71EDB75@lindenbergsoftware.com>
References: <CAHBU6ivNjMUwN2Hsn-E8FKxjqXS6b4qz=_MeeaHahWBWqG_Hgg@mail.gmail.com> <ED62F638-C0C4-411D-BA5B-EB9BA71EDB75@lindenbergsoftware.com>
Date: Wed, 12 Jun 2013 15:54:25 -0700
Message-ID: <CAHBU6ivTQL__=5puCxs_d+eQvBVvW4LvO1g_0q4V8bp0nq4JZA@mail.gmail.com>
From: Tim Bray <tbray@textuality.com>
To: Norbert Lindenberg <ietf@lindenbergsoftware.com>
Content-Type: multipart/alternative; boundary="bcaec54857e83787c604defce58d"
X-Gm-Message-State: ALoCoQmCSknCoqpsLakfoY0cs98ZK11zctzqPj6j6dMSNoaWRLdy5IEk8is9SeaCezlZEwIoiKBc
Cc: "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] Proposal for strings/Unicode text
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Jun 2013 22:54:32 -0000
On Wed, Jun 12, 2013 at 3:46 PM, Norbert Lindenberg < ietf@lindenbergsoftware.com> wrote: > The JSON RFC seems to use Unicode character names, in this case case > "reverse solidus". > We’re not allowed to change JSON but we are allowed to improve the spec. The term “solidus” is obscure and off-putting. > Strings begin and end with quotation marks. They are intended,to contain sequences of Unicode characters; Note however that the ABNF in this section allows the inclusion of 16-bit quantities in ways which can never be useful for representing characters and is likely to cause breakage in software designed to process Unicode text. This warning is too vague to be useful. Which specific risks do you think > need to be discussed here? Also, the ABNF doesn't do anything specifically > for 16-bit quantities, as far as I can see. > Agreed; I introduced a specific example. > > 16-bit quantities (normally Unicode characters from the Basic > Multingual Pane(U+0000 through U+FFFF) may be “escaped”, or represented as > a six-character sequence: a backslash (U+005C REVERSE SOLIDUS), followed > by the lowercase letter u, followed by four hexadecimal digits that encode > the character's code point. The hexadecimal letters A though F can be > upper or lower case. So, for example, a string containing only a single > backslash may be represented as "\u005C". > > These escape sequences aren't about 16-bit quantities - they represent > Unicode BMP code points. They also represent surrogates. Thus they combine fish and bicycles, and I think the only accurate way to describe what you can escape is “16-bit quantities” > If we discuss these escape sequences in prose, then the sequences for BMP > and supplementary characters need to be discussed together so that it's > clear what's a surrogate pair and how unpaired surrogates are handled. > Hm? I see no point in replicating the discussion of surrogates in Unicode, and there is nothing useful to say about the handling of unpaired surrogates. > > Alternatively, there are two-character sequence escape representations > of some popular characters. So, for example, a string containing only a > single backslash may be represented more compactly as "\\". > > I'd assume that this is not about popularity, but about the need to > represent control characters and characters that are also used within the > JSON syntax. I don't see two-character escapes for "e" or "的". > I tried to stay close to the language of the original. > > Norbert > >
- [Json] Proposal for strings/Unicode text Tim Bray
- Re: [Json] Proposal for strings/Unicode text Carsten Bormann
- Re: [Json] Proposal for strings/Unicode text Norbert Lindenberg
- Re: [Json] Proposal for strings/Unicode text Tim Bray
- Re: [Json] Proposal for strings/Unicode text Tim Bray
- Re: [Json] Proposal for strings/Unicode text Carsten Bormann
- Re: [Json] Proposal for strings/Unicode text R S
- Re: [Json] Proposal for strings/Unicode text John Cowan
- Re: [Json] Proposal for strings/Unicode text Bjoern Hoehrmann
- Re: [Json] Proposal for strings/Unicode text John Cowan
- Re: [Json] Proposal for strings/Unicode text Joe Hildebrand (jhildebr)
- Re: [Json] Proposal for strings/Unicode text Tim Bray
- Re: [Json] Proposal for strings/Unicode text Nico Williams
- Re: [Json] Proposal for strings/Unicode text John Cowan
- Re: [Json] Proposal for strings/Unicode text Manger, James H
- Re: [Json] Proposal for strings/Unicode text Paul Hoffman
- Re: [Json] Proposal for strings/Unicode text John Cowan
- Re: [Json] Proposal for strings/Unicode text Paul Hoffman
- Re: [Json] Proposal for strings/Unicode text John Cowan
- Re: [Json] Proposal for strings/Unicode text Paul Hoffman
- Re: [Json] Proposal for strings/Unicode text Joe Hildebrand (jhildebr)
- Re: [Json] Proposal for strings/Unicode text Paul Hoffman
- Re: [Json] Proposal for strings/Unicode text Joe Hildebrand (jhildebr)
- Re: [Json] Proposal for strings/Unicode text Carsten Bormann
- Re: [Json] Proposal for strings/Unicode text Paul Hoffman
- Re: [Json] Proposal for strings/Unicode text Norbert Lindenberg
- Re: [Json] Proposal for strings/Unicode text Norbert Lindenberg