[Json] Proposal for strings/Unicode text
Tim Bray <tbray@textuality.com> Wed, 12 June 2013 18:47 UTC
Return-Path: <tbray@textuality.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 74DE521E80BA for <json@ietfa.amsl.com>; Wed, 12 Jun 2013 11:47:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.692
X-Spam-Level:
X-Spam-Status: No, score=-2.692 tagged_above=-999 required=5 tests=[AWL=1.684, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, GB_I_LETTER=-2, HTML_MESSAGE=0.001, J_CHICKENPOX_82=0.6, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id krOgrXFy0T8L for <json@ietfa.amsl.com>; Wed, 12 Jun 2013 11:46:55 -0700 (PDT)
Received: from mail-vc0-f174.google.com (mail-vc0-f174.google.com [209.85.220.174]) by ietfa.amsl.com (Postfix) with ESMTP id 6BD2E11E80EF for <json@ietf.org>; Wed, 12 Jun 2013 11:46:55 -0700 (PDT)
Received: by mail-vc0-f174.google.com with SMTP id kw10so6382151vcb.19 for <json@ietf.org>; Wed, 12 Jun 2013 11:46:54 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-originating-ip:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=446EvbF5jeXtskI/pWNTG6jpxj+kgoufRuZrjMDkEDo=; b=PwlJ9XK6yNzGm0uaEgKNVBi8JuKlIyDGa/KfefQSpy23b3gNt91dWoUfcrG1sy0Nhd rD8z7+b2JmpGYrg2PSTD1086dCyJGfWVotlCJc3RgIpQZ98XaU0udQledg1ndptmqfBB ecGwP9WZQPXhTASQPzF68AgMdGJs5yqo7hRMgWKWGXHkrjS6j1y+wtIq8K+EDlcE5anK go59uA+9VfJ08Z0Y9YGf4e/JrnuTB+54Hep+AHhKHVHTDUCDPyJTJ9jfjxMvCw8HQLmO qIhZRATKlecWHMIjWhEp5uAGLClwNUgP2HeXZSyE5BugSs4lDcLYOCfqMb6jQmkZgUBa 4teA==
MIME-Version: 1.0
X-Received: by 10.52.30.14 with SMTP id o14mr8666288vdh.106.1371062814722; Wed, 12 Jun 2013 11:46:54 -0700 (PDT)
Received: by 10.220.25.199 with HTTP; Wed, 12 Jun 2013 11:46:54 -0700 (PDT)
X-Originating-IP: [96.49.81.176]
Date: Wed, 12 Jun 2013 11:46:54 -0700
Message-ID: <CAHBU6ivNjMUwN2Hsn-E8FKxjqXS6b4qz=_MeeaHahWBWqG_Hgg@mail.gmail.com>
From: Tim Bray <tbray@textuality.com>
To: "json@ietf.org" <json@ietf.org>
Content-Type: multipart/alternative; boundary="20cf307ca1e60962e704def970e9"
X-Gm-Message-State: ALoCoQnReAgvzM1fafQgMbLrlRadXc8QNwUb+mTRndVCWwMHZxxrRvlDp8z05HErJcLKFXgsixw3
Subject: [Json] Proposal for strings/Unicode text
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Jun 2013 18:47:01 -0000
Rationale: - emphasize the important fact that Strings are *intended* for Unicode characters - document the important fact that the rules allow horrible Unicode practices - say “backslash” instead of “reverse solidus” :) In section 1, introduction Before: A string is a sequence of zero or more Unicode characters [UNICODE]. After: A string is intended to contain sequences of zero or more Unicode characters [UNICODE 6.2] Rewrite section 2.5 as follows: Strings begin and end with quotation marks. They are intended,to contain sequences of Unicode characters; Note however that the ABNF in this section allows the inclusion of 16-bit quantities in ways which can never be useful for representing characters and is likely to cause breakage in software designed to process Unicode text. The ABNF allows the use of many Unicode code points that could be used in future to represent Unicode characters, but have not yet been assigned. Therefore, this specification should not need revision as the Unicode character repertoire continues to grow. 16-bit quantities (normally Unicode characters from the Basic Multingual Pane(U+0000 through U+FFFF) may be “escaped”, or represented as a six-character sequence: a backslash (U+005C REVERSE SOLIDUS), followed by the lowercase letter u, followed by four hexadecimal digits that encode the character's code point. The hexadecimal letters A though F can be upper or lower case. So, for example, a string containing only a single backslash may be represented as "\u005C". Alternatively, there are two-character sequence escape representations of some popular characters. So, for example, a string containing only a single backslash may be represented more compactly as "\\". To escape an extended character that is not in the Basic Multilingual Plane, the character is represented as a twelve-character sequence, encoding the UTF-16 surrogate pair. So, for example, a string containing only U+1D11E G CLEF may be represented as "\uD834\uDD1E". === insert ABNF here ====
- [Json] Proposal for strings/Unicode text Tim Bray
- Re: [Json] Proposal for strings/Unicode text Carsten Bormann
- Re: [Json] Proposal for strings/Unicode text Norbert Lindenberg
- Re: [Json] Proposal for strings/Unicode text Tim Bray
- Re: [Json] Proposal for strings/Unicode text Tim Bray
- Re: [Json] Proposal for strings/Unicode text Carsten Bormann
- Re: [Json] Proposal for strings/Unicode text R S
- Re: [Json] Proposal for strings/Unicode text John Cowan
- Re: [Json] Proposal for strings/Unicode text Bjoern Hoehrmann
- Re: [Json] Proposal for strings/Unicode text John Cowan
- Re: [Json] Proposal for strings/Unicode text Joe Hildebrand (jhildebr)
- Re: [Json] Proposal for strings/Unicode text Tim Bray
- Re: [Json] Proposal for strings/Unicode text Nico Williams
- Re: [Json] Proposal for strings/Unicode text John Cowan
- Re: [Json] Proposal for strings/Unicode text Manger, James H
- Re: [Json] Proposal for strings/Unicode text Paul Hoffman
- Re: [Json] Proposal for strings/Unicode text John Cowan
- Re: [Json] Proposal for strings/Unicode text Paul Hoffman
- Re: [Json] Proposal for strings/Unicode text John Cowan
- Re: [Json] Proposal for strings/Unicode text Paul Hoffman
- Re: [Json] Proposal for strings/Unicode text Joe Hildebrand (jhildebr)
- Re: [Json] Proposal for strings/Unicode text Paul Hoffman
- Re: [Json] Proposal for strings/Unicode text Joe Hildebrand (jhildebr)
- Re: [Json] Proposal for strings/Unicode text Carsten Bormann
- Re: [Json] Proposal for strings/Unicode text Paul Hoffman
- Re: [Json] Proposal for strings/Unicode text Norbert Lindenberg
- Re: [Json] Proposal for strings/Unicode text Norbert Lindenberg