Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"

"HANSEN, TONY L" <tony@att.com> Tue, 21 March 2017 04:20 UTC

Return-Path: <tony@att.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F3205129465; Mon, 20 Mar 2017 21:20:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.396
X-Spam-Level:
X-Spam-Status: No, score=-5.396 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-2.796, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7dgo7er-7nhG; Mon, 20 Mar 2017 21:20:33 -0700 (PDT)
Received: from mx0a-00191d01.pphosted.com (mx0b-00191d01.pphosted.com [67.231.157.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 68BF3129462; Mon, 20 Mar 2017 21:20:32 -0700 (PDT)
Received: from pps.filterd (m0083689.ppops.net [127.0.0.1]) by m0083689.ppops.net-00191d01. (8.16.0.17/8.16.0.17) with SMTP id v2L4FGjk000874; Tue, 21 Mar 2017 00:20:29 -0400
Received: from alpi155.enaf.aldc.att.com (sbcsmtp7.sbc.com [144.160.229.24]) by m0083689.ppops.net-00191d01. with ESMTP id 29apjyf6u8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 21 Mar 2017 00:20:29 -0400
Received: from enaf.aldc.att.com (localhost [127.0.0.1]) by alpi155.enaf.aldc.att.com (8.14.5/8.14.5) with ESMTP id v2L4KS4M006753; Tue, 21 Mar 2017 00:20:28 -0400
Received: from mlpi407.sfdc.sbc.com (mlpi407.sfdc.sbc.com [130.9.128.239]) by alpi155.enaf.aldc.att.com (8.14.5/8.14.5) with ESMTP id v2L4KJet006642 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 21 Mar 2017 00:20:23 -0400
Received: from MISOUT7MSGHUBAB.ITServices.sbc.com (MISOUT7MSGHUBAB.itservices.sbc.com [130.9.129.146]) by mlpi407.sfdc.sbc.com (RSA Interceptor); Tue, 21 Mar 2017 04:20:03 GMT
Received: from MISOUT7MSGUSRCG.ITServices.sbc.com ([169.254.7.103]) by MISOUT7MSGHUBAB.ITServices.sbc.com ([130.9.129.146]) with mapi id 14.03.0319.002; Tue, 21 Mar 2017 00:20:02 -0400
From: "HANSEN, TONY L" <tony@att.com>
To: "json@ietf.org" <json@ietf.org>
CC: The IESG <iesg@ietf.org>
Thread-Topic: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"
Thread-Index: AQHSoB2hkOiDU0h/uUO6PkcNIhnyU6GcKDiAgAASAQCAAfYDAIAAhFmA
Date: Tue, 21 Mar 2017 04:20:02 +0000
Message-ID: <D89BCFAA-B81F-4EEB-8B3A-180BAAB9D16C@att.com>
References: <1fb5849e-8dbf-835d-65b7-2403686248f9@outer-planes.net> <0E32A94D-CE12-4F52-9ED6-8743C49751B4@vpnc.org> <4d2f0fb3-a729-0c17-2394-bc1e005dd612@gmx.de> <d09f9a59-2411-45a0-470c-ea95072fe4fd@outer-planes.net> <dad91b19-e774-e239-36d2-9d086cca8e0d@gmx.de> <ac432615-ee84-3cdf-6b37-480626bd18c1@gmx.de> <804f9930-26a5-a565-0607-452b386cfeb5@outer-planes.net>
In-Reply-To: <804f9930-26a5-a565-0607-452b386cfeb5@outer-planes.net>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [135.110.240.165]
Content-Type: text/plain; charset="utf-8"
Content-ID: <02E62CB531054142868A665A9E1BA287@LOCAL>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-RSA-Inspected: yes
X-RSA-Classifications: public
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-03-21_03:, , signatures=0
X-Proofpoint-Spam-Details: rule=outbound_policy_notspam policy=outbound_policy score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1702020001 definitions=main-1703210037
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/Dmoqnjg7cRJJGm2ZoUKUgkFdmXo>
Subject: Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Mar 2017 04:20:35 -0000

I like most of what Matt has below. However, I would prefer that the statement about inspecting the first octets for nulls were more explicit. I think it was Julian who posted a nice little chart the other day on how to determine which utf-16/32 variants was present based on the null pattern. If you don’t want it in the main text, then at least put it into an appendix. Otherwise, the code will be re-created many times, and probably often incorrectly.

	Tony Hansen

On 3/20/17, 12:26 PM, "json on behalf of Matthew A. Miller" <json-bounces@ietf.org on behalf of linuxwolf+ietf@outer-planes.net> wrote:

    Thank you for the suggested changes, Julian.  To consolidate the
    changes, I believe the following is your suggested text for all of
    Section 8.1:
    
    """
    JSON text MUST be encoded in UTF-8, UTF-16, or UTF-32 Section 3 of
    [UNICODE].  The default encoding is UTF-8, and JSON texts that are
    encoded in UTF-8 are interoperable in the sense that they will be
    read successfully by the maximum number of implementations; there are
    many implementations that cannot successfully read texts in other
    encodings (such as UTF-16 and UTF-32).  Text encoded in character
    encodings other than UTF-8, UTF-16, or UTF-32 cannot be used with
    the media tye "application/json".
    
    Implementations MUST NOT add a byte order mark (U+FEFF) to the
    beginning of a JSON text.  In the interests of interoperability,
    implementations that parse JSON texts MAY ignore the presence of a
    byte order mark rather than treating it as an error.
    
    Recipients that wish to support Unicode encodings other than UTF-8
    can do this using a detection mechanism that is based on the fact
    that the first character will always have a Unicode code point less
    or equal than 127, thus the UTF-16/32 variants can be detected by
    inspecting the first octets for nulls.
    """
    
    Does the working group object to this change?
    
    
    - m&m
    
    Matthew A. Miller
    JSONbis Chair
    
    On 17/03/19 04:29, Julian Reschke wrote:
    > ...and here is a concrete proposal:
    > 
    > Original text:
    > 
    >> 8.1.  Character Encoding
    >>
    >>    JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32 [UNICODE]
    >>    (Section 3).  The default encoding is UTF-8, and JSON texts that are
    > 
    > Change 1:
    > 
    > Say "MUST" instead of "SHALL", as it's the more common form of
    > expressing this requirement.
    > 
    > Change 2:
    > 
    > Replace "[UNICODE] (Section 3)" by "Section 3 of [UNICODE]".
    > 
    > That said, this citation isn't as stable as it should, as [UNICODE]
    > refers to <http://www.unicode.org/versions/latest/> and unless I'm
    > missing something, there's no guarantee that future versions will have
    > the relevant bits in Section 3.
    > 
    >>    encoded in UTF-8 are interoperable in the sense that they will be
    >>    read successfully by the maximum number of implementations; there are
    >>    many implementations that cannot successfully read texts in other
    >>    encodings (such as UTF-16 and UTF-32).
    > 
    > Change 3:
    > 
    > Add "Text encoded in character encodings other than UTF-8, UTF-16, or
    > UTF-32 can not be used with the media type "application/json".
    > 
    > (this explains the implications of the SHALL/MUST)
    > 
    > 
    >>    Implementations MUST NOT add a byte order mark (U+FEFF) to the
    >>    beginning of a JSON text.  In the interests of interoperability,
    >>    implementations that parse JSON texts MAY ignore the presence of a
    >>    byte order mark rather than treating it as an error.
    > 
    > 
    > Finally, change 4:
    > 
    > Add a new paragraph:
    > 
    > "Recipients that wish to support Unicode encodings other than UTF-8 can
    > do this using a detection mechanism that is based on the fact that the
    > first character will always have a Unicode code point less or equal than
    > 127, thus the UTF-16/32 variants can be detected by inspecting the first
    > octets for nulls."
    > 
    > 
    > I believe none of these changes affects anything normative, but that
    > they absolutely clarify the spec. In particular, having them in the spec
    > would have avoided this whole discussion we just had.
    > 
    > Best regards, Julian