[Json] FW: Call for Consensus: Proposed Text for "8.1 Character Encoding"

"Manger, James" <James.H.Manger@team.telstra.com> Mon, 27 March 2017 00:30 UTC

Return-Path: <James.H.Manger@team.telstra.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5382C126C89 for <json@ietfa.amsl.com>; Sun, 26 Mar 2017 17:30:09 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=teamtelstra.onmicrosoft.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JaVYSuVHyEdg for <json@ietfa.amsl.com>; Sun, 26 Mar 2017 17:30:06 -0700 (PDT)
Received: from ipxbno.tcif.telstra.com.au (ipxbno.tcif.telstra.com.au [203.35.82.204]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5523F127241 for <json@ietf.org>; Sun, 26 Mar 2017 17:30:06 -0700 (PDT)
X-IronPort-AV: E=Sophos;i="5.36,229,1486386000"; d="scan'208";a="139346875"
Received: from unknown (HELO ipcbni.tcif.telstra.com.au) ([10.97.216.204]) by ipobni.tcif.telstra.com.au with ESMTP; 27 Mar 2017 11:30:03 +1100
X-IronPort-AV: E=McAfee;i="5800,7501,8479"; a="331145814"
Received: from wsmsg3705.srv.dir.telstra.com ([172.49.40.203]) by ipcbni.tcif.telstra.com.au with ESMTP; 27 Mar 2017 11:30:03 +1100
Received: from wsapp5585.srv.dir.telstra.com (10.75.3.67) by WSMSG3705.srv.dir.telstra.com (172.49.40.203) with Microsoft SMTP Server (TLS) id 8.3.485.1; Mon, 27 Mar 2017 11:30:04 +1100
Received: from wsapp5585.srv.dir.telstra.com (10.75.3.67) by wsapp5585.srv.dir.telstra.com (10.75.3.67) with Microsoft SMTP Server (TLS) id 15.0.1236.3; Mon, 27 Mar 2017 11:30:02 +1100
Received: from AUS01-ME1-obe.outbound.protection.outlook.com (10.172.101.125) by wsapp5585.srv.dir.telstra.com (10.75.3.67) with Microsoft SMTP Server (TLS) id 15.0.1236.3 via Frontend Transport; Mon, 27 Mar 2017 11:30:02 +1100
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=teamtelstra.onmicrosoft.com; s=selector1-team-telstra-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=48EHUTA/Nc9D7C54BZ6nW7wr+JvvnOO65e+sOnvnyGs=; b=LTEJloNarlKX8Dvh1t0NKHtbbiDPdulxQ3ENjdVq20Oy24LTT+j+tOQwbEHsHdjTL/paZ0Zv+yhlWzzISIcAVY8V/+38trwb8G1MsxvN5DZpjUTcVvH2EYJwY1E7+mHAC9BXofJech4Y8GZOkkMG5lla6XwWUd/kspqBDFMUHso=
Received: from SYXPR01MB1615.ausprd01.prod.outlook.com (10.175.209.15) by SYXPR01MB1615.ausprd01.prod.outlook.com (10.175.209.15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.991.14; Mon, 27 Mar 2017 00:30:02 +0000
Received: from SYXPR01MB1615.ausprd01.prod.outlook.com ([10.175.209.15]) by SYXPR01MB1615.ausprd01.prod.outlook.com ([10.175.209.15]) with mapi id 15.01.0991.020; Mon, 27 Mar 2017 00:30:02 +0000
From: "Manger, James" <James.H.Manger@team.telstra.com>
To: "json@ietf.org" <json@ietf.org>
Thread-Topic: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"
Thread-Index: AQHSnpSQJhydSNAg1U+/4BPQLujeJKGX7gyAgAMP+wCAAOo1gIAAEgEAgAH2AwCAAMdoAIAD0h8AgAB89UCABN4u8A==
Date: Mon, 27 Mar 2017 00:30:02 +0000
Message-ID: <SYXPR01MB16158E1A21189A2E51E2E1DDE5330@SYXPR01MB1615.ausprd01.prod.outlook.com>
References: <1fb5849e-8dbf-835d-65b7-2403686248f9@outer-planes.net> <0E32A94D-CE12-4F52-9ED6-8743C49751B4@vpnc.org> <4d2f0fb3-a729-0c17-2394-bc1e005dd612@gmx.de> <d09f9a59-2411-45a0-470c-ea95072fe4fd@outer-planes.net> <dad91b19-e774-e239-36d2-9d086cca8e0d@gmx.de> <ac432615-ee84-3cdf-6b37-480626bd18c1@gmx.de> <804f9930-26a5-a565-0607-452b386cfeb5@outer-planes.net> <D89BCFAA-B81F-4EEB-8B3A-180BAAB9D16C@att.com> <e69d7c21-85cb-45f4-c0c2-34c624e63049@outer-planes.net>
Accept-Language: en-AU, en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
authentication-results: ietf.org; dkim=none (message not signed) header.d=none;ietf.org; dmarc=none action=none header.from=team.telstra.com;
x-originating-ip: [203.41.142.244]
x-microsoft-exchange-diagnostics: 1; SYXPR01MB1615; 7:JSF+AYh6L+BPLhlhshiJClM2rZ1k3+VtVmcZ+RfyL27SALiuaE7/ygaOgT1fKqb9EhVxfwb67ON57pmI9CEH+rwwgwdbnWfL+koLghNmwW0YMsiTA4+1fjrgsnF8Kk/dKlEj8NWZPPsQTuR7lZm5zvGNNer1Zh1L/icpyVYSH+A5tIffSL5WdESk4t0mMNrNRL2Wb7XmnyfoXIeATrVPZKL273T+Qb8nYfVzH3yH4UawkIOlVHN5UC7De5e0VnPuC69n2NMu8ZGqAqpkgaqZpUZMP7fWLPNGO1XhtX1wPhcBlZInc5bbg2Oq/O3Fl4K7ufnX5KSGJpsZ5K76oVZsXg==; 20:0rNvwqYKReGQoNEpjh0zyA3nhSnVxZWU2Kuk7BYZo8UR7NqPh0qD9e+25gj/dNYfUNGfDT0DvJmFZPLytQ2NROqZgP53lYozoB/gLC2FCMvo8cWJlZUfsPi+IkS5jA3s9+XqxsCyqk2k7js/hy16WSF2coqeq+q0Z54Ls5tsM6D4dfY9v0fD363W7wiP0QK/p2mK8AB3zIUKT4v62ML+8tlIQygXkQIIfTUM3Mog5+jWfu9V5ajQhNeZ4kS2G+UyiW878aokvn98Fm5dCI8VqP8r3hd65IL+VT7DTrds3Q1922feM2HvNqc9fqKj7Q8Yo1fCSEDF2NGgxu+x9saJDGcrYPcwqm+IF/cDJ/3zVpPwBdP4wR5hvZqK0WRWou30wDWMU3yr75V9mbpivN8qhITYD87RdqUskTEclCwo3VWzxVgSfTaMxYYB4J3BtlwTP+6uf8cUqX77bcNpAha9DjAq/LLSm3uZCF+Cges7+YFZXZxWQoA5qMxO0EAlrdhE
x-ms-office365-filtering-correlation-id: 3c7781e4-d429-4cb3-2abd-08d474a86882
x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(2017030254075); SRVR:SYXPR01MB1615;
x-microsoft-antispam-prvs: <SYXPR01MB16154136761CB7ACCE91F5EAE5330@SYXPR01MB1615.ausprd01.prod.outlook.com>
x-exchange-antispam-report-test: UriScan:(100405760836317);
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040375)(2401047)(5005006)(8121501046)(10201501046)(3002001)(6041248)(20161123564025)(20161123555025)(20161123560025)(20161123562025)(20161123558025)(6072148); SRVR:SYXPR01MB1615; BCL:0; PCL:0; RULEID:; SRVR:SYXPR01MB1615;
x-forefront-prvs: 02596AB7DA
x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(6009001)(39410400002)(39850400002)(39840400002)(39450400003)(377454003)(12864004)(13464003)(305945005)(7736002)(74316002)(42882006)(122556002)(2351001)(55016002)(2473003)(9686003)(99286003)(53936002)(6436002)(8936002)(3660700001)(8676002)(81166006)(5640700003)(3846002)(189998001)(1730700003)(102836003)(6116002)(5660300001)(86362001)(2906002)(93886004)(53546009)(25786009)(3280700002)(38730400002)(110136004)(77096006)(6506006)(33656002)(7696004)(66066001)(76176999)(50986999)(6916009)(54356999); DIR:OUT; SFP:1102; SCL:1; SRVR:SYXPR01MB1615; H:SYXPR01MB1615.ausprd01.prod.outlook.com; FPR:; SPF:None; MLV:sfv; LANG:en;
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-originalarrivaltime: 27 Mar 2017 00:30:02.1540 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 49dfc6a3-5fb7-49f4-adea-c54e725bb854
X-MS-Exchange-Transport-CrossTenantHeadersStamped: SYXPR01MB1615
X-OriginatorOrg: team.telstra.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/1oXPWrecgRSClPze4lcapcRqsGw>
Subject: [Json] FW: Call for Consensus: Proposed Text for "8.1 Character Encoding"
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Mar 2017 00:30:09 -0000

> Recipients that wish to support Unicode encodings other than UTF-8
> can do this using a detection mechanism

Not quite. The detection only distinguishes UTF-8/16/32, not "encodings other than UTF-8".

> that is based on the fact
> that the first character will always have a Unicode code point
> greater than 0

Not quite. It is based on the fact that the (unescaped) character U+0000 cannot appear *anywhere* in valid JSON, and all valid first characters have code points less than 128.

> by inspecting the first octets for nulls.

Might as well be explicitly that inspecting up to the first 4 octets is sufficient. "null" is a special term in JSON so re-using it for a 0x00 octet isn't ideal.

> Text encoded in character encodings other than UTF-8,
> UTF-16, or UTF-32 cannot be used with the media type "application/json".

The sections starts with "MUST be encoded in UTF-8, UTF-16, or UTF-32" so this sentence adds nothing. It just slightly muddies the water. Drop it.



I'll go for option 3) MUST encode as UTF-8 where the media type is 'application/json'.
Suggested text:

"""
  8.1. Character Encoding

  When exchanged as bytes, JSON text MUST be encoded in UTF-8.

  Earlier editions also allowed UTF-16 and UTF-32 encodings. However, those
  choices do not produce interoperable JSON as they are not supported by
  many implementations. UTF-16 and UTF-32 encodings of JSON can be distinguished
  from a UTF-8 encoding by the presence of an octet with value 0 within the first 2 octets,
  since valid first characters all have code points less than 128 and unescaped U+0000 characters
  are not allowed anywhere.

  <paragraph on the byte order mark>
"""

--
James Manger


-----Original Message-----
From: json [mailto:json-bounces@ietf.org] On Behalf Of Matthew A. Miller
Sent: Friday, 24 March 2017 8:38 AM
Cc: json@ietf.org
Subject: Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"

Our sponsoring AD and I have been discussing this option.  Alexey is
willing to go through another IETF last call if we can get consensus.

To start, to what degree is UTF-8 encouraged?

0) don't encourage UTF-8 more than already is
1) SHOULD encode as UTF-8 for all usages
2) SHOULD encode as UTF-8 where the media type is 'application/json'
3) MUST encode as UTF-8 where the media type is 'application/json'
4) other -- please specify


-----Original Message-----
From: json [mailto:json-bounces@ietf.org] On Behalf Of Matthew A. Miller
Sent: Friday, 24 March 2017 1:41 AM
To: json@ietf.org
Subject: Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"

Hello JSONbis,

It looks like we have consensus for the following text for all of
Section 8.1:

"""
JSON text MUST be encoded in UTF-8, UTF-16, or UTF-32 Section 3 of
[UNICODE].  The default encoding is UTF-8, and JSON texts that are
encoded in UTF-8 are interoperable in the sense that they will be
read successfully by the maximum number of implementations; there are
many implementations that cannot successfully read texts encoded in
UTF-16 or UTF-32. Text encoded in character encodings other than UTF-8,
UTF-16, or UTF-32 cannot be used with the media type "application/json".

Implementations MUST NOT add a byte order mark (U+FEFF) to the
beginning of a JSON text.  In the interests of interoperability,
implementations that parse JSON texts MAY ignore the presence of a
byte order mark rather than treating it as an error.

Recipients that wish to support Unicode encodings other than UTF-8
can do this using a detection mechanism that is based on the fact
that the first character will always have a Unicode code point
greater than 0 and less than 128, thus the UTF-16/32 variants can
be detected by inspecting the first octets for nulls.
"""

Please speak now if you have any objections.

Thank you all,

--
- m&m

Matthew A. Miller
JSONbis Chair