Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"

"Matthew A. Miller" <linuxwolf+ietf@outer-planes.net> Mon, 20 March 2017 16:26 UTC

Return-Path: <linuxwolf+ietf@outer-planes.net>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A14011294D4 for <json@ietfa.amsl.com>; Mon, 20 Mar 2017 09:26:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.235
X-Spam-Level:
X-Spam-Status: No, score=-1.235 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_SOFTFAIL=0.665] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=outer-planes-net.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id acvMwBi0Z4zU for <json@ietfa.amsl.com>; Mon, 20 Mar 2017 09:26:23 -0700 (PDT)
Received: from mail-ot0-x241.google.com (mail-ot0-x241.google.com [IPv6:2607:f8b0:4003:c0f::241]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D76881286AB for <json@ietf.org>; Mon, 20 Mar 2017 09:26:22 -0700 (PDT)
Received: by mail-ot0-x241.google.com with SMTP id x37so19821046ota.1 for <json@ietf.org>; Mon, 20 Mar 2017 09:26:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outer-planes-net.20150623.gappssmtp.com; s=20150623; h=sender:subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to; bh=UgQyLafM06MkwyOQVd08E8fdO3EPurKgh1XUOOn6uxo=; b=OrznMIy6VEq8CRhkn0YkhtGUfX67pZ771ns2ORvM6iQ2fj1zkqQ7G0OtMBZHUcxKNd 4nh1FdNPU20IiA5U98SyHmuR1TNMGBkpWXEQ/NbZAH5/L313TqsJAZ1OMjCUG8wirU3Q ovOiqqHpiHFZycZnDsLJauKESYBKZltOQ/BAcGuehRw22DaAW8+jpveyTprhFJdgUlKD BG4prszrg1ErNFdAEQJzGqg0bhHyj/vAnSxoVmNfnAdp7lHvSKwXxUdiZAZtOQ1tsUjm xwatg9B5xqkJdv0ToedWR3ACby192sywROCJbNygxO4EtOMGrADbRg2mQKStY4lTBnn4 6AEw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:to:cc:references:from:message-id :date:user-agent:mime-version:in-reply-to; bh=UgQyLafM06MkwyOQVd08E8fdO3EPurKgh1XUOOn6uxo=; b=Dps1PnCS7k/WM2+tWjlNyAk/rfqdEvc5wkIM2me77e9NNhg2TS25U4TARYl5w8ZX6h 9moHjA5KOdBu5U1FYPnTf4TYbMdHGi9NaCs7N814uAzveIr38NojRiUrVRlKLby3Gw0p zepXvTwJoS5qsAEgv0ACN6cPJTb7bBThYpk1kowpgCY1jvDQd5kE9JoZ+9BInqkY/iAI TNI3i1MF77leH0Q69/zZmeAzFuNB+xOpMNN427Yxosxr9w6J/PlBxNXJ2xiiyg5R/LKW o/2qXFQtmdZZIycnGDoBHVUqp5SH+WlN5u6vW8wKehfD3XtNMpWFNemEZhHiLQNbnpks vGBg==
X-Gm-Message-State: AFeK/H0KjFFhrRsHXqi4Wwmx+Mo734ur5VOCUaq2Hwq+zxOXCk3NmywwOKxy5oW2SHpnDg==
X-Received: by 10.157.89.143 with SMTP id u15mr14942469oth.176.1490027182226; Mon, 20 Mar 2017 09:26:22 -0700 (PDT)
Received: from [10.6.23.170] ([128.177.113.102]) by smtp.gmail.com with ESMTPSA id q31sm2917179ota.6.2017.03.20.09.26.21 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 20 Mar 2017 09:26:21 -0700 (PDT)
Sender: Matthew Miller <linuxwolf@outer-planes.net>
To: Julian Reschke <julian.reschke@gmx.de>, "json@ietf.org" <json@ietf.org>
Cc: The IESG <iesg@ietf.org>
References: <1fb5849e-8dbf-835d-65b7-2403686248f9@outer-planes.net> <0E32A94D-CE12-4F52-9ED6-8743C49751B4@vpnc.org> <4d2f0fb3-a729-0c17-2394-bc1e005dd612@gmx.de> <d09f9a59-2411-45a0-470c-ea95072fe4fd@outer-planes.net> <dad91b19-e774-e239-36d2-9d086cca8e0d@gmx.de> <ac432615-ee84-3cdf-6b37-480626bd18c1@gmx.de>
From: "Matthew A. Miller" <linuxwolf+ietf@outer-planes.net>
Message-ID: <804f9930-26a5-a565-0607-452b386cfeb5@outer-planes.net>
Date: Mon, 20 Mar 2017 10:26:20 -0600
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.0
MIME-Version: 1.0
In-Reply-To: <ac432615-ee84-3cdf-6b37-480626bd18c1@gmx.de>
Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="HtUvPglAb1StOSRo7i7X66880bAHT0a5I"
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/Voi2X-4LHj8WjQHm_8uCC4fT2L8>
Subject: Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 20 Mar 2017 16:26:25 -0000

Thank you for the suggested changes, Julian.  To consolidate the
changes, I believe the following is your suggested text for all of
Section 8.1:

"""
JSON text MUST be encoded in UTF-8, UTF-16, or UTF-32 Section 3 of
[UNICODE].  The default encoding is UTF-8, and JSON texts that are
encoded in UTF-8 are interoperable in the sense that they will be
read successfully by the maximum number of implementations; there are
many implementations that cannot successfully read texts in other
encodings (such as UTF-16 and UTF-32).  Text encoded in character
encodings other than UTF-8, UTF-16, or UTF-32 cannot be used with
the media tye "application/json".

Implementations MUST NOT add a byte order mark (U+FEFF) to the
beginning of a JSON text.  In the interests of interoperability,
implementations that parse JSON texts MAY ignore the presence of a
byte order mark rather than treating it as an error.

Recipients that wish to support Unicode encodings other than UTF-8
can do this using a detection mechanism that is based on the fact
that the first character will always have a Unicode code point less
or equal than 127, thus the UTF-16/32 variants can be detected by
inspecting the first octets for nulls.
"""

Does the working group object to this change?


- m&m

Matthew A. Miller
JSONbis Chair

On 17/03/19 04:29, Julian Reschke wrote:
> ...and here is a concrete proposal:
> 
> Original text:
> 
>> 8.1.  Character Encoding
>>
>>    JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32 [UNICODE]
>>    (Section 3).  The default encoding is UTF-8, and JSON texts that are
> 
> Change 1:
> 
> Say "MUST" instead of "SHALL", as it's the more common form of
> expressing this requirement.
> 
> Change 2:
> 
> Replace "[UNICODE] (Section 3)" by "Section 3 of [UNICODE]".
> 
> That said, this citation isn't as stable as it should, as [UNICODE]
> refers to <http://www.unicode.org/versions/latest/> and unless I'm
> missing something, there's no guarantee that future versions will have
> the relevant bits in Section 3.
> 
>>    encoded in UTF-8 are interoperable in the sense that they will be
>>    read successfully by the maximum number of implementations; there are
>>    many implementations that cannot successfully read texts in other
>>    encodings (such as UTF-16 and UTF-32).
> 
> Change 3:
> 
> Add "Text encoded in character encodings other than UTF-8, UTF-16, or
> UTF-32 can not be used with the media type "application/json".
> 
> (this explains the implications of the SHALL/MUST)
> 
> 
>>    Implementations MUST NOT add a byte order mark (U+FEFF) to the
>>    beginning of a JSON text.  In the interests of interoperability,
>>    implementations that parse JSON texts MAY ignore the presence of a
>>    byte order mark rather than treating it as an error.
> 
> 
> Finally, change 4:
> 
> Add a new paragraph:
> 
> "Recipients that wish to support Unicode encodings other than UTF-8 can
> do this using a detection mechanism that is based on the fact that the
> first character will always have a Unicode code point less or equal than
> 127, thus the UTF-16/32 variants can be detected by inspecting the first
> octets for nulls."
> 
> 
> I believe none of these changes affects anything normative, but that
> they absolutely clarify the spec. In particular, having them in the spec
> would have avoided this whole discussion we just had.
> 
> Best regards, Julian