Re: [Json] Wording on encoding; removing the table

"Pete Cordell" <petejson@codalogic.com> Sat, 23 November 2013 09:51 UTC

Return-Path: <petejson@codalogic.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 466101AE274 for <json@ietfa.amsl.com>; Sat, 23 Nov 2013 01:51:47 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 2.736
X-Spam-Level: **
X-Spam-Status: No, score=2.736 tagged_above=-999 required=5 tests=[BAYES_20=-0.001, FH_HOST_EQ_D_D_D_D=0.765, HELO_MISMATCH_COM=0.553, RDNS_DYNAMIC=0.982, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, STOX_REPLY_TYPE=0.439] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tMOB64Nghqt5 for <json@ietfa.amsl.com>; Sat, 23 Nov 2013 01:51:45 -0800 (PST)
Received: from ppsa-online.com (lvps217-199-162-192.vps.webfusion.co.uk [217.199.162.192]) by ietfa.amsl.com (Postfix) with ESMTP id BF6AC1AE25A for <json@ietf.org>; Sat, 23 Nov 2013 01:51:44 -0800 (PST)
Received: (qmail 31725 invoked from network); 23 Nov 2013 09:51:18 +0000
Received: from host86-167-12-24.range86-167.btcentralplus.com (HELO codalogic) (86.167.12.24) by lvps217-199-162-217.vps.webfusion.co.uk with ESMTPSA (RC4-MD5 encrypted, authenticated); 23 Nov 2013 09:51:18 +0000
Message-ID: <7404D1DCC5E84DC3B8F8CD300274962D@codalogic>
From: Pete Cordell <petejson@codalogic.com>
To: Paul Hoffman <paul.hoffman@vpnc.org>, JSON WG <json@ietf.org>
References: <v8av89128j49csd5bb5ba2rqrgschs4c79@hive.bjoern.hoehrmann.de> <BE35B0E6-6C71-47EB-BA29-08A32935D20E@vpnc.org>
Date: Sat, 23 Nov 2013 09:45:02 -0000
X-Unsent: 1
MIME-Version: 1.0
Content-Type: text/plain; format="flowed"; charset="iso-8859-1"; reply-type="original"
x-vipre-scanned: 003896B1005C56003897FE
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
Subject: Re: [Json] Wording on encoding; removing the table
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 23 Nov 2013 09:51:47 -0000

I believe we must have consensus that this is a contentious issue, and there 
is a lot of confusion around it.  Therefore, in the interests of 
interoperability I believe it is inappropriate to decide to be silent on all 
of these issues.  Therefore, I propose text along the following lines:

    JSON text is a sequence of Unicode codepoints.  The transfer encoding
    used to represent the characters on-the-wire is beyond the scope
    of this document.  It is therefore up to the specifications that
    reference this document to specify whether JSON messages will be
    transferred using UTF-8 (recommended), UTF-16 and/or UTF-32
    (discouraged), and whether preceding BOMs must be present,
    must not be present or are optional.

    If multiple encodings are permitted, implementers may choose to
    auto-detect a message's encoding by exploiting the fact that the
    first character of a JSON text must be in the ASCII character
    range and use the following table to deduce the active encoding:

           xx xx -- --  UTF-8
           xx 00 xx --  UTF-16LE
           xx 00 00 xx  UTF-16LE
           xx 00 00 00  UTF-32LE
           00 xx -- --  UTF-16BE
           00 00 -- --  UTF-32BE

I don't think that's a lot of text with which to describe the issues here, 
and I'm sure Tim (or someone else) can make it even snappier and more 
accurate.

Pete Cordell
Codalogic Ltd
C++ tools for C++ programmers, http://codalogic.com
Read & write XML in C++, http://www.xml2cpp.com
----- Original Message ----- 
From: "Paul Hoffman" <paul.hoffman@vpnc.org>
To: "JSON WG" <json@ietf.org>
Sent: Friday, November 22, 2013 10:36 PM
Subject: [Json] Wording on encoding; removing the table


> <hat on>
>
> Please note that the chairs tried to find some consensus in the BOM
> discussion and found none. Given that, and given that the current table is
> now wrong, our proposal is to remove it, not try to doctor it.
>
> Current Section 8.1:
>
>   JSON text SHALL be encoded in Unicode.  The default encoding is
>   UTF-8.
>
>   Since the first two characters of a JSON text will always be ASCII
>   characters [RFC0020], it is possible to determine whether an octet
>   stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
>   at the pattern of nulls in the first four octets.
>
>   00 00 00 xx  UTF-32BE
>   00 xx 00 xx  UTF-16BE
>   xx 00 00 00  UTF-32LE
>   xx 00 xx 00  UTF-16LE
>   xx xx xx xx  UTF-8
>
> Proposed replacement:
>
>   The default encoding for JSON transmitted over the Internet is UTF-8.
>   Transmitting JSON using other encodings may not be interoperable
>   unless the receiving system definitively knows the encoding.
>
> Does anyone have a technical objection to the proposed replacement? If so,
> please state the error and (hopefully) a correction.
>
> --Matt Miller and Paul Hoffman
>
> _______________________________________________
> json mailing list
> json@ietf.org
> https://www.ietf.org/mailman/listinfo/json