Re: [Json] Complete section 3 proposal

"Pete Cordell" <petejson@codalogic.com> Wed, 19 June 2013 13:26 UTC

Return-Path: <petejson@codalogic.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F0B0921F84AA for <json@ietfa.amsl.com>; Wed, 19 Jun 2013 06:26:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.768
X-Spam-Level: *
X-Spam-Status: No, score=1.768 tagged_above=-999 required=5 tests=[AWL=0.100, BAYES_50=0.001, SARE_HEAD_XUNSENT=1.666, STOX_REPLY_TYPE=0.001]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pnGp-a4lGKFz for <json@ietfa.amsl.com>; Wed, 19 Jun 2013 06:26:01 -0700 (PDT)
Received: from codalogic.com (codalogic.com [94.136.60.219]) by ietfa.amsl.com (Postfix) with ESMTP id B065121F9962 for <json@ietf.org>; Wed, 19 Jun 2013 06:26:00 -0700 (PDT)
Received: (qmail 14083 invoked from network); 19 Jun 2013 14:25:59 +0100
Received: from host86-169-212-62.range86-169.btcentralplus.com (HELO codalogic) (86.169.212.62) by codalogic.com with (RC4-MD5 encrypted) SMTP; 19 Jun 2013 14:25:59 +0100
Message-ID: <4556367B73674E6FAB1E92C7CF3A2125@codalogic>
From: Pete Cordell <petejson@codalogic.com>
To: "Joe Hildebrand (jhildebr)" <jhildebr@cisco.com>, json@ietf.org
References: <A723FC6ECC552A4D8C8249D9E07425A70FC58C0B@xmb-rcd-x10.cisco.com>
X-Unsent: 1
x-vipre-scanned: 0130216C0049A6013022B9
Date: Wed, 19 Jun 2013 14:26:08 +0100
MIME-Version: 1.0
Content-Type: text/plain; format="flowed"; charset="iso-8859-1"; reply-type="original"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
Subject: Re: [Json] Complete section 3 proposal
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Jun 2013 13:26:06 -0000

Original Message From: "Joe Hildebrand (jhildebr)"
>    Since the first code point of JSON text will always be an ASCII
> character [RFC0020], it is possible to determine whether an octet stream
> is UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, or UTF-32LE by looking at the
> pattern of nulls in the first four octets of a stream.  In the following
> table "00" corresponds to an octet with value zero, "xx" corresponds to an
> octet known to be non-zero, and "??" corresponds to an octet that is not
> checked.
>
>    00 00 00 xx  UTF-32BE
>    00 xx ?? xx  UTF-16BE
>    xx 00 00 00  UTF-32LE
>    xx 00 xx ?? UTF-16LE
>    xx xx ?? ?? UTF-8
>
> Note: streams less than four octets long are not UTF-32BE or UTF-32LE, and
> streams less than two octets long are UTF-8.

To nit-pic, I think in the UTF-16BE case, by the time you've seen 00 xx you 
don't need anymore characters, so it's line should be:

    00 xx ?? ??  UTF-16BE


Or is the idea to include some redundancy in the detection?

Pete Cordell
Codalogic Ltd
C++ tools for C++ programmers, http://codalogic.com
Read & write XML in C++, http://www.xml2cpp.com