Re: [Json] Allow any JSON value at the top level - Encoding detection

"Pete Cordell" <petejson@codalogic.com> Fri, 07 June 2013 08:22 UTC

Return-Path: <petejson@codalogic.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C0FA521F964C for <json@ietfa.amsl.com>; Fri, 7 Jun 2013 01:22:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.018
X-Spam-Level: *
X-Spam-Status: No, score=1.018 tagged_above=-999 required=5 tests=[AWL=-0.650, BAYES_50=0.001, SARE_HEAD_XUNSENT=1.666, STOX_REPLY_TYPE=0.001]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WyLjlbn5CGwh for <json@ietfa.amsl.com>; Fri, 7 Jun 2013 01:22:45 -0700 (PDT)
Received: from codalogic.com (codalogic.com [94.136.60.219]) by ietfa.amsl.com (Postfix) with ESMTP id 96B5921F9007 for <json@ietf.org>; Fri, 7 Jun 2013 01:22:44 -0700 (PDT)
Received: (qmail 10019 invoked from network); 7 Jun 2013 09:22:42 +0100
Received: from host86-132-241-164.range86-132.btcentralplus.com (HELO codalogic) (86.132.241.164) by codalogic.com with (RC4-MD5 encrypted) SMTP; 7 Jun 2013 09:22:42 +0100
Message-ID: <A34120F0D1C741288A6707E0566E6525@codalogic>
From: Pete Cordell <petejson@codalogic.com>
To: "Manger, James H" <James.H.Manger@team.telstra.com>, stefan@drees.name, R S <sayrer@gmail.com>
References: <255B9BB34FB7D647A506DC292726F6E1151B21F9A9@WSMSG3153V.srv.dir.telstra.com><CAChr6SyqBm6O2Vuo5Pe3PyUaGoWqOfBxasYCC_vzZ=ya5FT57w@mail.gmail.com><51B1818A.7080100@drees.name><255B9BB34FB7D647A506DC292726F6E1151B21FE35@WSMSG3153V.srv.dir.telstra.com> <255B9BB34FB7D647A506DC292726F6E1151B21FE4F@WSMSG3153V.srv.dir.telstra.com>
X-Unsent: 1
Date: Fri, 07 Jun 2013 09:22:37 +0100
x-vipre-scanned: 0073D8FB00483A0073DA48
MIME-Version: 1.0
Content-Type: text/plain; format="flowed"; charset="iso-8859-1"; reply-type="original"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
Cc: json@ietf.org
Subject: Re: [Json] Allow any JSON value at the top level - Encoding detection
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Jun 2013 08:22:52 -0000

Original Message From: "Manger, James H"
>> Adjust the UTF-16 patterns to:
>>
>>     00 xx xx xx  UTF-16BE
>>     xx 00 xx xx  UTF-16LE
>
> Add those patterns, don't replace the existing ones. The table becomes:
>
>   00 00 00 xx  UTF-32BE
>   00 xx 00 xx  UTF-16BE
>   00 xx xx xx  UTF-16BE
>   xx 00 00 00  UTF-32LE
>   xx 00 xx 00  UTF-16LE
>   xx 00 xx xx  UTF-16LE
>   xx xx xx xx  UTF-8

If xx means non-zero, then I think we also have to include the following for 
characters like U+2c00:

    xx 00 00 xx  UTF-16LE

giving:

   00 00 00 xx  UTF-32BE
   00 xx 00 xx  UTF-16BE
   00 xx xx xx  UTF-16BE
   xx 00 00 00  UTF-32LE
   xx 00 00 xx  UTF-16LE
   xx 00 xx 00  UTF-16LE
   xx 00 xx xx  UTF-16LE
   xx xx xx xx  UTF-8

That can be reduced a bit if we use "--" to indicate "not-tested":

   00 00 -- --  UTF-32BE
   00 xx -- --  UTF-16BE
   xx 00 00 00  UTF-32LE
   xx 00 00 xx  UTF-16LE
   xx 00 xx --  UTF-16LE
   xx xx -- --  UTF-8


Pete Cordell
Codalogic Ltd
C++ tools for C++ programmers, http://codalogic.com
Read & write XML in C++, http://www.xml2cpp.com