Re: [Json] secdir review of draft-ietf-jsonbis-rfc7159bis-03

Peter Cordell <petejson@codalogic.com> Sun, 12 March 2017 09:06 UTC

Return-Path: <petejson@codalogic.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 85D82129476 for <json@ietfa.amsl.com>; Sun, 12 Mar 2017 01:06:30 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.92
X-Spam-Level:
X-Spam-Status: No, score=-0.92 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RDNS_DYNAMIC=0.982, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ivoGenb9Vz0D for <json@ietfa.amsl.com>; Sun, 12 Mar 2017 01:06:28 -0800 (PST)
Received: from ppsa-online.com (lvps217-199-162-192.vps.webfusion.co.uk [217.199.162.192]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 60DF5126D73 for <json@ietf.org>; Sun, 12 Mar 2017 01:06:27 -0800 (PST)
Received: (qmail 1030 invoked from network); 12 Mar 2017 08:59:10 +0000
Received: from host109-158-230-32.range109-158.btcentralplus.com (HELO ?192.168.1.72?) (109.158.230.32) by lvps217-199-162-217.vps.webfusion.co.uk with ESMTPSA (DHE-RSA-AES128-SHA encrypted, authenticated); 12 Mar 2017 08:59:10 +0000
To: Ned Freed <ned.freed@mrochek.com>, Julian Reschke <julian.reschke@gmx.de>
References: <20170308014823.GF30306@kduck.kaduk.org> <382aa5c8-c977-b24d-4d19-251257833b00@gmx.de> <456b4234-0d94-1033-507c-710878bb5159@gmx.de> <20170309055348.GL30306@kduck.kaduk.org> <CAD2gp_TOxcZJxwPoMhq-xp6M+Yq+tQnMUv81YNFp-ydRMpH=5w@mail.gmail.com> <bed0e331-f5fb-f24d-6207-f5a36ec9e7be@gmx.de> <01QBU8WJOCUO0003XB@mauve.mrochek.com>
From: Peter Cordell <petejson@codalogic.com>
Message-ID: <6d97dee7-7cf3-9142-aacf-f2ca4909103d@codalogic.com>
Date: Sun, 12 Mar 2017 09:06:24 +0000
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <01QBU8WJOCUO0003XB@mauve.mrochek.com>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/_RlFqQBQhAEgoFg64Ql5JcVhaaE>
Cc: draft-ietf-jsonbis-rfc7159bis.all@ietf.org, John Cowan <cowan@ccil.org>, ietf@ietf.org, secdir@ietf.org, "json@ietf.org" <json@ietf.org>, Benjamin Kaduk <kaduk@mit.edu>
Subject: Re: [Json] secdir review of draft-ietf-jsonbis-rfc7159bis-03
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 12 Mar 2017 09:06:30 -0000

On 11/03/2017 15:41, Ned Freed wrote:
>> On 2017-03-11 03:08, John Cowan wrote:
>> >
>> > On Thu, Mar 9, 2017 at 12:53 AM, Benjamin Kaduk <kaduk@mit.edu
>> > <mailto:kaduk@mit.edu>> wrote:
>> >
>> >     If that's what's supposed to happen, it should probably be more
>> >     clear, yes.  (But aren't there texts that have valid
>> interpretations
>> >     in multiple encodings?)
>> >
>> >
>> > Not if the content is well-formed JSON and the only possible encodings
>> > are UTF-8, UTF-16, and UTF-32.  It suffices to examine the first four
>> > bytes of the input.  If there are no NUL bytes in the first four bytes,
>> > it is UTF-8; if there are two NUL bytes, it is UTF-16; if there are
>> > three NUL bytes, it is UTF-32.  This works because the grammar requires
>> > the first character to be in the ASCII repertoire, and the NUL
>> > *character* (U+0000) is not allowed at all.
>
>> Good explanation. Maybe the spec should include it.
>
> +1
>
> This exact issue just came up in a media type review, where someone
> specified a charset parameter because they weren't aware of this algorithm.
>
> It would be very helpful to have this text in the RFC.


Although it does need slightly more detail to take into account 
endian-ness in the case of UTF-16 and -32.

The XML spec may offer some example text:

https://www.w3.org/TR/2008/REC-xml-20081126/#sec-guessing

Pete Cordell
Codalogic Ltd
Read & write XML in C++, http://www.xml2cpp.com