Re: [Json] Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09

Patrik Fältström <paf@frobbit.se> Mon, 08 December 2014 05:42 UTC

Return-Path: <paf@frobbit.se>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E52171A6F0B; Sun, 7 Dec 2014 21:42:35 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.361
X-Spam-Level:
X-Spam-Status: No, score=-1.361 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HELO_EQ_SE=0.35, J_CHICKENPOX_14=0.6, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id N8v7NOvnfSbs; Sun, 7 Dec 2014 21:42:34 -0800 (PST)
Received: from mail.frobbit.se (mail.frobbit.se [85.30.129.185]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7C0781A6EFB; Sun, 7 Dec 2014 21:42:34 -0800 (PST)
Received: from [192.168.1.83] (frobbit.cust.teleservice.net [85.30.128.225]) by mail.frobbit.se (Postfix) with ESMTPSA id 9C9472033F; Mon, 8 Dec 2014 06:42:32 +0100 (CET)
Subject: Re: [Json] Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09
Mime-Version: 1.0 (Mac OS X Mail 8.1 \(1993\))
Content-Type: multipart/signed; boundary="Apple-Mail=_2BF39DFA-E834-4C16-9342-0057BA5E36FC"; protocol="application/pgp-signature"; micalg="pgp-sha1"
X-Pgp-Agent: GPGMail 2.5b3
From: Patrik Fältström <paf@frobbit.se>
In-Reply-To: <20141207210754.GA18507@mercury.ccil.org>
Date: Mon, 08 Dec 2014 06:42:32 +0100
Message-Id: <743CABB8-9BDB-4144-BD96-2D3A79BF0450@frobbit.se>
References: <CE03DB3D7B45C245BCA0D24327794936289DC7@MX104CL02.corp.emc.com> <89601952-AA04-44EE-A6DA-E76D0AB07C21@frobbit.se> <20141207180528.GA1116@mercury.ccil.org> <D4E95FE1-0C25-4541-8327-16313175F13A@frobbit.se> <20141207210754.GA18507@mercury.ccil.org>
To: John Cowan <cowan@mercury.ccil.org>
X-Mailer: Apple Mail (2.1993)
Archived-At: http://mailarchive.ietf.org/arch/msg/ietf/iQUwbtc8DkonIWwQLgnufjyAHLo
Cc: "ops-dir@ietf.org" <ops-dir@ietf.org>, "ietf@ietf.org" <ietf@ietf.org>, "Black, David" <david.black@emc.com>, "json@ietf.org" <json@ietf.org>, "General Area Review Team (gen-art@ietf.org)" <gen-art@ietf.org>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ietf/>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Dec 2014 05:42:36 -0000

> On 7 dec 2014, at 22:07, John Cowan <cowan@mercury.ccil.org> wrote:
> 
> Patrik Fältström scripsit:
> 
>> I.e. the way I read draft-ietf-json-text-sequence (and I might be
>> wrong), you have specific octet values that act as separators. That
>> only works if the encoding is UTF-8.
> 
> This is a binary representation which has embedded JSON texts represented
> in UTF-8.  Since the first character in a JSON text is necessarily in
> the ASCII repertoire, it is not possible to parse a UTF-16 or UTF-32
> JSON text as UTF-8 and come out with valid JSON.

My point is that if you talk about what specific characters or reference RFC20 or what not, then you only get RS if you use UTF-8 encoding. If you use UTF-16, then you neither have RS as one octet (0x1E), nor is RS the only character that do have 0x1E as one of the octets.

I think the problem is that I do not know what "octet string" is. You either have UTF-8 encoded Unicode strings, or... ;-) In this case, you have a series of UTF-8 encoded Unicode Strings, right? Separated by the octet 0x1E, which happen to also be a correctly encoded Unicode character -- the Information Separator Two. This implies the whole thing is a UTF-8 encoded text that is to be parsed like this:

possible-JSON = 1*(not-RS); UTF-8-encoded JSON text
 ; (as specified in RFC7159, but only UTF-8 allowed)

I.e. the blob, to be compliant with this document, MUST be UTF-8 encoded JSON.

Right?

> However, I grant that mentioning UTF-8 only in an ABNF comment is not
> really prominent enough.  Proposed wording change:
> 
> For:
> 
>   In prose: a series of octet strings, each containing any octet other
>   than a record separator (RS) (0x1E) [RFC0020], all octet strings
>   separated from each other by RS octets.  Each octet string in the
>   sequence is to be parsed as a JSON text.
> 
> read:
> 
>   In prose: a series of octet strings, each containing any octet other
>   than a record separator (RS) (0x1E) [RFC0020], all octet strings
>   separated from each other by RS octets.  Each octet string in the
>   sequence is to be parsed as a JSON text in UTF-8 encoding.
> 
> and add a suitable reference to UTF-8.

I would say that what you have said above is:

This specifies a series of UTF-8 encoded Unicode strings. Each to be interpreted as JSON text. The strings are separated by the octet 0x1E (which is UTF-8 encoding of the Unicode Character U+001E - INFORMATION SEPARATOR TWO). This character because of this must be escaped, for example by using \u001E notation, if it exists in an attribute value.

>> Ok, so what you say is that a string in an attribute value in the JSON
>> blob can still start with U+FEFF?
> 
> Just so.

Good.

   Patrik