Re: [Json] Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-10

Patrik Fältström <paf@frobbit.se> Sat, 13 December 2014 07:02 UTC

Return-Path: <paf@frobbit.se>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2573C1AC3E5; Fri, 12 Dec 2014 23:02:32 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.961
X-Spam-Level:
X-Spam-Status: No, score=-1.961 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HELO_EQ_SE=0.35, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IaLDisa-Gkc5; Fri, 12 Dec 2014 23:02:29 -0800 (PST)
Received: from mail.frobbit.se (mail.frobbit.se [85.30.129.185]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 51AE91AC3DF; Fri, 12 Dec 2014 23:02:29 -0800 (PST)
Received: from [192.168.1.83] (frobbit.cust.teleservice.net [85.30.128.225]) by mail.frobbit.se (Postfix) with ESMTPSA id A6E8022EAF; Sat, 13 Dec 2014 08:02:26 +0100 (CET)
Subject: Re: [Json] Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-10
Mime-Version: 1.0 (Mac OS X Mail 8.1 \(1993\))
Content-Type: multipart/signed; boundary="Apple-Mail=_64B49D58-F28D-40FC-9AC4-78D8B9EC82B1"; protocol="application/pgp-signature"; micalg="pgp-sha1"
X-Pgp-Agent: GPGMail 2.5b3
From: Patrik Fältström <paf@frobbit.se>
In-Reply-To: <20141212011208.GK5272@mercury.ccil.org>
Date: Sat, 13 Dec 2014 08:02:26 +0100
Message-Id: <B486D5B4-C95C-4315-9F3C-3173E9A64301@frobbit.se>
References: <CE03DB3D7B45C245BCA0D243277949362B18C7@MX104CL02.corp.emc.com> <475F8F1D-6F6A-47E3-AE60-7BDC7AB6BD66@vpnc.org> <255B9BB34FB7D647A506DC292726F6E127D5708376@WSMSG3153V.srv.dir.telstra.com> <20141212011208.GK5272@mercury.ccil.org>
To: John Cowan <cowan@mercury.ccil.org>
X-Mailer: Apple Mail (2.1993)
Archived-At: http://mailarchive.ietf.org/arch/msg/ietf/NZoDQsdWrcjEcpwrqIHhPnDCMQ8
Cc: "ops-dir@ietf.org" <ops-dir@ietf.org>, "ietf@ietf.org" <ietf@ietf.org>, "Black, David" <david.black@emc.com>, Paul Hoffman <paul.hoffman@vpnc.org>, "json@ietf.org" <json@ietf.org>, "Manger, James" <James.H.Manger@team.telstra.com>, "General Area Review Team (gen-art@ietf.org)" <gen-art@ietf.org>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ietf/>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 13 Dec 2014 07:02:32 -0000

> On 12 dec 2014, at 02:12, John Cowan <cowan@mercury.ccil.org> wrote:
> 
> Manger, James scripsit:
> 
>> How about:
>> 
>>  "A JSON text sequence consists of any number of JSON texts,
>>   each prefixed by a Record Separator (U+001E) character, and
>>   each suffixed by an End of Line (U+000A) character. It is
>>   UTF-8 encoded."
>> 
>> Say "Information Separator Two (U+001E)" if you really want to be pure.
> 
> The trouble with that is that U+001E has no official Unicode name or
> function; those come from ISO 6429, which is incorporated (in relevant
> part) into US-ASCII, which is described in RFC 20.

Although it does not have a Unicode Name, the alias is as close as we can get, which is "INFORMATION SEPARATOR TWO":

# grep ^001E UnicodeData.txt
001E;<control>;Cc;0;B;;;;;N;INFORMATION SEPARATOR TWO;;;;
#

So I suggest to use that.

It is I think wrong to say "Record Separator" and then still reference the Unicode Tables.

Alternatively one just write (and make it more clear how this works, and this is my understanding):

> A JSON text sequence consists of any number of JSON texts, each prefixed by U+001E character and each suffixed by U+000A. The JSON texts as well as the whole JSON text sequence is encoded in UTF-8 although any JSON text might be truncated and because of that not a valid UTF-8 sequence. Any occurance of the UTF-8 encoding of U+001E (the byte 0x1E) is to be viewed as the first byte before each JSON text, and occurrance of the byte 0x0A is to be viewed as the first byte after a complete JSON text. If the JSON text is truncated, the 0x0A byte will not be present.

I.e. the grammar is sort of (before coffee in the morning):

sequence := 0x1E text

text := complete-text | truncated-text

complete-text := proper-UTF8 0x0A

truncated-text := proper-UTF8 broken-UTF8

proper-UTF8 := "" | "a sequence of bytes, possible to parse as a series of UTF8 encoded Unicode characters"

broken-UTF8 := "a sequence of bytes not possible to parse as a UTF8 encoded unicode character"

   Patrik