Re: [Json] JSON Sequence support for log files

"Manger, James" <James.H.Manger@team.telstra.com> Thu, 08 May 2014 01:45 UTC

Return-Path: <James.H.Manger@team.telstra.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3C0621A0469 for <json@ietfa.amsl.com>; Wed, 7 May 2014 18:45:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.202
X-Spam-Level:
X-Spam-Status: No, score=-0.202 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HELO_EQ_AU=0.377, HOST_EQ_AU=0.327, RCVD_IN_DNSWL_NONE=-0.0001, RELAY_IS_203=0.994] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BYv0WVq2qblQ for <json@ietfa.amsl.com>; Wed, 7 May 2014 18:45:10 -0700 (PDT)
Received: from ipxcvo.tcif.telstra.com.au (ipxcvo.tcif.telstra.com.au [203.35.135.208]) by ietfa.amsl.com (Postfix) with ESMTP id 951851A0466 for <json@ietf.org>; Wed, 7 May 2014 18:45:08 -0700 (PDT)
X-IronPort-AV: E=Sophos;i="4.97,1007,1389704400"; d="scan'208";a="11513453"
Received: from unknown (HELO ipcavi.tcif.telstra.com.au) ([10.97.217.200]) by ipocvi.tcif.telstra.com.au with ESMTP; 08 May 2014 11:34:40 +1000
X-IronPort-AV: E=McAfee;i="5600,1067,7431"; a="275442749"
Received: from wsmsg3757.srv.dir.telstra.com ([172.49.40.85]) by ipcavi.tcif.telstra.com.au with ESMTP; 08 May 2014 11:45:03 +1000
Received: from WSMSG3153V.srv.dir.telstra.com ([172.49.40.159]) by wsmsg3757.srv.dir.telstra.com ([172.49.40.85]) with mapi; Thu, 8 May 2014 11:45:02 +1000
From: "Manger, James" <James.H.Manger@team.telstra.com>
To: Nico Williams <nico@cryptonector.com>, Phillip Hallam-Baker <hallam@gmail.com>
Date: Thu, 08 May 2014 11:45:01 +1000
Thread-Topic: [Json] JSON Sequence support for log files
Thread-Index: Ac9qRqCysiCJKk1zR9qxGHZipbZyGAADof+Q
Message-ID: <255B9BB34FB7D647A506DC292726F6E11545BD3247@WSMSG3153V.srv.dir.telstra.com>
References: <CAK3OfOjfr_KP+bu977CY2-8oCqO11fh_wfUDuj3LJ3JVrqCXaQ@mail.gmail.com> <CAMm+Lwh9rQf3h-Nw8fgtrOqyCL+oPXOc0-xBdhma2Aqe=OjipA@mail.gmail.com> <CAK3OfOhv0oUWJZPb11SksxDJ-xK8OghUwBt7Y75kuGgOiGD-EA@mail.gmail.com>
In-Reply-To: <CAK3OfOhv0oUWJZPb11SksxDJ-xK8OghUwBt7Y75kuGgOiGD-EA@mail.gmail.com>
Accept-Language: en-US, en-AU
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US, en-AU
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Archived-At: http://mailarchive.ietf.org/arch/msg/json/P6YE0QihgZ1YKHZkCNYRJbvvdDM
Cc: "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] JSON Sequence support for log files
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 08 May 2014 01:45:11 -0000

>>> Appending to logfiles creates the need to recover from interrupted
>>> writes (i.e., corrupted entries).  Therefore I propose the following
>>> minor change to the JSON Sequence I-D:
>>>
>>> a) add ASCII RS to the whitespace rule
>>> b) RECOMMEND that logfile writers write either RS <entry> RS NL or
>>> <entry-without-internal-newlines> NL
>>>
>>> (producing entries with no internal newlines is trivial, as discussed
>>> in other posts)
>>>
>>> c) describe how to resynchronize by seeking to the next RS or NL when
>>> a JSON text fails to parse, until a text that parses is found.

>> Why can't we have } NL { as a record boundary separator?

> JSON Sequence is more general than logfiles.  As such it supports any
> type at the top-level, therefore } NL { doesn't work.
>
> For logfiles, however, it does work, as we can constrain the type used
> at the top-level in logfiles.
>
> In fact, any sequence of ( %x22 / "]" / "}" ) %x0A ( %22 / "{" / "[" )
> works!  Thus even for logfiles we can have arrays, objects, and
> strings at the top-level.


Syncing to a boundary in the middle of a JSON sequence or log isn't a good reason for RS. You can sync without RS. I think the following detects a boundary between any JSON values.

( "}" / "]" / %x22 / "e" / "l" / DIGIT ) *ws NL *ws ( "{" / "[" / %x22 / "t" / "f" / "n" / "-" / DIGIT )

RS adds complexity (eg extra steps to strip RSs before passing to JSON parser).
It also causes unexpected behaviour from tools that handle text. For instance, the Chrome browser usually *displays* text files in the browser window, but if it has an RS character the file is downloaded instead!


RS could conceivably help detect *slightly* more cases of accidentally truncated writes (in the middle of a top-level number, in a top-level string in the middle of an escape sequence). That is not compelling. Particularly as it still fails to detect truncation of entire records (including RSs) so it is useless as a proper integrity mechanism. Finally, the "RS <entry> RS NL" suggestion misuses the normal semantics of a separator by using 2 per entry instead of 1.

-1 to RSs

--
James Manger