Re: [Json] JSON Log file encoding JSON-L

Phillip Hallam-Baker <hallam@gmail.com> Tue, 06 May 2014 14:54 UTC

Return-Path: <hallam@gmail.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D86261A008D for <json@ietfa.amsl.com>; Tue, 6 May 2014 07:54:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AD2dsgVi2nGj for <json@ietfa.amsl.com>; Tue, 6 May 2014 07:54:45 -0700 (PDT)
Received: from mail-lb0-x232.google.com (mail-lb0-x232.google.com [IPv6:2a00:1450:4010:c04::232]) by ietfa.amsl.com (Postfix) with ESMTP id 138A31A035A for <json@ietf.org>; Tue, 6 May 2014 07:54:44 -0700 (PDT)
Received: by mail-lb0-f178.google.com with SMTP id w7so2335407lbi.23 for <json@ietf.org>; Tue, 06 May 2014 07:54:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=fqQ7QLy4nR8M5h9kMP132sGzVkRESa+p1LOy7tRE/HQ=; b=cpmUFplWlDCcL6RdZ4KKhnn55FBOsn6p54Tx4gWU74wsyldfRzVfMHanfZbJQQFZQL 018XgyP42t9GwCtjFAroq1eMx8abNRI2TfrlawAUPHctaBlepgvj1QTcgf0eBkyEX2uq sSpK28wPNv5OyLnWm45KYc4bhqneF1xO71AdT1Y4SZp1gjWBgZdyhlPS2fYsE8IZeRgh /EqOC84FUoSKqV8FT3EBfagP1yM+RtMdWb0xouYrQVHfKJbt8Nq4G0HFZHgAITXXOGDy VofE5COcsRRS2iuQbwIsuA5ScxNTvyg5XCcDw25OYKPk9YN9k7omEr/5oh1r30OJNsJ9 hkvQ==
MIME-Version: 1.0
X-Received: by 10.113.3.167 with SMTP id bx7mr1213995lbd.64.1399388080485; Tue, 06 May 2014 07:54:40 -0700 (PDT)
Received: by 10.112.234.229 with HTTP; Tue, 6 May 2014 07:54:40 -0700 (PDT)
In-Reply-To: <CAHBU6is7g6ecwupv-N7wW+VTN_bZL71zbHYj=ePq=iqmk=gUwQ@mail.gmail.com>
References: <CAMm+LwjB-51z4GoeC0riehmJg1HAddmLAyfMsVOCVM80i=RiMA@mail.gmail.com> <CAK3OfOgX06vkOS1+CJhsptMAvm+KX1HgB2=34ubxUCtHNVnx=A@mail.gmail.com> <CAHBU6is7g6ecwupv-N7wW+VTN_bZL71zbHYj=ePq=iqmk=gUwQ@mail.gmail.com>
Date: Tue, 06 May 2014 10:54:40 -0400
Message-ID: <CAMm+Lwjy4iL3h2_v5v=No5UaQNNWe5dsTUZhP3wYLjVPshVUcA@mail.gmail.com>
From: Phillip Hallam-Baker <hallam@gmail.com>
To: Tim Bray <tbray@textuality.com>
Content-Type: text/plain; charset="UTF-8"
Archived-At: http://mailarchive.ietf.org/arch/msg/json/kEMYXtzS5fSS81OpINiNwzmGx_c
Cc: Nico Williams <nico@cryptonector.com>, JSON WG <json@ietf.org>
Subject: Re: [Json] JSON Log file encoding JSON-L
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 06 May 2014 14:54:47 -0000

If people want line mode then they probably want something like this:
http://www.w3.org/TR/WD-logfile.html

What I don't like about the W3C log file format is that every entry
has to have every column stated. Which makes it rather inconvenient
for tracking occasional errors. Most cases I just want to log the
source IP and the resource recorded. But if there was something
strange going on then I would probably want the full header set and
more.

If the information is in the logs then people will develop tools to
manage them. Its not like grep and its ilk are fixed and immutable.


One addition that I should probably make is to reference the W3C
specification as one possible source of JSON tags. That way tools can
easily support either format:

So this:

#Version: 1.0
#Date: 12-Jan-1996 00:00:00
#Fields: time cs-method cs-uri
00:34:23 GET /foo/bar.html
12:21:16 GET /foo/bar.html
12:45:52 GET /foo/bar.html
12:57:34 GET /foo/bar.html

Would become:

{ "Version": 1.0,
  "Date": "12-Jan-1996 00:00:00"}

{"time": 00:34:23, " cs-method" : "GET", "cs-uri": "/foo/bar.html}
{"time": 12:21:16, " cs-method" : "GET", "cs-uri": "/foo/bar.html}
{"time": 12:45:52, " cs-method" : "GET", "cs-uri": "/foo/bar.html}
{"time": 12:57:34, " cs-method" : "GET", "cs-uri": "/foo/bar.html}


Fun fact: the W3C log file specifies the time as GMT, not UTC and this
was not a mistake. At the time there was a possibility that GMT would
deviate from UTC by dropping the leap second idiocy and fixing to TAI.
Making corrections on the order of fifteen seconds a century is really
a piffling waste of time when solar time varies by five minutes over
the course of a year.

Of course now that the shutdown of ITU is a medium term possibility,
this might well be revisited.


On Tue, May 6, 2014 at 10:20 AM, Tim Bray <tbray@textuality.com> wrote:
> If you want to use line-oriented tools, don't use JSON.  The no-newlines
> rule seems really outside the spirit of JSON.
>
> On May 6, 2014 12:01 AM, "Nico Williams" <nico@cryptonector.com> wrote:
>>
>> For a variety of reasons I would prefer that a log file format based
>> on JSON simply forbid the appearance of unescaped newlines in JSON
>> texts, and otherwise be just JSON sequences:
>>
>>  - it's compatible with JSON sequences
>>  - it works with line-oriented Unix tools (e.g., wc(1), grep(1), ...)
>>  - aesthetically it's nice
>>  - it's simpler to resync: just search for the newlines (or EOF, or
>> offset 0) around entries that fail to parse
>>
>>  - it's easy to implement because most JSON encoders I've seen (all?
>> I think so) have an option for producing "compact" output,
>> usually/always meaning no whitespace will be emitted between the
>> tokens that make up a JSON text -- just use this feature and that's
>> that
>>
>>  - it even works without compact JSON encoding if you're lucky to not
>> have crashes and such
>>
>>    Some OSes don't guarantee that writes to files will be completed
>> when a process is SIGKILLed.  E.g., recent Linux kernels don't
>> guarantee that more than one byte will be written in that case -- this
>> was well-intentioned in that a process could start an enormous write
>> that an admin might want to stop, but how would they? which is why
>> Linux now allows SIGKILL to terminate incomplete writes.  Therefore
>> power failures and system crashes are not the only thing to worry
>> about.
>>
>>    Even with non-compact entry encodings, one can always recover with
>> some heuristics, and at some cost.  The biggest problem here is the
>> SIGKILL problem.
>>
>> Nico
>> --
>>
>> _______________________________________________
>> json mailing list
>> json@ietf.org
>> https://www.ietf.org/mailman/listinfo/json



-- 
Website: http://hallambaker.com/