Re: [DNSOP] New draft on representing DNS messages in JSON

Shane Kerr <shane@time-travellers.org> Thu, 21 August 2014 15:28 UTC

Return-Path: <shane@time-travellers.org>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 412CE1A03B5 for <dnsop@ietfa.amsl.com>; Thu, 21 Aug 2014 08:28:12 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.301
X-Spam-Level:
X-Spam-Status: No, score=-1.301 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, J_CHICKENPOX_84=0.6, SPF_HELO_PASS=-0.001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id aG0Gl8Z9yJgY for <dnsop@ietfa.amsl.com>; Thu, 21 Aug 2014 08:28:10 -0700 (PDT)
Received: from time-travellers.nl.eu.org (c.time-travellers.nl.eu.org [IPv6:2a02:2770::21a:4aff:fea3:eeaa]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 16ACB1A02ED for <dnsop@ietf.org>; Thu, 21 Aug 2014 08:28:09 -0700 (PDT)
Received: from [2001:610:719:1:3590:9502:46c0:a585] (helo=vulcan) by time-travellers.nl.eu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <shane@time-travellers.org>) id 1XKUHG-00010n-8q; Thu, 21 Aug 2014 15:27:50 +0000
Date: Thu, 21 Aug 2014 17:27:57 +0200
From: Shane Kerr <shane@time-travellers.org>
To: Paul Hoffman <paul.hoffman@vpnc.org>
Message-ID: <20140821172757.504f49b7@vulcan>
In-Reply-To: <586AEB36-C10F-4E6E-AC55-BAADE7C00FD4@vpnc.org>
References: <B4ACD73A-25EF-4063-81D4-DCFE6DB78AB1@vpnc.org> <20140821102232.73071610@vulcan> <586AEB36-C10F-4E6E-AC55-BAADE7C00FD4@vpnc.org>
X-Mailer: Claws Mail 3.10.1 (GTK+ 2.24.24; x86_64-pc-linux-gnu)
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
Archived-At: http://mailarchive.ietf.org/arch/msg/dnsop/jmUph_n-HlDHpDuHLJUj9FfEG-0
Cc: dnsop <dnsop@ietf.org>
Subject: Re: [DNSOP] New draft on representing DNS messages in JSON
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Aug 2014 15:28:12 -0000

Paul,

On Thu, 21 Aug 2014 07:01:01 -0700
Paul Hoffman <paul.hoffman@vpnc.org> wrote:

> On Aug 21, 2014, at 1:22 AM, Shane Kerr <shane@time-travellers.org>
> wrote:
> 
> > * I don't like the treatment of QNAME*/hostQNAME, NAME*/hostNAME,
> > and so on. Since JSON includes encoded strings, wouldn't it make
> > more sense just to always put the QNAME in there? (Especially since
> > you'll end up with SRV queries always being encoded as they have
> > underscore characters...)
> 
> JSON requires its strings to be encoded in a particular character
> set. Given that the labels in a QNAME/NAME can be an binary cruft,
> you can't assume that every QNAME will be representable.

I think you're making it too hard. Control characters, ", and \ are
already required to be escaped. Just specify a similar requirement for
octets 127 to 255 also be escaped, and we're done.

> > * In general I'm not super enthusiastic about the mixing of binary
> > and formatted data - I tend to think an application will want one
> > or the other. Perhaps it makes more sense to define two formats,
> > one binary and one formatted? Or...
> 
> All fields are optional, so a profile could say "don't include these"
> or "always include those". Further, and more importantly, most RDATA
> are binary. I did not want to force implementations to use the
> presentation format for RDATA.

The problem with an "all fields are optional" approach is that it puts
all the burden on the consumer of the data, right? You literally have
no idea what to expect. (That's kind of why I proposed some sort of
schema below.)

I understand not wanting to force implementations to use the
presentation format for RDATA... OTOH it seems likely that the reason
people are putting data in JSON is so they can see what it is. We could
always try the RFC 3597 approach for an unknown RTYPE?

> > * Maybe it makes sense to define a meta-record so consumers can know
> >  what to expect? Something that lists which names will (or may)
> > appear.
> 
> That would be a JSON schema. Just using that phrase will cause
> screaming in the Apps Area. Having said that, it's perfectly
> reasonable for a profile to insist that each record have a profile
> indicator such as "Profile": "Private DNS interchange v3.1".

Screaming aside, applications will either have an implicit schema or an
explicit one. Defining the problem to be out of scope may be necessary
to get something published, but that's a symptom of IETF brokenness
IMHO, since it reduces the usefulness of any such RFC. :(
 
> > I'd be mildly curious to see a comparison of the compressed sizes of
> > JSON-formatted data (without data duplicated as binary stuff) versus
> > non-JSON-formatted data. My intuition is that compression will
> > remove most of the horrible redundancy that is involved in JSON,
> > but there's only one way to be sure. ;)
> 
> Sure. It's pretty trivial to do, for example, a CBOR format that
> follows this; there are now CBOR libraries for most popular modern
> languages (see http://cbor.io/) If folks here want that, I can add
> it as an appendix. To be clear, however, I haven't heard anyone
> saying they want compression so badly they are willing to lose
> readability of the data.

Oh, I meant with gzip or the like, not some JSON crafted format.

So the idea is:

   $ tcpdump -w somefile.pcap
   $ pcap2dnsjson somefile.pcap somefile.json
   $ gzip somefile.pcap
   $ gzip somefile.json
   $ ls -l somefile.{pcap,json}.gz
   
Then compare the sizes of the compressed files.

The idea being that when moving files around via scp or rsync or
whatever they'd probably be compressed like this, and probably also for
medium-term storage. My hope is that a compressed JSON is roughly the
same size as a compress raw pcap file, since basically they have the
same entropy.

The reason I bring this up is to give a feel for the size cost of a
bloated text format in practice. :)

Cheers,

--
Shane