Re: [DNSOP] I-D Action: draft-ietf-dnsop-dns-capture-format-03.txt

Jim Hague <jim@sinodun.com> Wed, 05 July 2017 12:05 UTC

Return-Path: <jim@sinodun.com>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 09260131CC3 for <dnsop@ietfa.amsl.com>; Wed, 5 Jul 2017 05:05:35 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.2
X-Spam-Level:
X-Spam-Status: No, score=-4.2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SX78mxda6oAH for <dnsop@ietfa.amsl.com>; Wed, 5 Jul 2017 05:05:32 -0700 (PDT)
Received: from haggis.mythic-beasts.com (haggis.mythic-beasts.com [IPv6:2a00:1098:0:86:1000:0:2:1]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id EC693131CC0 for <dnsop@ietf.org>; Wed, 5 Jul 2017 05:05:31 -0700 (PDT)
Received: from [2001:b98:204:102:fff1::11] (port=63476 helo=Jims-iMac.local) by haggis.mythic-beasts.com with esmtpsa (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <jim@sinodun.com>) id 1dSj3g-0001mq-9E; Wed, 05 Jul 2017 13:05:29 +0100
To: rgibson@dyn.com
References: <149907291397.4998.8059630450980375262@ietfa.amsl.com> <CAC94RYaY81Taq-iubcE+HRGGY7mLUAoLqSqFgyLWga5wCxfLSA@mail.gmail.com>
From: Jim Hague <jim@sinodun.com>
Organization: Sinodun Internet Technologies Ltd.
Cc: dnsop@ietf.org
Message-ID: <5ec26bfa-b7c9-cdcc-2594-5e2df7bec4c8@sinodun.com>
Date: Wed, 05 Jul 2017 13:05:26 +0100
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.2.1
MIME-Version: 1.0
In-Reply-To: <CAC94RYaY81Taq-iubcE+HRGGY7mLUAoLqSqFgyLWga5wCxfLSA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Content-Language: en-GB
Content-Transfer-Encoding: 8bit
X-BlackCat-Spam-Score: -28
X-Mythic-Debug: State = no_sa; Score =
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/K7ZnFFa3tW0QDZnJXgef0-1WzOA>
Subject: Re: [DNSOP] I-D Action: draft-ietf-dnsop-dns-capture-format-03.txt
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 05 Jul 2017 12:05:35 -0000

On 04/07/2017 00:22, Richard Gibson wrote:
> I looked over this draft in detail, and found a handful of ambiguous
> points ("Clarifications" and "Potentially Missing Data" below). But more
> importantly, it is very close to defining a format that could replace
> much of my organization's in-house technology. Would you consider some
> generalizations to take it over the finish line ("Extension Fields" and
> "Opt-in Lossyness")? Only the suggestions related to representing time
> and "classtype" items would change the representation of existing data
> in such a way that implementations already supporting the draft
> specification would require changes.
>
> *Clarifications*
> * Items in the "classtype" table (section 7.11) are missing data type
> documentation. Both "type" and "class" should be unsigned numbers.

Thanks. Yes, the type needs adding.

>   * And speaking of 7.11, why are CLASS/TYPE pairs represented as CBOR
> maps instead of more efficient two-item arrays? If it was an intentional
> decision for clarity, then maybe the section 7.7 block preamble
> "earliest-time" field should also be promoted to a map ("time-seconds",
> "time-useconds", "time-pseconds", mirroring Q/R items) for the same
reason.

All the tables in the BlockTables section that are multi-valued are
maps. I did consider making the Class/Type pairs an array, but decided
to go for consistency with the other block tables containing composite
data and make it a map. Yes, two-item arrays would be more
space-efficient, saving two bytes per item. However, in the data we've
observed, the number of entries in the Class/Type table is, as you'd
expect, small, typically 15-20 entries. So we're looking at a saving of
maybe 40 bytes per block. By the time you've run through compression,
that advantage will be further eroded, so I ended up deciding that the
cost of consistency here was worth paying.

We've also considered specifying an implicit Class entry of IN (i.e. if
the Class items isn't present in the map, assume IN), but as, again, the
space saving is negligible prefer to keep the values explicit.

Timestamps, on the other hand, I always regarded as a basic data type,
so naturally a structure. Plus, of course, there's one per
query/response item, so in a block the size savings are in the 10-15k
bytes region, which is rather more significant.

> * In "query-sig" table items (section 7.13) "transport-flags" field, the
> bit corresponding to "trailing bytes" shouldn't be limited to UDP.

Interesting point. We haven't to date observed trailing data over TCP,
but that's not to say that somebody won't try it.

> * In section 7.18, "and an unsigned key" appears to be meaningless and
> should probably be removed.

In most places where we are discussing a map, we've specified the type
of the map key in the text, though I notice we're not 100% consistent
with that.

> *Potentially Missing Data*
> * In "query-sig" table items (section 7.13), "transport-flags" should
> probably be extended to include a TLS bit (cf. RFC 7858).

Agreed. We should also look at indicators for DNS-over-HTTP,
DNS-over-QUIC and any other exotica.

> *Extension Fields*
> Of the many potentially open-ended key-value maps (file preamble, file
> preamble configuration, block preamble, block statistics, query
> signatures, Q/R data), only block statistics allows for
> "implementation-specific fields", and no further guidance is provided. I
> think all maps should allow such fields, with a recommendation that they
> use an implementation-specific prefix to avoid collisions with fields
> added by other implementations or later versions of C-DNS.

You are right that extensibility of the tables is not something we have
considered deeply up to now, and it's definitely something that should
be done. FWIW, my initial inclination is to designate as
implementation-specific all key values above a threshold that allows
plenty of growth space for standardised fields, as long as we can be
sure that generic readers can safely skip over the fields they don't
understand.

This is a topic we need to discuss and flesh out.

 Example use
> cases:
[...]
> * Extend the block preamble (section 7.7) to override file preamble
> fields like "host-id" and "server-addresses", enabling fleet-wide file
> merges.

I don't quite follow why you'd need to put this informational-only stuff
into the block preamble rather than the file preamble/configuration. Can
you expand on that a bit?

> *Opt-in Lossyness*
> The format is generally quite good about allowing for detail without
> requiring it. However, there are some areas where more space savings
> could be had:
> * Communicate aggregation of IP addresses into prefixes (i.e., the
> irrelevance of least-significant bits in ip-address values) with new
> "client-prefix-length-ipv4" and "client-prefix-length-ipv6" and
> "server-prefix-length-ipv4" and "server-prefix-length-ipv6" file
> preamble configuration options.
> * Communicate case-normalizing aggregation of names (e.g., transforming
> "eXaMpLe.com" into "example.com <http://example.com>") with a new
> boolean-valued "name-normalization" file preamble configuration option.

These are items that could be addressed by implementation-specific
fields, though I do see the motivation behind wanting a standardised
representation for interchange.

This raises a question about a tension between the background of C-DNS
to date and the slightly different angle you are coming from. We've been
very much focused on using C-DNS to record traffic in a form where the
packets can be recreated in wire format (i.e. as PCAP). The optional
data items mean that data may be missing from those packets, but the
core query and response will still be present.

So, to take the next item:

> * In "rr" table items (section 7.15), "ttl" should be optional to
> accommodate decrementing in recursive resolver responses.

and (skipping out of order) your final:

> * For truly customizable aggregation, I think all query signature
> (section 7.13) and Q/R (section 7.18) data item fields should be
> optional... but especially Q/R data "client-port" and "transaction-id".

moves the recording to a point where reconstructing wire format means
that the application doing the reconstruction has to not just omit
information not present in C-DNS, but must start generating values to
fill in for the missing items. This feels a bit like a step that needs
discussion; we need to think over the design from your point of view.
Possibly those fields should be optional, but with recommendations for
how to populate them when/if generating PCAP.

> * In Q/R data items (section 7.18) and malformed packet records (section
> 7.20), I'd like "time-useconds" broken out into "time-seconds" and
> optional "time-useconds", both for parity with block-preamble
> "earliest-time" and for space savings in applications that are content
> with second-level resolution.

time-useconds is a time offset, rather than absolute time. Splitting
into seconds and optionally useconds means that people wanting usecond
resolution must pay a size overhead in their collection files as there
will will be an additional field, a byte for each query/response. If
there is a general demand for only having time resolution at the second
level, possibly there should instead be a configuration field indicating
whether the offset is in usec or sec. I see that second offsets would
provide for notable size savings.

I'm interested to hear whether others also have similar use cases to the
above.

We'll be doing a 10 minute slot on C-DNS in Prague, and would welcome
discussion there.
-- 
Jim Hague - jim@sinodun.com          Never trust a computer you can't lift.