Re: [DNSOP] I-D Action: draft-ietf-dnsop-dns-capture-format-03.txt
Jim Hague <jim@sinodun.com> Wed, 05 July 2017 12:05 UTC
Return-Path: <jim@sinodun.com>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 09260131CC3 for <dnsop@ietfa.amsl.com>; Wed, 5 Jul 2017 05:05:35 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.2
X-Spam-Level:
X-Spam-Status: No, score=-4.2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SX78mxda6oAH for <dnsop@ietfa.amsl.com>; Wed, 5 Jul 2017 05:05:32 -0700 (PDT)
Received: from haggis.mythic-beasts.com (haggis.mythic-beasts.com [IPv6:2a00:1098:0:86:1000:0:2:1]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id EC693131CC0 for <dnsop@ietf.org>; Wed, 5 Jul 2017 05:05:31 -0700 (PDT)
Received: from [2001:b98:204:102:fff1::11] (port=63476 helo=Jims-iMac.local) by haggis.mythic-beasts.com with esmtpsa (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <jim@sinodun.com>) id 1dSj3g-0001mq-9E; Wed, 05 Jul 2017 13:05:29 +0100
To: rgibson@dyn.com
References: <149907291397.4998.8059630450980375262@ietfa.amsl.com> <CAC94RYaY81Taq-iubcE+HRGGY7mLUAoLqSqFgyLWga5wCxfLSA@mail.gmail.com>
From: Jim Hague <jim@sinodun.com>
Organization: Sinodun Internet Technologies Ltd.
Cc: dnsop@ietf.org
Message-ID: <5ec26bfa-b7c9-cdcc-2594-5e2df7bec4c8@sinodun.com>
Date: Wed, 05 Jul 2017 13:05:26 +0100
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.2.1
MIME-Version: 1.0
In-Reply-To: <CAC94RYaY81Taq-iubcE+HRGGY7mLUAoLqSqFgyLWga5wCxfLSA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Content-Language: en-GB
Content-Transfer-Encoding: 8bit
X-BlackCat-Spam-Score: -28
X-Mythic-Debug: State = no_sa; Score =
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/K7ZnFFa3tW0QDZnJXgef0-1WzOA>
Subject: Re: [DNSOP] I-D Action: draft-ietf-dnsop-dns-capture-format-03.txt
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 05 Jul 2017 12:05:35 -0000
On 04/07/2017 00:22, Richard Gibson wrote: > I looked over this draft in detail, and found a handful of ambiguous > points ("Clarifications" and "Potentially Missing Data" below). But more > importantly, it is very close to defining a format that could replace > much of my organization's in-house technology. Would you consider some > generalizations to take it over the finish line ("Extension Fields" and > "Opt-in Lossyness")? Only the suggestions related to representing time > and "classtype" items would change the representation of existing data > in such a way that implementations already supporting the draft > specification would require changes. > > *Clarifications* > * Items in the "classtype" table (section 7.11) are missing data type > documentation. Both "type" and "class" should be unsigned numbers. Thanks. Yes, the type needs adding. > * And speaking of 7.11, why are CLASS/TYPE pairs represented as CBOR > maps instead of more efficient two-item arrays? If it was an intentional > decision for clarity, then maybe the section 7.7 block preamble > "earliest-time" field should also be promoted to a map ("time-seconds", > "time-useconds", "time-pseconds", mirroring Q/R items) for the same reason. All the tables in the BlockTables section that are multi-valued are maps. I did consider making the Class/Type pairs an array, but decided to go for consistency with the other block tables containing composite data and make it a map. Yes, two-item arrays would be more space-efficient, saving two bytes per item. However, in the data we've observed, the number of entries in the Class/Type table is, as you'd expect, small, typically 15-20 entries. So we're looking at a saving of maybe 40 bytes per block. By the time you've run through compression, that advantage will be further eroded, so I ended up deciding that the cost of consistency here was worth paying. We've also considered specifying an implicit Class entry of IN (i.e. if the Class items isn't present in the map, assume IN), but as, again, the space saving is negligible prefer to keep the values explicit. Timestamps, on the other hand, I always regarded as a basic data type, so naturally a structure. Plus, of course, there's one per query/response item, so in a block the size savings are in the 10-15k bytes region, which is rather more significant. > * In "query-sig" table items (section 7.13) "transport-flags" field, the > bit corresponding to "trailing bytes" shouldn't be limited to UDP. Interesting point. We haven't to date observed trailing data over TCP, but that's not to say that somebody won't try it. > * In section 7.18, "and an unsigned key" appears to be meaningless and > should probably be removed. In most places where we are discussing a map, we've specified the type of the map key in the text, though I notice we're not 100% consistent with that. > *Potentially Missing Data* > * In "query-sig" table items (section 7.13), "transport-flags" should > probably be extended to include a TLS bit (cf. RFC 7858). Agreed. We should also look at indicators for DNS-over-HTTP, DNS-over-QUIC and any other exotica. > *Extension Fields* > Of the many potentially open-ended key-value maps (file preamble, file > preamble configuration, block preamble, block statistics, query > signatures, Q/R data), only block statistics allows for > "implementation-specific fields", and no further guidance is provided. I > think all maps should allow such fields, with a recommendation that they > use an implementation-specific prefix to avoid collisions with fields > added by other implementations or later versions of C-DNS. You are right that extensibility of the tables is not something we have considered deeply up to now, and it's definitely something that should be done. FWIW, my initial inclination is to designate as implementation-specific all key values above a threshold that allows plenty of growth space for standardised fields, as long as we can be sure that generic readers can safely skip over the fields they don't understand. This is a topic we need to discuss and flesh out. Example use > cases: [...] > * Extend the block preamble (section 7.7) to override file preamble > fields like "host-id" and "server-addresses", enabling fleet-wide file > merges. I don't quite follow why you'd need to put this informational-only stuff into the block preamble rather than the file preamble/configuration. Can you expand on that a bit? > *Opt-in Lossyness* > The format is generally quite good about allowing for detail without > requiring it. However, there are some areas where more space savings > could be had: > * Communicate aggregation of IP addresses into prefixes (i.e., the > irrelevance of least-significant bits in ip-address values) with new > "client-prefix-length-ipv4" and "client-prefix-length-ipv6" and > "server-prefix-length-ipv4" and "server-prefix-length-ipv6" file > preamble configuration options. > * Communicate case-normalizing aggregation of names (e.g., transforming > "eXaMpLe.com" into "example.com <http://example.com>") with a new > boolean-valued "name-normalization" file preamble configuration option. These are items that could be addressed by implementation-specific fields, though I do see the motivation behind wanting a standardised representation for interchange. This raises a question about a tension between the background of C-DNS to date and the slightly different angle you are coming from. We've been very much focused on using C-DNS to record traffic in a form where the packets can be recreated in wire format (i.e. as PCAP). The optional data items mean that data may be missing from those packets, but the core query and response will still be present. So, to take the next item: > * In "rr" table items (section 7.15), "ttl" should be optional to > accommodate decrementing in recursive resolver responses. and (skipping out of order) your final: > * For truly customizable aggregation, I think all query signature > (section 7.13) and Q/R (section 7.18) data item fields should be > optional... but especially Q/R data "client-port" and "transaction-id". moves the recording to a point where reconstructing wire format means that the application doing the reconstruction has to not just omit information not present in C-DNS, but must start generating values to fill in for the missing items. This feels a bit like a step that needs discussion; we need to think over the design from your point of view. Possibly those fields should be optional, but with recommendations for how to populate them when/if generating PCAP. > * In Q/R data items (section 7.18) and malformed packet records (section > 7.20), I'd like "time-useconds" broken out into "time-seconds" and > optional "time-useconds", both for parity with block-preamble > "earliest-time" and for space savings in applications that are content > with second-level resolution. time-useconds is a time offset, rather than absolute time. Splitting into seconds and optionally useconds means that people wanting usecond resolution must pay a size overhead in their collection files as there will will be an additional field, a byte for each query/response. If there is a general demand for only having time resolution at the second level, possibly there should instead be a configuration field indicating whether the offset is in usec or sec. I see that second offsets would provide for notable size savings. I'm interested to hear whether others also have similar use cases to the above. We'll be doing a 10 minute slot on C-DNS in Prague, and would welcome discussion there. -- Jim Hague - jim@sinodun.com Never trust a computer you can't lift.
- [DNSOP] I-D Action: draft-ietf-dnsop-dns-capture-… internet-drafts
- Re: [DNSOP] I-D Action: draft-ietf-dnsop-dns-capt… John Dickinson
- Re: [DNSOP] I-D Action: draft-ietf-dnsop-dns-capt… Marc Groeneweg
- Re: [DNSOP] I-D Action: draft-ietf-dnsop-dns-capt… Ray Bellis
- Re: [DNSOP] I-D Action: draft-ietf-dnsop-dns-capt… Paul Hoffman
- Re: [DNSOP] I-D Action: draft-ietf-dnsop-dns-capt… Richard Gibson
- Re: [DNSOP] I-D Action: draft-ietf-dnsop-dns-capt… Tim Wicinski
- Re: [DNSOP] I-D Action: draft-ietf-dnsop-dns-capt… Jim Hague
- Re: [DNSOP] I-D Action: draft-ietf-dnsop-dns-capt… Richard Gibson
- Re: [DNSOP] I-D Action: draft-ietf-dnsop-dns-capt… Jim Hague
- Re: [DNSOP] I-D Action: draft-ietf-dnsop-dns-capt… Richard Gibson
- Re: [DNSOP] I-D Action: draft-ietf-dnsop-dns-capt… Stephane Bortzmeyer
- [DNSOP] C-DNS at the Hackathon (Was: I-D Action: … Stephane Bortzmeyer
- Re: [DNSOP] C-DNS at the Hackathon (Was: I-D Acti… Stephane Bortzmeyer