Re: [DNSOP] Benjamin Kaduk's Discuss on draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT)

Sara Dickinson <sara@sinodun.com> Thu, 22 November 2018 12:01 UTC

Return-Path: <sara@sinodun.com>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DAADB1277D2; Thu, 22 Nov 2018 04:01:17 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.299
X-Spam-Level:
X-Spam-Status: No, score=-4.299 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=sinodun.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id r9JTWWSH7blL; Thu, 22 Nov 2018 04:01:09 -0800 (PST)
Received: from balrog.mythic-beasts.com (balrog.mythic-beasts.com [IPv6:2a00:1098:0:82:1000:0:2:1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A3B7012D4E8; Thu, 22 Nov 2018 04:01:08 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sinodun.com ; s=balrog-2018; h=To:Date:Subject:From; bh=iWcr4NH4NpaNzm3YWrQ6C/hjlbsubeukTbod6pGewX4=; b=mZT4/sl2HLqnt6VvDWnOaUnJug ocCwPgsA9oAjEbbmM/On8jB+S4g4ib/kQMUhJ7/X7yRsVEN/paXeItlbhfmseQxzJ22gnqOOr1onF ZsgAiPT1E5E2oHfI5LfjLthgrYnJhbXMsqALwHnHVxrpwLe0tZCoOpw4XvrFDxMebFdJwqTzOnFNF 2id5WOWNGsMLufsARBcAwX0fVYF+lOFH09fKlfa/Yv2CyntZsflr9w8hSq8saizr1/fKnWc18Eesn nwBuA8508JmhWkt2eCiprgXo4dneqwAUPxBjUR0nMaV5fEsNjUPVYSxxrvaQEEjr9TD1w6ggZ0Px1 bKYtPpQA==;
Received: from [2001:b98:204:102:fffa::409] (port=59981) by balrog.mythic-beasts.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <sara@sinodun.com>) id 1gPnfK-00038p-Pq; Thu, 22 Nov 2018 12:01:06 +0000
From: Sara Dickinson <sara@sinodun.com>
Message-Id: <8538EA17-143F-4855-A658-B78701D9B37C@sinodun.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_EDE686B6-1284-431C-93FF-742EF7F5FF19"
Mime-Version: 1.0 (Mac OS X Mail 12.0 \(3445.100.39\))
Date: Thu, 22 Nov 2018 12:01:00 +0000
In-Reply-To: <154258729961.2478.12875770828573692533.idtracker@ietfa.amsl.com>
Cc: The IESG <iesg@ietf.org>, draft-ietf-dnsop-dns-capture-format@ietf.org, Tim Wicinski <tjw.ietf@gmail.com>, dnsop-chairs@ietf.org, dnsop@ietf.org
To: Benjamin Kaduk <kaduk@mit.edu>
References: <154258729961.2478.12875770828573692533.idtracker@ietfa.amsl.com>
X-Mailer: Apple Mail (2.3445.100.39)
X-BlackCat-Spam-Score: 14
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/XKDETNXkuy7Emw-IIKfM_FVCB7I>
Subject: Re: [DNSOP] Benjamin Kaduk's Discuss on draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT)
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 22 Nov 2018 12:01:18 -0000

> Begin forwarded message:
> 
> From: Benjamin Kaduk <kaduk@mit.edu <mailto:kaduk@mit.edu>>
> Subject: Benjamin Kaduk's Discuss on draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT)
> Date: 19 November 2018 at 00:28:19 GMT
> To: "The IESG" <iesg@ietf.org <mailto:iesg@ietf.org>>
> Cc: draft-ietf-dnsop-dns-capture-format@ietf.org <mailto:draft-ietf-dnsop-dns-capture-format@ietf.org>, Tim Wicinski <tjw.ietf@gmail.com <mailto:tjw.ietf@gmail.com>>, dnsop-chairs@ietf.org <mailto:dnsop-chairs@ietf.org>, tjw.ietf@gmail.com <mailto:tjw.ietf@gmail.com>,  dnsop@ietf.org <mailto:dnsop@ietf.org>
> Resent-From: <alias-bounces@ietf.org <mailto:alias-bounces@ietf.org>>
> Resent-To: jad@sinodun.com <mailto:jad@sinodun.com>, jim@sinodun.com <mailto:jim@sinodun.com>, sara@sinodun.com <mailto:sara@sinodun.com>, terry.manderson@icann.org <mailto:terry.manderson@icann.org>, john.bond@icann.org <mailto:john.bond@icann.org>
> 
> Benjamin Kaduk has entered the following ballot position for
> draft-ietf-dnsop-dns-capture-format-08: Discuss

To follow up on items not addressed in our previous email.

> ----------------------------------------------------------------------
> DISCUSS:
> ----------------------------------------------------------------------
> 
> There are also a couple of fields whose semantics don't seem to be
> sufficiently well specified for a proposed-standard document, such as
> vlan-ids, generator-id, name-rdata, and ae-code.  (I understand that some
> of them are probably only going to have locally relevant semantics, but we
> should be explicit about when that's the case.)

We have addressed the specific fields mentioned here in the comments below related to each of them.

> 
> 
> ----------------------------------------------------------------------
> COMMENT:
> ----------------------------------------------------------------------
> 
> Section 2
> 
> Please consider using the RFC 8174 version of the BCP 14 boilerplate.

Yes - will replace.

> 
> Section 3
> 
>   Because of these considerations, a major factor in the design of the
>   format is minimal storage size of the capture files.
> 
> maybe "storage and transmission”?

Sure.

> 
> Section 6
> 
> In Figure 2, the Query name is marked as "(q)" (only present if there is a
> query), but the running text in Section 4 (bullet 1) says that the Question
> section from the response can be used as an identifying QNAME if there is a
> response with no corresponding query.  Am I misexpanding QNAME here, or is
> there a disagreement between these two parts of the text?  In particular, I
> do not see a part of Figure 2 that would correspond to a Question section
> in the response, given the various "(q)"/"(r)" markings.

Good spot - you are correct this is an error in the diagram and it should read 'Query name' with no qualifier. 

> 
> Section 6.2.2
> 
>   Messages with OPCODES known to the recording application but not
>   listed in the Storage Parameters are discarded (regardless of whether
>   they are malformed or not).
> 
> (Do we need to say anything that the "discarded" is only w.r.t. the capture
> process, and not meant to imply that DNS queries would not get a normal
> response?)

Suggest: “Messages with OPCODES known to the recording application but not
  listed in the Storage Parameters are discarded by the recording application 
  during C-DNS capture (regardless of whether they are malformed or not)."

> 
> Section 6.2.4
> 
> Please consider using IPv6 examples, per
> https://www.iab.org/2016/11/07/iab-statement-on-ipv6/ <https://www.iab.org/2016/11/07/iab-statement-on-ipv6/> .

Yes - will add an IPv6 example.

> 
> Section 7.2
> 
>   o  The column T gives the CBOR data type of the item.
> 
>      *  U - Unsigned integer
> 
>      *  I - Signed integer
> 
> This is venturing a bit far from my normal area of expertise, but my
> understanding is that CBOR native major types are only provided for
> unsigned integer and negative integer, with "signed integer" being an
> abstraction at a slightly higher layer that needs to be managed in the
> application.  Do we need to add any clarifying text here or will the
> meaning be clear to the reader?

CDDL happily talks about uint and int types, but we think this might
indeed be a useful clarification to implementers. We suggest:

OLD: "* I - Signed integer"
NEW: "* I - Signed integer (i.e. CBOR unsigned or negative integer)"

> 
> Section 7.4
> 
> Should probably forward-reference section 8 for the format version numbers'
> semantics.

Yes, will do. 

> 
> Section 7.4.1.1
> 
> We should we reference the IANA registries by name for any of these fields
> (e.g., opcodes, rr-types, etc.).  (Also in Section 7.5.3.1, etc.)

I thought we had done this in the last update but clearly not, will fix.

> 
> Are the storage flags going to be allocated in sequence by updating
> standards-track documents, or some other mechanism?  (Is a registry
> necessary?)

As proposed for the DISCUSS this would be a sub registry.

> 
> For the various address prefix fields, do we need to specify that the full
> addresses are stored when the corresponding prefix field is absent?

Is it sufficient to update the text in 6.2.4:

OLD: “If IP address prefixes are given, only the prefix bits of
   addresses are stored.”

NEW: “If IP address prefixes are given, only the prefix bits of
   addresses are stored. If the IP address prefixes are absent then 
   full addresses are stored."


> 
> Section 7.4.1.1.1
> 
> Am I parsing the "query-response-hints" text correctly to say that a bit is
> set in the bitmap if the corresponding field is recorded (if present) by
> the collecting implementation?  The causality of "if the field is omitted
> the bit is unset" goes in a direction that is not what I expected.
> (Similarly for the other fields in this table.)

ekr picked up on the same point - as responded to him:

"The issue is that if the bit is set the field might still be missing because although the configuration was set to collect it the data wasn’t available to the encoder from some other reason. However when the bit is not set it means that the data will definitely not be present because the collector is configured not to collect it. 

We do discuss this problem in section 6.2.1 - perhaps a reference in the table back to that discussion is what is needed?”

Looking again I think a slight update to the text in 6.2.1 might help too:

OLD:
“The Storage Parameters therefore also contains a Storage Hints item
   which specifies which items the encoder of the file omits from the
   stored data."

NEW: “The Storage Parameters therefore also contains a Storage Hints item
   which specifies which items the encoder of the file omits from the
   stored data and will therefore never be present. (This approach is taken 
  because a flag that indicated which items were included for collection would 
  not guarantee that the item was present, only that it might be.) "


> 
> Section 7.4.2
> 
> Do we need a reference for "promiscuous mode”?

Promiscuous mode is discussed on the main PCAP manpage…. Hopefully a way
will be found to address the question of a suitable reference format for
PCAP material.

> 
> Just to check: in "server-addresses", I just infer the IP version from the
> length of the byte string?

As mentioned in the DISCUSS response, we probably need to make the transport flags mandatory.

> 
> Do we need to say more about where the vlan-ids identifiers are taken from?

Suggest: 

OLD: “ | vlan-ids         | O | A | Array of identifiers (of type unsigned |
   |                  |   |   | integer) of VLANs selected for         |
   |                  |   |   | collection. “

NEW: “ | vlan-ids         | O | A | User specified array of identifiers (of type unsigned |
   |                  |   |   | integer) of VLANs  [IEEE 802.1Q] selected for         |
   |                  |   |   | collection.  "

> 
> Is the "generator-id" string intended to only be human readable?  Only
> within a specific (administrative) context?

The generator ID is intended only to identify the collecting
application. Specifying that it is human-readable (if present) seems a
good idea. Would this be sufficient?

OLD: "String identifying the collection method.”
NEW: “User specified human-readable string identifying the collection method."

> 
> Section 7.5.1
> 
> Does "earliest-time" include leap seconds?

Thanks for noticing this…after digging into it…

The description specifies the number of seconds to be the
number of seconds since the POSIX epoch ("time_t"). POSIX requires that
leap seconds be omitted from reported time, and all days are defined as
having 86,400 seconds. This means that a POSIX timestamp can be
ambiguous and refer to either of the last 2 seconds of a day containing
a leap second (who knew time could stand still in POSIX world - aargh?!) 

However, libpcap (for example) can only provide POSIX timestamps for 
packets as far as we can see… 

Do you think we should just document this as a limitation or do you have 
another option in mind?

> 
> Section 7.5.3
> 
> The "ip-address" description seems to imply that very short ipv6 prefix
> lengths could cause confusion as to the address type being indicated (e.g.,
> setting to 32 when no ipv4 prefix length is set, or setting to the same
> value as the ipv4 prefix length).  Do we need to restrict the ipv6 prefix
> lengths to being 33 or larger?
> 
> Are the "name-rdata" contents in wire format or presentation format?

Wire format. We suggesting noting this:

OLD: "Array where each entry is the content of a single NAME or RDATA"
NEW: "Array where each entry is the content of a single NAME or RDATA in
wire format"

> 
> Section 7.5.3.2
> 
> What's the allocation policy/procedure for the remaining
> qr-transport-flags transport values?  For additional bits in any/all of the
> flags fields listed here?

As proposed for the DISCUSS this would be a sub registry.

> 
> Something of a side note, what's the mnemonic for the "sig" in
> "qr-sig-flags"?  That is, what is it a signature of or over (it doesn't
> seem like it's a cryptographic signature, which may be what is confusing
> me)?

Ah, I see the confusion. No it is meant to represent the idea that in a given set of DNS query/responses there will be a finite number of combinations of the attributes in this table, each one being a signature. 

In section 4, bullet 3: 

“Examples of commonality between DNS messages are that in most
          cases the QUESTION RR is the same in the query and response,
          and that there is a finite set of query signatures (based on a
          subset of attributes). “

Perhaps updating this text would help:

“ and that there is a finite set of query ‘signatures’ (defined as a specific combination of a subset of attributes). "

> 
> For "query-rcode"/"response-rcode", should there be a reference for "OPT",
> and/or for any of the EDNS stuff in here?  (The Terminology section only
> mentions using the naming from RFC 1035, that I can see.)

Yes, we can add a reference to RFC6891.

> 
> The "mm-transport-flags" here bear a striking resemblance to the
> "qr-transport-flags" from Section 7.5.3.2; should there be a shared
> registry for their contents?  (I guess the TransportFlags CDDL to some
> extent serves this function.)

Also noticed by Alexey..

The qr-transport-flags and mm-transport-flags are different in that the qr-transport-flags include Bit 5, the trailing bytes indicator.

In the CDDL a base ’TransportFlags’ type is defined and then

mm-transport-flags     => TransportFlags,

qr-transport-flags    => QueryResponseTransportFlags,

 QueryResponseTransportFlagValues = &(
      query-trailingdata : 5,
  ) / TransportFlagValues
  QueryResponseTransportFlags = uint .bits QueryResponseTransportFlagValues

We can add some text to the table descriptions in sections 7.5.3.2 and 7.5.3.5 to clarify the relationship. 

> 
> Section 7.7
> 
> How is the value of the "ae-code" determined?

"ae-code" is intended to hold the ICMP or ICMPv6 code. We suggest making
this clearer:

OLD: "A code relating to the event."
NEW: "A code relating to the event. For ICMP or ICMPv6 events, this
should be the ICMP [RFC792] or ICMPv6 [ RFC4443] code."

> 
> Appendix A
> 
> We could perhaps apply some constraints on (e.g.) the address-prefex length
> fields to be .le the relevant lengths.
> 
> Appendix C.6
> 
>                                           Using a strong compression,
>   block sizes over 10,000 query/response pairs would seem to offer
>   limited improvements.
> 
> nit: Using a strong compression scheme

Ack. 

Best regards

Sara.