Re: [DNSOP] Benjamin Kaduk's Discuss on draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT)

Benjamin Kaduk <kaduk@mit.edu> Sat, 24 November 2018 03:58 UTC

Return-Path: <kaduk@mit.edu>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AF6EC130EE2; Fri, 23 Nov 2018 19:58:22 -0800 (PST)
X-Quarantine-ID: <TfJkX3_QRfX7>
X-Virus-Scanned: amavisd-new at amsl.com
X-Amavis-Alert: BAD HEADER SECTION, Non-encoded 8-bit data (char 9C hex): Received: ...s kaduk@ATHENA.MIT.EDU)\n\t\234by outgoing.mit[...]
X-Spam-Flag: NO
X-Spam-Score: -4.199
X-Spam-Level:
X-Spam-Status: No, score=-4.199 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TfJkX3_QRfX7; Fri, 23 Nov 2018 19:58:20 -0800 (PST)
Received: from dmz-mailsec-scanner-8.mit.edu (dmz-mailsec-scanner-8.mit.edu [18.7.68.37]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C53EB130DDF; Fri, 23 Nov 2018 19:58:19 -0800 (PST)
X-AuditID: 12074425-611ff7000000522a-b1-5bf8cc58386f
Received: from mailhub-auth-2.mit.edu ( [18.7.62.36]) (using TLS with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by dmz-mailsec-scanner-8.mit.edu (Symantec Messaging Gateway) with SMTP id 7F.76.21034.95CC8FB5; Fri, 23 Nov 2018 22:58:17 -0500 (EST)
Received: from outgoing.mit.edu (OUTGOING-AUTH-1.MIT.EDU [18.9.28.11]) by mailhub-auth-2.mit.edu (8.14.7/8.9.2) with ESMTP id wAO3wAib010912; Fri, 23 Nov 2018 22:58:12 -0500
Received: from kduck.kaduk.org (24-107-191-124.dhcp.stls.mo.charter.com [24.107.191.124]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) œby outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id wAO3w6DE020667 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 23 Nov 2018 22:58:08 -0500
Date: Fri, 23 Nov 2018 21:58:05 -0600
From: Benjamin Kaduk <kaduk@mit.edu>
To: Sara Dickinson <sara@sinodun.com>
Cc: Tim Wicinski <tjw.ietf@gmail.com>, dnsop@ietf.org, dnsop-chairs@ietf.org, The IESG <iesg@ietf.org>, draft-ietf-dnsop-dns-capture-format@ietf.org
Message-ID: <20181124035805.GG68416@kduck.kaduk.org>
References: <154258729961.2478.12875770828573692533.idtracker@ietfa.amsl.com> <8538EA17-143F-4855-A658-B78701D9B37C@sinodun.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <8538EA17-143F-4855-A658-B78701D9B37C@sinodun.com>
User-Agent: Mutt/1.9.1 (2017-09-22)
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFlrFKsWRmVeSWpSXmKPExsUixG6noht55ke0wZd3VhZvtk9isbj75jKL xZIHO5gtZvyZyGzR9usXs8W0ts3MDmweO2fdZfdYsuQnk8e9nsuMAcxRXDYpqTmZZalF+nYJ XBk7579gLvhXV7Fw1m/mBsatSV2MnBwSAiYSM3ZvZ+xi5OIQEljDJPHh1jI2CGcjo8TPn2fY IZy7TBLrdnxiBmlhEVCVeDT5JxuIzSagItHQfRksLgIUP7HoPitIA7PAXEaJ07P3sYMkhAVy JVY2TmYBsXmB9r3ubYDa18Qo0d1+mBkiIShxcuYTsCJmAXWJP/MuAcU5gGxpieX/OCDC8hLN W2eDlXMK2EtMevKQCcQWFVCW2Nt3iH0Co+AsJJNmIZk0C2HSLCSTFjCyrGKUTcmt0s1NzMwp Tk3WLU5OzMtLLdK10MvNLNFLTSndxAiOBhfVHYxz/nodYhTgYFTi4TVg/hEtxJpYVlyZe4hR koNJSZT31nygEF9SfkplRmJxRnxRaU5q8SFGCQ5mJRHebw5AOd6UxMqq1KJ8mJQ0B4uSOO8f kcfRQgLpiSWp2ampBalFMFkZDg4lCd6s00CNgkWp6akVaZk5JQhpJg5OkOE8QMNPnQIZXlyQ mFucmQ6RP8Woy9GxbtkcZiGWvPy8VClxXmGQQQIgRRmleXBzQElMInt/zStGcaC3hHmTQKp4 gAkQbtIroCVMQEvk538HWVKSiJCSamAUW/1EYd2u72pN8arTqpbGrJQsWvRxvlQOi7afqcmy P7sav5u1auf+au7hTvY9FJLwpH79hhOyBtLdDEtPzN0rpZvyRGK7lE5YBnvUOwuliPdBTbNj OO3nB96Se/26/spFlmtts5o17C9VcR7hkdzAZvA9Wal+Q8zKci6zlx5G7FJmRRfLapVYijMS DbWYi4oTAZ0/mlU9AwAA
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/b4HwGcqDU9TfZ69W5zgLRwAuZwE>
Subject: Re: [DNSOP] Benjamin Kaduk's Discuss on draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT)
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 24 Nov 2018 03:58:23 -0000

On Thu, Nov 22, 2018 at 12:01:00PM +0000, Sara Dickinson wrote:
> 
> > Begin forwarded message:
> > 
> > From: Benjamin Kaduk <kaduk@mit.edu <mailto:kaduk@mit.edu>>
> > Subject: Benjamin Kaduk's Discuss on draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT)
> > Date: 19 November 2018 at 00:28:19 GMT
> > To: "The IESG" <iesg@ietf.org <mailto:iesg@ietf.org>>
> > Cc: draft-ietf-dnsop-dns-capture-format@ietf.org <mailto:draft-ietf-dnsop-dns-capture-format@ietf.org>, Tim Wicinski <tjw.ietf@gmail.com <mailto:tjw.ietf@gmail.com>>, dnsop-chairs@ietf.org <mailto:dnsop-chairs@ietf.org>, tjw.ietf@gmail.com <mailto:tjw.ietf@gmail.com>,  dnsop@ietf.org <mailto:dnsop@ietf.org>
> > Resent-From: <alias-bounces@ietf.org <mailto:alias-bounces@ietf.org>>
> > Resent-To: jad@sinodun.com <mailto:jad@sinodun.com>, jim@sinodun.com <mailto:jim@sinodun.com>, sara@sinodun.com <mailto:sara@sinodun.com>, terry.manderson@icann.org <mailto:terry.manderson@icann.org>, john.bond@icann.org <mailto:john.bond@icann.org>
> > 
> > Benjamin Kaduk has entered the following ballot position for
> > draft-ietf-dnsop-dns-capture-format-08: Discuss
> 
> To follow up on items not addressed in our previous email.
> 
> > ----------------------------------------------------------------------
> > DISCUSS:
> > ----------------------------------------------------------------------
> > 
> > There are also a couple of fields whose semantics don't seem to be
> > sufficiently well specified for a proposed-standard document, such as
> > vlan-ids, generator-id, name-rdata, and ae-code.  (I understand that some
> > of them are probably only going to have locally relevant semantics, but we
> > should be explicit about when that's the case.)
> 
> We have addressed the specific fields mentioned here in the comments below related to each of them.
> 
> > 
> > 
> > ----------------------------------------------------------------------
> > COMMENT:
> > ----------------------------------------------------------------------
> > 
> > Section 2
> > 
> > Please consider using the RFC 8174 version of the BCP 14 boilerplate.
> 
> Yes - will replace.
> 
> > 
> > Section 3
> > 
> >   Because of these considerations, a major factor in the design of the
> >   format is minimal storage size of the capture files.
> > 
> > maybe "storage and transmission”?
> 
> Sure.
> 
> > 
> > Section 6
> > 
> > In Figure 2, the Query name is marked as "(q)" (only present if there is a
> > query), but the running text in Section 4 (bullet 1) says that the Question
> > section from the response can be used as an identifying QNAME if there is a
> > response with no corresponding query.  Am I misexpanding QNAME here, or is
> > there a disagreement between these two parts of the text?  In particular, I
> > do not see a part of Figure 2 that would correspond to a Question section
> > in the response, given the various "(q)"/"(r)" markings.
> 
> Good spot - you are correct this is an error in the diagram and it should read 'Query name' with no qualifier. 

Oh good, I was worried that I was just confusing myself, so that's
reassuring to know.

> > 
> > Section 6.2.2
> > 
> >   Messages with OPCODES known to the recording application but not
> >   listed in the Storage Parameters are discarded (regardless of whether
> >   they are malformed or not).
> > 
> > (Do we need to say anything that the "discarded" is only w.r.t. the capture
> > process, and not meant to imply that DNS queries would not get a normal
> > response?)
> 
> Suggest: “Messages with OPCODES known to the recording application but not
>   listed in the Storage Parameters are discarded by the recording application 
>   during C-DNS capture (regardless of whether they are malformed or not)."

That sounds good (and to be clear, when I asked the question I wasn't sure
if the answer would just be "no").

> > 
> > Section 6.2.4
> > 
> > Please consider using IPv6 examples, per
> > https://www.iab.org/2016/11/07/iab-statement-on-ipv6/ <https://www.iab.org/2016/11/07/iab-statement-on-ipv6/> .
> 
> Yes - will add an IPv6 example.
> 
> > 
> > Section 7.2
> > 
> >   o  The column T gives the CBOR data type of the item.
> > 
> >      *  U - Unsigned integer
> > 
> >      *  I - Signed integer
> > 
> > This is venturing a bit far from my normal area of expertise, but my
> > understanding is that CBOR native major types are only provided for
> > unsigned integer and negative integer, with "signed integer" being an
> > abstraction at a slightly higher layer that needs to be managed in the
> > application.  Do we need to add any clarifying text here or will the
> > meaning be clear to the reader?
> 
> CDDL happily talks about uint and int types, but we think this might
> indeed be a useful clarification to implementers. We suggest:
> 
> OLD: "* I - Signed integer"
> NEW: "* I - Signed integer (i.e. CBOR unsigned or negative integer)"

Sounds good.

> > 
> > Section 7.4
> > 
> > Should probably forward-reference section 8 for the format version numbers'
> > semantics.
> 
> Yes, will do. 
> 
> > 
> > Section 7.4.1.1
> > 
> > We should we reference the IANA registries by name for any of these fields
> > (e.g., opcodes, rr-types, etc.).  (Also in Section 7.5.3.1, etc.)
> 
> I thought we had done this in the last update but clearly not, will fix.
> 
> > 
> > Are the storage flags going to be allocated in sequence by updating
> > standards-track documents, or some other mechanism?  (Is a registry
> > necessary?)
> 
> As proposed for the DISCUSS this would be a sub registry.
> 
> > 
> > For the various address prefix fields, do we need to specify that the full
> > addresses are stored when the corresponding prefix field is absent?
> 
> Is it sufficient to update the text in 6.2.4:
> 
> OLD: “If IP address prefixes are given, only the prefix bits of
>    addresses are stored.”
> 
> NEW: “If IP address prefixes are given, only the prefix bits of
>    addresses are stored. If the IP address prefixes are absent then 
>    full addresses are stored."

That works for me (and as above, I wouldn't have been surprised if the answer
to my question was "no").

> 
> > 
> > Section 7.4.1.1.1
> > 
> > Am I parsing the "query-response-hints" text correctly to say that a bit is
> > set in the bitmap if the corresponding field is recorded (if present) by
> > the collecting implementation?  The causality of "if the field is omitted
> > the bit is unset" goes in a direction that is not what I expected.
> > (Similarly for the other fields in this table.)
> 
> ekr picked up on the same point - as responded to him:
> 
> "The issue is that if the bit is set the field might still be missing because although the configuration was set to collect it the data wasn’t available to the encoder from some other reason. However when the bit is not set it means that the data will definitely not be present because the collector is configured not to collect it. 
> 
> We do discuss this problem in section 6.2.1 - perhaps a reference in the table back to that discussion is what is needed?”
> 
> Looking again I think a slight update to the text in 6.2.1 might help too:
> 
> OLD:
> “The Storage Parameters therefore also contains a Storage Hints item
>    which specifies which items the encoder of the file omits from the
>    stored data."
> 
> NEW: “The Storage Parameters therefore also contains a Storage Hints item
>    which specifies which items the encoder of the file omits from the
>    stored data and will therefore never be present. (This approach is taken 
>   because a flag that indicated which items were included for collection would 
>   not guarantee that the item was present, only that it might be.) "

This text helps, but I think it is not quite what I was going after -- that
is, when I think of a "hint" that feels like something active and that
would be indicated by setting a bit to one.  In this design, the hints
about what are *omitted* are the bits that are *zero*, which is
counter-intuitive, at least to me.  So maybe we could say (in 7.4.1.1.1, in
addition to your suggested change in 6.2.1):

  Hints indicating which "QueryResponse" fields are candidates for capture or
  omitted, see section 7.6.  If a bit is unset, that field is omitted from
  the capture.

> 
> > 
> > Section 7.4.2
> > 
> > Do we need a reference for "promiscuous mode”?
> 
> Promiscuous mode is discussed on the main PCAP manpage…. Hopefully a way
> will be found to address the question of a suitable reference format for
> PCAP material.
> 
> > 
> > Just to check: in "server-addresses", I just infer the IP version from the
> > length of the byte string?
> 
> As mentioned in the DISCUSS response, we probably need to make the transport flags mandatory.
> 
> > 
> > Do we need to say more about where the vlan-ids identifiers are taken from?
> 
> Suggest: 
> 
> OLD: “ | vlan-ids         | O | A | Array of identifiers (of type unsigned |
>    |                  |   |   | integer) of VLANs selected for         |
>    |                  |   |   | collection. “
> 
> NEW: “ | vlan-ids         | O | A | User specified array of identifiers (of type unsigned |
>    |                  |   |   | integer) of VLANs  [IEEE 802.1Q] selected for         |
>    |                  |   |   | collection.  "

It seems likely to me that we want to say that the actual VLAN ID values
are only unique within an administrative domain.

> > 
> > Is the "generator-id" string intended to only be human readable?  Only
> > within a specific (administrative) context?
> 
> The generator ID is intended only to identify the collecting
> application. Specifying that it is human-readable (if present) seems a
> good idea. Would this be sufficient?
> 
> OLD: "String identifying the collection method.”
> NEW: “User specified human-readable string identifying the collection method."

Does "user-specified" mean that only the user is responsible for reading it
later (or would we want it to make sense even when the data is conveyed to
some other party)?
If so, this would be enough for to address my comment, but then Ben's
comment about internationalization concerns would come into play.

> > 
> > Section 7.5.1
> > 
> > Does "earliest-time" include leap seconds?
> 
> Thanks for noticing this…after digging into it…
> 
> The description specifies the number of seconds to be the
> number of seconds since the POSIX epoch ("time_t"). POSIX requires that
> leap seconds be omitted from reported time, and all days are defined as
> having 86,400 seconds. This means that a POSIX timestamp can be
> ambiguous and refer to either of the last 2 seconds of a day containing
> a leap second (who knew time could stand still in POSIX world - aargh?!) 
> 
> However, libpcap (for example) can only provide POSIX timestamps for 
> packets as far as we can see… 
> 
> Do you think we should just document this as a limitation or do you have 
> another option in mind?

To be honest, I was only expecting "number of seconds since the POSIX epoch
("time_t", excluding leap seconds)" or "number of seconds since the POSIX
epoch ("time_t", including leap seconds)".  My concern is just that we
state how to interpret the number in this field; choosing whichever case
the common API provides is fine, and we don't need to document it as a
limitation at all.  If someone needs to convert between TAI and UTC, we
give them enough information so that they can do it, but otherwise it's not
our problem.

> > 
> > Section 7.5.3
> > 
> > The "ip-address" description seems to imply that very short ipv6 prefix
> > lengths could cause confusion as to the address type being indicated (e.g.,
> > setting to 32 when no ipv4 prefix length is set, or setting to the same
> > value as the ipv4 prefix length).  Do we need to restrict the ipv6 prefix
> > lengths to being 33 or larger?
> > 
> > Are the "name-rdata" contents in wire format or presentation format?
> 
> Wire format. We suggesting noting this:
> 
> OLD: "Array where each entry is the content of a single NAME or RDATA"
> NEW: "Array where each entry is the content of a single NAME or RDATA in
> wire format"

Sounds good.

> > 
> > Section 7.5.3.2
> > 
> > What's the allocation policy/procedure for the remaining
> > qr-transport-flags transport values?  For additional bits in any/all of the
> > flags fields listed here?
> 
> As proposed for the DISCUSS this would be a sub registry.
> 
> > 
> > Something of a side note, what's the mnemonic for the "sig" in
> > "qr-sig-flags"?  That is, what is it a signature of or over (it doesn't
> > seem like it's a cryptographic signature, which may be what is confusing
> > me)?
> 
> Ah, I see the confusion. No it is meant to represent the idea that in a given set of DNS query/responses there will be a finite number of combinations of the attributes in this table, each one being a signature. 
> 
> In section 4, bullet 3: 
> 
> “Examples of commonality between DNS messages are that in most
>           cases the QUESTION RR is the same in the query and response,
>           and that there is a finite set of query signatures (based on a
>           subset of attributes). “
> 
> Perhaps updating this text would help:
> 
> “ and that there is a finite set of query ‘signatures’ (defined as a specific combination of a subset of attributes). "

That would help me, yes, but I have no reason to think that there is anyone
else confused in the way that I managed to confuse myself.  That is, feel
free to leave the original text unchanged if you want.
(And thank you for the explanation here in the email; it does make sense to
me now, which I appreciate.)

> > 
> > For "query-rcode"/"response-rcode", should there be a reference for "OPT",
> > and/or for any of the EDNS stuff in here?  (The Terminology section only
> > mentions using the naming from RFC 1035, that I can see.)
> 
> Yes, we can add a reference to RFC6891.
> 
> > 
> > The "mm-transport-flags" here bear a striking resemblance to the
> > "qr-transport-flags" from Section 7.5.3.2; should there be a shared
> > registry for their contents?  (I guess the TransportFlags CDDL to some
> > extent serves this function.)
> 
> Also noticed by Alexey..
> 
> The qr-transport-flags and mm-transport-flags are different in that the qr-transport-flags include Bit 5, the trailing bytes indicator.
> 
> In the CDDL a base ’TransportFlags’ type is defined and then
> 
> mm-transport-flags     => TransportFlags,
> 
> qr-transport-flags    => QueryResponseTransportFlags,
> 
>  QueryResponseTransportFlagValues = &(
>       query-trailingdata : 5,
>   ) / TransportFlagValues
>   QueryResponseTransportFlags = uint .bits QueryResponseTransportFlagValues
> 
> We can add some text to the table descriptions in sections 7.5.3.2 and 7.5.3.5 to clarify the relationship. 

That might help, since we read those sections before we get to the CDDL
that does have the shared data type.

> > 
> > Section 7.7
> > 
> > How is the value of the "ae-code" determined?
> 
> "ae-code" is intended to hold the ICMP or ICMPv6 code. We suggest making
> this clearer:
> 
> OLD: "A code relating to the event."
> NEW: "A code relating to the event. For ICMP or ICMPv6 events, this
> should be the ICMP [RFC792] or ICMPv6 [ RFC4443] code."

I think we need to say that the contents are undefined (or only locally
defined) in other cases.  But this new text is a big step forward, thanks!

-Benjamin

> > 
> > Appendix A
> > 
> > We could perhaps apply some constraints on (e.g.) the address-prefex length
> > fields to be .le the relevant lengths.
> > 
> > Appendix C.6
> > 
> >                                           Using a strong compression,
> >   block sizes over 10,000 query/response pairs would seem to offer
> >   limited improvements.
> > 
> > nit: Using a strong compression scheme
> 
> Ack. 
> 
> Best regards
> 
> Sara.