Re: [DNSOP] Benjamin Kaduk's Discuss on draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT)

Benjamin Kaduk <kaduk@mit.edu> Sat, 24 November 2018 03:58 UTC

Date: Fri, 23 Nov 2018 21:58:05 -0600
From: Benjamin Kaduk <kaduk@mit.edu>
To: Sara Dickinson <sara@sinodun.com>
Cc: Tim Wicinski <tjw.ietf@gmail.com>, dnsop@ietf.org, dnsop-chairs@ietf.org, The IESG <iesg@ietf.org>, draft-ietf-dnsop-dns-capture-format@ietf.org
Message-ID: <20181124035805.GG68416@kduck.kaduk.org>
References: <154258729961.2478.12875770828573692533.idtracker@ietfa.amsl.com> <8538EA17-143F-4855-A658-B78701D9B37C@sinodun.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <8538EA17-143F-4855-A658-B78701D9B37C@sinodun.com>
User-Agent: Mutt/1.9.1 (2017-09-22)
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/b4HwGcqDU9TfZ69W5zgLRwAuZwE>
Subject: Re: [DNSOP] Benjamin Kaduk's Discuss on draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT)
Precedence: list

On Thu, Nov 22, 2018 at 12:01:00PM +0000, Sara Dickinson wrote:
> 
> > Begin forwarded message:
> > 
> > From: Benjamin Kaduk <kaduk@mit.edu <mailto:kaduk@mit.edu>>
> > Subject: Benjamin Kaduk's Discuss on draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT)
> > Date: 19 November 2018 at 00:28:19 GMT
> > To: "The IESG" <iesg@ietf.org <mailto:iesg@ietf.org>>
> > Cc: draft-ietf-dnsop-dns-capture-format@ietf.org <mailto:draft-ietf-dnsop-dns-capture-format@ietf.org>, Tim Wicinski <tjw.ietf@gmail.com <mailto:tjw.ietf@gmail.com>>, dnsop-chairs@ietf.org <mailto:dnsop-chairs@ietf.org>, tjw.ietf@gmail.com <mailto:tjw.ietf@gmail.com>,  dnsop@ietf.org <mailto:dnsop@ietf.org>
> > Resent-From: <alias-bounces@ietf.org <mailto:alias-bounces@ietf.org>>
> > Resent-To: jad@sinodun.com <mailto:jad@sinodun.com>, jim@sinodun.com <mailto:jim@sinodun.com>, sara@sinodun.com <mailto:sara@sinodun.com>, terry.manderson@icann.org <mailto:terry.manderson@icann.org>, john.bond@icann.org <mailto:john.bond@icann.org>
> > 
> > Benjamin Kaduk has entered the following ballot position for
> > draft-ietf-dnsop-dns-capture-format-08: Discuss
> 
> To follow up on items not addressed in our previous email.
> 
> > ----------------------------------------------------------------------
> > DISCUSS:
> > ----------------------------------------------------------------------
> > 
> > There are also a couple of fields whose semantics don't seem to be
> > sufficiently well specified for a proposed-standard document, such as
> > vlan-ids, generator-id, name-rdata, and ae-code.  (I understand that some
> > of them are probably only going to have locally relevant semantics, but we
> > should be explicit about when that's the case.)
> 
> We have addressed the specific fields mentioned here in the comments below related to each of them.
> 
> > 
> > 
> > ----------------------------------------------------------------------
> > COMMENT:
> > ----------------------------------------------------------------------
> > 
> > Section 2
> > 
> > Please consider using the RFC 8174 version of the BCP 14 boilerplate.
> 
> Yes - will replace.
> 
> > 
> > Section 3
> > 
> >   Because of these considerations, a major factor in the design of the
> >   format is minimal storage size of the capture files.
> > 
> > maybe "storage and transmission”?
> 
> Sure.
> 
> > 
> > Section 6
> > 
> > In Figure 2, the Query name is marked as "(q)" (only present if there is a
> > query), but the running text in Section 4 (bullet 1) says that the Question
> > section from the response can be used as an identifying QNAME if there is a
> > response with no corresponding query.  Am I misexpanding QNAME here, or is
> > there a disagreement between these two parts of the text?  In particular, I
> > do not see a part of Figure 2 that would correspond to a Question section
> > in the response, given the various "(q)"/"(r)" markings.
> 
> Good spot - you are correct this is an error in the diagram and it should read 'Query name' with no qualifier. 

Oh good, I was worried that I was just confusing myself, so that's
reassuring to know.

> > 
> > Section 6.2.2
> > 
> >   Messages with OPCODES known to the recording application but not
> >   listed in the Storage Parameters are discarded (regardless of whether
> >   they are malformed or not).
> > 
> > (Do we need to say anything that the "discarded" is only w.r.t. the capture
> > process, and not meant to imply that DNS queries would not get a normal
> > response?)
> 
> Suggest: “Messages with OPCODES known to the recording application but not
>   listed in the Storage Parameters are discarded by the recording application 
>   during C-DNS capture (regardless of whether they are malformed or not)."

That sounds good (and to be clear, when I asked the question I wasn't sure
if the answer would just be "no").

> > 
> > Section 6.2.4
> > 
> > Please consider using IPv6 examples, per
> > https://www.iab.org/2016/11/07/iab-statement-on-ipv6/ <https://www.iab.org/2016/11/07/iab-statement-on-ipv6/> .
> 
> Yes - will add an IPv6 example.
> 
> > 
> > Section 7.2
> > 
> >   o  The column T gives the CBOR data type of the item.
> > 
> >      *  U - Unsigned integer
> > 
> >      *  I - Signed integer
> > 
> > This is venturing a bit far from my normal area of expertise, but my
> > understanding is that CBOR native major types are only provided for
> > unsigned integer and negative integer, with "signed integer" being an
> > abstraction at a slightly higher layer that needs to be managed in the
> > application.  Do we need to add any clarifying text here or will the
> > meaning be clear to the reader?
> 
> CDDL happily talks about uint and int types, but we think this might
> indeed be a useful clarification to implementers. We suggest:
> 
> OLD: "* I - Signed integer"
> NEW: "* I - Signed integer (i.e. CBOR unsigned or negative integer)"

Sounds good.

> > 
> > Section 7.4
> > 
> > Should probably forward-reference section 8 for the format version numbers'
> > semantics.
> 
> Yes, will do. 
> 
> > 
> > Section 7.4.1.1
> > 
> > We should we reference the IANA registries by name for any of these fields
> > (e.g., opcodes, rr-types, etc.).  (Also in Section 7.5.3.1, etc.)
> 
> I thought we had done this in the last update but clearly not, will fix.
> 
> > 
> > Are the storage flags going to be allocated in sequence by updating
> > standards-track documents, or some other mechanism?  (Is a registry
> > necessary?)
> 
> As proposed for the DISCUSS this would be a sub registry.
> 
> > 
> > For the various address prefix fields, do we need to specify that the full
> > addresses are stored when the corresponding prefix field is absent?
> 
> Is it sufficient to update the text in 6.2.4:
> 
> OLD: “If IP address prefixes are given, only the prefix bits of
>    addresses are stored.”
> 
> NEW: “If IP address prefixes are given, only the prefix bits of
>    addresses are stored. If the IP address prefixes are absent then 
>    full addresses are stored."

That works for me (and as above, I wouldn't have been surprised if the answer
to my question was "no").

> 
> > 
> > Section 7.4.1.1.1
> > 
> > Am I parsing the "query-response-hints" text correctly to say that a bit is
> > set in the bitmap if the corresponding field is recorded (if present) by
> > the collecting implementation?  The causality of "if the field is omitted
> > the bit is unset" goes in a direction that is not what I expected.
> > (Similarly for the other fields in this table.)
> 
> ekr picked up on the same point - as responded to him:
> 
> "The issue is that if the bit is set the field might still be missing because although the configuration was set to collect it the data wasn’t available to the encoder from some other reason. However when the bit is not set it means that the data will definitely not be present because the collector is configured not to collect it. 
> 
> We do discuss this problem in section 6.2.1 - perhaps a reference in the table back to that discussion is what is needed?”
> 
> Looking again I think a slight update to the text in 6.2.1 might help too:
> 
> OLD:
> “The Storage Parameters therefore also contains a Storage Hints item
>    which specifies which items the encoder of the file omits from the
>    stored data."
> 
> NEW: “The Storage Parameters therefore also contains a Storage Hints item
>    which specifies which items the encoder of the file omits from the
>    stored data and will therefore never be present. (This approach is taken 
>   because a flag that indicated which items were included for collection would 
>   not guarantee that the item was present, only that it might be.) "

This text helps, but I think it is not quite what I was going after -- that
is, when I think of a "hint" that feels like something active and that
would be indicated by setting a bit to one.  In this design, the hints
about what are *omitted* are the bits that are *zero*, which is
counter-intuitive, at least to me.  So maybe we could say (in 7.4.1.1.1, in
addition to your suggested change in 6.2.1):

  Hints indicating which "QueryResponse" fields are candidates for capture or
  omitted, see section 7.6.  If a bit is unset, that field is omitted from
  the capture.

> 
> > 
> > Section 7.4.2
> > 
> > Do we need a reference for "promiscuous mode”?
> 
> Promiscuous mode is discussed on the main PCAP manpage…. Hopefully a way
> will be found to address the question of a suitable reference format for
> PCAP material.
> 
> > 
> > Just to check: in "server-addresses", I just infer the IP version from the
> > length of the byte string?
> 
> As mentioned in the DISCUSS response, we probably need to make the transport flags mandatory.
> 
> > 
> > Do we need to say more about where the vlan-ids identifiers are taken from?
> 
> Suggest: 
> 
> OLD: “ | vlan-ids         | O | A | Array of identifiers (of type unsigned |
>    |                  |   |   | integer) of VLANs selected for         |
>    |                  |   |   | collection. “
> 
> NEW: “ | vlan-ids         | O | A | User specified array of identifiers (of type unsigned |
>    |                  |   |   | integer) of VLANs  [IEEE 802.1Q] selected for         |
>    |                  |   |   | collection.  "

It seems likely to me that we want to say that the actual VLAN ID values
are only unique within an administrative domain.

> > 
> > Is the "generator-id" string intended to only be human readable?  Only
> > within a specific (administrative) context?
> 
> The generator ID is intended only to identify the collecting
> application. Specifying that it is human-readable (if present) seems a
> good idea. Would this be sufficient?
> 
> OLD: "String identifying the collection method.”
> NEW: “User specified human-readable string identifying the collection method."

Does "user-specified" mean that only the user is responsible for reading it
later (or would we want it to make sense even when the data is conveyed to
some other party)?
If so, this would be enough for to address my comment, but then Ben's
comment about internationalization concerns would come into play.

> > 
> > Section 7.5.1
> > 
> > Does "earliest-time" include leap seconds?
> 
> Thanks for noticing this…after digging into it…
> 
> The description specifies the number of seconds to be the
> number of seconds since the POSIX epoch ("time_t"). POSIX requires that
> leap seconds be omitted from reported time, and all days are defined as
> having 86,400 seconds. This means that a POSIX timestamp can be
> ambiguous and refer to either of the last 2 seconds of a day containing
> a leap second (who knew time could stand still in POSIX world - aargh?!) 
> 
> However, libpcap (for example) can only provide POSIX timestamps for 
> packets as far as we can see… 
> 
> Do you think we should just document this as a limitation or do you have 
> another option in mind?

To be honest, I was only expecting "number of seconds since the POSIX epoch
("time_t", excluding leap seconds)" or "number of seconds since the POSIX
epoch ("time_t", including leap seconds)".  My concern is just that we
state how to interpret the number in this field; choosing whichever case
the common API provides is fine, and we don't need to document it as a
limitation at all.  If someone needs to convert between TAI and UTC, we
give them enough information so that they can do it, but otherwise it's not
our problem.

> > 
> > Section 7.5.3
> > 
> > The "ip-address" description seems to imply that very short ipv6 prefix
> > lengths could cause confusion as to the address type being indicated (e.g.,
> > setting to 32 when no ipv4 prefix length is set, or setting to the same
> > value as the ipv4 prefix length).  Do we need to restrict the ipv6 prefix
> > lengths to being 33 or larger?
> > 
> > Are the "name-rdata" contents in wire format or presentation format?
> 
> Wire format. We suggesting noting this:
> 
> OLD: "Array where each entry is the content of a single NAME or RDATA"
> NEW: "Array where each entry is the content of a single NAME or RDATA in
> wire format"

Sounds good.

> > 
> > Section 7.5.3.2
> > 
> > What's the allocation policy/procedure for the remaining
> > qr-transport-flags transport values?  For additional bits in any/all of the
> > flags fields listed here?
> 
> As proposed for the DISCUSS this would be a sub registry.
> 
> > 
> > Something of a side note, what's the mnemonic for the "sig" in
> > "qr-sig-flags"?  That is, what is it a signature of or over (it doesn't
> > seem like it's a cryptographic signature, which may be what is confusing
> > me)?
> 
> Ah, I see the confusion. No it is meant to represent the idea that in a given set of DNS query/responses there will be a finite number of combinations of the attributes in this table, each one being a signature. 
> 
> In section 4, bullet 3: 
> 
> “Examples of commonality between DNS messages are that in most
>           cases the QUESTION RR is the same in the query and response,
>           and that there is a finite set of query signatures (based on a
>           subset of attributes). “
> 
> Perhaps updating this text would help:
> 
> “ and that there is a finite set of query ‘signatures’ (defined as a specific combination of a subset of attributes). "

That would help me, yes, but I have no reason to think that there is anyone
else confused in the way that I managed to confuse myself.  That is, feel
free to leave the original text unchanged if you want.
(And thank you for the explanation here in the email; it does make sense to
me now, which I appreciate.)

> > 
> > For "query-rcode"/"response-rcode", should there be a reference for "OPT",
> > and/or for any of the EDNS stuff in here?  (The Terminology section only
> > mentions using the naming from RFC 1035, that I can see.)
> 
> Yes, we can add a reference to RFC6891.
> 
> > 
> > The "mm-transport-flags" here bear a striking resemblance to the
> > "qr-transport-flags" from Section 7.5.3.2; should there be a shared
> > registry for their contents?  (I guess the TransportFlags CDDL to some
> > extent serves this function.)
> 
> Also noticed by Alexey..
> 
> The qr-transport-flags and mm-transport-flags are different in that the qr-transport-flags include Bit 5, the trailing bytes indicator.
> 
> In the CDDL a base ’TransportFlags’ type is defined and then
> 
> mm-transport-flags     => TransportFlags,
> 
> qr-transport-flags    => QueryResponseTransportFlags,
> 
>  QueryResponseTransportFlagValues = &(
>       query-trailingdata : 5,
>   ) / TransportFlagValues
>   QueryResponseTransportFlags = uint .bits QueryResponseTransportFlagValues
> 
> We can add some text to the table descriptions in sections 7.5.3.2 and 7.5.3.5 to clarify the relationship. 

That might help, since we read those sections before we get to the CDDL
that does have the shared data type.

> > 
> > Section 7.7
> > 
> > How is the value of the "ae-code" determined?
> 
> "ae-code" is intended to hold the ICMP or ICMPv6 code. We suggest making
> this clearer:
> 
> OLD: "A code relating to the event."
> NEW: "A code relating to the event. For ICMP or ICMPv6 events, this
> should be the ICMP [RFC792] or ICMPv6 [ RFC4443] code."

I think we need to say that the contents are undefined (or only locally
defined) in other cases.  But this new text is a big step forward, thanks!

-Benjamin

> > 
> > Appendix A
> > 
> > We could perhaps apply some constraints on (e.g.) the address-prefex length
> > fields to be .le the relevant lengths.
> > 
> > Appendix C.6
> > 
> >                                           Using a strong compression,
> >   block sizes over 10,000 query/response pairs would seem to offer
> >   limited improvements.
> > 
> > nit: Using a strong compression scheme
> 
> Ack. 
> 
> Best regards
> 
> Sara.

[DNSOP] Benjamin Kaduk's Discuss on draft-ietf-dn… Benjamin Kaduk
Re: [DNSOP] Benjamin Kaduk's Discuss on draft-iet… Sara Dickinson
Re: [DNSOP] Benjamin Kaduk's Discuss on draft-iet… Sara Dickinson
Re: [DNSOP] Benjamin Kaduk's Discuss on draft-iet… Richard Gibson
Re: [DNSOP] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk
Re: [DNSOP] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk
Re: [DNSOP] Benjamin Kaduk's Discuss on draft-iet… Tony Finch
Re: [DNSOP] Benjamin Kaduk's Discuss on draft-iet… Mark Andrews
Re: [DNSOP] Benjamin Kaduk's Discuss on draft-iet… Brian Dickson
Re: [DNSOP] Benjamin Kaduk's Discuss on draft-iet… Tony Finch
Re: [DNSOP] Benjamin Kaduk's Discuss on draft-iet… Sara Dickinson
Re: [DNSOP] Benjamin Kaduk's Discuss on draft-iet… Sara Dickinson
Re: [DNSOP] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk