[DNSOP] Benjamin Kaduk's Discuss on draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT)

Benjamin Kaduk <kaduk@mit.edu> Mon, 19 November 2018 00:28 UTC

Return-Path: <kaduk@mit.edu>
X-Original-To: dnsop@ietf.org
Delivered-To: dnsop@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 98442124BE5; Sun, 18 Nov 2018 16:28:19 -0800 (PST)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
From: Benjamin Kaduk <kaduk@mit.edu>
To: The IESG <iesg@ietf.org>
Cc: draft-ietf-dnsop-dns-capture-format@ietf.org, Tim Wicinski <tjw.ietf@gmail.com>, dnsop-chairs@ietf.org, tjw.ietf@gmail.com, dnsop@ietf.org
X-Test-IDTracker: no
X-IETF-IDTracker: 6.88.0
Auto-Submitted: auto-generated
Precedence: bulk
Message-ID: <154258729961.2478.12875770828573692533.idtracker@ietfa.amsl.com>
Date: Sun, 18 Nov 2018 16:28:19 -0800
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/ingHrzoPcbMx6E5atAEJ5_bq9Qg>
Subject: [DNSOP] Benjamin Kaduk's Discuss on draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT)
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.29
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 19 Nov 2018 00:28:20 -0000

Benjamin Kaduk has entered the following ballot position for
draft-ietf-dnsop-dns-capture-format-08: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-dnsop-dns-capture-format/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

It is pretty shocking to not see any discussion of the privacy
considerations of storing data including client addresses (and ports)
alongside DNS transactions, given how central DNS resolution is to user
behavior on the web.  (Note that there are mentions of potentially
anonymized data in Sections 6.2 and 6.2.3 which would presumably
forward-reference the privacy considerations.)  Data normalization would
probably also be mentioned in this section, since (e.g.) the case used for
a query/response could be used in fingerprinting an implementation.

I'm also concerned about the policy/procedure for allocating/extending the
various bitfields and similar potential extension points in the data
structures.  Section 8 covers the major/minor versioning semantics with
respect to new map keys and new maps, but not addition of new bits within
existing (uint) bitmaps.  Given the usage of the CDDL .bits constraint,
it's not really clear that an IANA registry is the right tool to use, but I
think some indication of the expected way to allocate new bits is in order,
whether it's "a future standards-track document that updates this document"
or otherwise.  (I've noted many, but not all, instances of such bitmaps in
my COMMENT section.)

There are also a couple of fields whose semantics don't seem to be
sufficiently well specified for a proposed-standard document, such as
vlan-ids, generator-id, name-rdata, and ae-code.  (I understand that some
of them are probably only going to have locally relevant semantics, but we
should be explicit about when that's the case.)

If I'm reading things correctly that the IP address type is inferred from
the bytestring length, then I think we need to enforce a restriction on the
address prefix length(s) to allow for that inference to be unambiguous
(noting that we only have the *byte* length of the address fields at our
disposal for disabmgituation, and not the more precise bit-length).


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

Section 2

Please consider using the RFC 8174 version of the BCP 14 boilerplate.

Section 3

   Because of these considerations, a major factor in the design of the
   format is minimal storage size of the capture files.

maybe "storage and transmission"?

Section 6

In Figure 2, the Query name is marked as "(q)" (only present if there is a
query), but the running text in Section 4 (bullet 1) says that the Question
section from the response can be used as an identifying QNAME if there is a
response with no corresponding query.  Am I misexpanding QNAME here, or is
there a disagreement between these two parts of the text?  In particular, I
do not see a part of Figure 2 that would correspond to a Question section
in the response, given the various "(q)"/"(r)" markings.

Section 6.2.2

   Messages with OPCODES known to the recording application but not
   listed in the Storage Parameters are discarded (regardless of whether
   they are malformed or not).

(Do we need to say anything that the "discarded" is only w.r.t. the capture
process, and not meant to imply that DNS queries would not get a normal
response?)

Section 6.2.4

Please consider using IPv6 examples, per
https://www.iab.org/2016/11/07/iab-statement-on-ipv6/ .

Section 7.2

   o  The column T gives the CBOR data type of the item.

      *  U - Unsigned integer

      *  I - Signed integer

This is venturing a bit far from my normal area of expertise, but my
understanding is that CBOR native major types are only provided for
unsigned integer and negative integer, with "signed integer" being an
abstraction at a slightly higher layer that needs to be managed in the
application.  Do we need to add any clarifying text here or will the
meaning be clear to the reader?

Section 7.4

Should probably forward-reference section 8 for the format version numbers'
semantics.

Section 7.4.1.1

We should we reference the IANA registries by name for any of these fields
(e.g., opcodes, rr-types, etc.).  (Also in Section 7.5.3.1, etc.)

Are the storage flags going to be allocated in sequence by updating
standards-track documents, or some other mechanism?  (Is a registry
necessary?)

For the various address prefix fields, do we need to specify that the full
addresses are stored when the corresponding prefix field is absent?

Section 7.4.1.1.1

Am I parsing the "query-response-hints" text correctly to say that a bit is
set in the bitmap if the corresponding field is recorded (if present) by
the collecting implementation?  The causality of "if the field is omitted
the bit is unset" goes in a direction that is not what I expected.
(Similarly for the other fields in this table.)

Section 7.4.2

Do we need a reference for "promiscuous mode"?

Just to check: in "server-addresses", I just infer the IP version from the
length of the byte string?

Do we need to say more about where the vlan-ids identifiers are taken from?

Is the "generator-id" string intended to only be human readable?  Only
within a specific (administrative) context?

Section 7.5.1

Does "earliest-time" include leap seconds?

Section 7.5.3

The "ip-address" description seems to imply that very short ipv6 prefix
lengths could cause confusion as to the address type being indicated (e.g.,
setting to 32 when no ipv4 prefix length is set, or setting to the same
value as the ipv4 prefix length).  Do we need to restrict the ipv6 prefix
lengths to being 33 or larger?

Are the "name-rdata" contents in wire format or presentation format?

Section 7.5.3.2

What's the allocation policy/procedure for the remaining
qr-transport-flags transport values?  For additional bits in any/all of the
flags fields listed here?

Something of a side note, what's the mnemonic for the "sig" in
"qr-sig-flags"?  That is, what is it a signature of or over (it doesn't
seem like it's a cryptographic signature, which may be what is confusing
me)?

For "query-rcode"/"response-rcode", should there be a reference for "OPT",
and/or for any of the EDNS stuff in here?  (The Terminology section only
mentions using the naming from RFC 1035, that I can see.)

The "mm-transport-flags" here bear a striking resemblance to the
"qr-transport-flags" from Section 7.5.3.2; should there be a shared
registry for their contents?  (I guess the TransportFlags CDDL to some
extent serves this function.)

Section 7.7

How is the value of the "ae-code" determined?

Appendix A

We could perhaps apply some constraints on (e.g.) the address-prefex length
fields to be .le the relevant lengths.

Appendix C.6

                                           Using a strong compression,
   block sizes over 10,000 query/response pairs would seem to offer
   limited improvements.

nit: Using a strong compression scheme