Re: [DNSOP] Benjamin Kaduk's Discuss on draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT)

Sara Dickinson <sara@sinodun.com> Thu, 29 November 2018 15:42 UTC

Return-Path: <sara@sinodun.com>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C045C130E14; Thu, 29 Nov 2018 07:42:19 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.301
X-Spam-Level:
X-Spam-Status: No, score=-4.301 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=sinodun.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Ux7pThnAtBEy; Thu, 29 Nov 2018 07:42:11 -0800 (PST)
Received: from haggis.mythic-beasts.com (haggis.mythic-beasts.com [IPv6:2a00:1098:0:86:1000:0:2:1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7B31812E036; Thu, 29 Nov 2018 07:42:10 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sinodun.com ; s=haggis-2018; h=To:Date:From:Subject; bh=q75CHWTfsIWXbZcWXRhB4cH3P131geLPO0eCuermet8=; b=vtqNf1nrebZTehfY/8YPDmyymR oOVIP/BrXrUR/u2jnv7VtZfZP4qnytG5dTlMyQMNpyfCoHP16BMKwaWZY202lgBPQm/lyN24jLN2I p5vTwSzqKQj/0F+M41lfwaoOXvF6vjn5SnXQb5KtL5zHSP7y2tsqH+9ZD6xm+s7X40MsLWsdxgrMb OcNP1nav6et/PouD73PWwfemwJxyQ5crmmPpUGBVIISUSOCDtYt7/+nwbRpknqtmIwYdq6E8IGOhR iDJicqJhFNezq+2SCGCX54UfoXjhb2QnZ+SX2TLizC0SYuWqSXN7fXJnnCqdHL+arYznyuW2wVkfO VlGEFDNQ==;
Received: from [2001:b98:204:102:fffa::409] (port=52361) by haggis.mythic-beasts.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <sara@sinodun.com>) id 1gSOS7-000353-4N; Thu, 29 Nov 2018 15:42:07 +0000
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 12.0 \(3445.100.39\))
From: Sara Dickinson <sara@sinodun.com>
In-Reply-To: <20181124033529.GF68416@kduck.kaduk.org>
Date: Thu, 29 Nov 2018 15:42:02 +0000
Cc: Tim Wicinski <tjw.ietf@gmail.com>, dnsop <dnsop@ietf.org>, dnsop-chairs <dnsop-chairs@ietf.org>, The IESG <iesg@ietf.org>, draft-ietf-dnsop-dns-capture-format@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <AF2C2B2D-F8D4-4475-926D-EEBEB78F3995@sinodun.com>
References: <154258729961.2478.12875770828573692533.idtracker@ietfa.amsl.com> <CAD81299-8C6E-44EA-AFC0-D3A67E0057C3@sinodun.com> <20181124033529.GF68416@kduck.kaduk.org>
To: Benjamin Kaduk <kaduk@mit.edu>
X-Mailer: Apple Mail (2.3445.100.39)
X-BlackCat-Spam-Score: 4
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/24AeGph6_NG7akSS7QB6REXJZzc>
Subject: Re: [DNSOP] Benjamin Kaduk's Discuss on draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT)
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Nov 2018 15:42:20 -0000


> On 24 Nov 2018, at 03:35, Benjamin Kaduk <kaduk@mit.edu> wrote:
> 
> On Wed, Nov 21, 2018 at 01:53:09PM +0000, Sara Dickinson wrote:
>> 
>> 
>>> Begin forwarded message:
>>> 
>>> From: Benjamin Kaduk <kaduk@mit.edu <mailto:kaduk@mit.edu>>
>>> Subject: Benjamin Kaduk's Discuss on draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT)
>>> Date: 19 November 2018 at 00:28:19 GMT
>>> To: "The IESG" <iesg@ietf.org <mailto:iesg@ietf.org>>
>>> Cc: draft-ietf-dnsop-dns-capture-format@ietf.org <mailto:draft-ietf-dnsop-dns-capture-format@ietf.org>, Tim Wicinski <tjw.ietf@gmail.com <mailto:tjw.ietf@gmail.com>>, dnsop-chairs@ietf.org <mailto:dnsop-chairs@ietf.org>, tjw.ietf@gmail.com <mailto:tjw.ietf@gmail.com>, dnsop@ietf.org <mailto:dnsop@ietf.org>
>>> Resent-From: <alias-bounces@ietf.org <mailto:alias-bounces@ietf.org>>
>>> Resent-To: jad@sinodun.com <mailto:jad@sinodun.com>, jim@sinodun.com <mailto:jim@sinodun.com>, sara@sinodun.com <mailto:sara@sinodun.com>, terry.manderson@icann.org <mailto:terry.manderson@icann.org>, john.bond@icann.org <mailto:john.bond@icann.org>
>> 
>> Many thanks for the detailed review. 
>> 
>>> 
>>> ----------------------------------------------------------------------
>>> DISCUSS:
>>> ----------------------------------------------------------------------
>>> 
>>> It is pretty shocking to not see any discussion of the privacy
>>> considerations of storing data including client addresses (and ports)
>>> alongside DNS transactions, given how central DNS resolution is to user
>>> behavior on the web.  (Note that there are mentions of potentially
>>> anonymized data in Sections 6.2 and 6.2.3 which would presumably
>>> forward-reference the privacy considerations.)  Data normalization would
>>> probably also be mentioned in this section, since (e.g.) the case used for
>>> a query/response could be used in fingerprinting an implementation.
>> 
>> There have been extensive discussion of data storage risks and practices in two DPRIVE documents so I’d suggest the following changes in the first instance to address this:
> 
> This is exactly the sort of thing I was hoping to see, thank you!  I have
> just a couple tweaks to suggest, inline.
> 
>> New Privacy Considerations section:
>> “ Storage of DNS traffic by operators in PCAP and other formats is a long standing and widespread practice. Section 2.5 of draft-bortzmeyer-dprive-rfc7626-bis is an analysis of the risks to Internet users of the storage of DNS traffic data in servers (recursive resolvers, authoritative and rogue server). 
>> 
>> Section 5.2 of draft-dickinson-dprive-bcp-op describes mitigations for those risks for data stored on recursive resolvers (but which could by extension apply to authoritative servers). These include data handling practices and methods for data minimisation, IP address pseudonymization and anonymization. Appendix B of that document presents an analysis of 7 published anonymization processes. In addition RSSAC have recently published RSSAC04: " Recommendations on Anonymization Processes for Source IP Addresses Submitted for Future Analysis”[1].
>> 
>> The above analyses consider full data capture (e.g using PCAP) as a
>> baseline for privacy considerations and therefore this format
>> specification introduces no new user privacy issues beyond those of full
>> data capture. It does provides mechanisms to selectively record only
> 
> I would say "beyond those of full data capture (which are quite severe)".
> That is, while the current state of affairs is a valid baseline for
> comparison, that does not absolve us of responsibility for analyzing the
> current state of affairs.  (To be clear,
> draft-bortzmeyer-dprive-rfc7626-bis is a fine place for the bulk of that
> anlaysis to live, but in this document we should not pretend that the
> current state of affairs is a good situation to be in.)
> 
>> certain fields at the time of data capture to improve user privacy and to
>> explicitly indicate that data is sampled and or anonymised. It also
>> provide flags to indicate if data normalisation has been performed; data
>> normalisation increases user privacy by reducing the potential for
>> fingerprinting individuals however a trade-off is potentially reducing
> 
> I think "however" would be offset by commas on both sides.

Both these WFM - thanks.

And thanks for the responses below - will update the draft accordingly.

Sara. 

> 
>> the capacity to identify attack traffic via query name signatures.
>> Operators should carefully consider their operational requirements and
>> privacy policies and SHOULD capture at source the minimum user data
>> required to meet their needs“
>> 
>> [1] https://www.icann.org/en/system/files/files/rssac-040-07aug18-en.pdf <https://www.icann.org/en/system/files/files/rssac-040-07aug18-en.pdf>
>> 
>> 
>> As noted, there are a few other places we can also highlight the privacy aspects:
>> 
>> Introduction:
>> OLD: “The PCAP [pcap] or PCAP-NG [pcapng] formats are typically used in practice for packet captures, but these file formats can contain a great deal of additional  information that is not directly pertinent to DNS traffic analysis  and thus unnecessarily increases the capture file size.”
>> 
>> NEW: “The PCAP [pcap] or PCAP-NG [pcapng] formats are typically used in practice for packet captures, but these file formats can contain a great deal of additional  information that is not directly pertinent to DNS traffic analysis  and thus unnecessarily increases the capture file size. Additionally these tools and format typically have no filter mechanism to selectively record only certain fields at capture time, requiring post-processing for anonymisation or pseudonymistaion of data to protect user privacy.
>> 
>> Section 4, bullet point 2:
>> 
>> OLD: “Different users will have different requirements
>>          for data to be available for analysis.  Users with minimal
>>          requirements should not have to pay the cost of recording full
>>          data, though this will limit the ability to perform certain
>>          kinds of data analysis and also to reconstruct packet
>>          captures.  For example, omitting the resource records from a
>>          Response will reduce the C-DNS file size; in principle
>>          responses can be synthesized if there is enough context.”
>> 
>> NEW: “Different operators will have different requirements
>>          for data to be available for analysis.  Operators with minimal
>>          requirements should not have to pay the cost of recording full
>>          data, though this will limit the ability to perform certain
>>          kinds of data analysis and also to reconstruct packet
>>          captures.  For example, omitting the resource records from a
>>          Response will reduce the C-DNS file size; in principle
>>          responses can be synthesized if there is enough context.
>>          Operators may have different policies for collecting user data
>>          and can choose to omit or anonymise certain fields at
>>         capture time e.g. client address."
>> 
>> And yes, in both sections 6.2 and 6.2.3 add forward references to the Privacy Considerations section
>> 
>> 
>>> 
>>> I'm also concerned about the policy/procedure for allocating/extending the
>>> various bitfields and similar potential extension points in the data
>>> structures.  Section 8 covers the major/minor versioning semantics with
>>> respect to new map keys and new maps, but not addition of new bits within
>>> existing (uint) bitmaps.  Given the usage of the CDDL .bits constraint,
>>> it's not really clear that an IANA registry is the right tool to use, but I
>>> think some indication of the expected way to allocate new bits is in order,
>>> whether it's "a future standards-track document that updates this document"
>>> or otherwise.  (I've noted many, but not all, instances of such bitmaps in
>>> my COMMENT section.)
>> 
>> We are inclined to follow the lead of existing RFCs making use of CBOR, namely
>> * RFC8152 'CBOR Object Signing and Encryption' (July 2017)
>> * RFC8392 ‘CBOR Web Token (CWT)' (May 2018) and 
>> * RFC8428 'Sensor Measurement Lists (SenML)' (Aug 2018) 
>> and request IANA create a C-DNS registry with
>> subregistries with keys for each of the different maps used in C-DNS.
>> New entries in these subregistries would follow Expert Review as defined
>> in RFC8126. This appears to be the emerging usual way of dealing with
>> CBOR map key values, particularly integer.
> 
> That sounds like a fine path forward, thanks.
> 
>>> 
>>> There are also a couple of fields whose semantics don't seem to be
>>> sufficiently well specified for a proposed-standard document, such as
>>> vlan-ids, generator-id, name-rdata, and ae-code.  (I understand that some
>>> of them are probably only going to have locally relevant semantics, but we
>>> should be explicit about when that's the case.)
>> 
>> Acknowledged, we’ll add references or clarifications for these (will put details in a follow up mail that will also address your comments below).
> 
> Sounds good.
> 
>>> 
>>> If I'm reading things correctly that the IP address type is inferred from
>>> the bytestring length, then I think we need to enforce a restriction on the
>>> address prefix length(s) to allow for that inference to be unambiguous
>>> (noting that we only have the *byte* length of the address fields at our
>>> disposal for disabmgituation, and not the more precise bit-length).
>> 
>> Ah, the first bit of the qr-transport-flags contains a IPv4/IPv6 flag so the address type can be explicitly determined from that if it is set but of course there is a corner case where that field isn’t present we hadn’t considered so we’ll have to address that. Making that field mandatory if prefixes are used would be simplest. 
> 
> I guess I had forgotten about that bit in the qr-transport-flags on my
> first read.  Making it mandatory if prefix lengths are present ought to
> work.
> 
> -Benjamin