Benjamin Kaduk's Discuss on draft-ietf-6man-rfc4941bis-11: (with DISCUSS and COMMENT)

Benjamin Kaduk via Datatracker <noreply@ietf.org> Tue, 20 October 2020 02:45 UTC

MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
From: Benjamin Kaduk via Datatracker <noreply@ietf.org>
To: The IESG <iesg@ietf.org>
Cc: draft-ietf-6man-rfc4941bis@ietf.org, 6man-chairs@ietf.org, ipv6@ietf.org, Ole Trøan <otroan@employees.org>, otroan@employees.org
Subject: Benjamin Kaduk's Discuss on draft-ietf-6man-rfc4941bis-11: (with DISCUSS and COMMENT)
Auto-Submitted: auto-generated
Precedence: bulk
Reply-To: Benjamin Kaduk <kaduk@mit.edu>
Message-ID: <160316195336.2217.7354030088928179279@ietfa.amsl.com>
Date: Mon, 19 Oct 2020 19:45:53 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/ipv6/9pjsdIwjMpH0hHiTLFxTcDtEgMo>

Benjamin Kaduk has entered the following ballot position for
draft-ietf-6man-rfc4941bis-11: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-6man-rfc4941bis/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

(1) I am not entirely sure what we mean by saying that temporary addresses
must have a lifetime that is "statistically different" across different
addresses, and accordingly I am not sure that the procedures in Section
3.4+3.5 for rereshing a temporary address achieve that property.  (The
text about "statistically different" does not appear in RFC 4941, and
the relevant parts of Section 3.4/3.5 are unchanged from RFC 4941, so
this may be the result of an incomplete update.)

Specifically, when Section 3.5 says to "[repeat] the actions described
in Section 3.4, starting at step 4" that seems to (for long-lived PIOs)
result in, e.g., the new temporary address having lifetime
TEMP_VALID_LIFETIME starting at exactly the time when the previous one
expired; wouldn't an observer be able to trivially correlate "new
address showed up with TEMP_VALID_LIFETIME" with "address that expired
at that time"?  Note that the attacker does not need to know the value
of TEMP_VALID_LIFETIME in order to perform a DFT on the distribution of
"new address" events.  (Furthermore, we apparently qualify the
"repeating the actions" with some caveats, which doesn't exactly qualify
as "repeating the actions" anymore.  That said, the caveats currently
listed in Section 3.5 don't seem to be enough to provide the
"statistically different property" in what I believe to be the intended
interpretation.)

(2) Please fix the reference for DupAddrDetectTransmits in Section 3.8
-- it is defined in 4862, while RetransTimer is in 4861.

(3) RFC 4941 cannot be a *normative* reference of this document if we are
going to Obsolete it.


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

Section 1.2

Having followed many of the references from the Introduction, it seems
that there could be an additional aspect to the problem statement,
namely the question of whether an attacker can (statistically) determine
whether or not there is a host at a given address/IID.  When such an
ability is present, techniques (e.g., pen-testing) involving scanning
the entire address space become more feasible.  I do not think this
potential aspect needs to be mentioned, per se, but do not know if it
was considered for inclusion or not.

   The correlation can be performed by

   o  An attacker who is in the path between the node in question and
      the peer(s) to which it is communicating, and who can view the
      IPv6 addresses present in the datagrams.

   o  An attacker who can access the communication logs of the peers
      with which the node has communicated.

(side note) I suppose if some other node in the path kept logs and the
attacker got access to those logs, that would also allow the
correlation, but that's rather an edge case and we don't claim to have
an exhaustive list, so I don't see a need to add complications to this
text.

   Use of temporary addresses will not prevent such payload-based
   correlation, which can only be addressed by widespread deployment of
   encryption as discussed in [RFC7624].  Nor will it prevent an on-link
   observer (e.g. the node's default router) to track all the node's
   addresses.

nit: s/to track/from tracking/

Section 2.1

   Many nodes also have DNS names associated with their addresses, in
   which case the DNS name serves as a similar identifier.  Although the
   DNS name associated with an address is more work to obtain (it may
   require a DNS query), the information is often readily available.  In
   such cases, changing the address on a machine over time would do
   little to address the concerns raised in this document, unless the
   DNS name is changed as well (see Section 4).

nit: perhaps say "at the same time"?

   The use of a constant identifier within an address is of special
   concern because addresses are a fundamental requirement of
   communication and cannot easily be hidden from eavesdroppers and
   other parties.  Even when higher layers encrypt their payloads,

(editorial) the two paragraphs before this one seem to be examples (DNS
names, HTTP cookies) of "identifier[s] that [are] recognizable over time
within different contexts" as discussed in the paragraph prior to them.
This paragraph is getting back to why we care about constant identifiers
in IP addresses; I wonder if some kind of (list?) formatting for the
previous two paragraphs might help indicate the structure of the
discussion.

   Changing global scope addresses over time limits the time window over
   which eavesdroppers and other information collectors may trivially
   correlate network activity when the same address is employed for
   multiple transactions by the same node.  Additionally, it reduces the
   window of exposure of a node via an address that gets revealed as a
   result of active communication.

I'm not 100% sure that I understand what is being exposed by this
"window of exposure" -- is it just "there is a node at this address and
it is responsible for all activities using that address"?  Thus, perhaps
"window of exposure for such correlation"?  (Similar text also appears
in the Abstract.)

   The security and privacy implications of IPv6 addresses are discussed
   in detail in [RFC7721], [RFC7707], and [RFC7217].

A sentence essentially identical to this one already appeared in the
Introduction; I'm not sure if we should de-duplicate.

Section 3.1

   4.  Temporary addresses must have a limited lifetime (limited "valid
       lifetime" and "preferred lifetime" from [RFC4862]), that should
       be statistically different for different addresses.  The lifetime

We should probably be more specific about what "statistically different" is
supposed to mean.  For example, is it intended to relate to the initial
value associated with a freshly generated address (i.e., "should not be
exactly 24 hour lifetime at time of generation") or the offset between
different addresses ("should not be exactly 24 hours more than the
previous one")?

   5.  By default, one address is generated for each prefix advertised
       by stateless address autoconfiguration.  The resulting Interface
       Identifiers must be statistically different when addresses are
       configured for different prefixes.  That is, when temporary

[In contrast, this use of "statistically different" is both (1)
clarified further and (2) for a time-independent quantity, so the
interpretation is pretty clear as-is.]

Section 3.3.1

I think we need to say something about the random number being long
enough or getting more random bits in step 2 if there aren't enough
bits, or similar.  Just "obtain a random number" doesn't say what the
number is sampled from, and could cover, e.g., https://xkcd.com/221/ .

Section 3.3.2

   1.  Compute a random identifier with the expression:

       RID = F(Prefix, Net_Iface, Network_ID, Time, DAD_Counter,
       secret_key)
   [...]
       F():
          A pseudorandom function (PRF) that MUST NOT be computable from
          the outside (without knowledge of the secret key).  F() MUST
          also be difficult to reverse, such that it resists attempts to
          obtain the secret_key, even when given samples of the output
          of F() and knowledge or control of the other input parameters.
          F() SHOULD produce an output of at least 64 bits.  F() could
          be implemented as a cryptographic hash of the concatenation of
          each of the function parameters.  SHA-256 [FIPS-SHS] is one
          possible option for F().  Note: MD5 [RFC1321] is considered
          unacceptable for F() [RFC6151].

I recognize that this is just the RFC 7217 construction with the 'Time'
parameter added, but it's not entirely clear that we want to be
recommending the plain "hash of concatenation" option without additional
caveats.  While having the secret key be the last element in the
bitstring seems to close off the length-extension class of attacks, we
don't say anything about performing the concatenation with fixed-width
types (or a length prefix), as is needed for non-malleability of the
hash inputs.  (This is particularly of note for the IPv6 prefix, that
one might naturally encode as just the prefix parts, not necessarily
fixed length, but also applies to other parameters, including some of
the "Net_Iface" examples given in RFC 7217.)  There is also no
discussion about the potential for hash collisions (or, more generally,
attacks) across this construction and the RFC 7217 construction.
Guidance to not reuse a secret_key for both constructions would be in
order.  (I will note that it may be tempting to upgrade to an HMAC
construction, and while that will certainly work, modulo the need for
length prefixes/fixed-length input, it is overkill for this case.)
Finally, the guidance for "SHOULD produce an output of at least 64 bits"
could perhaps be revisited; any useful cryptographic hash these days is
going to have at least 128 bits of output, which is certainly enough for
generating an IID!

       Prefix:
          The prefix to be used for SLAAC, as learned from an ICMPv6
          Router Advertisement message.

(side note) is the "as learned from an ICMPv6 RA" an important
prerequisite, or could a prefix learned in some other fashion still be
usable?

          which this interface is associated.  Additionally, Simple DNA
          [RFC6059] describes ideas that could be leveraged to generate
          a Network_ID parameter.  This parameter is SHOULD be employed
          if some form of "Network_ID" is available.

nit: s/is SHOULD/SHOULD/

Section 3.4

   7.  The node MUST perform duplicate address detection (DAD) on the
       generated temporary address.  If DAD indicates the address is
       already in use, the node MUST generate a new randomized interface
       identifier, and repeat the previous steps as appropriate up to
       TEMP_IDGEN_RETRIES times.  If after TEMP_IDGEN_RETRIES
       consecutive attempts no non-unique address was generated, the
       node MUST log a system error and SHOULD NOT attempt to generate a
       temporary address for the given prefix for the duration of the
       node's attachment to the network via this interface.  [...]

Just to confirm my understanding: "for the duration of the node's
attachment to the network" means that even if a new RA+PIO is received,
the node still ignores that prefix?

Section 3.6

   determine that the link change has occurred.  One such process is
   described by "Simple Procedures for Detecting Network Attachment in
   IPv6" [RFC6059].  Detecting link changes would prevent link down/up

nit: we have already referred to the abbreviated name "Simple DNA"
earlier in this document, so the expanded title does not seem necessary
here.

Section 3.8

   REGEN_ADVANCE -- 2 + (TEMP_IDGEN_RETRIES * DupAddrDetectTransmits *
   RetransTimer / 1000)

Please indicate the units of this value (the division by 1000 indicates
it is likely measured in seconds).

   DESYNC_FACTOR -- A random value within the range 0 -
   MAX_DESYNC_FACTOR.  It is computed once at system start (rather than
   each time it is used) and must never be greater than
   (TEMP_PREFERRED_LIFETIME - REGEN_ADVANCE).

Computing only at startup and not changing it could perhaps run into
issues with maintaining the invariant, when TEMP_PREFERRED_LIFETIME and
REGEN_ADVANCE are configurable after startup.  (Changing DESYNC_FACTOR
more often, and having the range be more like half of the overall
lifetime, would be one approach for achieving the "statistically
different" property mentioned in my Discuss point.)

Section 4.

   The desires of protecting individual privacy versus the desire to
   effectively maintain and debug a network can conflict with each
   other.  [...]

(editorial) this sentence lacks parallelism of structure.  Perhaps:

% The desire to protect individual privacy can conflict with the desire
% to effectively maintain and debug a network.

Section 5

   o  Addresses all errata submitted for [RFC4941].

There are errata reports against RFC 4941 that are still in the state
"reported"; the responsible AD should probably process those before this
document gets published.

Section 9

Overall these security considerations seem pretty comprehensive and
well-described -- thank you!

   If a very small number of nodes (say, only one) use a given prefix
   for extended periods of time, just changing the interface identifier
   part of the address may not be sufficient to mitigate address-based
   network activity correlation, since the prefix acts as a constant
   identifier.  [...]

It might be worth noting some scenarios where this commonly occurs,
e.g., residential households that only have a single computer.  (Is it
also the case for mobile phones?)

   fairly large number of nodes.  Additionally, if a temporary address
   is used in a session where the user authenticates, any notion of
   "privacy" for that address is compromised.

Compromised for the part(ies) that receive the authentication information,
at least.  That does not necessarily include a passive observer in the
network.

   While this document discusses ways of obscuring a user's IP address,
   the method described is believed to be ineffective against

I don't think "obscuring" is the right word -- the IP address is still
visible; we're just trying to remove some of the information content
from it over long periods of time.  I understand the desire to remove
the word "permanent" from the RFC 4941 version, but this still doesn't
seem right.  Perhaps the goal could be rephrased as something about
making the IP address less useful as a persistent (numerical) identifier.

   Ingress filtering has been and is being deployed as a means of
   preventing the use of spoofed source addresses in Distributed Denial
   of Service (DDoS) attacks.  In a network with a large number of
   nodes, new temporary addresses are created at a fairly high rate.
   This might make it difficult for ingress filtering mechanisms to
   distinguish between legitimately changing temporary addresses and
   spoofed source addresses, which are "in-prefix" (using a
   topologically correct prefix and non-existent interface ID).  This
   can be addressed by using access control mechanisms on a per-address
   basis on the network egress point.

Should we say something about the corresponding resource consumption
increase at the egress point?

Section 11.1

One might argue that RFC 7217 is merely informative, since we duplicate
in full the IID-generation algorithm from it (with modifications).

RFC 8190 is only referenced to note that we specifically do *not* use
terminology from it; that seems like it does not really meet the
threshold for being a normative reference.

Benjamin Kaduk's Discuss on draft-ietf-6man-rfc49… Benjamin Kaduk via Datatracker
Re: Benjamin Kaduk's Discuss on draft-ietf-6man-r… Fernando Gont
Re: Benjamin Kaduk's Discuss on draft-ietf-6man-r… Benjamin Kaduk