Re: [secdir] Secdir last call review of draft-ietf-6man-rfc4941bis-12

Christopher Wood <caw@heapingbits.net> Fri, 29 January 2021 14:51 UTC

User-Agent: Cyrus-JMAP/3.5.0-alpha0-84-gfc141fe8b8-fm-20210125.001-gfc141fe8
Mime-Version: 1.0
Message-Id: <4005a168-d4b7-4324-bcd1-a721fe0d743f@www.fastmail.com>
In-Reply-To: <532736e2-f235-2bb1-9e31-7f707b153b15@si6networks.com>
References: <160998197921.18103.15481726186693031049@ietfa.amsl.com> <532736e2-f235-2bb1-9e31-7f707b153b15@si6networks.com>
Date: Fri, 29 Jan 2021 06:39:07 -0800
From: Christopher Wood <caw@heapingbits.net>
To: Fernando Gont <fgont@si6networks.com>, "secdir@ietf.org" <secdir@ietf.org>
Cc: draft-ietf-6man-rfc4941bis.all@ietf.org, Benjamin Kaduk <kaduk@mit.edu>
Content-Type: text/plain
Archived-At: <https://mailarchive.ietf.org/arch/msg/secdir/aYQanSWdmDp6Vaf8J5hFkh9F-wc>
Subject: Re: [secdir] Secdir last call review of draft-ietf-6man-rfc4941bis-12
Precedence: list

Hi Fernando,

Apologies for the delay. Please see inline below.

On Thu, Jan 28, 2021, at 6:03 AM, Fernando Gont wrote:
> Hi, Chris,
> 
> I had responded to this one, but will respond again about the main 
> points you raise. PLease find my comments in-line....
> 
> 
> On 6/1/21 22:12, Christopher Wood via Datatracker wrote:
> [....]
> > 
> > Section 2.1.
> > 
> >     One of the requirements for correlating seemingly unrelated
> >     activities is the use (and reuse) of an identifier that is
> >     recognizable over time within different contexts.  IP addresses
> >     provide one obvious example, but there are more.  For example,
> > 
> > What about MAC addresses? As I understand it, most systems are moving towards
> > MAC address randomization, though it's still probably worth mentioning.
> > Likewise, similar to cookies, one could also mention TLS (or transport) layer
> > identifiers, such as TLS session tickets. This is touched on somewhat in the
> > Security Considerations.
> 
> This document notes that it tries to tackle only address-based 
> correlation. As you correctly note, there are potentially multiple IDs, 
> at lower and upper layers, that could be leveraged for activity 
> correlation. But this documents tries to tackle only IPv6 address-based 
> correlation.

Sure, sure, I was only suggesting that these other correlation vectors be noted somewhere. :-)

> > 
> > Section 3.1.
> > 
> >     3.  New temporary addresses are generated over time to replace
> >        temporary addresses that expire.
> > 
> > I assume expiration here means that the address is deprecated, right? If so,
> > that might be worth clarifying.
> > 
> >         4. <snip>
> > 
> >         The lifetime of temporary addresses must be statistically different
> >         for different addresses, such that it is hard to predict or infer
> >         when a new temporary address is generated, or correlate a newly-
> >         generated address with an existing one.
> > 
> > This "must" is not normative, right? I assume not, since the previous guideline
> > in this item ("the lifetime of an address should be further reduced when
> > privacy-meaningful events ... takes place") does not require all temporary
> > addresses to cease working. It might be better to drop the "or correlate a
> > newly-generated address with an existing one" bit.
> 
> How about rephrase as: "hard to predict or infer when a temporary 
> address are regenerated"?

That's good! (Small nit: s/address are regenerated/address is regenerated)

> > Moreover, what does "statistically different" mean, precisely?
> 
> A and B having a high probability of being different when both of them 
> are selected from a PRNG

Would it make sense to include that definition?

> > It might be more
> > accurate to talk about this property from the perspective of the adversary. For
> > example, I think this is trying to say that given two different temporary
> > addresses, an adversary must have negligible probability in determining whether
> > or not they correspond to the same or different sources. (That would match
> > better with the Randomized Interface Identifier algorithms given in Section
> > 3.3.)
> 
> The property that you describe depends on the specific deployment. e.g., 
> If I0m an attacker, and you are the only host on a /64 there's "nothing" 
> you can do for me not to be able to tell that it's just you changing the 
> address of your own box.

Yep, I agree, and I think that's important. In such deployments, changing addresses offers no privacy, right?

> > Section 3.3.
> > 
> >     The algorithm specified in
> >     Section 3.3.1 benefits from a Pseudo-Random Number Generator (PRNG)
> >     available on the system.
> > 
> > What does "benefits" mean here? If we're specifying an algorithm to generate
> > random values, shouldn't a PRNG be *required*?
> 
> Implementation-wise, when you work on the code, must likely you have 
> some form of random() available (I personally don't know of a system 
> that doesn't). Whereas for the other algorithm, it will probably require 
> more work on your side. (e.g., consider if you want to use siphash for 
> the PRF).

Yep, I follow. My point was rather that the text might be better if it said:

   "The algorithm specified in Section 3.3.1 requires a PRNG."

Whether it comes from the system (random()) or not is irrelevant, I think.

> > Section 3.3.2.
> > 
> > This section assumes a "hash-based" algorithm, but is specified using a PRF.
> 
> Actually, the title is misleading. The algorithm requires a PRF, but 
> notes that one possible implementation is with a hash function.
> 
> We could provide an actual PRF as an example (e.g. BLAKE3) if you think 
> that's important. This wouldn't/shouldn't be a big deal to do, since 
> specific PRFs are not required by the algorithm (i.e., there are no 
> normative references or specification of any specific PRF).

It might be useful to note possible PRFs. Your call. :-)

> > Later, in the text, it reads:
> > 
> >     F() could be the
> >     result of applying a cryptographic hash over an encoded
> >     version of the function parameters.
> > 
> > But a cryptographic hash is not a PRF. If the hash function is meant to be
> > keyed, even that probably isn't sufficient. (Some constructions, like H(k || m)
> > for secret k and input m, are vulnerable to length extensions.)
> > 
> > I think it's probably safest to recommend a particular construction, such as
> > HKDF with secret_key and output length equal to the number bytes needed for the
> > interface identifier.
> 
> As noted in the document, we do talk about a "cryptographically robust 
> construction". However, since the use of a keyed-hash function is mostly 
> informational, I think that we could easily reference an HMAC instead.

Indeed! That would be better.

> So far, in the context of numeric-ids, we were employing HMAC-SHA-256 
> ... which also happen to be the HMAC flavor of the hash function that we 
> currently suggest in this document. Would HMAC-SHA-256 address this comment?

Yep, it would, provided the secret key was not known to the attacker. 

> > Moreover, requirements for secret_key are not really strict enough. There's
> > text about F(), e.g.,:
> > 
> >     F() MUST
> >     also be difficult to reverse, such that it resists attempts to
> >     obtain the secret_key
> > 
> > And it is said that secret_key "SHOULD be of at least 128 bits," but what if
> > it's less? What if it only has a single byte of entropy?
> 
> The rationale here essentially was that in such case, you're supposed to 
> know what you're doing if you're going against the "SHOULD". Besides, in 
> that cause you'd be able to compute the output of F(), and hence would 
> not be complying with this earlier requirement:
> 
>        A pseudorandom function (PRF) that MUST NOT be computable from
>        the outside (without knowledge of the secret key)

True, true. Thanks for clarifying.

> > Section 3.4.
> > 
> > Constants here are used before defined. Moving Section 3.8 to somewhere before
> > Section 3.4 might help.
> 
> The oredering is borrowed from RFC4941. THe only way that I envision 
> that the order could be altered (without the parameters section 
> interrupting the natural read of the document) would be for Section 4 
> and section 3.8 to be subsections of the same parent Section. Not sure 
> I'd do it at this stage, though.
> 
> Thoughts?

Given how far along this document is, what you think is probably best!

> > What happens if the constants are chosen such that the rule (5) is not possible
> > to achieve?
> 
> The math in Section 3.8 should prevent you from setting such values.

I didn't check, so if that's the case, please disregard this comment. 

> > Section 3.6.
> > 
> >     The frequency at which temporary addresses change depends on how a
> >     device is being used (e.g., how frequently it initiates new
> >     communication) and the concerns of the end user.  The most egregious
> >     privacy concerns appear to involve addresses used for long periods of
> >     time (weeks to months to years).  The more frequently an address
> >     changes, the less feasible collecting or coordinating information
> >     keyed on interface identifiers becomes.  Moreover, the cost of
> >     collecting information and attempting to correlate it based on
> >     interface identifiers will only be justified if enough addresses
> >     contain non-changing identifiers to make it worthwhile.  Thus, having
> >     large numbers of clients change their address on a daily or weekly
> >     basis is likely to be sufficient to alleviate most privacy concerns.
> > 
> > I don't disagree with the text, but is there anything we can cite here? Why do
> > we think it's "sufficient," for example?
> 
> I don't have a reference for this -- at the end of the day, this is 
> quite subjective.
> 
> The bottom-line here is that you don't want to expose your stable 
> address (which has a potential lifetime of O(forever) ). O(day) seems to 
> be more than a fair compromise.

Indeed. I don't have a citation either. I was just noting in case others did.

> >     Finally, when an interface connects to a new (different) link,
> >     existing temporary addresses for the corresponding interface MUST be
> >     eliminated, and new temporary addresses MUST be generated immediately
> >     for use on the new link.
> > 
> > If the addresses are eliminated, how does one run DAD and ensure that the same
> > (or similar) addresses are not used on the new link?
> 
> If you move to a different link, you don't need your old addresses (even 
> the prefix should have changed). So why would you mind?

Oh, hah, good point. Disregard. :-)

> > 
> > Section 3.7.
> > 
> >     Devices implementing this specification MUST provide a way for the
> >     end user to explicitly enable or disable the use of temporary
> >     addresses.
> > 
> > Why is this a MUST, rather than a SHOULD? Since this is effectively describing
> > an API, I think this ought to be relaxed.
> 
> This was borrowed from RFC4941. Now, there could be valid reasons for a 
> user that wants to disable it:
>    e.g., let's say I use ssh a lot, like long-lived sessions, and use an
>    ssh client that doesn't know how to tell the OS to use stable
>    addresses for the ssh sessions.
> 
> In such case, my options are:
>   * disable v6 on my host
>   * disable use of temporary addresses (*this knob*)

If there's precedent, I suppose it's fine to keep. It just read odd to me.

> > Section 6.
> > 
> >     An implementation might want to keep track of which addresses are
> >     being used by upper layers so as to be able to remove a deprecated
> >     temporary address from internal data structures once no upper layer
> >     protocols are using it (but not before).
> > 
> > It seems an application might also want to consider other information linkable
> > to select addresses in the future. For example, TLS resumption may link clients
> > across two different temporary addresses. (This goes back to my comment on
> > Section 2.1 above.)
> 
> Indeed. But this is out of the scope of RFC4941bis. rfc4941 simply 
> provides temporary addresses. It doesn't have features like "create a 
> new address on request" or the like -- that would be valuable, but 
> subject for a different document.
> 
> Thoughts?

Yep, that seems reasonable. 

Thanks for your thoughtful response, and work on this document!

Best,
Chris

[secdir] Secdir last call review of draft-ietf-6m… Christopher Wood via Datatracker
Re: [secdir] Secdir last call review of draft-iet… Fernando Gont
Re: [secdir] Secdir last call review of draft-iet… Christopher Wood