Re: [secdir] Secdir last call review of draft-ietf-6man-rfc4941bis-12

Fernando Gont <fgont@si6networks.com> Thu, 28 January 2021 14:04 UTC

To: Christopher Wood <caw@heapingbits.net>, secdir@ietf.org
Cc: draft-ietf-6man-rfc4941bis.all@ietf.org, Benjamin Kaduk <kaduk@mit.edu>
References: <160998197921.18103.15481726186693031049@ietfa.amsl.com>
From: Fernando Gont <fgont@si6networks.com>
Message-ID: <532736e2-f235-2bb1-9e31-7f707b153b15@si6networks.com>
Date: Thu, 28 Jan 2021 11:03:37 -0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.1
MIME-Version: 1.0
In-Reply-To: <160998197921.18103.15481726186693031049@ietfa.amsl.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/secdir/sQJh-t4ajWA5ovLy4gJ-2CCWowU>
Subject: Re: [secdir] Secdir last call review of draft-ietf-6man-rfc4941bis-12
Precedence: list

Hi, Chris,

I had responded to this one, but will respond again about the main 
points you raise. PLease find my comments in-line....

On 6/1/21 22:12, Christopher Wood via Datatracker wrote:
[....]
> 
> Section 2.1.
> 
>     One of the requirements for correlating seemingly unrelated
>     activities is the use (and reuse) of an identifier that is
>     recognizable over time within different contexts.  IP addresses
>     provide one obvious example, but there are more.  For example,
> 
> What about MAC addresses? As I understand it, most systems are moving towards
> MAC address randomization, though it's still probably worth mentioning.
> Likewise, similar to cookies, one could also mention TLS (or transport) layer
> identifiers, such as TLS session tickets. This is touched on somewhat in the
> Security Considerations.

This document notes that it tries to tackle only address-based 
correlation. As you correctly note, there are potentially multiple IDs, 
at lower and upper layers, that could be leveraged for activity 
correlation. But this documents tries to tackle only IPv6 address-based 
correlation.

> Section 2.2.
> 
>     To make it difficult to make educated guesses as to whether two
>     different interface identifiers belong to the same host, the
>     algorithm for generating alternate identifiers must include input
>     that has an unpredictable component from the perspective of the
>     outside entities that are collecting information.
> 
> It seems like this "must" be normative, and should probably reference the
> RFC4086 [https://tools.ietf.org/html/rfc4086].

This section, as well as the design goals in Section 3.1, are not 
normative but informative. -- that's why they don't delve into much 
detail or have RFC2119 language.

> 
> Section 3.1.
> 
>     3.  New temporary addresses are generated over time to replace
>        temporary addresses that expire.
> 
> I assume expiration here means that the address is deprecated, right? If so,
> that might be worth clarifying.
> 
>         4. <snip>
> 
>         The lifetime of temporary addresses must be statistically different
>         for different addresses, such that it is hard to predict or infer
>         when a new temporary address is generated, or correlate a newly-
>         generated address with an existing one.
> 
> This "must" is not normative, right? I assume not, since the previous guideline
> in this item ("the lifetime of an address should be further reduced when
> privacy-meaningful events ... takes place") does not require all temporary
> addresses to cease working. It might be better to drop the "or correlate a
> newly-generated address with an existing one" bit.

How about rephrase as: "hard to predict or infer when a temporary 
address are regenerated"?

> Moreover, what does "statistically different" mean, precisely?

A and B having a high probability of being different when both of them 
are selected from a PRNG

> It might be more
> accurate to talk about this property from the perspective of the adversary. For
> example, I think this is trying to say that given two different temporary
> addresses, an adversary must have negligible probability in determining whether
> or not they correspond to the same or different sources. (That would match
> better with the Randomized Interface Identifier algorithms given in Section
> 3.3.)

The property that you describe depends on the specific deployment. e.g., 
If I0m an attacker, and you are the only host on a /64 there's "nothing" 
you can do for me not to be able to tell that it's just you changing the 
address of your own box.

> Section 3.3.
> 
>     The algorithm specified in
>     Section 3.3.1 benefits from a Pseudo-Random Number Generator (PRNG)
>     available on the system.
> 
> What does "benefits" mean here? If we're specifying an algorithm to generate
> random values, shouldn't a PRNG be *required*?

Implementation-wise, when you work on the code, must likely you have 
some form of random() available (I personally don't know of a system 
that doesn't). Whereas for the other algorithm, it will probably require 
more work on your side. (e.g., consider if you want to use siphash for 
the PRF).

> Section 3.3.2.
> 
> This section assumes a "hash-based" algorithm, but is specified using a PRF.

Actually, the title is misleading. The algorithm requires a PRF, but 
notes that one possible implementation is with a hash function.

We could provide an actual PRF as an example (e.g. BLAKE3) if you think 
that's important. This wouldn't/shouldn't be a big deal to do, since 
specific PRFs are not required by the algorithm (i.e., there are no 
normative references or specification of any specific PRF).

> Later, in the text, it reads:
> 
>     F() could be the
>     result of applying a cryptographic hash over an encoded
>     version of the function parameters.
> 
> But a cryptographic hash is not a PRF. If the hash function is meant to be
> keyed, even that probably isn't sufficient. (Some constructions, like H(k || m)
> for secret k and input m, are vulnerable to length extensions.)
> 
> I think it's probably safest to recommend a particular construction, such as
> HKDF with secret_key and output length equal to the number bytes needed for the
> interface identifier.

As noted in the document, we do talk about a "cryptographically robust 
construction". However, since the use of a keyed-hash function is mostly 
informational, I think that we could easily reference an HMAC instead.

So far, in the context of numeric-ids, we were employing HMAC-SHA-256 
... which also happen to be the HMAC flavor of the hash function that we 
currently suggest in this document. Would HMAC-SHA-256 address this comment?

> Moreover, requirements for secret_key are not really strict enough. There's
> text about F(), e.g.,:
> 
>     F() MUST
>     also be difficult to reverse, such that it resists attempts to
>     obtain the secret_key
> 
> And it is said that secret_key "SHOULD be of at least 128 bits," but what if
> it's less? What if it only has a single byte of entropy?

The rationale here essentially was that in such case, you're supposed to 
know what you're doing if you're going against the "SHOULD". Besides, in 
that cause you'd be able to compute the output of F(), and hence would 
not be complying with this earlier requirement:

       A pseudorandom function (PRF) that MUST NOT be computable from
       the outside (without knowledge of the secret key)

> Section 3.4.
> 
> Constants here are used before defined. Moving Section 3.8 to somewhere before
> Section 3.4 might help.

The oredering is borrowed from RFC4941. THe only way that I envision 
that the order could be altered (without the parameters section 
interrupting the natural read of the document) would be for Section 4 
and section 3.8 to be subsections of the same parent Section. Not sure 
I'd do it at this stage, though.

Thoughts?

> What happens if the constants are chosen such that the rule (5) is not possible
> to achieve?

The math in Section 3.8 should prevent you from setting such values.

> Section 3.6.
> 
>     The frequency at which temporary addresses change depends on how a
>     device is being used (e.g., how frequently it initiates new
>     communication) and the concerns of the end user.  The most egregious
>     privacy concerns appear to involve addresses used for long periods of
>     time (weeks to months to years).  The more frequently an address
>     changes, the less feasible collecting or coordinating information
>     keyed on interface identifiers becomes.  Moreover, the cost of
>     collecting information and attempting to correlate it based on
>     interface identifiers will only be justified if enough addresses
>     contain non-changing identifiers to make it worthwhile.  Thus, having
>     large numbers of clients change their address on a daily or weekly
>     basis is likely to be sufficient to alleviate most privacy concerns.
> 
> I don't disagree with the text, but is there anything we can cite here? Why do
> we think it's "sufficient," for example?

I don't have a reference for this -- at the end of the day, this is 
quite subjective.

The bottom-line here is that you don't want to expose your stable 
address (which has a potential lifetime of O(forever) ). O(day) seems to 
be more than a fair compromise.

>     Finally, when an interface connects to a new (different) link,
>     existing temporary addresses for the corresponding interface MUST be
>     eliminated, and new temporary addresses MUST be generated immediately
>     for use on the new link.
> 
> If the addresses are eliminated, how does one run DAD and ensure that the same
> (or similar) addresses are not used on the new link?

If you move to a different link, you don't need your old addresses (even 
the prefix should have changed). So why would you mind?

> 
> Section 3.7.
> 
>     Devices implementing this specification MUST provide a way for the
>     end user to explicitly enable or disable the use of temporary
>     addresses.
> 
> Why is this a MUST, rather than a SHOULD? Since this is effectively describing
> an API, I think this ought to be relaxed.

This was borrowed from RFC4941. Now, there could be valid reasons for a 
user that wants to disable it:
   e.g., let's say I use ssh a lot, like long-lived sessions, and use an
   ssh client that doesn't know how to tell the OS to use stable
   addresses for the ssh sessions.

In such case, my options are:
  * disable v6 on my host
  * disable use of temporary addresses (*this knob*)

> Section 6.
> 
>     An implementation might want to keep track of which addresses are
>     being used by upper layers so as to be able to remove a deprecated
>     temporary address from internal data structures once no upper layer
>     protocols are using it (but not before).
> 
> It seems an application might also want to consider other information linkable
> to select addresses in the future. For example, TLS resumption may link clients
> across two different temporary addresses. (This goes back to my comment on
> Section 2.1 above.)

Indeed. But this is out of the scope of RFC4941bis. rfc4941 simply 
provides temporary addresses. It doesn't have features like "create a 
new address on request" or the like -- that would be valuable, but 
subject for a different document.

Thoughts?

Thanks again for your comments!

Regards,
-- 
Fernando Gont
SI6 Networks
e-mail: fgont@si6networks.com
PGP Fingerprint: 6666 31C6 D484 63B2 8FB1 E3C4 AE25 0D55 1D4E 7492

[secdir] Secdir last call review of draft-ietf-6m… Christopher Wood via Datatracker
Re: [secdir] Secdir last call review of draft-iet… Fernando Gont
Re: [secdir] Secdir last call review of draft-iet… Christopher Wood