Re: [babel] Benjamin Kaduk's Discuss on draft-ietf-babel-hmac-08: (with DISCUSS and COMMENT)

Benjamin Kaduk <kaduk@mit.edu> Thu, 22 August 2019 16:54 UTC

Date: Thu, 22 Aug 2019 11:54:42 -0500
From: Benjamin Kaduk <kaduk@mit.edu>
To: Juliusz Chroboczek <jch@irif.fr>
Cc: The IESG <iesg@ietf.org>, draft-ietf-babel-hmac@ietf.org, Donald Eastlake <d3e3e3@gmail.com>, babel-chairs@ietf.org, babel@ietf.org
Message-ID: <20190822165441.GK60855@kduck.mit.edu>
References: <156521429138.8333.12124544758210076970.idtracker@ietfa.amsl.com> <87h86pcmzk.wl-jch@irif.fr> <20190814194802.GB88236@kduck.mit.edu> <87a7c8ir5d.wl-jch@irif.fr>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <87a7c8ir5d.wl-jch@irif.fr>
User-Agent: Mutt/1.10.1 (2018-07-13)
Archived-At: <https://mailarchive.ietf.org/arch/msg/babel/O2uiOSFe6DCvnNxC0OE70ZFjNl0>
Subject: Re: [babel] Benjamin Kaduk's Discuss on draft-ietf-babel-hmac-08: (with DISCUSS and COMMENT)
Precedence: list

On Sat, Aug 17, 2019 at 02:15:10AM +0200, Juliusz Chroboczek wrote:
> >>> Also in Section 3.1, if we are going to claim that a "random string of
> >>> sufficient length" suffices to initialize a fresh index, we need to
> >>> provide guidance on what constitutes "sufficient length" to achieve the
> >>> needed property.
> 
> >> Section 6 says [...]
> 
> > And you don't want to make a forward reference to Section 6?
> 
> I don't want to, since it makes this section more difficult to read, but
> I have done so.

Looking at the -10, I'm more sympathetic to this concern, and won't object
if you change it back.

The -10 also seems to address (or render irrelevant) my other Discuss
points, so I'll cleare my ballot position shortly.

> >>> Blake2s is a keyed MAC, but is not an HMAC construction.
> 
> [...]
> 
> >> I disagree.
> 
> > Well, I am pretty uncomfortable about saying things that are not true!
> 
> Ok.  I've changed HMAC to MAC throughout the document, after consulting
> the list.

Thank you, and my apologies for having to churn the document so much.

> >> I agree.  This is something that we considered in the design of the
> >> protocol (which is why Nonces are allowed to be so large), but never
> >> implemented ourselves; it would probably be some work to get it right.
> >> Hence the very careful formulation ("might").  Let me know if you want to
> >> suggest a different formulation.
> 
> > It's probably okay to leave this as speculative (since there's not an
> > existence proof), but I think it's irresponsible to not also include a
> > disclaimer of at least "Note that such schemes will need to consider what
> > degree of "freshness" is required of a challenge and the risk of replay for
> > the encoded state, among other things".
> 
> Added "note however that any such scheme will need to prevent cookie
> replay".  (I've once implemented just such a scheme for the BitTorrent
> DHT, and it's been running on hundred of thousands of nodes for almost ten
> years now with no massive meltdown of the DHT; so I'm pretty confident
> it's not that difficult.)
> 
> >>> This is a symmetric-keyed scenario, so most attacks that involve a
> >>> compromised node will be "uninteresting", in that once the key is
> >>> exposed all guarantees are lost.  However, it may still be worth noting
> >>> that a compromised node can cause disruption on multi-access links
> >>> without detection since there is no "end of generation" signal when a
> >>> node changes its index.
> 
> >> Are we misunderstanding each other?
> 
> > A and C both have state about B, which is retained when B reboots.
> > While B is still rebooting or otherwise initializing, C can forge a message
> > "from" B that A will accept, and C can continue to do so until A sees the
> > new (legitimate) index from B and switches over to the new generation.  So
> > A has accepted false input but neither A nor B are aware of it.
> 
> Ah, I see.  It's worse than that -- if C knows the key, then it can
> perform a higher seqno attack on B, it doesn't need to wait for B to reboot.

Also true.  I don't have a great sense for how likely B would be to detect
that higher seqno attack (or whether there's anything useful it could do,
having detected the attack); I brought this one up as potentially notable
since there is a much lower chance of discovery.

> >>> Is there any need for initial PC value randomization?
> 
> >> I don't see why.
> 
> > I don't, either, but just wanted to get a second opinion from an actual
> > expert.
> 
> I hope that's not me.
> 
> >>> Do we want to recommend starting at 0 or 1 (or prohibit recipients from
> >>> assuming that the initial value for an index will be that)?
> 
> >> The initial value as well as the rate of increase are arbitrary.  For
> >> example, a node may use a hardware clock (suitably offset to fit in 32
> >> bits) as the PC.
> 
> > I agree that they're arbitrary, but have a lot of painful experience with
> > protocol participants making assumptions about underspecified behavior that
> > ended up not being protocol invariants, with subsequent breakage.  If you
> > think the existing ecosystem is healthy enough that there's not a real risk
> > of problems down the line, I can accept that, but I wanted to raise the
> > topic.
> 
> The ecosystem is pretty young, but we haven't had any issues with
> interoperability yet.  I think that it's fair to say that if you encode
> your age in the PC, you deserve any trouble you get.

Okay.

> >>> Does this rekeying option require an external operation/management actor
> >>> to trigger it?  It might be worth mentioning with some operational
> >>> considerations.
> 
> >> Disagree.
> 
> > I think my main quesetion here is who determines, and how, "whenever a
> > collision becomes likely".  Some protocols have in-band mechanisms for
> > peers to make such a determination and initiate rekeying automatically;
> > from what I understand, babel-hmac does not.  If we actually expect
> > deployments to encounter scenarios where rekeying is necessary to preserve
> > security, we should be pretty clear about who has the responsibility to do
> > so.
> 
> I've changed this to "by rekeying often enough that collisions are unlikely".
> 
> Here's some background.  Babel's evolution until now has been driven by
> the needs of our users; this has led to two very interesting extensions
> (Source-specific and RTT), and two extensions that are not being deployed
> (diversity routing and ToS-specific).  The security extensions, on the
> other hand, have been designed due to IETF requirements, and they have no
> users known to us.  We are waiting for the community to react to the code
> being available to discuss key management with them.
> 
> Options include:
> 
>   - static keying, rekeying never happens;
>   - central node that performs periodic key rotation (this could be as
>     simple as a cron job that uses ssh to distribute keys);
>   - key negotiation using another protocol (e.g. HNCP, the Homenet thing).
> 
> I'm keeping an open mind; the only thing I'm planning to refuse is in-band
> key distribution or negotiation -- Babel is a routing protocol, not a key
> distribution protocol.

Fair enough, and thanks for the extra background.
(I'd also be interested in hearing from people who have consciously decided
to not deploy the security extensions, but that's way off-topic for this
thread and probably better unicast to me.)

> >> >    o  among different nodes, it is only vulnerable to immediate replay:
> >> >       if a node A has accepted a packet from C as valid, then a node B
> >> >       will only accept a copy of that packet as authentic if B has
> >> >       accepted an older packet from C and B has received no later packet
> >> >       from C.
> >> 
> >> > nit: I don't think "A has accepted a packet from C" is quite the right
> >> > precondition; it seems to be more like "A has received a valid packet
> >> > from C", since whether or not A (as an attacker) considers it valid is
> >> > irrelevant to whether (honest) B will.
> >> 
> >> C is the attacker here, A is honest.
> 
> > I was (apparently) reading this differently -- "replay" could be initiated
> > by an RFC 3552 attacker in the network even without knowledge of the key.
> 
> I see.  Reworded, thanks.
> 
> >>> Section 4.1
> >> 
> >>> If we had identifiers for symmetric keys or HMAC algorithms, we could
> >>> include those identifiers in the pseudo-header and thereby gain some
> >>> protection from downgrade/HMAC-stripping attacks in the presence of a
> >>> weak keyed MAC algorithm.
> 
> >> This is a symmetric algorithm, for downgrade attacks to work the victim
> >> would need to be configured with a weak key.  I therefore don't see how
> >> this added complexity helps.
> 
> > I've seen plenty of cases where systems are still configured with
> > single-DES keys as well as AES keys (whoops!).
> 
> Right.  I'm not too concerned.

I'm not too concerned, either, but appreciate having the discussion.

> We're only using MAC here, so the strength of the hash is not likely to be
> the limiting factor.  We could be using MD2 here, and we'd still be
> reasonably secure.  (And if someone implements this protocol with MD2
> hashing, I'd like to hear from them.)
> 
> >>> It might be worth reiterating that every time a packet goes on the wire,
> >>> it gets a fresh PC, regardless of whether it's a "retransmit" after a
> >>> timeout or a new message.
> 
> >> There are no retransmits in this protocol.
> 
> > Hence the scare quotes.  Can you guarantee that no one will look at this
> > and say "I'll just cache this challenge I send to a peer and re-send it
> > (rate limited) to that peer in response to traffic I can't authenticate,
> > until I get a valid reply"?
> 
> I cannot.  From an implementation point of view, however, this would be
> way more complicated than the correct implementation.

Okay.

> >>> Validating the HMACs is the sort of operation that we tend to recommend
> >>> be done in constnt-time to avoid side channel attacks.  I don't have a
> >>> concrete attack handy here at the moment, though.
> 
> >> I frankly have no idea if that's necessary or not.  FWIW, the protocol is
> >> asynchronous, and there is jitter applied to packets.
> 
> > My super-quick assessment is that the timing channel could be used to "pick
> > off byte by byte" an attacker's guess at the HMAC of some fixed plaintext
> > that the attacker would later replay (with valid HMAC) to some other
> > participant.
> 
> We never compare keys -- we compare MACs.  So perhaps you could gain some
> information about how many bytes of the (per-packet) MAC match, but that
> gives you no information about how many bytes of the key match.  If you
> really insist, I can mandate constant-time comparison, but I feel it would
> be confusing for the reader to mandate it without being able to explain
> how it improves the protocol's security.

I do not insist.  Most of my (trimmed from here) discussion was an attempt
to convince myself that learning just the valid MAC value (vs. the key
value, as you note) would not be of any real value to the attacker.

> >>> Any reason to not just drop the whole packet if there are multiple PCs
> >>> present?  I see this is not rfc7298bis but don't know what level of
> >>> breaking change is reasonable.
> 
> >> It doesn't matter much, it's a "cannot happen" case.  (Note that the HMAC
> >> has already been validted at this point, so it isn't a security issue.)
> 
> > Oh, it's definitely not a security issue; this is more of a philosophical
> > question of what to do in the face of a peer with a "totally broken"
> > implementation.
> 
> I really don't think it makes much importance.  Since this formulation has
> been accepted by the WG, I'd rather not go through getting a different
> approach accepted.
> 
> > The TLS ecosystem, for example, has been bitten by the need for
> > historical compatibility quirks and is now (over?)compensating by being
> > quite strict about rejecting malformed things.
> 
> Indeed, RFC 8446 is what nightmares are made off:
> 
>       Experience has shown that many servers
>       do not properly implement version negotiation, leading to "version
>       intolerance" in which the server rejects an otherwise acceptable
>       ClientHello with a version number higher than it supports.
> 
> >>> I'd suggest explicitly stating that if there is a challenge reply that
> >>> doesn't validate, the packet should be discarded.
> 
> >> This is already the case -- a packet with an incorrect challenge reply is
> >> treated just like a packet with no challenge reply.
> 
> > This was an attempt to "consider all the ways in which a peer could
> > misbehave": it could send two, one valid and one invalid; or an invalid
> > challenge reply even though it didn't need to supply one at all; or ...
> > (This is basically the same philosophical question as above.)
> 
> According to the text, the receiving peer will process all the challenge
> replies in order.  The first valid reply will cause it to accept the
> packet, the remaining ones will be ignored (Section 4.3.1.3).
> 
> Suppose that for some reason A sends to challenge requests in quick
> succession.  B will send two challenge replies, and if we're unlucky, they
> will end up in the same packet; so we end up with a packet with two
> challenge replies, only the second of which will be accepted by A.  This
> is perfectly normal (although somewhat extreme) behaviour in this protocol.

Thanks for helping me think it through.

> >>>    challenge reply is extremely cheap (it's just a bitwise comparison of
> >>>    two strings of octets), a similar optimisation for challenge replies
> >>>    is not worthwile.
> >> 
> >>> Er, challenge reply validation still requires the HMAC validation step,
> >>> right?
> 
> >> At this stage we've already verified the HMAC.  The only thing we could
> >> potentially save would be a bitwise comparison, and that's not worth it.
> 
> > Right, but just "validating a challenge reply" in vacuum could be taken to
> > include validating the HMAC on the containing packet.  An editorial comment,
> > to be sure, but "poses minimal incremental cost" would avoid it.
> 
> Done.
> 
> > Consider a multi-access link with honest A and B, and attacker C.
> > A sends a challenge request to B, that C observes and records.  B sends a
> > challenge reply, and A sends some more traffic.  What happens if C starts
> > spamming replayed copies of that challenge request towards B?  This note
> > about constructing Challenge Reply is during the preparse stage, before
> > index/PC validation.  Can C cause B to spend a lot of time generating
> > challenge replies and reduce the amount of time spent sending actual
> > data?
> 
> You're right.  I've added a requirement to rate-limit challenge replies.
> (Since we're already rate-limiting requests, this does not slow down the
> protocol any more.)

Thanks!

> >> > Section 6
> >> 
> >> >    This mechanism relies on two assumptions, as described in
> >> >    Section 1.2.  First, it assumes that the hash being used is
> >> 
> >> > s/hash/MAC/
> >> 
> >> I'm confused.  This section refers to pre-image attacks, which I was under
> >> the impression is a property of the hash.
> 
> > The thing we're sending over the wire is a MAC (sometimes HMAC-SHA-256;
> > sometimes Blake2s; potentially other things in the future).  MACs are
> > almost universally constructed using hash functions, so it's pretty natural
> > to think of them fairly interchangably, but there do exist other
> > constructions like CBC-MAC.
> 
> Fixed.
> 
> > (Also, I think technically we're concerned about *second* preimage
> > resistance, since we're effectively sending the first preimage as the
> > message.)  Getting this second-preimage resistance from CBC-MAC
> > constructs that take variable-length input requires some care, though!
> 
> Not sure.  We're not sending the key.
> 
> >>> It would require a bit more thought to convince me that 64-bit indices
> >>> are sufficient for *all* cases.
> 
> >> Assume the attacker is able to crash the node at will, and that the node
> >> needs 10s to reboot.  If we assume the attacker will start seeing
> >> collisions after 2^32 tries, then this will happen after 1360 years, which
> >> is slightly more than the duration of the Byzantine empire.
> 
> > "Attacks only get better; never worse."
> 
> > My point was to show that it's plausible for an attacker to be able to
> > trigger index generation, and that if we want to be robust in the face of
> > future attacks, we should consider a threat model that allows them to do so
> > at will.  Just because we can't see how that would happen now doesn't mean
> > very much unless we have some sort of proof to back it up.
> 
> Yes, that's why the protocol supports larger values.  However, for the
> time being, I stand by the statement that "64-bit values are believed to
> be large enough for all practical applications".

Okay.

> >>>    present at the receiver.  If the attacker is able to cause the
> >>>    (Index, PC) pair to persist for arbitrary amounts of time (e.g., by
> >>>    repeatedly causing failed challenges), then it is able to delay the
> >>>    packet by arbitrary amounts of time, even after the sender has left
> >>>    the network.
> 
> >>> I'd suggest adding another sentence describing the potential
> >>> consequences of selectively delayed input (i.e., messing up the
> >>> routing).
> 
> >> We don't know about any such consequences.  Still, we believe it is good
> >> to avoid this situation.
> 
> > Maybe I'm confused, but suppose a node advertises a really-low-metric path
> > to a prefix, but then drops off the network.
> 
> Sorry, my brain went away when I wrote that.  This is exactly the issue
> I had in mind when I suggested this requirement.  I've added "which could
> allow it to redirect or blackhole traffic to destinations previously
> advertised by the sender".

Thanks.

> >>> nit: that's a comma splice in the parenthetical; a semicolon would be
> >>> better.
> 
> >> I've added a conjunction, I didn't realise such usage is unacceptable.
> 
> > Introspection suggests that it's a pet peeve of mine,
> 
> I'm pretty sure this usage is standard in British English, which is the
> dialect I was taught.  Thank you for educating me, I shall avoid it in the
> future.

Interesting; I did not know that.

> (*My* pet peeve is the use of a comma after i.e. and e.g., but who am I to
> argue with the RFC Editor.)

My condolences!  I can't imagine that gets any easier to deal with over
time, either.

Thanks again,

Ben

[babel] Benjamin Kaduk's Discuss on draft-ietf-ba… Benjamin Kaduk via Datatracker
Re: [babel] Benjamin Kaduk's Discuss on draft-iet… Juliusz Chroboczek
Re: [babel] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk
Re: [babel] Benjamin Kaduk's Discuss on draft-iet… Juliusz Chroboczek
Re: [babel] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk