Re: [babel] Benjamin Kaduk's Discuss on draft-ietf-babel-hmac-08: (with DISCUSS and COMMENT)

Juliusz Chroboczek <jch@irif.fr> Sat, 17 August 2019 00:15 UTC

Return-Path: <jch@irif.fr>
X-Original-To: babel@ietfa.amsl.com
Delivered-To: babel@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 24356120113; Fri, 16 Aug 2019 17:15:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id J9qktfiasrOT; Fri, 16 Aug 2019 17:15:15 -0700 (PDT)
Received: from korolev.univ-paris7.fr (korolev.univ-paris7.fr [IPv6:2001:660:3301:8000::1:2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 30E8B1200DE; Fri, 16 Aug 2019 17:15:15 -0700 (PDT)
Received: from mailhub.math.univ-paris-diderot.fr (mailhub.math.univ-paris-diderot.fr [81.194.30.253]) by korolev.univ-paris7.fr (8.14.4/8.14.4/relay1/82085) with ESMTP id x7H0FAQM008902; Sat, 17 Aug 2019 02:15:10 +0200
Received: from mailhub.math.univ-paris-diderot.fr (localhost [127.0.0.1]) by mailhub.math.univ-paris-diderot.fr (Postfix) with ESMTP id D3C805750C; Sat, 17 Aug 2019 02:15:12 +0200 (CEST)
X-Virus-Scanned: amavisd-new at math.univ-paris-diderot.fr
Received: from mailhub.math.univ-paris-diderot.fr ([127.0.0.1]) by mailhub.math.univ-paris-diderot.fr (mailhub.math.univ-paris-diderot.fr [127.0.0.1]) (amavisd-new, port 10023) with ESMTP id EXcpaHQYPLS3; Sat, 17 Aug 2019 02:15:11 +0200 (CEST)
Received: from pirx.irif.fr (82-64-141-196.subs.proxad.net [82.64.141.196]) (Authenticated sender: jch) by mailhub.math.univ-paris-diderot.fr (Postfix) with ESMTPSA id E9EF75750A; Sat, 17 Aug 2019 02:15:09 +0200 (CEST)
Date: Sat, 17 Aug 2019 02:15:10 +0200
Message-ID: <87a7c8ir5d.wl-jch@irif.fr>
From: Juliusz Chroboczek <jch@irif.fr>
To: Benjamin Kaduk <kaduk@mit.edu>
Cc: The IESG <iesg@ietf.org>, draft-ietf-babel-hmac@ietf.org, Donald Eastlake <d3e3e3@gmail.com>, babel-chairs@ietf.org, babel@ietf.org
In-Reply-To: <20190814194802.GB88236@kduck.mit.edu>
References: <156521429138.8333.12124544758210076970.idtracker@ietfa.amsl.com> <87h86pcmzk.wl-jch@irif.fr> <20190814194802.GB88236@kduck.mit.edu>
User-Agent: Wanderlust/2.15.9
MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue")
Content-Type: text/plain; charset="US-ASCII"
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (korolev.univ-paris7.fr [194.254.61.138]); Sat, 17 Aug 2019 02:15:10 +0200 (CEST)
X-Miltered: at korolev with ID 5D57470E.000 by Joe's j-chkmail (http : // j-chkmail dot ensmp dot fr)!
X-j-chkmail-Enveloppe: 5D57470E.000 from mailhub.math.univ-paris-diderot.fr/mailhub.math.univ-paris-diderot.fr/null/mailhub.math.univ-paris-diderot.fr/<jch@irif.fr>
X-j-chkmail-Score: MSGID : 5D57470E.000 on korolev.univ-paris7.fr : j-chkmail score : . : R=. U=. O=. B=0.000 -> S=0.000
X-j-chkmail-Status: Ham
Archived-At: <https://mailarchive.ietf.org/arch/msg/babel/XfJaI8wuMHjvecZfmkAjU07feuc>
Subject: Re: [babel] Benjamin Kaduk's Discuss on draft-ietf-babel-hmac-08: (with DISCUSS and COMMENT)
X-BeenThere: babel@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "A list for discussion of the Babel Routing Protocol." <babel.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/babel>, <mailto:babel-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/babel/>
List-Post: <mailto:babel@ietf.org>
List-Help: <mailto:babel-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/babel>, <mailto:babel-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 17 Aug 2019 00:15:18 -0000

>>> Also in Section 3.1, if we are going to claim that a "random string of
>>> sufficient length" suffices to initialize a fresh index, we need to
>>> provide guidance on what constitutes "sufficient length" to achieve the
>>> needed property.

>> Section 6 says [...]

> And you don't want to make a forward reference to Section 6?

I don't want to, since it makes this section more difficult to read, but
I have done so.

>>> Blake2s is a keyed MAC, but is not an HMAC construction.

[...]

>> I disagree.

> Well, I am pretty uncomfortable about saying things that are not true!

Ok.  I've changed HMAC to MAC throughout the document, after consulting
the list.

>> I agree.  This is something that we considered in the design of the
>> protocol (which is why Nonces are allowed to be so large), but never
>> implemented ourselves; it would probably be some work to get it right.
>> Hence the very careful formulation ("might").  Let me know if you want to
>> suggest a different formulation.

> It's probably okay to leave this as speculative (since there's not an
> existence proof), but I think it's irresponsible to not also include a
> disclaimer of at least "Note that such schemes will need to consider what
> degree of "freshness" is required of a challenge and the risk of replay for
> the encoded state, among other things".

Added "note however that any such scheme will need to prevent cookie
replay".  (I've once implemented just such a scheme for the BitTorrent
DHT, and it's been running on hundred of thousands of nodes for almost ten
years now with no massive meltdown of the DHT; so I'm pretty confident
it's not that difficult.)

>>> This is a symmetric-keyed scenario, so most attacks that involve a
>>> compromised node will be "uninteresting", in that once the key is
>>> exposed all guarantees are lost.  However, it may still be worth noting
>>> that a compromised node can cause disruption on multi-access links
>>> without detection since there is no "end of generation" signal when a
>>> node changes its index.

>> Are we misunderstanding each other?

> A and C both have state about B, which is retained when B reboots.
> While B is still rebooting or otherwise initializing, C can forge a message
> "from" B that A will accept, and C can continue to do so until A sees the
> new (legitimate) index from B and switches over to the new generation.  So
> A has accepted false input but neither A nor B are aware of it.

Ah, I see.  It's worse than that -- if C knows the key, then it can
perform a higher seqno attack on B, it doesn't need to wait for B to reboot.

>>> Is there any need for initial PC value randomization?

>> I don't see why.

> I don't, either, but just wanted to get a second opinion from an actual
> expert.

I hope that's not me.

>>> Do we want to recommend starting at 0 or 1 (or prohibit recipients from
>>> assuming that the initial value for an index will be that)?

>> The initial value as well as the rate of increase are arbitrary.  For
>> example, a node may use a hardware clock (suitably offset to fit in 32
>> bits) as the PC.

> I agree that they're arbitrary, but have a lot of painful experience with
> protocol participants making assumptions about underspecified behavior that
> ended up not being protocol invariants, with subsequent breakage.  If you
> think the existing ecosystem is healthy enough that there's not a real risk
> of problems down the line, I can accept that, but I wanted to raise the
> topic.

The ecosystem is pretty young, but we haven't had any issues with
interoperability yet.  I think that it's fair to say that if you encode
your age in the PC, you deserve any trouble you get.

>>> Does this rekeying option require an external operation/management actor
>>> to trigger it?  It might be worth mentioning with some operational
>>> considerations.

>> Disagree.

> I think my main quesetion here is who determines, and how, "whenever a
> collision becomes likely".  Some protocols have in-band mechanisms for
> peers to make such a determination and initiate rekeying automatically;
> from what I understand, babel-hmac does not.  If we actually expect
> deployments to encounter scenarios where rekeying is necessary to preserve
> security, we should be pretty clear about who has the responsibility to do
> so.

I've changed this to "by rekeying often enough that collisions are unlikely".

Here's some background.  Babel's evolution until now has been driven by
the needs of our users; this has led to two very interesting extensions
(Source-specific and RTT), and two extensions that are not being deployed
(diversity routing and ToS-specific).  The security extensions, on the
other hand, have been designed due to IETF requirements, and they have no
users known to us.  We are waiting for the community to react to the code
being available to discuss key management with them.

Options include:

  - static keying, rekeying never happens;
  - central node that performs periodic key rotation (this could be as
    simple as a cron job that uses ssh to distribute keys);
  - key negotiation using another protocol (e.g. HNCP, the Homenet thing).

I'm keeping an open mind; the only thing I'm planning to refuse is in-band
key distribution or negotiation -- Babel is a routing protocol, not a key
distribution protocol.

>> >    o  among different nodes, it is only vulnerable to immediate replay:
>> >       if a node A has accepted a packet from C as valid, then a node B
>> >       will only accept a copy of that packet as authentic if B has
>> >       accepted an older packet from C and B has received no later packet
>> >       from C.
>> 
>> > nit: I don't think "A has accepted a packet from C" is quite the right
>> > precondition; it seems to be more like "A has received a valid packet
>> > from C", since whether or not A (as an attacker) considers it valid is
>> > irrelevant to whether (honest) B will.
>> 
>> C is the attacker here, A is honest.

> I was (apparently) reading this differently -- "replay" could be initiated
> by an RFC 3552 attacker in the network even without knowledge of the key.

I see.  Reworded, thanks.

>>> Section 4.1
>> 
>>> If we had identifiers for symmetric keys or HMAC algorithms, we could
>>> include those identifiers in the pseudo-header and thereby gain some
>>> protection from downgrade/HMAC-stripping attacks in the presence of a
>>> weak keyed MAC algorithm.

>> This is a symmetric algorithm, for downgrade attacks to work the victim
>> would need to be configured with a weak key.  I therefore don't see how
>> this added complexity helps.

> I've seen plenty of cases where systems are still configured with
> single-DES keys as well as AES keys (whoops!).

Right.  I'm not too concerned.

We're only using MAC here, so the strength of the hash is not likely to be
the limiting factor.  We could be using MD2 here, and we'd still be
reasonably secure.  (And if someone implements this protocol with MD2
hashing, I'd like to hear from them.)

>>> It might be worth reiterating that every time a packet goes on the wire,
>>> it gets a fresh PC, regardless of whether it's a "retransmit" after a
>>> timeout or a new message.

>> There are no retransmits in this protocol.

> Hence the scare quotes.  Can you guarantee that no one will look at this
> and say "I'll just cache this challenge I send to a peer and re-send it
> (rate limited) to that peer in response to traffic I can't authenticate,
> until I get a valid reply"?

I cannot.  From an implementation point of view, however, this would be
way more complicated than the correct implementation.

>>> Validating the HMACs is the sort of operation that we tend to recommend
>>> be done in constnt-time to avoid side channel attacks.  I don't have a
>>> concrete attack handy here at the moment, though.

>> I frankly have no idea if that's necessary or not.  FWIW, the protocol is
>> asynchronous, and there is jitter applied to packets.

> My super-quick assessment is that the timing channel could be used to "pick
> off byte by byte" an attacker's guess at the HMAC of some fixed plaintext
> that the attacker would later replay (with valid HMAC) to some other
> participant.

We never compare keys -- we compare MACs.  So perhaps you could gain some
information about how many bytes of the (per-packet) MAC match, but that
gives you no information about how many bytes of the key match.  If you
really insist, I can mandate constant-time comparison, but I feel it would
be confusing for the reader to mandate it without being able to explain
how it improves the protocol's security.

>>> Any reason to not just drop the whole packet if there are multiple PCs
>>> present?  I see this is not rfc7298bis but don't know what level of
>>> breaking change is reasonable.

>> It doesn't matter much, it's a "cannot happen" case.  (Note that the HMAC
>> has already been validted at this point, so it isn't a security issue.)

> Oh, it's definitely not a security issue; this is more of a philosophical
> question of what to do in the face of a peer with a "totally broken"
> implementation.

I really don't think it makes much importance.  Since this formulation has
been accepted by the WG, I'd rather not go through getting a different
approach accepted.

> The TLS ecosystem, for example, has been bitten by the need for
> historical compatibility quirks and is now (over?)compensating by being
> quite strict about rejecting malformed things.

Indeed, RFC 8446 is what nightmares are made off:

      Experience has shown that many servers
      do not properly implement version negotiation, leading to "version
      intolerance" in which the server rejects an otherwise acceptable
      ClientHello with a version number higher than it supports.

>>> I'd suggest explicitly stating that if there is a challenge reply that
>>> doesn't validate, the packet should be discarded.

>> This is already the case -- a packet with an incorrect challenge reply is
>> treated just like a packet with no challenge reply.

> This was an attempt to "consider all the ways in which a peer could
> misbehave": it could send two, one valid and one invalid; or an invalid
> challenge reply even though it didn't need to supply one at all; or ...
> (This is basically the same philosophical question as above.)

According to the text, the receiving peer will process all the challenge
replies in order.  The first valid reply will cause it to accept the
packet, the remaining ones will be ignored (Section 4.3.1.3).

Suppose that for some reason A sends to challenge requests in quick
succession.  B will send two challenge replies, and if we're unlucky, they
will end up in the same packet; so we end up with a packet with two
challenge replies, only the second of which will be accepted by A.  This
is perfectly normal (although somewhat extreme) behaviour in this protocol.

>>>    challenge reply is extremely cheap (it's just a bitwise comparison of
>>>    two strings of octets), a similar optimisation for challenge replies
>>>    is not worthwile.
>> 
>>> Er, challenge reply validation still requires the HMAC validation step,
>>> right?

>> At this stage we've already verified the HMAC.  The only thing we could
>> potentially save would be a bitwise comparison, and that's not worth it.

> Right, but just "validating a challenge reply" in vacuum could be taken to
> include validating the HMAC on the containing packet.  An editorial comment,
> to be sure, but "poses minimal incremental cost" would avoid it.

Done.

> Consider a multi-access link with honest A and B, and attacker C.
> A sends a challenge request to B, that C observes and records.  B sends a
> challenge reply, and A sends some more traffic.  What happens if C starts
> spamming replayed copies of that challenge request towards B?  This note
> about constructing Challenge Reply is during the preparse stage, before
> index/PC validation.  Can C cause B to spend a lot of time generating
> challenge replies and reduce the amount of time spent sending actual
> data?

You're right.  I've added a requirement to rate-limit challenge replies.
(Since we're already rate-limiting requests, this does not slow down the
protocol any more.)

>> > Section 6
>> 
>> >    This mechanism relies on two assumptions, as described in
>> >    Section 1.2.  First, it assumes that the hash being used is
>> 
>> > s/hash/MAC/
>> 
>> I'm confused.  This section refers to pre-image attacks, which I was under
>> the impression is a property of the hash.

> The thing we're sending over the wire is a MAC (sometimes HMAC-SHA-256;
> sometimes Blake2s; potentially other things in the future).  MACs are
> almost universally constructed using hash functions, so it's pretty natural
> to think of them fairly interchangably, but there do exist other
> constructions like CBC-MAC.

Fixed.

> (Also, I think technically we're concerned about *second* preimage
> resistance, since we're effectively sending the first preimage as the
> message.)  Getting this second-preimage resistance from CBC-MAC
> constructs that take variable-length input requires some care, though!

Not sure.  We're not sending the key.

>>> It would require a bit more thought to convince me that 64-bit indices
>>> are sufficient for *all* cases.

>> Assume the attacker is able to crash the node at will, and that the node
>> needs 10s to reboot.  If we assume the attacker will start seeing
>> collisions after 2^32 tries, then this will happen after 1360 years, which
>> is slightly more than the duration of the Byzantine empire.

> "Attacks only get better; never worse."

> My point was to show that it's plausible for an attacker to be able to
> trigger index generation, and that if we want to be robust in the face of
> future attacks, we should consider a threat model that allows them to do so
> at will.  Just because we can't see how that would happen now doesn't mean
> very much unless we have some sort of proof to back it up.

Yes, that's why the protocol supports larger values.  However, for the
time being, I stand by the statement that "64-bit values are believed to
be large enough for all practical applications".

>>>    present at the receiver.  If the attacker is able to cause the
>>>    (Index, PC) pair to persist for arbitrary amounts of time (e.g., by
>>>    repeatedly causing failed challenges), then it is able to delay the
>>>    packet by arbitrary amounts of time, even after the sender has left
>>>    the network.

>>> I'd suggest adding another sentence describing the potential
>>> consequences of selectively delayed input (i.e., messing up the
>>> routing).

>> We don't know about any such consequences.  Still, we believe it is good
>> to avoid this situation.

> Maybe I'm confused, but suppose a node advertises a really-low-metric path
> to a prefix, but then drops off the network.

Sorry, my brain went away when I wrote that.  This is exactly the issue
I had in mind when I suggested this requirement.  I've added "which could
allow it to redirect or blackhole traffic to destinations previously
advertised by the sender".

>>> nit: that's a comma splice in the parenthetical; a semicolon would be
>>> better.

>> I've added a conjunction, I didn't realise such usage is unacceptable.

> Introspection suggests that it's a pet peeve of mine,

I'm pretty sure this usage is standard in British English, which is the
dialect I was taught.  Thank you for educating me, I shall avoid it in the
future.

(*My* pet peeve is the use of a comma after i.e. and e.g., but who am I to
argue with the RFC Editor.)

-- Juliusz