Re: [babel] Benjamin Kaduk's Discuss on draft-ietf-babel-hmac-08: (with DISCUSS and COMMENT)

Juliusz Chroboczek <jch@irif.fr> Sat, 10 August 2019 10:51 UTC

Date: Sat, 10 Aug 2019 12:51:11 +0200
Message-ID: <87h86pcmzk.wl-jch@irif.fr>
From: Juliusz Chroboczek <jch@irif.fr>
To: Benjamin Kaduk <kaduk@mit.edu>
Cc: The IESG <iesg@ietf.org>, draft-ietf-babel-hmac@ietf.org, Donald Eastlake <d3e3e3@gmail.com>, babel-chairs@ietf.org, babel@ietf.org
In-Reply-To: <156521429138.8333.12124544758210076970.idtracker@ietfa.amsl.com>
References: <156521429138.8333.12124544758210076970.idtracker@ietfa.amsl.com>
User-Agent: Wanderlust/2.15.9
MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue")
Content-Type: text/plain; charset="US-ASCII"
Archived-At: <https://mailarchive.ietf.org/arch/msg/babel/Ya9Nt7Tucr6YyPubm7Sb80fEV-A>
Subject: Re: [babel] Benjamin Kaduk's Discuss on draft-ietf-babel-hmac-08: (with DISCUSS and COMMENT)
Precedence: list

Dear Benjamin,

Thank you for your review.

> Are the HMAC keys required to be the hash function's block size or its
> output size?  Section 3.1 says just "the length of each key is exactly
> the hash size of the associated HMAC algorithm", and "hash size"
> conventionally refers to the output length.  The referenced Section 2 of
> RFC 2104 concerns itself with the hash's compression function's block
> size B, which is generally different.

The block size, good catch.

> Also in Section 3.1, if we are going to claim that a "random string of
> sufficient length" suffices to initialize a fresh index, we need to
> provide guidance on what constitutes "sufficient length" to achieve the
> needed property.

Section 6 says

   This
   property can be satisfied either by using a cryptographically secure
   random number generator to generate indices and nonces that contain
   enough entropy (64-bit values are believed to be large enough for all
   practical applications), or by using a reliably monotonic hardware
   clock.

> Blake2s is a keyed MAC, but is not an HMAC construction.  If we are to
> allow its usage for providing integrity protection of babel packets
> directly, we therefore cannot refer to the preotection scheme as "HMAC"
> generically.  Fixing this will, unfortunately, be somewhat invasive to
> the document, since we mention HMAC all over the place.  I believe that
> "Keyed Message Authentication Code (Keyed MAC)" is an appropriate
> replacement description.

I disagree.  While you're technically correct, for the typical network
operator "HMAC" is a familiar term, "Keyed MAC" is not.  I think that
renaming this document to "Keyed MAC" would make it less informative.

> The suggestion that the large challenge nonce size admits storage of
> state in a secure "cookie" in the nonce is true, however, implementing
> this properly presents some subtleties, and it seems like something of
> an attractive nuisance to suggest that it is possible without giving
> adequate guidance at how to do it safely.  Unfortunately, the best
> reference I can think of, offhand, is the obsoleted RFC 5077.

I agree.  This is something that we considered in the design of the
protocol (which is why Nonces are allowed to be so large), but never
implemented ourselves; it would probably be some work to get it right.
Hence the very careful formulation ("might").  Let me know if you want to
suggest a different formulation.

> Let's also have a discussion about whether 64 bits of randomness is
> always sufficient; I left a longer note down in the Comment since I
> don't expect this to end up being a blocking point.

Let's.  See below.

> ----------------------------------------------------------------------
> COMMENT:
> ----------------------------------------------------------------------

> This is a symmetric-keyed scenario, so most attacks that involve a
> compromised node will be "uninteresting", in that once the key is
> exposed all guarantees are lost.  However, it may still be worth noting
> that a compromised node can cause disruption on multi-access links
> without detection since there is no "end of generation" signal when a
> node changes its index.  That is, if node B reboots or otherwise resets
> its index/pc, then compromised node C can spoof packets from B with the
> previous index and honest node A will accept them, and B will be unable
> to detect that it has been spoofed.  On the flip side, we may want to
> discuss that B can watch for messages that spoof its source address to
> detect compromised nodes.

I may be misunderstanding what you mean, but I think that's not the case.
If B reboots, then:

  - if A has state about B, then any previous PC will be rejected;
  - if A has no state about B, then A will send a challenge which C won't
  - be able to reply to.

Are we misunderstanding each other?

> Is there any need for initial PC value randomization?

I don't see why.

> Do we want to recommend starting at 0 or 1 (or prohibit recipients from
> assuming that the initial value for an index will be that)?

The initial value as well as the rate of increase are arbitrary.  For
example, a node may use a hardware clock (suitably offset to fit in 32
bits) as the PC.

> Section 1

> "This document obsoletes RFC 7298" should be in the Introduction as well
> as the abstract.

Done.

> Is the capability for an attacker to modify/spoof Babel packets in
> order to cause data to get dropped or cause a routing loop worth
> mentioning here?

No, that's a particular case of redirecting.

> Section 1.2

>    o  that the Hashed Message Authentication Code (HMAC) being used is
>       invulnerable to pre-image attacks, i.e., that an attacker is
>       unable to generate a packet with a correct HMAC;

> I think it's more conventional to include the caveat "without [access
> to/knowledge of] the secret key" for this sort of statement about HMAC.

Agreed, done.

>    The first assumption is a property of the HMAC being used.  The
>    second assumption can be met either by using a robust random number
>    generator [RFC4086] and sufficiently large indices and nonces, by
>    using a reliable hardware clock, or by rekeying whenever a collision
>    becomes likely.

> Does this rekeying option require an external operation/management actor
> to trigger it?  It might be worth mentioning with some operational
> considerations.

Disagree.

>    o  among different nodes, it is only vulnerable to immediate replay:
>       if a node A has accepted a packet from C as valid, then a node B
>       will only accept a copy of that packet as authentic if B has
>       accepted an older packet from C and B has received no later packet
>       from C.

> nit: I don't think "A has accepted a packet from C" is quite the right
> precondition; it seems to be more like "A has received a valid packet
> from C", since whether or not A (as an attacker) considers it valid is
> irrelevant to whether (honest) B will.

C is the attacker here, A is honest.

> Section 4.1

> If we had identifiers for symmetric keys or HMAC algorithms, we could
> include those identifiers in the pseudo-header and thereby gain some
> protection from downgrade/HMAC-stripping attacks in the presence of a
> weak keyed MAC algorithm.  (I think we have to include both what we are
> sending and what we think the peer can do in order to get substantial
> protection, though, which diminishes the appeal for multicast
> scenarios.)

This is a symmetric algorithm, for downgrade attacks to work the victim
would need to be configured with a weak key.  I therefore don't see how
this added complexity helps.

> nit: I don't think the past tense is correct for "packet was carried
> over IPvN", since we're talking about a pseudo-header used in
> computations before the packet is sent.

Agreed, done.

> It might be worth reiterating that every time a packet goes on the wire,
> it gets a fresh PC, regardless of whether it's a "retransmit" after a
> timeout or a new message.

There are no retransmits in this protocol.

>    interface MTU (Section 4 of [RFC6126bis]).  For an interface on which
>    HMAC protection is configured, the TLV aggregation logic MUST take
>    into account the overhead due to PC TLVs (one in each packet) and
>    HMAC TLVs (one per configured key).

> (per configured key, and also per packet, right?)

Of course.  The MTU applies per packet.

> Does it matter whether the sender increments the PC before or after
> inserting it in the PC TLV?  (I think the only potential impact would be
> as it relates to the value sent in response to a challenge nonce, but
> the "increment by a positive not-necessarily-one amount" property may
> provide all the flexibility we need.)

I don't think so.

> Section 4.3

> Validating the HMACs is the sort of operation that we tend to recommend
> be done in constnt-time to avoid side channel attacks.  I don't have a
> concrete attack handy here at the moment, though.

I frankly have no idea if that's necessary or not.  FWIW, the protocol is
asynchronous, and there is jitter applied to packets.

>       When a PC TLV is encountered, the enclosed PC and Index are saved
>       for later processing; if multiple PCs are found (which should not
>       happen, see Section 4.2 above), only the first one is processed,
>       the remaining ones MUST be silently ignored.  If a Challenge

> Any reason to not just drop the whole packet if there are multiple PCs
> present?  I see this is not rfc7298bis but don't know what level of
> breaking change is reasonable.

It doesn't matter much, it's a "cannot happen" case.  (Note that the HMAC
has already been validted at this point, so it isn't a security issue.)

>    o  The preparse phase above has yielded two pieces of data: the PC
>       and Index from the first PC TLV, and a bit indicating whether the
>       packet contains a successful Challenge Reply.  If the packet does
>       not contain a PC TLV, the packet MUST be dropped and processing
>       stops at this point.  If the packet contains a successful
>       Challenge Reply, then the PC and Index contained in the PC TLV
>       MUST be stored in the Neighbour Table entry corresponding to the
>       sender (which already exists in this case), and the packet is
>       accepted.

> I'd suggest explicitly stating that if there is a challenge reply that
> doesn't validate, the packet should be discarded.
> Or are there multicast scenarios where that is not the case? The key
> point being to emphasize that just the presence of a challenge reply
> doesn't mean anything, it has to be valid in order to have significance.

This is already the case -- a packet with an incorrect challenge reply is
treated just like a packet with no challenge reply.

I don't feel comfortable with discarding a packet just because it contains
an obsolete challenge reply -- what if the packet also contains a challenge
request?  It also complicates the code, by requiring three challenge
verdicts (positive/neutral/negative) rather than just two (positive/neutral).

>    o  At this stage, the packet contains no successful challenge reply
>       and the Index contained in the PC TLV is equal to the Index in the
>       Neighbour Table entry corresponding to the sender.  The receiver
>       compares the received PC with the PC contained in the Neighbour
>       Table; if the received PC is smaller or equal than the PC
>       contained in the Neighbour Table, the packet MUST be dropped and
>       processing stops (no challenge is sent in this case, since the
>       mismatch might be caused by harmless packet reordering on the
>       link).  Otherwise, the PC contained in the Neighbour Table entry
>       is set to the received PC, and the packet is accepted.

> Does this mean that if packet reordering is encountered, we will just
> not process packets that get reordered later?  (AFAIK babel will still
> work fine in such conditions, so I'm just checking my understanding.)

Correct on both counts.  The WG did consider using a sliding window in the
style of DTLS, but we finally decided against it for the sake of simplicity.
Since this is a link-local protocol, packet reordering is unlikely.

>    it MAY ignore a challenge request in the case where it it contained

> nit: s/it it/it is/

Done, thanks.

>    The same is true of challenge replies.  However, since validating a
>    challenge reply is extremely cheap (it's just a bitwise comparison of
>    two strings of octets), a similar optimisation for challenge replies
>    is not worthwile.

> Er, challenge reply validation still requires the HMAC validation step,
> right?

At this stage we've already verified the HMAC.  The only thing we could
potentially save would be a bitwise comparison, and that's not worth it.

> Section 4.3.1.1

>    When it encounters a mismatched Index during the preparse phase, a
>    node picks a nonce that it has never used with any of the keys
>    currently configured on the relevant interface, for example by
>    drawing a sufficiently large random string of bytes or by consulting

> (same comment as above about "sufficiently large")

See Section 6.

> Section 4.3.1.2

>    buffered TLVs in the same packet as the Challenge Reply.  However, it
>    MUST arrange for the Challenge Reply to be sent in a timely manner
>    (within a few seconds), and SHOULD NOT send any other packets over
>    the same interface before sending the Challenge Reply, as those would
>    be dropped by the challenger.

> I think this "SHOULD NOT" (or rather, "would be dropped by the
> challenger") is predicated on the challenge request having not been a
> replay, but I do not see anything requiring the recipient to do nonce
> uniqueness validation.

This doesn't require delaying any packets, quite the opposite, it says
that you SHOULD send out the challenge reply before sending out any more
packets.

> Section 4.3.1.3

>    neighbour that sent the Challenge Reply.  If no challenge is in
>    progress, i.e., if there is no Nonce stored in the Neighbour
>    Table entry or the Challenge timer has expired, the Challenge Reply
>    MUST be silently ignored and the challenge has failed.

> I think "the challenge has failed" is predicated on the challenge reply
> being in response to a challenge sent by this node.  The previous
> section's "send the Challenge Reply to the unicast address" seems to
> imply that there are no multicast scenarios which would make that not
> the case, but I just wanted to check my understanding.

That's right.

> Section 5

> Do we need to say whether sub-TLVs are allowed in any of these TLVs?
> (Presumably they are not, since the length is needed in order to
> identify the length of the variable-length fields, but being explicit
> can be useful.)

6126bis only specifies which TLVs do allow sub-TLVs, I think it makes
sense to follow the same format.

> Section 5.1

>    This [HMAC] TLV is allowed in the packet trailer (see Section 4.2 of
>    [RFC6126bis]), and MUST be ignored if it is found in the packet body.

> side note: Using "MUST ignore" vs. "discard the packet" has some
> protocol evolution consequences -- it in practice then becomes an
> alternative padding technique for use in packet bodies, and if ever used
> as such then could lead to a way to fingerprint an implementation or be
> used as a hidden channel for sending other data.  But, I see that
> ignoring at the TLV level is something of a core babel design choice,

Right.

> and I don't see any serious consequences that would merit revisiting
> that decision.

Good.

> Section 6

>    This mechanism relies on two assumptions, as described in
>    Section 1.2.  First, it assumes that the hash being used is

> s/hash/MAC/

I'm confused.  This section refers to pre-image attacks, which I was under
the impression is a property of the hash.

> It would require a bit more thought to convince me that 64-bit indices
> are sufficient for *all* cases.  Specifically, if we want full 64-bit
> strength, then the 64-bit space cannot be controlled or affected by the
> attacker to cause collisions.  But I think there will be a reasonable
> risk that an attacker can cause a given node to need to regenerate its
> index on demand (e.g,. but triggering a bug that crashes it, or power
> cycling it)

Assume the attacker is able to crash the node at will, and that the node
needs 10s to reboot.  If we assume the attacker will start seeing
collisions after 2^32 tries, then this will happen after 1360 years, which
is slightly more than the duration of the Byzantine empire.

The Babel working group recommends rekeying whenever Anatolia is invaded.

>    present at the receiver.  If the attacker is able to cause the
>    (Index, PC) pair to persist for arbitrary amounts of time (e.g., by
>    repeatedly causing failed challenges), then it is able to delay the
>    packet by arbitrary amounts of time, even after the sender has left
>    the network.

> I'd suggest adding another sentence describing the potential
> consequences of selectively delayed input (i.e., messing up the
> routing).

We don't know about any such consequences.  Still, we believe it is good
to avoid this situation.

>    protocol (the data structures described in Section 3.2 of
>    [RFC6126bis] are conceptual, any data structure that yields the same
>    result may be used).  Implementers might also consider using the fact

> nit: that's a comma splice in the parenthetical; a semicolon would be better.

I've added a conjunction, I didn't realise such usage is unacceptable.

-- Juliusz

[babel] Benjamin Kaduk's Discuss on draft-ietf-ba… Benjamin Kaduk via Datatracker
Re: [babel] Benjamin Kaduk's Discuss on draft-iet… Juliusz Chroboczek
Re: [babel] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk
Re: [babel] Benjamin Kaduk's Discuss on draft-iet… Juliusz Chroboczek
Re: [babel] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk