Re: [secdir] Secdir review of draft-ietf-ipsecme-failure-detection-05

Yoav Nir <> Thu, 10 March 2011 07:49 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 574DB3A67FA; Wed, 9 Mar 2011 23:49:36 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -10.419
X-Spam-Status: No, score=-10.419 tagged_above=-999 required=5 tests=[AWL=-0.120, BAYES_00=-2.599, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_HI=-8]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 9XtCp1lHWAMc; Wed, 9 Mar 2011 23:49:35 -0800 (PST)
Received: from ( []) by (Postfix) with ESMTP id E3F403A67EF; Wed, 9 Mar 2011 23:49:34 -0800 (PST)
Received: from ( []) by (8.13.8/8.13.8) with ESMTP id p2A7omYk005506; Thu, 10 Mar 2011 09:50:48 +0200
X-CheckPoint: {4D7882BB-1-1B221DC2-FFFF}
Received: from ([]) by ([]) with mapi; Thu, 10 Mar 2011 09:50:48 +0200
From: Yoav Nir <>
To: Magnus Nyström <>
Date: Thu, 10 Mar 2011 09:50:54 +0200
Thread-Topic: Secdir review of draft-ietf-ipsecme-failure-detection-05
Thread-Index: Acve996V76BNQ+NgTrOULn70eOVlpw==
Message-ID: <>
References: <>
In-Reply-To: <>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Mailman-Approved-At: Fri, 11 Mar 2011 08:19:10 -0800
Cc: "" <>, "" <>, "" <>
Subject: Re: [secdir] Secdir review of draft-ietf-ipsecme-failure-detection-05
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Security Area Directorate <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 10 Mar 2011 07:49:36 -0000

Hi Magnus, thanks for the review.  My answers are inline.

On Mar 7, 2011, at 9:14 AM, Magnus Nyström wrote:

> I have reviewed this document as part of the security directorate's
> ongoing effort to review all IETF documents being processed by the
> IESG.  These comments were written primarily for the benefit of the
> security area directors. Document editors and WG chairs should treat
> these comments just like any other last call comments.
> This document defines a new extension (the "QCD token") to the IKEv2
> protocol that allows for faster detection of SA de-synchronization.
> - General:
>  o "Quick Crash Detection": Is "crash" really the right term here? As
> the document indicates, the SA de-synchronization may have had other
> reasons than a crash...? The term "failure detection" seems more
> accurate.

There are indeed several reasons why an IKE implementation might lose state. There's bugs in implementations or underlying operating systems, which may cause state loss, and are colloquially referred to as "crashes", there are power interruptions, and every product that I know of also has a user interface command to clear state.  At least the last of these is not even a failure at all.  So we might have termed this "Quick State Desynchronization Detection" (QSDD?), but crashes were the primary motivation for writing this, so we have used the name QCD from the start.  I think it stay, as long as the introduction says that it's OK.
> - Section 1, Introduction:
>  o "However, in many cases the rebooted peer is a VPN gateway that
> protects only servers, or else the non-rebooted peer has a dynamic IP
> address" - it is not clear from this how or why the dynamic IP address
> of the non-rebooted peer impacts the tunnel re-establishment?

I agree it is a little confusing and should be split into two sentences. How about:

However, in many cases the rebooted peer is a VPN gateway that protects only server, so all traffic is inbound. In other cases, the non-rebooted peer has a dynamic IP address, so the rebooted peer cannot initiate IKE because its current IP address is unknown.

>  o Editorial: "is a quick" -> "in a quick"?

Yes. Thanks.

> - Section 2, RFC 5996 Crash Recovery :
>  o "There are cases where the peer loses state and is able to recover
> immediately; in those cases it might take several minutes to recover."
> - so a peer is able to recover immediately, yet it might take several
> minutes to recover?? Unclear what is meant here.

How about "in those cases it might still take several minutes to recover the IPsec SAs."

> - Section 5, Token Generation and Verification:
>  o (Not sure why these methods are called stateless as the QCD_SECRET
> must be maintained?)

That's because they don't have a per-tunnel persistent state. An obvious (though never described) alternative is to generate a random token for each IKE SA, and store that in non-volatile storage. This was actually the design in one of the early drafts.

>  o By adding a nonce to the token generation the attack outlined in
> Section 9.3 would be impossible, as the attacker would also need to
> guess the nonce (adding a nonce to the TOKEN_SECRET_DATA generation
> would also have the effect that even for the same SPIs, the
> TOKEN_SECRET_DATA would be different). More generally, a standard key
> derivation scheme such as the Concatenation KDF in NIST SP 800-56 may
> be considered.

Yes, but then we would have to keep the nonce in state for each IKE SA, and this would vastly increase the non-volatile storage requirements. The goal is for the rebooted gateway to generate the correct token based only on an encrypted (with keys it doesn't have) IKE packet.

> - Section 9.2, Security Considerations:
>  o Last paragraph: "This method should not be used..." - unclear what
> method is being referred to here?

Seems clear to me. The first sentence in that paragraph talks about the method in section 5.2. The following sentences begin with "This method...", referring to the same method. 

> - Appendix A.2:
>  o Would have been useful with a sentence or two that indicated why
> the working group preferred the QCD proposal over the SIR one.

   The working group preferred the QCD proposal to this one.

   The working group preferred the QCD proposal to this one, because 
   of the lack of cryptographic protection for the queries and 

> -- Magnus