Re: [Idr] Secdir last call review of draft-ietf-idr-long-lived-gr-05

Valery Smyslov <valery@smyslov.net> Thu, 13 July 2023 08:43 UTC

Return-Path: <valery@smyslov.net>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0166CC15106F; Thu, 13 Jul 2023 01:43:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.096
X-Spam-Level:
X-Spam-Status: No, score=-7.096 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=smyslov.net
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GwZwcSFhG6xD; Thu, 13 Jul 2023 01:43:06 -0700 (PDT)
Received: from direct.host-care.com (direct.host-care.com [198.136.54.115]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E1E0CC14CE33; Thu, 13 Jul 2023 01:43:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=smyslov.net ; s=default; h=Content-Transfer-Encoding:Content-Type:MIME-Version:Message-ID :Date:Subject:In-Reply-To:References:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=8hY6hOWz8aVi7POOW0m2eN9NWNwO5ufy9Lwy2/S4XP8=; b=QmSa8eNdv7eB2Lp7y3DdPrwDuc lM/Lzxn3zhHbZhBcw9BH4/3fKFXKSaDeroq/DPypQVFcv+aLNkvQpyyc177WAjwp6hE0jsUXdCh6+ wLbDH9x1Rr7/6SA9YVUA8rCswr5QGhVZHp8ZchN/4gObv2G4tCSLjOIUtkTaVPZTlTICJsH+ODIGQ /AVMEr8egNfbl7gZsO6+yzl7XViSXQk22Ml6mdiMuRGK99s5ZUmukopAQbaubFmsxyAjtVLENTeny rba4N41o2yG67wpluHGozyiRvOntiFkXZt/pJcYAU/fsK4xVGJ4a27DaU70zxaHPugmvdMIwih3gI /vEivdRA==;
Received: from [93.188.44.204] (port=64987 helo=buildpc) by direct.host-care.com with esmtpsa (TLS1.2) tls TLS_DHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from <valery@smyslov.net>) id 1qJruf-0007Vj-Sc; Thu, 13 Jul 2023 04:43:02 -0400
From: Valery Smyslov <valery@smyslov.net>
To: 'John Scudder' <jgs@juniper.net>
Cc: secdir@ietf.org, draft-ietf-idr-long-lived-gr.all@ietf.org, idr@ietf.org, last-call@ietf.org
References: <168845800740.483.1479588038121884290@ietfa.amsl.com> <71207892-ADFC-417E-B8C4-66B564C53934@juniper.net>
In-Reply-To: <71207892-ADFC-417E-B8C4-66B564C53934@juniper.net>
Date: Thu, 13 Jul 2023 11:43:00 +0300
Message-ID: <06fd01d9b566$0a2ca550$1e85eff0$@smyslov.net>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Mailer: Microsoft Outlook 14.0
Content-Language: ru
Thread-Index: AQIaq0ks/CBmjDVPdAw1+Y4xHLTveAJcYUctryJeSeA=
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - direct.host-care.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - smyslov.net
X-Get-Message-Sender-Via: direct.host-care.com: authenticated_id: valery@smyslov.net
X-Authenticated-Sender: direct.host-care.com: valery@smyslov.net
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/qK7HWZ-rwy_lkDjYHbjrRU2h3nA>
Subject: Re: [Idr] Secdir last call review of draft-ietf-idr-long-lived-gr-05
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Jul 2023 08:43:11 -0000

Hi John,

please see inline.

> Hi Valery,
> 
> Thanks for your review. Some responses inline below.
> 
> > On Jul 4, 2023, at 4:06 AM, Valery Smyslov via Datatracker <noreply@ietf.org> wrote:
> >
> > Reviewer: Valery Smyslov
> > Review result: Has Issues
> >
> > I have reviewed this document as part of the security directorate's
> > ongoing effort to review all IETF documents being processed by the
> > IESG.  These comments were written primarily for the benefit of the
> > security area directors.  Document editors and WG chairs should treat
> > these comments just like any other last call comments.
> >
> > The document defines a new BGP capability "Long-lived Graceful Restart Capability"
> > that allows stale routes to be retained for a longer time than is currently allowed
> > by RFC 4724. The document is well written and is easy to understand.
> 
> Thank you!

You welcome :-)

> > My concern is that the upper limit for the "Long-lived Stale Time" period is 2^24 - 1 seconds
> > (about 194 days) and the document doesn't specify any restrictions for this value.
> 
> I’m not sure if this is different from what you meant by “any restrictions”, but Section 4.2 has "These
> timers MAY be modified by local configuration.” After discussing it with my co-authors, we agree that
> this is too easy to overlook, and propose to change it to "The timers received in the Long-lived Graceful
> Restart Capability SHOULD be modifiable by local configuration, which may impose either an upper or a
> lower bound, or both, on their respective values.” Then, we return to this in our updated Security
> Considerations section, read on.

OK.

> > It seems to me that having such long lived stale routes may open new possibilities for attackers.
> > In particular, a possibility of a resource exhausting for storing a lot of stale routes
> > for a very long time leading to a DoS attack come first to my mind.
> > This possibility is not mentioned in the Security Considerations.
> 
> We worked through several scenarios and as best we can determine, this is adequately covered under
> "The security implications of the LLGR mechanism defined in this document are akin to those incurred by
> the maintenance of stale routing information within a network." The outline looks like:
> 
> 1. To successfully mount a DoS attack against the network, the attacker has to be able to inject a large
> number of routes. If an attacker can do that, it’s a pre-existing vulnerability, not one created by LLGR.
> 2. The new vulnerability would be, if the DoS in (1) can be exacerbated by keeping the garbage routes
> stored in the network even after the attack against the proximate victim has been remediated.
>   2.a. But, if the attack is remediated, for instance by resetting the BGP session from the attacker to the
> victim (either manually, or as a result of the operation of an automatic defense feature such as max-
> prefixes), then the routes would promptly be flushed from the network as a consequence of the normal
> operation of the BGP protocol.
>   2.b. So, in order for the attack to succeed, the proximate victim would have to be prevented from
> withdrawing the routes. Ergo, the attacker would have to have the ability to not only inject routes in (1),
> but subsequently to silence the victim router (e.g. by crashing it into a non-recoverable state).
>   2.c. Even if that scenario were to be carried out (which implies underlying vulnerabilities probably more
> concerning than the LLGR resource-exhaustion vulnerability itself) the victim router’s next hop would
> disappear from the IGP, which would cause the LLGR routes to become non-resolvable, removing them
> from the FIB. Granted that RIB resources would still be consumed for the duration of the attack or the
> LLST, whichever is shorter, but in general FIB, not RIB, resources are the bottleneck.

I was mostly thinking on something like 2.b. You are in a better position to 
analyze this scenario, so if you think that it is not a real threat, then I trust you.

> We’re not absolutely opposed to including an analysis like the above in the Security Considerations, but
> pending any further discussion, we’re comfortable with leaving it at the brief outline that’s already
> present. We did add one sentence to the introductory paragraph, so
> 
> OLD:
> The security implications of the LLGR mechanism defined in this document are akin to those incurred by
> the maintenance of stale routing information within a network.
> 
> NEW:
> The security implications of the LLGR mechanism defined in this document are akin to those incurred by
> the maintenance of stale routing information within a network. However, since the retention time may
> potentially be much longer, the window during which certain attacks are feasible may be substantially
> increased.

Fine with me, thank you.

> > Then, it seems to me that the countermeasures suggested in Section 6 to avoid VPN breach
> > may not work for large values of the "Long-lived Stale Time" period.
> >
> > And a final nit: the last para of Section 6 looks to me like some sort of excuse, which
> > in my opinion is not appropriate for a technical document. No matter how complex an attack is,
> > if it is ever feasible with the given threat model, then we should just describe it
> > with no additional sentiments that it is hard. Perhaps it is better to describe possible
> > attacks in terms of attacker's capabilities. E.g.: "If an attacker is able to inject packets
> > into the network then the following attacks are possible...".
> 
> Thanks for challenging us on these! Happily, the rewrite to fix the latter also led to improving the clarity
> of exposition regarding the countermeasure. Your point is still correct of course, that if it’s impossible to
> find a viable configuration that prevents overlap of label allocation reuse time and LLST, then the attack
> can’t be entirely ruled out; I hope the proposed text is sufficiently clear on this point. I’ve pasted the
> proposed update below.
> 
> OLD:
>    Therefore, to avoid VPN breach, before enabling BGP LLGR for a VPN
>    address family, Service Providers need to check how fast a given
>    label can be reused by a PE, taking into account:
> 
>    *  The load of the BGP route churn on a PE (in terms of the number of
>       VPN labels advertised and the churn rate).
> 
>    *  The label allocation policy on the PE (possibly depending upon the
>       size of the pool of the VPN labels (which can be restricted by
>       hardware considerations or other MPLS usages), the label
>       allocation scheme (for example per route or per VRF/CE), the re-
>       allocation policy (for example least recently used label).
> 
>    Note that [RFC4781] which defines Graceful Restart Mechanism for BGP
>    with MPLS is also applicable to BGP LLGR.
> 
>    These considerations notwithstanding, the LLGR mechanism described
>    within this document is considered to be complex to exploit
>    maliciously - in order to inject packets into a topology, there is a
>    requirement to engineer a specific LLGR state between two PE devices,
>    whilst engineering label reallocation to occur in a manner that
>    results in the two topologies overlapping.  Such allocation is
>    particularly difficult to engineer (since it is typically an internal
>    mechanism of a router).
> 
> NEW:
>    In order to exploit the vulnerability described above, there is a
>    requirement to engineer a specific LLGR state between two PE devices,
>    whilst engineering label reallocation to occur in a manner that
>    results in the two topologies overlapping.  Therefore, to avoid the
>    potential for a VPN breach, before enabling BGP LLGR for a VPN
>    address family, the operator should endeavor to ensure that the lower
>    bound on when a label might be reused is greater than the upper bound
>    on LLST.  Section 4.2 discusses the provision of an upper bound on LLST.
>    Details of features for setting a lower bound on label reuse time are
>    beyond the scope of this document; however, factors that might need
>    to be taken into account when setting this value include:
> 
>    *  The load of the BGP route churn on a PE (in terms of the number of
>       VPN labels advertised and the churn rate).
> 
>    *  The label allocation policy on the PE (possibly depending upon the
>       size of the pool of the VPN labels (which can be restricted by
>       hardware considerations or other MPLS usages), the label
>       allocation scheme (for example per route or per VRF/CE), the re-
>       allocation policy (for example least recently used label).
> 
>    Note that [RFC4781] which defines Graceful Restart Mechanism for BGP
>    with MPLS is also applicable to BGP LLGR.

Thank you, this text is much better.

> We’ll post a version 06 with the updates as soon as possible. Thanks again for your review.

No problem :-)

Regards,
Valery.

> —John