Re: BFD WG adoption for draft-haas-bfd-large-packets

"Acee Lindem (acee)" <acee@cisco.com> Tue, 23 October 2018 16:30 UTC

Return-Path: <acee@cisco.com>
X-Original-To: rtg-bfd@ietfa.amsl.com
Delivered-To: rtg-bfd@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0F9C6124408 for <rtg-bfd@ietfa.amsl.com>; Tue, 23 Oct 2018 09:30:59 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -14.5
X-Spam-Level:
X-Spam-Status: No, score=-14.5 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 68hFKoZ6GniQ for <rtg-bfd@ietfa.amsl.com>; Tue, 23 Oct 2018 09:30:57 -0700 (PDT)
Received: from rcdn-iport-3.cisco.com (rcdn-iport-3.cisco.com [173.37.86.74]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A500E130E3D for <rtg-bfd@ietf.org>; Tue, 23 Oct 2018 09:30:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=29166; q=dns/txt; s=iport; t=1540312254; x=1541521854; h=from:to:subject:date:message-id:references:in-reply-to: mime-version; bh=IdAEn6nAP4kmERZddrUZ9n0SJ9Sj9r4N/DyQV3X+aDU=; b=lQQaA2m7wrysTwPl5kXpYcYjOfc0Fb0eKoYDeqV27aTgWVaaNt12AQr2 RR9dh2GPuer6yUzMcedKpsF6quX1NdmCtg6C9KQknVKgJMGVKNnuvGCjX H5jmF56urQ7bdAv3BHp1aN4lUmQlx8NTttuJ4XXPEhUojjr07XSypkMXG 0=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A0AHAAC4S89b/5NdJa1jGgEBAQEBAgEBAQEHAgEBAQGBUgQBAQEBCwGBDU0qZn8oCoNrlDWBaCWXFYF6CwEBhGwCF4UTITUMDQEDAQECAQECbSiFOgEBAQQjZgIBCA4DAwECIQcDAgICMBQJCAIEARIbBII3SwGBHWSnNIEuiiKLYheCAIEQAScME4E3gRWFFQkGEIJNMYImAo43hhKJM1QJAolDhy8XgVKEdIlqlkQCERSBJh8DM4FVcBVlAYJBgiMafQECB40Tb4piK4EBgR8BAQ
X-IronPort-AV: E=Sophos;i="5.54,416,1534809600"; d="scan'208,217";a="460245845"
Received: from rcdn-core-11.cisco.com ([173.37.93.147]) by rcdn-iport-3.cisco.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 23 Oct 2018 16:30:53 +0000
Received: from XCH-RTP-002.cisco.com (xch-rtp-002.cisco.com [64.101.220.142]) by rcdn-core-11.cisco.com (8.15.2/8.15.2) with ESMTPS id w9NGUrjo030373 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=FAIL); Tue, 23 Oct 2018 16:30:53 GMT
Received: from xch-rtp-015.cisco.com (64.101.220.155) by XCH-RTP-002.cisco.com (64.101.220.142) with Microsoft SMTP Server (TLS) id 15.0.1395.4; Tue, 23 Oct 2018 12:30:52 -0400
Received: from xch-rtp-015.cisco.com ([64.101.220.155]) by XCH-RTP-015.cisco.com ([64.101.220.155]) with mapi id 15.00.1395.000; Tue, 23 Oct 2018 12:30:52 -0400
From: "Acee Lindem (acee)" <acee@cisco.com>
To: Albert Fu <afu14@bloomberg.net>, "rtg-bfd@ietf.org" <rtg-bfd@ietf.org>, "Les Ginsberg (ginsberg)" <ginsberg@cisco.com>
Subject: Re: BFD WG adoption for draft-haas-bfd-large-packets
Thread-Topic: BFD WG adoption for draft-haas-bfd-large-packets
Thread-Index: AQHUaudK2obCS5tWFUCJRL2ls+mZdKUtBbeA
Date: Tue, 23 Oct 2018 16:30:52 +0000
Message-ID: <5C5FC034-3730-41AD-8414-5AD7DEA5AE89@cisco.com>
References: <5BCF41E0027F048C00390652_0_50208@msllnjpmsgsv06>
In-Reply-To: <5BCF41E0027F048C00390652_0_50208@msllnjpmsgsv06>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-ms-exchange-messagesentrepresentingtype: 1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [10.116.152.200]
Content-Type: multipart/alternative; boundary="_000_5C5FC034373041AD84145AD7DEA5AE89ciscocom_"
MIME-Version: 1.0
X-Outbound-SMTP-Client: 64.101.220.142, xch-rtp-002.cisco.com
X-Outbound-Node: rcdn-core-11.cisco.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtg-bfd/hStkSagcbhV92DHbtd8SNJcuVgQ>
X-BeenThere: rtg-bfd@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "RTG Area: Bidirectional Forwarding Detection DT" <rtg-bfd.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtg-bfd/>
List-Post: <mailto:rtg-bfd@ietf.org>
List-Help: <mailto:rtg-bfd-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 23 Oct 2018 16:30:59 -0000

Hi Albert, Les,

I tend to agree with Les that BFD doesn’t seem like the right protocol for this. Note that if you use OSPF as your IGP and flap the interface when the MTU changes, you’ll detect MTU mismatches immediately due to OSPF’s DB exchange MTU negotiation. Granted, control plane detection won’t detect data plane bugs resulting in MTU fluctuations but I don’t see this as a frequent event.

Thanks,
Acee

From: Rtg-bfd <rtg-bfd-bounces@ietf.org> on behalf of "Albert Fu (BLOOMBERG/ 120 PARK)" <afu14@bloomberg.net>
Reply-To: Albert Fu <afu14@bloomberg.net>
Date: Tuesday, October 23, 2018 at 11:44 AM
To: "rtg-bfd@ietf.org" <rtg-bfd@ietf.org>, "Les Ginsberg (ginsberg)" <ginsberg@cisco.com>
Subject: RE: BFD WG adoption for draft-haas-bfd-large-packets

Hi Les,

Given that it takes relative lengthy time to troubleshoot the MTU issue, and the associated impact on customer traffic, it is important to have a reliable and fast mechanism to detect the issue.

I believe BFD, especially for single hop control-plane independent situation (btw, this covers majority of our BFD use case), is indeed an ideal and reliable solution for this purpose. It is also closely tied with the routing protocols, and enable traffic to be diverted very quickly.

The choice of BFD timer is also one of the design tradeoffs - low BFD detection timer will cause more network churns. We do not need extremely aggressive BFD timer to achieve fast convergence. For example, with protection, we can achieve end to end sub-second convergence by using relatively high BFD interval of 150ms.

In the case where the path will be used for a variety of encapsulations (e.g. Pure IP and L3VPN traffic), we would set the BFD padding to cater for the largest possible payload. So, in our case, our link needs to carry a mix of pure IP (1500 max payload) and MPLS traffic (1500 + 3 headers), we would set the padding so that the total padded BFD packet size is 1512 bytes.

As you rightly pointed out, ISIS routing protocol does support hello padding, but since this is a control plane process, we can not use aggressive timer. The lowest hello interval the can be configured is 1s, so with default multiplier of 3, the best we can achieve is 3s detection time.

What we would like is a simple mechanism to validate that a link can indeed carry the expected max payload size before we put it into production. If an issue occurs where this is no longer the case (e.g. due to outages or re-routing within the Telco circuit), we would like a reliable mechanism to detect this, and also divert traffic around the link quickly. I feel BFD is a good method for this purpose.

Thanks
Albert

From: ginsberg@cisco.com At: 10/23/18 10:45:02
To: Albert Fu (BLOOMBERG/ 120 PARK ) <mailto:afu14@bloomberg.net> , rtg-bfd@ietf.org<mailto:rtg-bfd@ietf.org>
Subject: RE: BFD WG adoption for draft-haas-bfd-large-packets
Albert –

Please understand that I fully agree with the importance of being able to detect/report MTU issues. In my own experience this can be a difficult problem to diagnose. You do not have to convince me that some improvement in detection/reporting is needed. The question really is whether using BFD is the best option.

Could you respond to my original questions – particularly why sub-second detection of this issue is a requirement?

For your convenience:

<snip>
It has been stated that there is a need for sub-second detection of this condition – but I really question that requirement.
What I would expect is that MTU changes only occur as a result of some maintenance operation (configuration change, link addition/bringup, insertion of a new box in the physical path etc.). The idea of using a mechanism which is specifically tailored for sub-second detection to monitor something that is only going to change occasionally seems inappropriate. It makes me think that other mechanisms (some form of OAM, enhancements to routing protocols to do what IS-IS already does •) could be more appropriate and would still meet the operational requirements.

I have listened to the Montreal recording – and I know there was discussion related to these issues (not sending padded packets all the time, use of BFD echo, etc.) – but I would be interested in more discussion of the need for sub-second detection.

Also, given that a path might be used with a variety of encapsulations, how do you see such a mechanism being used when multiple BFD clients share the same BFD session and their MTU constraints are different?
<end snip>

Thanx.

   Les