RE: draft-palanivelan-bfd-v2-gr-08

"Palanivelan A (apvelan)" <apvelan@cisco.com> Thu, 28 October 2010 12:59 UTC

Return-Path: <apvelan@cisco.com>
X-Original-To: rtg-bfd@core3.amsl.com
Delivered-To: rtg-bfd@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 5657C3A6918 for <rtg-bfd@core3.amsl.com>; Thu, 28 Oct 2010 05:59:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.376
X-Spam-Level:
X-Spam-Status: No, score=-6.376 tagged_above=-999 required=5 tests=[AWL=-3.778, BAYES_00=-2.599, HTML_MESSAGE=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VYpsetFe3NsV for <rtg-bfd@core3.amsl.com>; Thu, 28 Oct 2010 05:59:47 -0700 (PDT)
Received: from ams-iport-2.cisco.com (ams-iport-2.cisco.com [144.254.224.141]) by core3.amsl.com (Postfix) with ESMTP id 72CEB3A6875 for <rtg-bfd@ietf.org>; Thu, 28 Oct 2010 05:59:46 -0700 (PDT)
Authentication-Results: ams-iport-2.cisco.com; dkim=neutral (message not signed) header.i=none
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AtkEAH0QyUyQ/khNgWdsb2JhbACBRaAEFQEBFiIiomacIYVIBIRViQw
X-IronPort-AV: E=Sophos; i="4.58,252,1286150400"; d="scan'208,217"; a="12192757"
Received: from ams-core-4.cisco.com ([144.254.72.77]) by ams-iport-2.cisco.com with ESMTP; 28 Oct 2010 13:01:37 +0000
Received: from xbh-bgl-412.cisco.com (xbh-bgl-412.cisco.com [72.163.129.202]) by ams-core-4.cisco.com (8.14.3/8.14.3) with ESMTP id o9SD0fF7026104; Thu, 28 Oct 2010 13:01:37 GMT
Received: from xmb-bgl-411.cisco.com ([72.163.129.207]) by xbh-bgl-412.cisco.com with Microsoft SMTPSVC(6.0.3790.4675); Thu, 28 Oct 2010 18:31:36 +0530
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01CB76A0.40C4250A"
Subject: RE: draft-palanivelan-bfd-v2-gr-08
Date: Thu, 28 Oct 2010 18:31:30 +0530
Message-ID: <D4A66B38FC6C6E4F820A2470AEEA5CED02D587DF@XMB-BGL-411.cisco.com>
In-Reply-To: <C2E157D9-DB69-43D8-BB86-E148A93BA9EE@juniper.net>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: draft-palanivelan-bfd-v2-gr-08
Thread-Index: Act1+z53M/ERDyqFT9WzBPdIyTkFFgARBhBA
References: <FB649DA20153634794BEBBAB504DA1AD4506130D74@EMBX02-BNG.jnpr.net> <C2E157D9-DB69-43D8-BB86-E148A93BA9EE@juniper.net>
From: "Palanivelan A (apvelan)" <apvelan@cisco.com>
To: Dave Katz <dkatz@juniper.net>
X-OriginalArrivalTime: 28 Oct 2010 13:01:36.0704 (UTC) FILETIME=[40F82400:01CB76A0]
Cc: rtg-bfd WG <rtg-bfd@ietf.org>
X-BeenThere: rtg-bfd@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: "RTG Area: Bidirectional Forwarding Detection DT" <rtg-bfd.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rtg-bfd>
List-Post: <mailto:rtg-bfd@ietf.org>
List-Help: <mailto:rtg-bfd-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Oct 2010 12:59:57 -0000

 

We see BFD to be serving what it is meant to and the wide
implementations of this technology points to its acceptance. 

If we have to pick at the areas that are little grey, BFD for routing
protocols with GR comes as one of them.

 

Though the success of BFd with GR depend on the system design and that
there can be different ways of addressing the issue in line with the
design, 

This draft intended to provide a common solution that can be adopted.
This was originally intended for "standards track".

 

But this working group, within its rights felt that BFD would not look
for such an extension and that this need not be considered for a
standards track and 

I am certainly not debating on that. The independent reviewers however
proposed that this draft can be recorded for future and hence were
willing to

consider this draft for  a "Historic" status.The revisions to this draft
were all based on these review comments.

 

I hope this clarifies the intend of the draft and also the status of
this draft as on date. 

 

I guess, for someone who is facing this issue and looking  at a
solution, this draft gives an option (not the only one though)  to
address and that is my bottom point.

 

Thanks and Regards,

A.Palanivelan

 

From: Dave Katz [mailto:dkatz@juniper.net] 
Sent: Wednesday, October 27, 2010 10:49 PM
To: Palanivelan A (apvelan)
Cc: rtg-bfd WG; Santosh P K
Subject: Re: draft-palanivelan-bfd-v2-gr-08

 

I somehow seem to have missed the ongoing revision of this draft.

 

To underscore what Santosh says, I believe that GR interactions can be
dealt with in an informational BCP, without any modification of the base
spec.  RFC 5882 has a bunch of verbiage to try to address exactly this
scenario.

 

 

But to the details:

 

I don't see how there is *any* mechanism that can deal with unplanned
restart if the BFD session can't stay up long enough for the restarting
system to start sending BFD packets again.  The non-restarting system
fundamentally cannot differentiate between this and a crashed neighbor;
it must assume that the path has failed and do whatever it needs to
(take down a routing protocol adjacency, for example.)  Unless I'm
missing something, this is an unavoidable side effect of fast failure
detection, and the only way to deal with it is to make the behavior of
the dependent control protocol implementations less onerous when the
topology changes (which is frankly what folks should have done instead
of inventing GR, which in my opinion is an awful kludge).

 

If the BFD session can stay up long enough for the restarting system to
send BFD packets again, and the control protocol has GR capabilities,
the second paragraph of section 3.3 of RFC 5882 describes in general
terms what to do.  In particular, the mythical adaptation layer
described in the RFC can detect the fact that the session is actively
being reestablished (by virtue of the receipt of BFD packets from the
restarting system) and apply hysteresis to the BFD session flap.  This
will give the restarted control protocol sufficient time to signal the
GR and avoid perturbation in that layer.

 

The Diag field is intended to be *informational* only, that is, a
write-only field as far as the BFD state machine is concerned.  There is
currently *no* place in the spec where a receiving system uses this
field as part of the BFD state machine.  Using it to signal within the
protocol is outside of the architectural thinking (which, admittedly, is
not explicitly documented).  But beyond that, the Diag field is
"fragile" within the protocol;  it is easily overwritten under various
conditions based on the existing spec, and attempting to preserve its
value for signaling would require constraining the operation of the
sender.  This is hinted at in section 6.8.17 in RFC 5880 when describing
concatenated paths, for example.

 

I see no value in signaling the GR timers in the protocol;  if a
restarting system can trigger this mechanism by sending Diag 9, it could
also do so by simply cranking up the transmit and receive intervals to
whatever values it desires.  For planned restart, this is all that is
necessary (and this can be done cleanly at that time.)  For unplanned
restart, at some level it doesn't matter;  by the time you can signal
anything from the restarted system, you can run BFD at full speed, or
whatever rate you would like to signal at that time.

 

All the stuff about broadband and large fan-out seems to be beside the
point, an excuse for poor system design, and doesn't seem to add
anything to the argument.  A system that is fielding DHCP requests but
is ignoring fundamental connectivity is broken in any case (and as
Santosh points out, no mechanism in the protocol is going to help you if
the packets can't be sent and received anyhow.)  Doing successful
unplanned GR has certain requirements, among them ensuring that the
protocols involved can actually talk, and the particular example is not
a particularly good one, IMHO.

 

What am I missing?

 

--Dave

 

 

On Oct 27, 2010, at 9:10 AM, Santosh P K wrote:





Hello Palanivelan,
     I have couple of doubts on this draft. Under section 6.2. Remote
Neighbor Restart and Recovery, it's mentioned that.
 
   "When the set of systems had their BFD sessions established, with GR
   support as described in this document, when the remote neighbor
   restarts it will set the BFD diagnostics field to a value of 9
   (Neighbor Restarting) in the control packet to its neighbor (local
   system)."
 
  This draft is trying to address the unplanned restart of protocols
using BFD, as the planned outage is handled today anyway. 
 
1.    How BFD would determine that protocol using BFD has gone for
graceful restart (in case of unplanned outage), to send BFD packet with
BFD diagnostics field set to 9?  
2.    If BFD can determine that protocol using BFD has restarted with GR
enabled and that's not planned outage then can't we increase BFD session
timers instead of having MyRestartInterval and yourRestartInterval
fields?
3.    In case we miss out initial couple of BFD packets with diagnostics
field set to 9 due to BFD not having enough CPU slice on restarting
router, then at local router (helper) are we not bringing down the
session? 

 

....................................

Thanks and regards
Santosh P K