Re: I-D ACTION:draft-shen-bfd-intf-p2p-nbr-00.txt

Naiming Shen <naiming@cisco.com> Sat, 31 March 2007 01:30 UTC

Return-path: <rtg-bfd-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1HXSPs-0001Of-C6; Fri, 30 Mar 2007 21:30:04 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1HXSPq-0001Oa-WB for rtg-bfd@ietf.org; Fri, 30 Mar 2007 21:30:03 -0400
Received: from sj-iport-6.cisco.com ([171.71.176.117]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1HXSPp-00035r-EJ for rtg-bfd@ietf.org; Fri, 30 Mar 2007 21:30:02 -0400
Received: from sj-dkim-3.cisco.com ([171.71.179.195]) by sj-iport-6.cisco.com with ESMTP; 30 Mar 2007 18:30:01 -0700
Received: from sj-core-2.cisco.com (sj-core-2.cisco.com [171.71.177.254]) by sj-dkim-3.cisco.com (8.12.11/8.12.11) with ESMTP id l2V1U05d024782; Fri, 30 Mar 2007 18:30:00 -0700
Received: from [128.107.98.28] ([128.107.98.28]) by sj-core-2.cisco.com (8.12.10/8.12.6) with ESMTP id l2V1TwZT023009; Sat, 31 Mar 2007 01:30:00 GMT
In-Reply-To: <448453AC-AAC4-4924-8BF2-87AC85907252@juniper.net>
References: <E1HWe9i-0004Zp-AR@stiedprstage1.ietf.org> <3D93D8A8-F2CD-4F5D-BA37-5A2489E2C3DA@cisco.com> <29C50044-05B4-412E-B0D8-4B1B6F38672F@juniper.net> <E69131CB-20D2-4B5C-8485-831D6F038AC9@cisco.com> <448453AC-AAC4-4924-8BF2-87AC85907252@juniper.net>
Mime-Version: 1.0 (Apple Message framework v752.3)
Content-Type: text/plain; charset="US-ASCII"; delsp="yes"; format="flowed"
Message-Id: <F9A4058F-65FD-46DB-A3B7-681AB089A3EB@cisco.com>
Content-Transfer-Encoding: 7bit
From: Naiming Shen <naiming@cisco.com>
Date: Fri, 30 Mar 2007 18:29:56 -0700
To: Dave Katz <dkatz@juniper.net>
X-Mailer: Apple Mail (2.752.3)
DKIM-Signature: v=0.5; a=rsa-sha256; q=dns/txt; l=6906; t=1175304600; x=1176168600; c=relaxed/simple; s=sjdkim3002; h=Content-Type:From:Subject:Content-Transfer-Encoding:MIME-Version; d=cisco.com; i=naiming@cisco.com; z=From:=20Naiming=20Shen=20<naiming@cisco.com> |Subject:=20Re=3A=20I-D=20ACTION=3Adraft-shen-bfd-intf-p2p-nbr-00.txt=20 |Sender:=20; bh=XyvgCLnGlba+QgAO8rSh4Cev3f0FNXXcZlKkrsyg89E=; b=XxBoJumuKL3zumm1jMrPZxD5nOiKT4icBy6kOm1b62UtFFeFse4pMtaQA2bf5SpHS+swAqia vf6fYI3DjLpWBezSn84aFFZkPY9aqC/prwTm5ihbGWwY2YzlfllHdCm9;
Authentication-Results: sj-dkim-3; header.From=naiming@cisco.com; dkim=pass ( sig from cisco.com/sjdkim3002 verified; );
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 87a3f533bb300b99e2a18357f3c1563d
Cc: rtg-bfd@ietf.org
Subject: Re: I-D ACTION:draft-shen-bfd-intf-p2p-nbr-00.txt
X-BeenThere: rtg-bfd@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "RTG Area: Bidirectional Forwarding Detection DT" <rtg-bfd.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:rtg-bfd@ietf.org>
List-Help: <mailto:rtg-bfd-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=subscribe>
Errors-To: rtg-bfd-bounces@ietf.org

On Mar 30, 2007, at 5:52 PM, Dave Katz wrote:

>
> On Mar 30, 2007, at 4:01 PM, Naiming Shen wrote:
>
>>>
>>> But secondly, there is a self-referential problem.  What you'd  
>>> *really* like to do on a BFD failure is to take down the failing  
>>> protocol and keep it down until the BFD session resumes.  But you  
>>> can't do this if BFD is running on top of that failing protocol  
>>> (which it will be) which thus requires the "special flag" hack  
>>> and only temporarily taking down the protocol.  The fate-sharing  
>>> implications of this are more serious, because it is possible to  
>>> get false positives in some cases when the data protocol has  
>>> failed.  Imagine, for example, running ISIS with this method.  If  
>>> IPv4 fails, the BFD session goes down, the interface flaps for  
>>> five seconds, and then is reenabled.  At that point ISIS will  
>>> happily come back up and report IP reachability even though it  
>>> doesn't exist.  You can bet that folks who have not yet  
>>> implemented BFD will be tempted to do this.
>>
>>
>> If you are talking about sharing fate between ipv4 bfd and ISIS is  
>> a problem, then I agree,
>> this scheme is only meant for sharing fate among protocols, it  
>> should not be used
>> if this assumption is not true. But using ISIS to carry IP  
>> information itself is violating this
>> principle;-) don't know who to blame, although it's not a BFD issue.
>
> I guess my point is that you need to be very explicit about what  
> works and what doesn't.  It's certainly the case that, without the  
> presence of BFD, ISIS will do Bad Things if IP forwarding breaks,  
> since it will continue to advertise IP connectivity.  But the whole  
> point of BFD is to sense data protocol connectivity and provide  
> that (presumably useful) information to other parts of a system.   
> If BFD is providing information directly to ISIS, it can withdraw  
> IP connectivity (or tear down the adjacency if need be) and keep it  
> that way until connectivity is restored.  If ISIS relied on this  
> scheme, and IP connectivity failed (but datalink connectivity  
> remained), the ISIS adjacency would flap, and then ISIS would  
> proceed to black hole traffic.

Even in the normal BFD interacting with ISIS(for ipv4), I would think  
it can also do this
black holing. Since ipv4 bfd session is flapping, and datalink layer  
is fine, and ISIS
packets is going through ok, hellos are happy. ISIS will bring up the  
adjacency, and
then register with bfd, and bfd later failed again, which will bring  
down the ISIS session.
I fail to see the difference between the two schemes.

>
> I think you need to specifically disallow this mechanism for cases  
> like this, namely, applications that will continue to run even with  
> the failure of a data protocol, but whose correct operation  
> requires that data protocol.  (Note that this sentence describes  
> both ISIS and static routes.)
>
> OSPFv3 is another interesting example if you're running BFD in this  
> configuration only over IPv4.  If there is a v4 failure, OSPFv3  
> will flap unnecessarily.  This gets back to the IPv4 == everything  
> fate sharing that is at the heart of the way you've specified it,  
> and which I think is an unnecessary restriction.  A number of  
> systems (including the one I'm most familiar with these days) has  
> an interface hierarchy that includes the data protocol.  Such  
> systems are likely better served by having, say, separate v4 and v6  
> BFD sessions and flapping the appropriate data protocol up/down  
> status in the face of a BFD session failure.  This would allow the  
> OSPFv3 case to run unmolested when v4 died.  I would suggest to at  
> least offer this up as a MAY when you discuss the fate sharing  
> implications of this mechanism, since it should be essentially no  
> more work to implement if the system is already built this way.

Sure. There can be an configuration option for bring down the whole  
thing or bring
down the data protocol part if the platform supports that.

Even though from architecture wise, the separation of bfds is clean,  
ipv4 controls the
ipv4 protocols and ipv6 controls the ipv6 protocols. there are still  
much to be desired
from implementation point of angle. On many routers BFD packets going  
out not really
through the exact data packets forwarding path or the packets are  
sent out from the
same software process, be it v6 or v6. So the argument of data  
separation is rather
mood. And I'm yet to see a case BFD session down is actually caused  
by the layer 3
lookup engine which is only responsible for ipv4;-) I would rather do  
the re-route
altogether though if we know one of the data plane is already in  
trouble.

>
>>>
>>> In light of this, my preference would be for all of the verbiage  
>>> about static routes and dynamic protocols and special fiags to be  
>>> removed.  In place of this, add text that is very specific about  
>>> the fate-sharing implications of this mechanism as outlined  
>>> above, and point out that any application of BFD that does not  
>>> automatically share fate with the data protocol over which BFD is  
>>> running (such as ISIS or static routes) MUST have some form of  
>>> explicit interaction with BFD in order to avoid false positives,  
>>> and leave it at that.  The "special bit" hack is orthogonal to  
>>> this mechanism;  it could just as well have been specified in the  
>>> generic spec (and would have been just as inappropriate there.)
>>
>> I think the dynamic and static difference still needs to be  
>> mentioned, although should not
>> be directly linked with a 'special flag' for the static routing.
>
> But I think the "difference" here is fundamental--as soon as you  
> have any special case communication between BFD and a part of the  
> system, you've basically discarded the point of the draft (if I  
> understand it) which is to be able to leverage BFD without changing  
> your "clients" to specifically use it.  What you're specifying here  
> is functionally *exactly* the same as what the generic spec talks  
> about for static routes and other non-protocol applications, and  
> only muddies the spec, IMHO.

only the UP->Down portion is different between the two schemes. the  
rest is the same.
but the bring down itself is different from the point of dynamic or  
static.

>
> Why not just say that this mechanism only provides a way of more  
> rapidly taking down applications that would otherwise go down  
> eventually and which will stay down on their own until the path is  
> healed (namely, control protocols), and leave statics out of it  
> altogether?

Maybe you have a point. I'll think about that.

thanks.
- Naiming

>
>
> --Dave