RE: TWAMP analysis for assisting BFD debuggin (was Re: BFD stability follow-up from IETF-91)

Gregory Mirsky <gregory.mirsky@ericsson.com> Fri, 19 December 2014 22:52 UTC

Return-Path: <gregory.mirsky@ericsson.com>
X-Original-To: rtg-bfd@ietfa.amsl.com
Delivered-To: rtg-bfd@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 11B501A9088 for <rtg-bfd@ietfa.amsl.com>; Fri, 19 Dec 2014 14:52:15 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -104.201
X-Spam-Level:
X-Spam-Status: No, score=-104.201 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eeXUua7ZONlk for <rtg-bfd@ietfa.amsl.com>; Fri, 19 Dec 2014 14:52:13 -0800 (PST)
Received: from usevmg21.ericsson.net (usevmg21.ericsson.net [198.24.6.65]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 621061A9084 for <rtg-bfd@ietf.org>; Fri, 19 Dec 2014 14:52:13 -0800 (PST)
X-AuditID: c6180641-f79916d00000623a-b4-54944f11f9ad
Received: from EUSAAHC001.ericsson.se (Unknown_Domain [147.117.188.75]) by usevmg21.ericsson.net (Symantec Mail Security) with SMTP id 66.BB.25146.11F44945; Fri, 19 Dec 2014 17:15:13 +0100 (CET)
Received: from EUSAAMB103.ericsson.se ([147.117.188.120]) by EUSAAHC001.ericsson.se ([147.117.188.75]) with mapi id 14.03.0195.001; Fri, 19 Dec 2014 17:52:11 -0500
From: Gregory Mirsky <gregory.mirsky@ericsson.com>
To: Jeffrey Haas <jhaas@pfrc.org>
Subject: RE: TWAMP analysis for assisting BFD debuggin (was Re: BFD stability follow-up from IETF-91)
Thread-Topic: TWAMP analysis for assisting BFD debuggin (was Re: BFD stability follow-up from IETF-91)
Thread-Index: AQHQG88XebvdVz4lWEyU7mFsO07GX5yXbPxA
Date: Fri, 19 Dec 2014 22:52:10 +0000
Message-ID: <7347100B5761DC41A166AC17F22DF1121B8C5EA0@eusaamb103.ericsson.se>
References: <00a001d00d64$7735ce50$65a16af0$@chinamobile.com> <7347100B5761DC41A166AC17F22DF1121B8A87E6@eusaamb103.ericsson.se> <730769BB-D021-4E22-878A-2C289822A156@gmail.com> <7347100B5761DC41A166AC17F22DF1121B8AA754@eusaamb103.ericsson.se> <09CD6B2F-4DCC-429F-848B-223C72A0F171@gmail.com> <7347100B5761DC41A166AC17F22DF1121B8AAA24@eusaamb103.ericsson.se> <CO2PR0501MB8231A4913DEB31323847CA8B3780@CO2PR0501MB823.namprd05.prod.outlook.com> <7347100B5761DC41A166AC17F22DF1121B8AAC0D@eusaamb103.ericsson.se> <CAG1kdoiquWYaAz5ti14VrmiqXmph-SpjgYs=m8AuQGdKGo2xXQ@mail.gmail.com> <7347100B5761DC41A166AC17F22DF1121B8AACDB@eusaamb103.ericsson.se> <20141219210222.GJ16279@pfrc>
In-Reply-To: <20141219210222.GJ16279@pfrc>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [147.117.188.9]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrFLMWRmVeSWpSXmKPExsUyuXSPt66g/5QQgxMrdC32H3zLanF5Uhu7 xec/2xgdmD12zrrL7rFkyU8mj8u9W1kDmKO4bFJSczLLUov07RK4Mo7feMpS8F2u4sXqJqYG xtcSXYycHBICJhKnN/9khrDFJC7cW8/WxcjFISRwhFGit3MflLOcUeLBnCvsIFVsAkYSLzb2 gNkiAooS8/93soHYzAKBEqs//mQFsYUF0iVmvDzBBlGTIbGn4zxQnAPINpJ4vkENJMwioCqx bOsEsDG8Ar4Sn/7/ADtCSGAzq8TEu+kgNqeAlsS06TPBahiBjvt+ag0TxCpxiVtP5jNBHC0g sWTPeagHRCVePv7HCmErSuzrn84OUa8jsWD3J6gztSWWLXzNDLFXUOLkzCcsExjFZiEZOwtJ yywkLbOQtCxgZFnFyFFanFqWm25kuIkRGDnHJNgcdzAu+GR5iFGAg1GJh9dAZXKIEGtiWXFl 7iFGaQ4WJXFezep5wUIC6YklqdmpqQWpRfFFpTmpxYcYmTg4pRoYDQ23B/3bVufaxvv0397f F5akm+fuyJsmWKeS+DPRKap06qcqhYI5TUILdGdl8itv0Rfb07D1l8A9NaPPGlvKfuaef7mv 1u/MRqOmfDaF60aRwgnV6csSDlk5NwaZqLS3JDFWhH85eFC+6plMgczFjPUVH+peGdvE/57w U2nxooPR1+xv2wUosRRnJBpqMRcVJwIAdK5AfH0CAAA=
Archived-At: http://mailarchive.ietf.org/arch/msg/rtg-bfd/2HJ07NKgE64sKV984sg_Nmf57t0
Cc: "rtg-bfd@ietf.org" <rtg-bfd@ietf.org>
X-BeenThere: rtg-bfd@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "RTG Area: Bidirectional Forwarding Detection DT" <rtg-bfd.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rtg-bfd/>
List-Post: <mailto:rtg-bfd@ietf.org>
List-Help: <mailto:rtg-bfd-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Dec 2014 22:52:15 -0000

Hi Jeff,
thank you for your interest in this discussion. If you agree, we can add IPPM WG to it.
I agree with your analysis of TWAMP and I would not suggest that TWAMP-Test is well suited to debug BFD specific issues. But TWAMP-Test, as other active performance measurement mechanisms, i.e. Y.1731 and MPLS PM based on RFC 6374, may be used to measure latency and jitter of a network or its segment. And even more than active, passive measurement methods may provide helpful information to troubleshoot network or BFD. That may be done using IPFIX on certain nodes in the network with the data analysis by a data collector. Would like to note that IPPM WG is in discussion of IP flow measurement method that uses marking method that I consider operationally more useful comparing to straight IPFIX.
I agree that debugging and troubleshooting is often an art and operators and vendors  need to have and use all tools that are available. We probably can do better job of instrumenting BFD debug tools but, I believe, not by changing, not by overloading its main functions of monitoring continuity and fast detection of the Loss of Continuity defect. Whether Down state is indication of the real defect or false negative, that is the question to be answered through analysis of available information and follow-up with debugging and troubleshooting the BFD itself rather than the network if there are concerns that it was not a real defect.

	Regards,
		Greg

-----Original Message-----
From: Jeffrey Haas [mailto:jhaas@pfrc.org] 
Sent: Friday, December 19, 2014 1:02 PM
To: Gregory Mirsky
Cc: Manav Bhatia; rtg-bfd@ietf.org
Subject: TWAMP analysis for assisting BFD debuggin (was Re: BFD stability follow-up from IETF-91)

[I'm running behind in discussions as usual]

On Thu, Dec 04, 2014 at 04:33:17PM +0000, Gregory Mirsky wrote:
> Hi Manav,
> I hope you don???t expect me to give a lecture on how to design and implement debugable implementation using logging and packet tracing.

For what it's worth, I always find such "drive-by" comments about "well, there already exists something that does <foo>, it's here - go do your homework!" to be a bit frustrating myself. :-)  

Not being personally familiar with TWAMP, I spend a few minutes digging through the spec for TWAMP and OWAMP for some details about the control and test traffic.  It has some properties that are problematic for the pieces of the ecosystem covered by various portions of BFD deployment:

TCP control channel.  The implication is not only bidirectional reachability, but a control IP stack that may not involve the nodes being tested.  (Nodes in this case potentially being anything from host systems all the way down to line card components.)

The data channel is UDP, which is good.  The problem though is there's no guarantee it'll cover the layers of the hardware stack covered by the BFD sessions.  As one example, BFD on LAG, we not only may not have a full IP stack running at the point it's coming up, there's a chance that the receiver isn't really much of an IP speaker.  It may be effectively using the IP+UDP framing as dumb framing rather than protocol in order to bootstrap the session.  IP may not come up until constituent LAG elements have been put into service.

There's also the matter that to get TWAMP running at the appropriate layer, it would be similarly necessary to embed it in the same general paths that BFD covers.

Thus, at first very coarse analysis, TWAMP neither seems to share enough of the forwarding fate on a guaranteed basis as we'd want to supplement BFD debugging.  The control channel also seems a bit heavy-weight, but it's potentially no worse than architecture of some LACP implementations I'm aware of.

Is there something missing or incorrect in the above thinking?

-- Jeff