Re: [Lsr] [GROW] FW: New Version Notification for draft-gu-network-mornitoring-protol-00.txt

"Guyunan (Yunan Gu, IP Technology Research Dept. NW)" <guyunan@huawei.com> Thu, 12 July 2018 06:36 UTC

Return-Path: <guyunan@huawei.com>
X-Original-To: lsr@ietfa.amsl.com
Delivered-To: lsr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 491AB130EBC; Wed, 11 Jul 2018 23:36:36 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5oPgg1zJT0Eo; Wed, 11 Jul 2018 23:36:32 -0700 (PDT)
Received: from huawei.com (lhrrgout.huawei.com [185.176.76.210]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3AA15131060; Wed, 11 Jul 2018 23:36:32 -0700 (PDT)
Received: from lhreml702-cah.china.huawei.com (unknown [172.18.7.106]) by Forcepoint Email with ESMTP id 5A190BFACFEC4; Thu, 12 Jul 2018 07:36:28 +0100 (IST)
Received: from DGGEMI402-HUB.china.huawei.com (10.3.17.135) by lhreml702-cah.china.huawei.com (10.201.108.43) with Microsoft SMTP Server (TLS) id 14.3.382.0; Thu, 12 Jul 2018 07:36:28 +0100
Received: from DGGEMI524-MBS.china.huawei.com ([169.254.8.70]) by dggemi402-hub.china.huawei.com ([10.3.17.135]) with mapi id 14.03.0382.000; Thu, 12 Jul 2018 14:36:24 +0800
From: "Guyunan (Yunan Gu, IP Technology Research Dept. NW)" <guyunan@huawei.com>
To: Robert Raszuk <robert@raszuk.net>, "Tim Evens (tievens)" <tievens=40cisco.com@dmarc.ietf.org>
CC: Greg Skinner <gregskinner0=40icloud.com@dmarc.ietf.org>, GMO Crops <grow@ietf.org>, "lsr@ietf.org" <lsr@ietf.org>, "opsawg@ietf.org" <opsawg@ietf.org>, "rtgwg@ietf.org" <rtgwg@ietf.org>
Thread-Topic: [GROW] [Lsr] FW: New Version Notification for draft-gu-network-mornitoring-protol-00.txt
Thread-Index: AQHUFzkzPZnVEqxzXE6jr7Maa1QaIaSIqxwAgACSYACAAehmwA==
Date: Thu, 12 Jul 2018 06:36:23 +0000
Message-ID: <C01B0098369B2D4391851938DA6700B7CA5CFD@DGGEMI524-MBS.china.huawei.com>
References: <624FB76E-1588-4D6E-8DD6-A666C77A9201@gmail.com> <5A5B4DE12C0DAC44AF501CD9A2B01A8D8F43FE44@dggemm512-mbx.china.huawei.com> <B8E2C2E6-BE62-4624-A2AD-E54647ED8EF1@cisco.com> <5A5B4DE12C0DAC44AF501CD9A2B01A8D8F4403E7@dggemm512-mbx.china.huawei.com> <m236wu8l40.wl-randy@psg.com> <14AE5E1F-CE52-4061-A1B5-F34FCA8E461F@icloud.com> <646634D0-AB75-4715-AB1C-8BB378F2ADF6@cisco.com> <CA+b+ERkC3F3B676Wmmv6QGE7_cR-xyOzLCbZEK3M3EVm8vwcSQ@mail.gmail.com>
In-Reply-To: <CA+b+ERkC3F3B676Wmmv6QGE7_cR-xyOzLCbZEK3M3EVm8vwcSQ@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.130.184.132]
Content-Type: multipart/alternative; boundary="_000_C01B0098369B2D4391851938DA6700B7CA5CFDDGGEMI524MBSchina_"
MIME-Version: 1.0
X-CFilter-Loop: Reflected
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/TaoqS7m0rq-_kDEqvsJgD_jEcfM>
Subject: Re: [Lsr] [GROW] FW: New Version Notification for draft-gu-network-mornitoring-protol-00.txt
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 12 Jul 2018 06:36:37 -0000

Hi Tim, Robert,

Thanks a lot for your comments. Regarding the idea of using BGP-LS for troubleshooting, we have also considered the possible pros and cons.

BGP-LS is initially proposed for carrying link state information using BGP, and is currently used for applications like topology visualization. However, I would not consider it as a strict “monitoring” protocol. NMP is intended for network troubleshooting, which monitors the protocol running status, exporting information more than just link state. For example, as also pointed out by Tim, in the NMP adjacency up/down event notification, possible reasons are included. Such non-link-state information is not defined in BGP-LS. It might be technically worktable for BGP-LS to carry the reason data by adding some “attribute” TLV, or to carry any other non-link-state data used for troubleshooting (e.g., PDU statistics), but it kind of deviates from the original intention of proposing BGP-LS.

In addition, whatever data conveyed by BGP-LS NLRI needs to be supported/encapsulated in the ISIS/OSPF PDUs. This bring scalability issue and extra IGP extension work whenever new information for new troubleshooting use cases is required. Besides, the extra data is to be flooded with ISIS/OSPF throughout the area/AS, which consumes extra bandwidth and is unwanted.

Another concern for BGP-LS is the lack of per-device feed. As we have stated in another email: The availability of real-time protocol PDUs collected from all monitored routers is necessary for troubleshooting analysis. As Tim pointed out that:
“BGP-LS also can be used to monitor EPE and direct/static routes, which is a bit of a stretch on putting that in BGP-LS, but nonetheless…”
 “Regarding requiring BGP-LS feeds from many or all nodes...  We need this regardless of this draft because of segment-routing/egress peer engineering.   Due to EPE, we already recommend BGP-LS peering (feeds) from all EPE nodes (normally via a peering server) so that we can collect/monitor EPE (hopefully using https://tools.ietf.org/html/draft-ietf-grow-bmp-local-rib-01).”
Although BGP-LS might be extended for multiple feeds in the future for specific purposes, to me, it again deviates from its original intention. And if we insist on extending BGP-LS for the purpose of troubleshooting, it just becomes NMP.

Regarding the comparison with other telemetry approaches, specifically gRPC/YANG, we have stated our points in the other email. To avoid repeating in this thread, please kindly refer to our previous email.

We agree with Tim that “The initiation message could lead to overloading it with all kinds of device specific info. Some constraint is needed. The per peer (adjacency) header is missing multi-topology.  BGP-LS includes the protocol type (e.g. CT) and MT (missing from this draft).”
In fact, not only during the initiation phase of the NMP session, but also during some network failure, e.g., route flapping, massive PDUs and other data are reported to the server. We think enabling different working modes (e.g., PDU compress mode, normal mode) of NMP at the device side could be a workable solution. We can refine this idea in the next version, as well as adding MT into the per-adjacency header.


BR,

Yunan

From: GROW [mailto:grow-bounces@ietf.org] On Behalf Of Robert Raszuk
Sent: 2018年7月11日 17:25
To: Tim Evens (tievens) <tievens=40cisco.com@dmarc.ietf.org>
Cc: Greg Skinner <gregskinner0=40icloud.com@dmarc.ietf.org>; GMO Crops <grow@ietf.org>; lsr@ietf.org; opsawg@ietf.org; rtgwg@ietf.org
Subject: Re: [GROW] [Lsr] FW: New Version Notification for draft-gu-network-mornitoring-protol-00.txt

Tim,

I already suggested use of BGP-LS *as-is* in this thread on Jul 6th.

But I guess we all agree that this is not the best use of BGP protocol to be now a vehicle of NMS only because it is easy with BGP to establish a TCP session and to distribute "stuff" in a relatively loop free fashion.

Thx,
R.

On Wed, Jul 11, 2018 at 2:40 AM, Tim Evens (tievens) <tievens=40cisco.com@dmarc.ietf.org<mailto:tievens=40cisco.com@dmarc.ietf.org>> wrote:
Hi Robin, Yunan, Shunwan,

I'm a little late to this thread due to being preoccupied with a newborn.

Below are my comments, which take into consideration the other comments… sans the YANG/telemetry debate.  Considering we do use BGP-LS extensively, I don't think YANG is the only solution to these link-state monitoring use-cases.

BMP doesn't change or limit what's available in BGP. It encapsulates and multiplexes BGP over a single stream for remote monitoring.   BGP-LS (RFC7752) can be used today to monitor link state protocols (ISIS, OSPF).  BGP-LS also can be used to monitor EPE and direct/static routes, which is a bit of a stretch on putting that in BGP-LS, but nonetheless…  BGP-LS is available via BMP.

"Section 3.1, ISIS Adjacency Issues"

As written, this is covered by BGP-LS LINK NLRI.  We see a LINK change (advertised verses withdrawn) when the adjacency changes.  If the router dies or the control-plane fails in some way, we still see the NLRI change from the other side of the adjacency (perspective).

What we are missing is a BGP-LS "attribute" tlv (on local entries) for links/nodes that indicates the REASON why the LINK (also NODE) is withdrawn, but.... this is not available without changing the IGP protocol itself (e.g. new ISIS TLV) or by implementing a solution architecture that requires BGP-LS feeds from all ISIS/OSPF nodes.  As written, I see NMP (including Netconf/gNMI/telemetry) requiring sessions from all nodes since the targeted data is not available in ISIS TLV's today.   For example, the ISIS LSDB on node-A does not have any local (device specific) information from all the other nodes unless there are TLV's to convey that information.

Regarding requiring BGP-LS feeds from many or all nodes...  We need this regardless of this draft because of segment-routing/egress peer engineering.   Due to EPE, we already recommend BGP-LS peering (feeds) from all EPE nodes (normally via a peering server) so that we can collect/monitor EPE (hopefully using https://tools.ietf.org/html/draft-ietf-grow-bmp-local-rib-01). Adding LINK/NODE withdrawal/down reason should not overstep into YANG/Netconf/Telemetry.

"3.2.  Forwarding Path Disconnection"

This seems to be more of a fit for telemetry with interface/link monitoring.  Although, if the link was working at some point and it goes down due to MTU or otherwise, the BGP-LS REASON attribute should be able to convey that.  BGP-LS wouldn't convey anything if the link was never established.  Currently, it's assumed that the link advertisement means it's established.  This could be changed if we added a LINK NLRI state TLV.   The LINK could be updated (advertised) multiple times, changing based on state.   If the LINK doesn't establish, the withdrawal could indicate the reason.

"3.3.  ISIS LSP Synchronization Failure"

If a new BGP-LS LINK attribute is used as mentioned above to convey LINK adv state, it should then be feasible to add a state to indicate inconsistency. If the link/adj changes to down, then the withdrawal LINK reason attribute could indicate the cause.

The BGP-LS reason and state tlv's would only apply to the links/nodes that originate from the BGP-LS speaker.  Other node/link advertisements would not have the attribute/tlv.   This is why the solution would recommend enabling BGP-LS feeds from nodes that matter enough to get this level of local info.  This btw would solve a problem we have with BGP-LS today where optional TLV's are not present unless ISIS/OSPF have specific features enabled, such as traffic-engineering... even IPv4/IPv6 router ID's are not included unless enabled specifically (isis) per node.

"4.  Extensions of NMP for ISIS"

Most of the new messages are redundant to the existing BGP-LS advertisements and withdrawals.  Telemetry of course could also convey this…

The initiation message could lead to overloading it with all kinds of device specific info.   Some constraint is needed.

The per peer (adjacency) header is missing multi-topology.  BGP-LS includes the protocol type (e.g. CT) and MT (missing from this draft).

All in all, I believe the use-cases described have merit and I think we can do this with BGP-LS, which doesn't require BMP but could be used.

Thanks,
Tim


On 7/8/18, 8:59 PM, "GROW on behalf of Greg Skinner" <grow-bounces@ietf.org<mailto:grow-bounces@ietf.org> on behalf of gregskinner0=40icloud.com@dmarc.ietf.org<mailto:gregskinner0=40icloud.com@dmarc.ietf.org>> wrote:

Randy,

Is the OPS-NM Configuration Management Requirements (ops-nm) Bof<https://www.ietf.org/proceedings/52/176.htm> from IETF 52 (10 December 2001) the meeting you were thinking of?  There are also references to an IAB meeting in 2002 about the lack of use of SNMP for network configuration in SNMP compared with CLI, Netconf, Netflow<https://www.snmpcenter.com/snmp-versus-other-protocols/> that culminated in RFC 3535<https://tools.ietf.org/html/rfc3535>.

Robin,

Regarding the draft in question, I generally agree with the concerns others have made that it doesn’t appear to provide anything that other technologies such as YANG provide.  Also, IMO, the draft needs considerable work to be more easily understood.  For example, there are many acronyms such as CSNP and PSNP that are not defined, and may be misunderstood by readers not familiar with ISIS.  In the packet format sections, there are several uncapitalized uses of ‘should’.  Do the authors consider these to be non-normative requirements?  There are also statements such as "Network OAM statistics show that a relatively large part of the network issues are caused by the disfunction of various routing protocols and MPLS signalings” that are offered without citations.

Regards,
Greg

On Jul 7, 2018, at 8:25 PM, Randy Bush <randy@psg.com<mailto:randy@psg.com>> wrote:

robin,

i am not ignoring you.  i did not want to write unless i had something
possibly useful to say; and that requires pretending to think, always
difficult.

I would also like to propose following draft for your reference which
trigger us to move forward for better network maintenance with
multiple tools in which gRPC/NETCOF and NMP/BMP may play different
roles: https://datatracker.ietf.org/doc/draft-song-ntf/

[ warning: my memory is likely fuzzy, and the glass is dark ]

at an ietf in the late '90s[0], there was a hastily called meeting of
the snmp standards bearers and a bunch of operators.  the snmp folk were
shocked to learn that no operators used snmp for other than monitoring;
no one had snmp write enabled on devices; ops configured with the
cli[1].  from this was born netconf and the xml path.  credit where due:
phil eng was already well down this path at the time of that meeting.

but netconf/xml was a mechanism and lacked a model.  snmp had models,
whether we thought they were pretty or not.  thus yang was born, and ,
of course, a new generation wants to use the latest modern toys such as
restconf, openconfig, json, ...

draft-song-ntf yearns for an "architectural framework for network
telemetry," a lofty and worthwhile goal not, a priori, a bad one.  but a
few comments from a jaded old dog.

for a new paradigm to gain traction, it must be *significantly* better
than the old one, or the old paradigm must be clearly failing.  in the
story above, snmp was clearly failing, aside from using an unfashionable
encoding.  and yang clearly provided something needed and missing from
netconf.  note that this paradigm shift has taken over 20 years; and we
dis the itu et alia.

second, draft-song-ntf is an export-only model.  while telemetry is
extremely important, i will be very frustrated if i can only hear and
may not speak.  and the more it evolves to a really attractive paradigm
and model, the more annoyed i will be that i can not use it for control.

and lastly, to quote don knuth, "premature optimization is the root of
all evil."  do not get distracted by squeezing bits out of an encoding.
focus on things such as simple, clear, securable, extensible ...

randy

---

0 - i would love help pinning down which meeting

1 - i still have the "it's the cli, stupid" tee shirt.  an american
   political slogan of the era was "it's the economy, stupid."


_______________________________________________
GROW mailing list
GROW@ietf.org<mailto:GROW@ietf.org>
https://www.ietf.org/mailman/listinfo/grow