RE: [GROW] [Lsr] FW: New Version Notification for draft-gu-network-mornitoring-protol-00.txt

"Guyunan (Yunan Gu, IP Technology Research Dept. NW)" <guyunan@huawei.com> Thu, 12 July 2018 02:34 UTC

Return-Path: <guyunan@huawei.com>
X-Original-To: rtgwg@ietfa.amsl.com
Delivered-To: rtgwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B3CE2130EC2; Wed, 11 Jul 2018 19:34:43 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AH-0PvyAicfJ; Wed, 11 Jul 2018 19:34:40 -0700 (PDT)
Received: from huawei.com (lhrrgout.huawei.com [185.176.76.210]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 39A86127148; Wed, 11 Jul 2018 19:34:40 -0700 (PDT)
Received: from lhreml705-cah.china.huawei.com (unknown [172.18.7.106]) by Forcepoint Email with ESMTP id 73E4B29E17717; Thu, 12 Jul 2018 03:34:36 +0100 (IST)
Received: from DGGEMI424-HUB.china.huawei.com (10.1.199.153) by lhreml705-cah.china.huawei.com (10.201.108.46) with Microsoft SMTP Server (TLS) id 14.3.382.0; Thu, 12 Jul 2018 03:34:37 +0100
Received: from DGGEMI524-MBS.china.huawei.com ([169.254.8.70]) by DGGEMI424-HUB.china.huawei.com ([10.1.199.153]) with mapi id 14.03.0382.000; Thu, 12 Jul 2018 10:34:32 +0800
From: "Guyunan (Yunan Gu, IP Technology Research Dept. NW)" <guyunan@huawei.com>
To: Greg Skinner <gregskinner0=40icloud.com@dmarc.ietf.org>, Randy Bush <randy@psg.com>, "rwilton=40cisco.com@dmarc.ietf.org" <rwilton=40cisco.com@dmarc.ietf.org>, "acee@cisco.com" <acee@cisco.com>, "einarnn=40cisco.com@dmarc.ietf.org" <einarnn=40cisco.com@dmarc.ietf.org>, "jefftant.ietf@gmail.com" <jefftant.ietf@gmail.com>
CC: "lsr@ietf.org" <lsr@ietf.org>, GMO Crops <grow@ietf.org>, "opsawg@ietf.org" <opsawg@ietf.org>, "rtgwg@ietf.org" <rtgwg@ietf.org>
Subject: RE: [GROW] [Lsr] FW: New Version Notification for draft-gu-network-mornitoring-protol-00.txt
Thread-Topic: [GROW] [Lsr] FW: New Version Notification for draft-gu-network-mornitoring-protol-00.txt
Thread-Index: AQHUFzkzPZnVEqxzXE6jr7Maa1QaIaSK3fOg
Date: Thu, 12 Jul 2018 02:34:32 +0000
Message-ID: <C01B0098369B2D4391851938DA6700B7CA5C88@DGGEMI524-MBS.china.huawei.com>
References: <624FB76E-1588-4D6E-8DD6-A666C77A9201@gmail.com> <5A5B4DE12C0DAC44AF501CD9A2B01A8D8F43FE44@dggemm512-mbx.china.huawei.com> <B8E2C2E6-BE62-4624-A2AD-E54647ED8EF1@cisco.com> <5A5B4DE12C0DAC44AF501CD9A2B01A8D8F4403E7@dggemm512-mbx.china.huawei.com> <m236wu8l40.wl-randy@psg.com> <14AE5E1F-CE52-4061-A1B5-F34FCA8E461F@icloud.com>
In-Reply-To: <14AE5E1F-CE52-4061-A1B5-F34FCA8E461F@icloud.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.130.184.132]
Content-Type: multipart/alternative; boundary="_000_C01B0098369B2D4391851938DA6700B7CA5C88DGGEMI524MBSchina_"
MIME-Version: 1.0
X-CFilter-Loop: Reflected
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtgwg/SGJBFzjAdoiIE4Z-FBhalV3xwW4>
X-BeenThere: rtgwg@ietf.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: Routing Area Working Group <rtgwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtgwg/>
List-Post: <mailto:rtgwg@ietf.org>
List-Help: <mailto:rtgwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 12 Jul 2018 02:34:45 -0000

Hi Greg, Jeff, Acee, Robert Wilton, Einar,

and anyone who has concern with NMP vs. gRPC/YANG, thanks for your interest in our draft and your valuable comments. Regarding this question, we’d like to state the following two points.


1.      Control plane (CP) and management plane (MP) information retrieval/manipulation decoupling:

As stated in the draft, we propose NMP to monitor the running status of routing protocols, which can be considered as a CP telemetry approach (so is BMP). MP telemetry approaches, such as SNMP, NETCONF and so on, have longer histories and are better developed than CP telemetry. However, there exists differences between the CP and MP data, and it’s not necessarily appropriate to reuse the existing MP approaches for CP monitoring.

First of all, MP data reflects information from the device viewpoint, including the device status information (e.g., CPU, interface, memory…) and configuration actions (e.g., set timer…), while CP data reflects information from the protocol viewpoint, e.g., protocol PDU (e.g., ISIS LSP/Hello) and protocol statistics. Specifically, the availability of real-time protocol PDUs collected from all monitored routers is necessary for NMP troubleshooting analysis. Such per-router PDU monitoring of NMP provides much more information than one LSDB collected from a single router in an area/AS using BGP-LS. For example, in the case an ISIS adjacency fails to set up due to mismatched authentications, analyzing the Hello statistics alone is not sufficient. Comparing the Hello PDUs sent from both devices can provide insight into authentication differences. Another example that in a route flapping case, caused by corrupted LSP remaining lifetime, a locally sent LSP and an LSP received at the remote device can be compared/analyzed to localize the corruption.

Secondly, MP telemetry mainly focuses on things like VPN configuration, tunnel configuration and so on, while NMP are proposed to facilitate protocol troubleshooting. Apparently, troubleshooting is more time-sensitive compared with things like VPN configuration. In addition, CP information, e.g., protocol PDUs, is updating continuously with time and thus needing real-time report. MP data are less dynamic compared to CP.

Thirdly, a key principle of CP telemetry is to keep the monitoring protocol independent from the monitored protocol. Thus a unidirectional monitoring protocol, just like BMP, could avoid any possible interference to the monitored protocol.

Thus, we believe it’s necessary to decouple the CP monitoring from the MP telemetry.


2.      Why not gRPC/YANG for CPT?

We agree that it’s technically workable to use gRPC/YANG to model and convey CP data, like Hello/LSP, however we think it’s simply not the best choice for CP monitoring.

First all, as the major component of CP monitoring data, the protocol PDUs are already in binary format, and are transmission-ready with nice performance. The protocol statistics can be easily encoded as TLVs and added to transmission. Thus, modeling CP data into YANG first and then encoding it as XML/protobuf for transmission is just extra work.

Secondly, in case of route flapping caused by timer issue (e.g., 100 times faster), the updates of LSP, Hello, and LSP purge could be in massive quantity, the modeling of all these PDUs into YANG may slow down the data export, thus delaying the troubleshooting process.

Thirdly, it might be not clear yet, but it could be possible that the YANG modeling process may affect the PDU data integrity in case when a bit-by-bit comparison of two PDUs is needed.

We also agree that information, like system ID/MTU, is more fit to be reported using gRPC/YANG. All in all, NMP is not meant to replace any existing OAM approach, but to work side by side with it and even data plane telemetry (e.g., iOAM) for better network troubleshooting.


BR,

Yunan


From: GROW [mailto:grow-bounces@ietf.org] On Behalf Of Greg Skinner
Sent: 2018年7月9日 11:59
To: Randy Bush <randy@psg.com>om>; Lizhenbin <lizhenbin@huawei.com>
Cc: lsr@ietf.org; GMO Crops <grow@ietf.org>rg>; opsawg@ietf.org; rtgwg@ietf.org
Subject: Re: [GROW] [Lsr] FW: New Version Notification for draft-gu-network-mornitoring-protol-00.txt

Randy,

Is the OPS-NM Configuration Management Requirements (ops-nm) Bof<https://www.ietf.org/proceedings/52/176.htm> from IETF 52 (10 December 2001) the meeting you were thinking of?  There are also references to an IAB meeting in 2002 about the lack of use of SNMP for network configuration in SNMP compared with CLI, Netconf, Netflow<https://www.snmpcenter.com/snmp-versus-other-protocols/> that culminated in RFC 3535<https://tools.ietf.org/html/rfc3535>35>.

Robin,

Regarding the draft in question, I generally agree with the concerns others have made that it doesn’t appear to provide anything that other technologies such as YANG provide.  Also, IMO, the draft needs considerable work to be more easily understood.  For example, there are many acronyms such as CSNP and PSNP that are not defined, and may be misunderstood by readers not familiar with ISIS.  In the packet format sections, there are several uncapitalized uses of ‘should’.  Do the authors consider these to be non-normative requirements?  There are also statements such as "Network OAM statistics show that a relatively large part of the network issues are caused by the disfunction of various routing protocols and MPLS signalings” that are offered without citations.

Regards,
Greg

On Jul 7, 2018, at 8:25 PM, Randy Bush <randy@psg.com<mailto:randy@psg.com>> wrote:

robin,

i am not ignoring you.  i did not want to write unless i had something
possibly useful to say; and that requires pretending to think, always
difficult.


I would also like to propose following draft for your reference which
trigger us to move forward for better network maintenance with
multiple tools in which gRPC/NETCOF and NMP/BMP may play different
roles: https://datatracker.ietf.org/doc/draft-song-ntf/

[ warning: my memory is likely fuzzy, and the glass is dark ]

at an ietf in the late '90s[0], there was a hastily called meeting of
the snmp standards bearers and a bunch of operators.  the snmp folk were
shocked to learn that no operators used snmp for other than monitoring;
no one had snmp write enabled on devices; ops configured with the
cli[1].  from this was born netconf and the xml path.  credit where due:
phil eng was already well down this path at the time of that meeting.

but netconf/xml was a mechanism and lacked a model.  snmp had models,
whether we thought they were pretty or not.  thus yang was born, and ,
of course, a new generation wants to use the latest modern toys such as
restconf, openconfig, json, ...

draft-song-ntf yearns for an "architectural framework for network
telemetry," a lofty and worthwhile goal not, a priori, a bad one.  but a
few comments from a jaded old dog.

for a new paradigm to gain traction, it must be *significantly* better
than the old one, or the old paradigm must be clearly failing.  in the
story above, snmp was clearly failing, aside from using an unfashionable
encoding.  and yang clearly provided something needed and missing from
netconf.  note that this paradigm shift has taken over 20 years; and we
dis the itu et alia.

second, draft-song-ntf is an export-only model.  while telemetry is
extremely important, i will be very frustrated if i can only hear and
may not speak.  and the more it evolves to a really attractive paradigm
and model, the more annoyed i will be that i can not use it for control.

and lastly, to quote don knuth, "premature optimization is the root of
all evil."  do not get distracted by squeezing bits out of an encoding.
focus on things such as simple, clear, securable, extensible ...

randy

---

0 - i would love help pinning down which meeting

1 - i still have the "it's the cli, stupid" tee shirt.  an american
   political slogan of the era was "it's the economy, stupid."