Re: [GROW] [Lsr] FW: New Version Notification for draft-gu-network-mornitoring-protol-00.txt

"Tim Evens (tievens)" <tievens@cisco.com> Wed, 11 July 2018 00:41 UTC

Return-Path: <tievens@cisco.com>
X-Original-To: rtgwg@ietfa.amsl.com
Delivered-To: rtgwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7C39612F1A2; Tue, 10 Jul 2018 17:41:04 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -14.51
X-Spam-Level:
X-Spam-Status: No, score=-14.51 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001, T_DKIMWL_WL_HIGH=-0.01, USER_IN_DEF_DKIM_WL=-7.5] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HkoiwKiLp4J8; Tue, 10 Jul 2018 17:41:00 -0700 (PDT)
Received: from alln-iport-4.cisco.com (alln-iport-4.cisco.com [173.37.142.91]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 98D1B127332; Tue, 10 Jul 2018 17:41:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=30860; q=dns/txt; s=iport; t=1531269660; x=1532479260; h=from:to:cc:subject:date:message-id:references: in-reply-to:mime-version; bh=dTenQ/hRRfg8dU/xNuEDrZrn0FceNKQnj3UkdT0I5L8=; b=DiUMBFW3i2tmr6qlg0lQ03wjc84Guk0JBGCymcDE70u35gz8CJkcDFHS IKbdw7iPFLVsmbzyJ/Fe3cCQxKSDmdnI3hIqH3N8O71If673Z3tPiyXhK 4lOzSKejElBs+AwG16XZYe/bLHtkRSO/W9QaFGDr4xRs98MnnZW8jIRoP 0=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: =?us-ascii?q?A0D9AQAUUUVb/5NdJa1cGQEBAQEBAQE?= =?us-ascii?q?BAQEBAQcBAQEBAYJTSC5jfygKg3CUPIFnl08LI4RJAheCEyE3FQECAQECAQE?= =?us-ascii?q?CbRwMhTcDAyNJCgECEAIBCBItAwICAjAUAw4CBAENBRuDBQGBG2QPqleBLoh?= =?us-ascii?q?PgTMFiHmBVz+BECcMgl6DGQEBAQEBAYE2EIMXMYIkAplSCQKGB4kdjWCKOIc?= =?us-ascii?q?zAhEUgSQzIoFScBVlAYI+giUXEYM0hRSFPQFvAYs8gS2BGgEB?=
X-IronPort-AV: E=Sophos;i="5.51,336,1526342400"; d="scan'208,217";a="141301706"
Received: from rcdn-core-11.cisco.com ([173.37.93.147]) by alln-iport-4.cisco.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Jul 2018 00:40:59 +0000
Received: from XCH-ALN-016.cisco.com (xch-aln-016.cisco.com [173.36.7.26]) by rcdn-core-11.cisco.com (8.15.2/8.15.2) with ESMTPS id w6B0exes008740 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=FAIL); Wed, 11 Jul 2018 00:40:59 GMT
Received: from xch-rcd-016.cisco.com (173.37.102.26) by XCH-ALN-016.cisco.com (173.36.7.26) with Microsoft SMTP Server (TLS) id 15.0.1320.4; Tue, 10 Jul 2018 19:40:58 -0500
Received: from xch-rcd-016.cisco.com ([173.37.102.26]) by XCH-RCD-016.cisco.com ([173.37.102.26]) with mapi id 15.00.1320.000; Tue, 10 Jul 2018 19:40:58 -0500
From: "Tim Evens (tievens)" <tievens@cisco.com>
To: Greg Skinner <gregskinner0=40icloud.com@dmarc.ietf.org>, Randy Bush <randy@psg.com>, "lizhenbin@huawei.com" <lizhenbin@huawei.com>
CC: "lsr@ietf.org" <lsr@ietf.org>, GMO Crops <grow@ietf.org>, "opsawg@ietf.org" <opsawg@ietf.org>, "rtgwg@ietf.org" <rtgwg@ietf.org>
Subject: Re: [GROW] [Lsr] FW: New Version Notification for draft-gu-network-mornitoring-protol-00.txt
Thread-Topic: [GROW] [Lsr] FW: New Version Notification for draft-gu-network-mornitoring-protol-00.txt
Thread-Index: AQHUExMHVLjWoqpkGkGBWxrfSNrSi6SAySSAgAC8j4CAAKF8AIAC3RSAgAGboICAAnf4gA==
Date: Wed, 11 Jul 2018 00:40:58 +0000
Message-ID: <646634D0-AB75-4715-AB1C-8BB378F2ADF6@cisco.com>
References: <624FB76E-1588-4D6E-8DD6-A666C77A9201@gmail.com> <5A5B4DE12C0DAC44AF501CD9A2B01A8D8F43FE44@dggemm512-mbx.china.huawei.com> <B8E2C2E6-BE62-4624-A2AD-E54647ED8EF1@cisco.com> <5A5B4DE12C0DAC44AF501CD9A2B01A8D8F4403E7@dggemm512-mbx.china.huawei.com> <m236wu8l40.wl-randy@psg.com> <14AE5E1F-CE52-4061-A1B5-F34FCA8E461F@icloud.com>
In-Reply-To: <14AE5E1F-CE52-4061-A1B5-F34FCA8E461F@icloud.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-ms-exchange-messagesentrepresentingtype: 1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [10.24.56.244]
Content-Type: multipart/alternative; boundary="_000_646634D0AB754715AB1C8BB378F2ADF6ciscocom_"
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtgwg/rtuvV_MVlQ4JNB8XZd_UAvAON2s>
X-BeenThere: rtgwg@ietf.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: Routing Area Working Group <rtgwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtgwg/>
List-Post: <mailto:rtgwg@ietf.org>
List-Help: <mailto:rtgwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 11 Jul 2018 00:41:05 -0000

Hi Robin, Yunan, Shunwan,

I'm a little late to this thread due to being preoccupied with a newborn.

Below are my comments, which take into consideration the other comments… sans the YANG/telemetry debate.  Considering we do use BGP-LS extensively, I don't think YANG is the only solution to these link-state monitoring use-cases.

BMP doesn't change or limit what's available in BGP. It encapsulates and multiplexes BGP over a single stream for remote monitoring.   BGP-LS (RFC7752) can be used today to monitor link state protocols (ISIS, OSPF).  BGP-LS also can be used to monitor EPE and direct/static routes, which is a bit of a stretch on putting that in BGP-LS, but nonetheless…  BGP-LS is available via BMP.

"Section 3.1, ISIS Adjacency Issues"

As written, this is covered by BGP-LS LINK NLRI.  We see a LINK change (advertised verses withdrawn) when the adjacency changes.  If the router dies or the control-plane fails in some way, we still see the NLRI change from the other side of the adjacency (perspective).

What we are missing is a BGP-LS "attribute" tlv (on local entries) for links/nodes that indicates the REASON why the LINK (also NODE) is withdrawn, but.... this is not available without changing the IGP protocol itself (e.g. new ISIS TLV) or by implementing a solution architecture that requires BGP-LS feeds from all ISIS/OSPF nodes.  As written, I see NMP (including Netconf/gNMI/telemetry) requiring sessions from all nodes since the targeted data is not available in ISIS TLV's today.   For example, the ISIS LSDB on node-A does not have any local (device specific) information from all the other nodes unless there are TLV's to convey that information.

Regarding requiring BGP-LS feeds from many or all nodes...  We need this regardless of this draft because of segment-routing/egress peer engineering.   Due to EPE, we already recommend BGP-LS peering (feeds) from all EPE nodes (normally via a peering server) so that we can collect/monitor EPE (hopefully using https://tools.ietf.org/html/draft-ietf-grow-bmp-local-rib-01). Adding LINK/NODE withdrawal/down reason should not overstep into YANG/Netconf/Telemetry.

"3.2.  Forwarding Path Disconnection"

This seems to be more of a fit for telemetry with interface/link monitoring.  Although, if the link was working at some point and it goes down due to MTU or otherwise, the BGP-LS REASON attribute should be able to convey that.  BGP-LS wouldn't convey anything if the link was never established.  Currently, it's assumed that the link advertisement means it's established.  This could be changed if we added a LINK NLRI state TLV.   The LINK could be updated (advertised) multiple times, changing based on state.   If the LINK doesn't establish, the withdrawal could indicate the reason.

"3.3.  ISIS LSP Synchronization Failure"

If a new BGP-LS LINK attribute is used as mentioned above to convey LINK adv state, it should then be feasible to add a state to indicate inconsistency. If the link/adj changes to down, then the withdrawal LINK reason attribute could indicate the cause.

The BGP-LS reason and state tlv's would only apply to the links/nodes that originate from the BGP-LS speaker.  Other node/link advertisements would not have the attribute/tlv.   This is why the solution would recommend enabling BGP-LS feeds from nodes that matter enough to get this level of local info.  This btw would solve a problem we have with BGP-LS today where optional TLV's are not present unless ISIS/OSPF have specific features enabled, such as traffic-engineering... even IPv4/IPv6 router ID's are not included unless enabled specifically (isis) per node.

"4.  Extensions of NMP for ISIS"

Most of the new messages are redundant to the existing BGP-LS advertisements and withdrawals.  Telemetry of course could also convey this…

The initiation message could lead to overloading it with all kinds of device specific info.   Some constraint is needed.

The per peer (adjacency) header is missing multi-topology.  BGP-LS includes the protocol type (e.g. CT) and MT (missing from this draft).

All in all, I believe the use-cases described have merit and I think we can do this with BGP-LS, which doesn't require BMP but could be used.

Thanks,
Tim


On 7/8/18, 8:59 PM, "GROW on behalf of Greg Skinner" <grow-bounces@ietf.org<mailto:grow-bounces@ietf.org> on behalf of gregskinner0=40icloud.com@dmarc.ietf.org<mailto:gregskinner0=40icloud.com@dmarc.ietf.org>> wrote:

Randy,

Is the OPS-NM Configuration Management Requirements (ops-nm) Bof<https://www.ietf.org/proceedings/52/176.htm> from IETF 52 (10 December 2001) the meeting you were thinking of?  There are also references to an IAB meeting in 2002 about the lack of use of SNMP for network configuration in SNMP compared with CLI, Netconf, Netflow<https://www.snmpcenter.com/snmp-versus-other-protocols/> that culminated in RFC 3535<https://tools.ietf.org/html/rfc3535>35>.

Robin,

Regarding the draft in question, I generally agree with the concerns others have made that it doesn’t appear to provide anything that other technologies such as YANG provide.  Also, IMO, the draft needs considerable work to be more easily understood.  For example, there are many acronyms such as CSNP and PSNP that are not defined, and may be misunderstood by readers not familiar with ISIS.  In the packet format sections, there are several uncapitalized uses of ‘should’.  Do the authors consider these to be non-normative requirements?  There are also statements such as "Network OAM statistics show that a relatively large part of the network issues are caused by the disfunction of various routing protocols and MPLS signalings” that are offered without citations.

Regards,
Greg

On Jul 7, 2018, at 8:25 PM, Randy Bush <randy@psg.com<mailto:randy@psg.com>> wrote:

robin,

i am not ignoring you.  i did not want to write unless i had something
possibly useful to say; and that requires pretending to think, always
difficult.


I would also like to propose following draft for your reference which
trigger us to move forward for better network maintenance with
multiple tools in which gRPC/NETCOF and NMP/BMP may play different
roles: https://datatracker.ietf.org/doc/draft-song-ntf/

[ warning: my memory is likely fuzzy, and the glass is dark ]

at an ietf in the late '90s[0], there was a hastily called meeting of
the snmp standards bearers and a bunch of operators.  the snmp folk were
shocked to learn that no operators used snmp for other than monitoring;
no one had snmp write enabled on devices; ops configured with the
cli[1].  from this was born netconf and the xml path.  credit where due:
phil eng was already well down this path at the time of that meeting.

but netconf/xml was a mechanism and lacked a model.  snmp had models,
whether we thought they were pretty or not.  thus yang was born, and ,
of course, a new generation wants to use the latest modern toys such as
restconf, openconfig, json, ...

draft-song-ntf yearns for an "architectural framework for network
telemetry," a lofty and worthwhile goal not, a priori, a bad one.  but a
few comments from a jaded old dog.

for a new paradigm to gain traction, it must be *significantly* better
than the old one, or the old paradigm must be clearly failing.  in the
story above, snmp was clearly failing, aside from using an unfashionable
encoding.  and yang clearly provided something needed and missing from
netconf.  note that this paradigm shift has taken over 20 years; and we
dis the itu et alia.

second, draft-song-ntf is an export-only model.  while telemetry is
extremely important, i will be very frustrated if i can only hear and
may not speak.  and the more it evolves to a really attractive paradigm
and model, the more annoyed i will be that i can not use it for control.

and lastly, to quote don knuth, "premature optimization is the root of
all evil."  do not get distracted by squeezing bits out of an encoding.
focus on things such as simple, clear, securable, extensible ...

randy

---

0 - i would love help pinning down which meeting

1 - i still have the "it's the cli, stupid" tee shirt.  an american
   political slogan of the era was "it's the economy, stupid."