Re: [Idr] I-D Action: draft-wang-idr-bgp-error-enhance-00.txt
Jeffrey Haas <jhaas@pfrc.org> Mon, 15 November 2021 19:43 UTC
Return-Path: <jhaas@slice.pfrc.org>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 50BA33A07CC; Mon, 15 Nov 2021 11:43:29 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level:
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Sk2qovBgXVWP; Mon, 15 Nov 2021 11:43:24 -0800 (PST)
Received: from slice.pfrc.org (slice.pfrc.org [67.207.130.108]) by ietfa.amsl.com (Postfix) with ESMTP id D70723A07E9; Mon, 15 Nov 2021 11:43:20 -0800 (PST)
Received: by slice.pfrc.org (Postfix, from userid 1001) id EA1A61E2D8; Mon, 15 Nov 2021 14:43:19 -0500 (EST)
Date: Mon, 15 Nov 2021 14:43:19 -0500
From: Jeffrey Haas <jhaas@pfrc.org>
To: "Wanghaibo (Rainsword)" <rainsword.wang@huawei.com>
Cc: "UTTARO, JAMES" <ju1738@att.com>, Robert Raszuk <robert@raszuk.net>, Bruno Decraene <bruno.decraene@orange.com>, "enchen@paloaltonetworks.com" <enchen@paloaltonetworks.com>, "jheitz@cisco.com" <jheitz@cisco.com>, Keyur Patel <keyur@arrcus.com>, "idr@ietf. org" <idr@ietf.org>, "draft-wang-idr-bgp-error-enhance@ietf.org" <draft-wang-idr-bgp-error-enhance@ietf.org>
Message-ID: <20211115194319.GC25878@pfrc.org>
References: <CAOj+MMG=Ve+_BuOGY6tAU-CjY5GUg2uEHPtXO_HeSVFsBdDerQ@mail.gmail.com> <CANJ8pZ-M1Si_BgTCxz3kpyTpDP3nvMz5qH=akkk2EpnppFB+7w@mail.gmail.com> <BYAPR11MB320751C57DBA2EF6180F17AEC0959@BYAPR11MB3207.namprd11.prod.outlook.com> <26162_1636711639_618E3CD7_26162_6_1_44d7103da3f84030a76484d4944d6f75@orange.com> <MW4PR02MB739458E2335B3D52447DFC8CC6959@MW4PR02MB7394.namprd02.prod.outlook.com> <BYAPR11MB3207C9012C7F13076944CBB6C0969@BYAPR11MB3207.namprd11.prod.outlook.com> <11079_1636966936_61922218_11079_233_1_8e88e66cbae347cc88de3fb146c935c9@orange.com> <CAOj+MMGH9VX8Pb_PUZ=ypNV_WnM7EBMQcWYsdZGb81KDMWavMg@mail.gmail.com> <MW4PR02MB73945375BF9A05DD86AEF256C6989@MW4PR02MB7394.namprd02.prod.outlook.com> <d0b4499f15184476a0956cd22e845411@huawei.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <d0b4499f15184476a0956cd22e845411@huawei.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/H1dnnNnP2-htDrKvGGBJ96Iy4qw>
Subject: Re: [Idr] I-D Action: draft-wang-idr-bgp-error-enhance-00.txt
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Nov 2021 19:43:29 -0000
Haibo, On Mon, Nov 15, 2021 at 03:47:46PM +0000, Wanghaibo (Rainsword) wrote: > Our main purpose is to reduce the impact of error messages on the > business, also to provide some clear descriptions of what was not clearly > defined in the original RFCs. Please make sure to highlight specifically which things you believe may not be clearly defined in the RFCs. If there is consensus about this, filing Errata is an appropriate thing. > At present, more discussion is about the NLRI’s parsing. Such as the scope > of application raised by Jeff, it is true that the Internet has a wide > range of applications, and this processing may not be appropriate. > However, for other AFI/SAFIs, such as EVPN and MVPN, this processing can > be considered. This also answers Jim's question. We focus on the > MP_REACH/UNREACH attribute, which mainly refers to various service > AFI/SAFIs and non-Internet services. Please remember that even "simple" IPv6 Internet service is carried in MP_REACH/UNREACH. The characterization about the scope of the address family is really the detail we will want to discuss here. I tend to frame the problem in terms of the "blast radius" - the damage caused by an explosion - that a protocol fault may have. While it is usually the case that mvpn and evpn are "local", evpn type 5 routes may effectively pass through networks and eventually interact with the Internet. Similarly, IPv4/IPv6 unicast may be used solely for private purposes. > The integrity of the NLRI mentioned by Enke can be divided into two parts. > First, the length of the NLRI in the MP_REACH is incorrect. This is > described as a parsing error in RFC7606 section 5.3. However, the specific > proceduring is not described. The text in RFC 7606, section 5.3: : The NLRI field or Withdrawn Routes field SHALL be considered : "syntactically incorrect" if either of the following are true: The procedure is understood from RFC 4271, section 6.3: : The NLRI field in the UPDATE message is checked for syntactic : validity. If the field is syntactically incorrect, then the Error : Subcode MUST be set to Invalid Network Field. > It can be understood that the definition of > RFC4760 is retained, that is, disable the NLRI’s AFI/SAFI or reset the > session. It is also worth discussing whether it is appropriate to disable > AFI/SAFI or reset session. I am personally curious what implementations disable a given AFI/SAFI. The challenge an implementation has if it decides NLRI is malformed is whether it can find the next BGP message boundary safely. This is part of my microphone comments during the IDR presentation of your draft. In the most optimistic case, a BGP Update contains a single NLRI. In the event that the Update is considered malformed because of the NLRI, it might be possible to look at adjacent memory patterns to find the BGP Marker field. If so, an implementation may choose to "resync" the BGP messages and continue. I am unaware of whether any implementation does this. (The BGP Marker is a somewhat vestigial protocol feature. It was originally there to cover a previous authentication feature that was not carried forward in the protocol updates.) > Currently, some AFI/SAFIs, such as EVPN, > contain multiple routing types. Whether an error of one of them should > affect the entire AFI/SAFI. This has been a conversational point before with our Area Director, Alvaro. The typed NLRI has created an unfortunate additional place of fragility in error handling. This is just another example. > Whether a packet is parsed out of bounds depends on the length of the > MP_REACH attribute, the total length of the attribute, and the total > length of the packet to ensure that the packet does not across the bounds. > According to Keyur's comments, TCP is flow-based. If the current packet is > discarded, whether the subsequent packet can be parsed depends on the > arrival of the next packet. However, this should not affect the processing > of the current packet. This again indirectly applies toward using the Marker to find the "next packet". The scenario I believe you're trying to highlight is if it is the case that an implementation believes that all damage to an Update is contained in a single BGP message that it may be one step "safer" to leave the session up. While this may be true, the real question is the blast radius of the damage of the Update that contains malformed state. RFC 7606 decided to consider malformed NLRI as toxic. As we discussed during the IDR session at this recent IETF, it will take significant motivation to convince the Working Group to change this consensus. > Finally, some mechanism optimizations, such as dynamic capability > negotiation and multi-session, may be used to reduce the impact. However, > we also need consider how to do better under the current BGP mechanism. I urge you to note that multi-session and BGP QUIC address similar considerations but by different mechanisms. multi-session reduces the "blast radius" to a separate session. I commonly refer to this property as "structural separation". For BGP QUIC, the protocol supplies a missing OSI Layer 5 that has the opportunity to change the per-message fault model. -- Jeff
- Re: [Idr] I-D Action: draft-wang-idr-bgp-error-en… Robert Raszuk
- Re: [Idr] I-D Action: draft-wang-idr-bgp-error-en… Wanghaibo (Rainsword)
- Re: [Idr] I-D Action: draft-wang-idr-bgp-error-en… Enke Chen
- Re: [Idr] I-D Action: draft-wang-idr-bgp-error-en… Jakob Heitz (jheitz)
- Re: [Idr] I-D Action: draft-wang-idr-bgp-error-en… bruno.decraene
- Re: [Idr] I-D Action: draft-wang-idr-bgp-error-en… UTTARO, JAMES
- Re: [Idr] I-D Action: draft-wang-idr-bgp-error-en… Jakob Heitz (jheitz)
- Re: [Idr] I-D Action: draft-wang-idr-bgp-error-en… bruno.decraene
- Re: [Idr] I-D Action: draft-wang-idr-bgp-error-en… Robert Raszuk
- Re: [Idr] I-D Action: draft-wang-idr-bgp-error-en… UTTARO, JAMES
- Re: [Idr] I-D Action: draft-wang-idr-bgp-error-en… Wanghaibo (Rainsword)
- Re: [Idr] I-D Action: draft-wang-idr-bgp-error-en… Jeffrey Haas
- Re: [Idr] I-D Action: draft-wang-idr-bgp-error-en… Gyan Mishra