Re: [bess] John Scudder's Discuss on draft-ietf-bess-evpn-optimized-ir-09: (with DISCUSS and COMMENT)
"Rabadan, Jorge (Nokia - US/Mountain View)" <jorge.rabadan@nokia.com> Mon, 08 November 2021 15:21 UTC
Return-Path: <jorge.rabadan@nokia.com>
X-Original-To: bess@ietfa.amsl.com
Delivered-To: bess@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 89D583A147B; Mon, 8 Nov 2021 07:21:26 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.891
X-Spam-Level:
X-Spam-Status: No, score=-1.891 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, T_FILL_THIS_FORM_SHORT=0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=nokia.onmicrosoft.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mcfbKikUDbVP; Mon, 8 Nov 2021 07:21:18 -0800 (PST)
Received: from NAM04-DM6-obe.outbound.protection.outlook.com (mail-dm6nam08on2114.outbound.protection.outlook.com [40.107.102.114]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1F7453A107D; Mon, 8 Nov 2021 07:20:36 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=P70Cu/Aaah7Tes5Wd5Hz24ShN0D6VlRZi9UJAnIh7U37fl7fny2urRUInILWwvo3N0TBj8f7ufNdJUjtPAtdlqDHXoNrkK+bamw/SpndPh/dY+TGchPeW+/0XKiidmFrAinlV7DBUd+qGmt/B0PPB3tuArvpqpA1Xed1Py/d0NjaPv6iOKfLS7s2PLR9dUasv20i8R+o7BMMDQd2Qk3VDh2itQ/O5D/zmc7ygkpkaBLajlRapP61B+ELfa1bpThPED9PBd2Oaw1EwSmkOkMh1gm63hBxAAch6IHFgZFjye+tS/d1hfwI+SYaf3DuTmcrSdB5mPPzSLpXYMuJk5QoYg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=WqQ8q/mDMIJWtX6WW6N4BRTssW+1G2jWvMPoRT+cvdw=; b=ledKMlRUClK58fICuU0cUtGmmtw5/Fd++3hmA7l5NVN3nhwak47BFVjRtfYLyPI6e67llylEbEYdVFdSKJQPWGfz/rEr4/+XufxsjZ8Y0lIqwv+GDQF9qta8ZOF6Qv2QZQUnV8FNtGyxlKHiIis2LopBMFFPlTUek4DoHU5bitsAlxv+5Dd6uCvZcRrWKxILxvEswPSEGImVqH3tjrHVecVf5XHQLEGhtDsyprDkzwHzEmWQf97y237zZUfygTjUe9QTtacaO6xfdZBH8a1X+3TV9kp7+mkTyqkYTCTTjbZgupMgcQDg0v+qTiumKAM+f605DWbnXGLfi5x6AEBOig==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nokia.com; dmarc=pass action=none header.from=nokia.com; dkim=pass header.d=nokia.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nokia.onmicrosoft.com; s=selector1-nokia-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=WqQ8q/mDMIJWtX6WW6N4BRTssW+1G2jWvMPoRT+cvdw=; b=jRkRH5COmIVBhhUg+6RYyaG9fhkabSVmiiicndFLhsRMDDQa9lGRlP/ZrGPO80mchj3wFZnUgVobftbBSvIYxyQL4VrqTS/UtVXN9yZRHrHQFHuGv46je7Uvv2mLc1nJjynodZpmXxxBoVLUeB7O5dPlE9VT9OUnWccgvF9RsNM=
Received: from BY3PR08MB7060.namprd08.prod.outlook.com (2603:10b6:a03:36d::19) by BYAPR08MB4245.namprd08.prod.outlook.com (2603:10b6:a03::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4669.11; Mon, 8 Nov 2021 15:20:31 +0000
Received: from BY3PR08MB7060.namprd08.prod.outlook.com ([fe80::c481:f856:9121:e]) by BY3PR08MB7060.namprd08.prod.outlook.com ([fe80::c481:f856:9121:e%7]) with mapi id 15.20.4669.016; Mon, 8 Nov 2021 15:20:31 +0000
From: "Rabadan, Jorge (Nokia - US/Mountain View)" <jorge.rabadan@nokia.com>
To: John Scudder <jgs@juniper.net>, The IESG <iesg@ietf.org>, "draft-ietf-bess-evpn-optimized-ir@ietf.org" <draft-ietf-bess-evpn-optimized-ir@ietf.org>, "bess-chairs@ietf.org" <bess-chairs@ietf.org>, "bess@ietf.org" <bess@ietf.org>, "Bocci, Matthew (Nokia - GB)" <matthew.bocci@nokia.com>
Thread-Topic: John Scudder's Discuss on draft-ietf-bess-evpn-optimized-ir-09: (with DISCUSS and COMMENT)
Thread-Index: AQHXxhfMrlBD4AIth0id/PNXgGqUQ6vyb5sV
Date: Mon, 08 Nov 2021 15:20:31 +0000
Message-ID: <BY3PR08MB706048E96EC99525C286418CF78C9@BY3PR08MB7060.namprd08.prod.outlook.com>
References: <163477834717.27602.11452549676478352862@ietfa.amsl.com>
In-Reply-To: <163477834717.27602.11452549676478352862@ietfa.amsl.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
authentication-results: juniper.net; dkim=none (message not signed) header.d=none;juniper.net; dmarc=none action=none header.from=nokia.com;
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: ab054600-6fa0-4f76-eaef-08d9a2cb4de5
x-ms-traffictypediagnostic: BYAPR08MB4245:
x-microsoft-antispam-prvs: <BYAPR08MB42452882ED3A7F9C83F7C08EF7919@BYAPR08MB4245.namprd08.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:10000;
x-ms-exchange-senderadcheck: 1
x-ms-exchange-antispam-relay: 0
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: CNSAw7Sas6MAdyZUdoZcb7U6SpKacfUVBsOiDu2vnnuDTTSfewtkuV314vhxVKv/PcjP52gbLzSXQa6FZYFO3eE0s8BRiBxadVQmCdAVqGCtKJibibfvXT5WuG5KT7+dzgG7HBKqfBDi8WqMYVpQjFG2aoKTFUrmhhUcpjTBG8RrOZpOQawhymWcVjE9scwFc2RovD4M9aSawpG1igN3q1tJZOcXxEzClpOsLzWJ2ktw1tYUQSjacjry0U0ZN5eidCW46Wwfj9PRBl/W1vwC+u1k8Xi3UpnNigHstf1SAWDGSYeyugtUsAqPlfxR47zqzx58I7l/Sw60tCQssiF4ui2Jr0DWhTolF8A2ARePOhrjPQScaexWQPd4t3/ib3B+gIYFOFMJzpzt8wj4gmwttahGNcwpstfXrzmVkXStLtQqDNpxTYS9WwJxnWrgiE0AlVBtjMm84wYS/yACgbyXQULoTVu2B7DzcgMm2c/EsRD1NF1kdaqvxkioe8VrYmkA7sz2aOiLlLmuGbMoaThXzkazK3NiE3nKoNXYWe8ksheD0i1fvwhQJ7Je1b4Fas8ycdBCvt9VkPk4HEuiAJe6T/FmpVLGzUmIg7OZ4NrR6oK3s/5AX06qgiQcuf+Y7YmP4wyoy3qptU8oG3WBLpM+USnH0XkGuXvoOuHD0BwxOt9EwN58DnzWlMrv6Ec8k/ou9or9Wd8Qa18r8vL+9dX8hGEVlfbwctmwaTbiIUPPaI1JOMXauCjdZKJuvoRnqIIfxarx08E+J/rZXpxC8AO3BMuEgNU5AdeBNBAZEk2a2QcoMDDOL+twQ/n7EhIbuNgAXbtRo+TJqJW60ztxES44wQ==
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BY3PR08MB7060.namprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(66946007)(55016002)(508600001)(9686003)(8936002)(64756008)(66556008)(91956017)(40140700001)(966005)(110136005)(66476007)(66446008)(8676002)(38100700002)(33656002)(316002)(2906002)(4001150100001)(26005)(30864003)(83380400001)(5660300002)(86362001)(122000001)(186003)(9326002)(166002)(6506007)(7696005)(82960400001)(66574015)(52536014)(76116006)(38070700005)(53546011)(71200400001)(6636002)(559001)(579004); DIR:OUT; SFP:1102;
x-ms-exchange-antispam-messagedata-chunkcount: 1
x-ms-exchange-antispam-messagedata-0: AlrCPpe5CVSUZxnFuiYd19Ys8oYiF2XneNd1JB6tW5rhuFnjTZmTNKxUT9f174EAB+hlPJUPiUDfofgwTXtPFZsLnLHIIxjB+locChmGnGCA3ijhmBsbdJ6CIwtzQFQVhdx5uV/k8NqIYVkH3nsMu1yvz6Dq3t0OWl37yjQ4yF+MUucCV5V96ZJfjqR9vatkcp4yu55bzLUCwEyX+k9hfgOaUM/GINKYBTV4HqfOXfoo3v9LiYixXzi8CPMzpPuPHkqY5plXDJ+0Pud/ib9Y3kPMNNerRwH6AJXYHVlDLpzGFHRLjkrI2mF1lE/zZQW9hOfBjM/rz6q2d6x1cLEupNOboxBSZjI1gYh7t1ojPEdc9U9aKgZ715bmholTRMAAn7jCoqIHWttja+JZovK58Lho+mjW2BaXx1cy3e+fj3GAX5CWLfp/Yy2WlPT98/OCfKGDR+avPfqlhb8Iith5RyBgaRf8HSB/BHV0xhrT90n/5S2nhtE0tG6f+J5TtBLEOlbH0uCZCEVLKV7ZhV9GMKozJmOms31RXjjOswxl58vskeRK0x20WDREQ7eold4F3G7FcOlvsE5FdBLo12QuZ6/FgYLVP9aVxVSMkGQaETs/QMAessCzCFipGwsgE3xcDomR1QzCqrLehV+zdLbcd+xOKGLLLwGvlQ8T3hD9KT/VN+Kk97j1sWSpRIjuPzQAz6sjTVx/XjwwAD9hsOeA5Li+/Zawb0gO1X8hI2bE5zI/1cHb4jQ4sK72mrYu4k/lr0bwzrlqHHcu7RrZ4gjSx3AAzDzuzzsTus28hg8JTo+D185kPGN9GmpRsUbGAQwJ7ny1DNOqB0A7Tr4I/U3vsIj+e3LKvptZQ0QINkA679H9rrE8zQ+IFY3FIK8ro3Yftq9dol9GMcz2poz7vhSvCuDnF3+MafgGo2w7EKNRgyvPzK0AGTsKShDImUMbhhWcDGmzrf4EQ8aHSpyiXbMj0jHq/Tl7RKTmJHxMGV7v+WfCYihuLLwO+8HKRArzetuuRTQB5OaEK3DkG61v1EtT8tw9IJ1DRgCFy32i7mP//TgvR0clfILu+cJHj56JxqwKQ7T7jXM6fGgTzRauGEpLhaAjsH+KlFGo2g3KpT73Xt9gF7BHCXBs+v6uZLf7rFD+9q5358fYlyCUi88SfgWFY7G6xYxjnWOiW3ny/gWmQFo7zcVTLqgrLMCXOUXS77eXunfscxwvb+IB6IXTOBNSVWcLEXIBmsUnH9U7B4IW01/I5pR8eD1G7esLDf8TLuqrJ6ea9zr2346dXm7geHRAxHVEe5N9kxPLTsY9hDtdAjmrlL0Smd5HaOPVY7T6HYR6CWetNOk+LSdbQ6cWXLPuuFKqpYJBjqTBWO43O42vGY5e4mvBZeRwZdPzjnwO4m+ouz1NiyeEl4ei1sVP1vTJY5FEEkqZlXWs5GJP5iZnru7GMwoIWMhf2dDwsfYQ5MCjn3+G6spTdNkqAXdrfpdKLpV0aF9uhFJ6VPWCR6VxixM3jcG7PeGkRJTx93JAM00Z3PVIpfO70dWVkdgMabTcNky29DXu2YV9RJ4Nrs5iR6PhuGYcgNhWg8gQv/oZLEEmBzwR7CG6MXkb1zSHPGktl5BC2Y0Cdp0lOQLUVSYp/0At75glhyUtltRuYvohWL8Z
Content-Type: multipart/alternative; boundary="_000_BY3PR08MB706048E96EC99525C286418CF78C9BY3PR08MB7060namp_"
MIME-Version: 1.0
X-OriginatorOrg: nokia.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: BY3PR08MB7060.namprd08.prod.outlook.com
X-MS-Exchange-CrossTenant-Network-Message-Id: ab054600-6fa0-4f76-eaef-08d9a2cb4de5
X-MS-Exchange-CrossTenant-originalarrivaltime: 08 Nov 2021 15:20:31.7285 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 5d471751-9675-428d-917b-70f44f9630b0
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: 0ZmsHqn5A2VEfPztTioVRDvlDwaCrU7D49ilIkuJj7m03hAZ5tQg5Jw2tmYa87cSzhPaXSDFv0wqxOddMhFdzg==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR08MB4245
Archived-At: <https://mailarchive.ietf.org/arch/msg/bess/7gf-1ItQC37ciol_RMy5P4e63mw>
Subject: Re: [bess] John Scudder's Discuss on draft-ietf-bess-evpn-optimized-ir-09: (with DISCUSS and COMMENT)
X-BeenThere: bess@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: BGP-Enabled ServiceS working group discussion list <bess.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bess>, <mailto:bess-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bess/>
List-Post: <mailto:bess@ietf.org>
List-Help: <mailto:bess-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bess>, <mailto:bess-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Nov 2021 15:21:33 -0000
Hi John, First of all, thank you very much for your time and thorough review. You have great points, and the document is now in a much better shape. We really appreciate it. Please see in-line along your email, with [jorge]. Thank you. Jorge From: John Scudder via Datatracker <noreply@ietf.org> Date: Wednesday, October 20, 2021 at 6:05 PM To: The IESG <iesg@ietf.org> Cc: draft-ietf-bess-evpn-optimized-ir@ietf.org <draft-ietf-bess-evpn-optimized-ir@ietf.org>, bess-chairs@ietf.org <bess-chairs@ietf.org>, bess@ietf.org <bess@ietf.org>, Bocci, Matthew (Nokia - GB) <matthew.bocci@nokia.com>, Bocci, Matthew (Nokia - GB) <matthew.bocci@nokia.com> Subject: John Scudder's Discuss on draft-ietf-bess-evpn-optimized-ir-09: (with DISCUSS and COMMENT) John Scudder has entered the following ballot position for draft-ietf-bess-evpn-optimized-ir-09: Discuss When responding, please keep the subject line intact and reply to all email addresses included in the To and CC lines. (Feel free to cut this introductory paragraph, however.) Please refer to https://www.ietf.org/blog/handling-iesg-ballot-positions/ for more information about how to handle DISCUSS and COMMENT positions. The document, along with other ballot positions, can be found here: https://datatracker.ietf.org/doc/draft-ietf-bess-evpn-optimized-ir/ ---------------------------------------------------------------------- DISCUSS: ---------------------------------------------------------------------- DISCUSS: In my review there are a number of comments/questions that I would like to be sure of having discussed. In particular, the questions about use of SHOULD without associated discussion of exception cases. I would also like to be make sure the question about whether non-BM receivers are, or are not, excluded from flood lists (§6.1) has been addressed. Of course I would also appreciate replies to my other comments! :-) [jorge] sure. Please see below. ---------------------------------------------------------------------- COMMENT: ---------------------------------------------------------------------- Overall comments: 1. This document suffers from what I think is an overuse of abbreviations. See https://www.psychologicalscience.org/observer/alienating-the-audience-how-abbreviations-hamper-scientific-communication for one perspective on why this is problematic. Any individual one of these doesn't rise to the level of being objectionable, but in aggregate at some point it makes the document a lot less accessible to anyone who isn't part the in-group who has memorized the abbreviations. 4r xm, <- ... is !@? to rd, 4r no gd rn, [see terminology section below] even though anyone who goes to the effort of looking up the terminology can decode it. I would really prefer it if this were improved; I think it's not that much work for the authors and will make the resulting spec more usable. I had intended to offer an example edit that expands many of the abbreviations, but have run out of time; I'd still be willing to do it later if requested, let me know. (Consider also the contrast with RFC 6514; for instance instead of referring to "the L-flag", when mentioning that flag it says "the Leaf Information Required flag". Since we don't pay by the byte for publishing our documents, it seems to be worth spending a few more keystrokes to make it easier to read them.) [jorge] that’s a fair comment and a fresh view. We, authors, are familiar with the acronyms and we don’t realize, but when looking at the document again from a different perspective, you are 100% right. I definitely think the document is more readable now: * we reduced the number of abbreviations * we ordered the terminology in section 2 2. The document starts in the middle. It jumps right from the requirements to the tunnel attribute diagram, with no overview or outline of the solution. This is related to Pascal's review comment, mentioned by Éric Vyncke. [jorge] the introduction has been extended/re-written including an outline of the solution. Thanks! Terminology: 4r: for xm: example <-: this ...: sentence !@?: difficult rd: read gd: good rn: reason [jorge] got you 😊 Detailed review: I’ve done my comments in the form of an edited copy of the draft. I don't think the datatracker tooling allows me to use attachments, so I'll follow up to this with an email with attached edited copy, as well as a PDF of the rfcdiff output for your convenience if you’d like to use it. I’ve also pasted a traditional diff below to capture the comments for the record and in case you want to use it for in-line reply. I’d appreciate feedback regarding whether you found this a useful way to receive my comments as compared to a more traditional numbered list of comments with selective quotation from the draft. [jorge] the pdf with the side to side diff and the copy below helped greatly. We really appreciate you took the time to provide the diff and the comments in the edited text thinking about how it could help better. Thank you very much. Please see my comments in-line below. If I don’t have comments it means we changed the text as you asked. *** draft-ietf-bess-evpn-optimized-ir-09.txt 2021-10-20 13:48:15.000000000 -0400 --- draft-ietf-bess-evpn-optimized-ir-09-jgs-markup.txt 2021-10-20 20:39:39.000000000 -0400 *************** *** 19,25 **** Abstract ! Network Virtualization Overlay (NVO) networks using EVPN as control plane may use Ingress Replication (IR) or PIM (Protocol Independent Multicast) based trees to convey the overlay Broadcast, Unknown unicast and Multicast (BUM) traffic. PIM provides an efficient --- 19,25 ---- Abstract ! Network Virtualization Overlay (NVO) networks using EVPN as their control plane may use Ingress Replication (IR) or PIM (Protocol Independent Multicast) based trees to convey the overlay Broadcast, Unknown unicast and Multicast (BUM) traffic. PIM provides an efficient *************** *** 105,111 **** Ethernet Virtual Private Networks (EVPN) may be used as the control plane for a Network Virtualization Overlay (NVO) network. Network ! Virtualization Edge (NVE) devices and Provider Edges (PEs) that are --- 105,111 ---- Ethernet Virtual Private Networks (EVPN) may be used as the control plane for a Network Virtualization Overlay (NVO) network. Network ! Virtualization Edge (NVE) and Provider Edge (PEs) devices that are *************** *** 182,187 **** --- 182,191 ---- "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. + + Is there any logic to the order in which the terms are presented? If so, + it escaped me. It would have been much better for my reading of the document, + if the terms had been given in alphabetical order, for obvious reasons. [jorge] alphabetic order is now in place, thanks The following terminology is used throughout the document: *************** *** 236,241 **** --- 240,247 ---- Replicator-AR route. It is used to identify the ingress packets that must follow AR procedures ONLY in the Single-IP AR-REPLICATOR case. + + A reference to section 8 would be helpful in the above. - IR-VNI: VNI advertised along with the RT-3 for IR. *************** *** 288,296 **** --- 294,313 ---- hereafter) meets the following requirements: a. It provides an IR optimization for BM (Broadcast and Multicast) + + Thank you for expanding "BM", but... you've already defined it in your + Terminology section, so maybe you don't need to define it again. (But see + also my general comment on the subject of abbreviations in general; + depending on how we resolve that this comment may be overtaken by events.) + traffic without the need for PIM, while preserving the packet order for unicast applications, i.e., known and unknown unicast traffic should follow the same path. This optimization is + + ... the same path as what? If you mean unknown should follow the same path + as known, then use "... i.e., unknown unicast traffic should follow the same + path as known unicast traffic". If you mean something different, what is it? + required in low-performance NVEs. b. It reduces the flooded traffic in NVO networks where some NVEs do *************** *** 361,369 **** --- 378,403 ---- The Flags field is 8 bits long. This document defines the use of 4 bits of this Flags field: + + It would be quite helpful to include a diagram of the Flags field as in + RFC 6514 §5: + + The Flags field has the following format: + + 0 1 2 3 4 5 6 7 + +-+-+-+-+-+-+-+-+ + | reserved |L| + +-+-+-+-+-+-+-+-+ + + except of course with all the new and previously-defined flags filled + in too. - bits 3 and 4, forming together the Assisted-Replication Type (T) field + + Up here you call it the Assisted-Replication Type field. Just a few lines + later you call it the AR Type field. Can you make up your mind and use + one or the other, please? - bit 5, called the Broadcast and Multicast (BM) flag *************** *** 406,411 **** --- 440,448 ---- - Flag L is an existing flag defined in [RFC6514] (L=Leaf Information Required) and it will be used only in the Selective AR Solution. + + I think it would be nice to provide the bit position for this flag, as in + "(L=Leaf Information Required, bit 7)" Please refer to Section 11 for the IANA considerations related to the PTA flags. *************** *** 420,436 **** --- 457,497 ---- address that we denominate IR-IP in this document. When advertised by an AR-LEAF node, the Regular-IR route SHOULD be advertised with type T= AR-LEAF. + + Your use of SHOULD implies there is at least one case where a reasonable + implementation could choose to advertise a Regular-IR route from an + AR-LEAF node with a different type. I am left to guess what the case is, + and what value it should choose then. Maybe it would use RNVE instead? + Please say something about this. On the other hand if there isn't any + such case, this should be a MUST. [jorge] the logic for using a SHOULD was that, even if the AR-LEAF does not set the T=AR-LEAF, the procedures in the document would still work, however, I changed it for MUST. If the AR-LEAF is an RNVE, then it is not an AR-LEAF… so I think it is better to use a MUST. - Replicator-AR route: this route is used by the AR-REPLICATOR to advertise its AR capabilities, with the fields set as follows: o Originating Router's IP Address MUST be set to an IP address of the PE that should be common to all the EVIs on the PE (usually + + What's "the PE" in this context? I'm assuming it means "the advertising + router". If that's right, please say that instead of "the PE". + this is the PE's loopback address). The Tunnel Identifier and Next-Hop SHOULD be set to the same IP address as the Originating Router's IP address when the NVE/PE originates the route. The Next-Hop address is referred to as the AR-IP and SHOULD be different than the IR-IP for a given PE/NVE. + + Similar question to my earlier one about the two SHOULDs above. I guess + in the case of the second SHOULD, it MAY be the same in the case of a + router unable to support two different IP addresses for this purpose, in + which case the procedures of Section 8 MUST be applied? If that's right, + please add language to that effect. + + As for the first SHOULD, does this imply that the Tunnel Identifier and + Next-Hop MAY be set to the IP address of some other router? + + Also, "when the NVE/PE originates the route" -- in this section aren't + we always talking about the NVE/PE originating the route? This clause + makes me think there is another case, but I can't figure out what it is. [jorge] all good points, thanks. The new text reads as follows: - Replicator-AR route: this route is used by the AR-REPLICATOR to advertise its AR capabilities, with the fields set as follows: o Originating Router's IP Address MUST be set to an IP address of the advertising router that is common to all the EVIs on the PE (usually this is a loopback address of the PE). + The Tunnel Identifier and Next-Hop SHOULD be set to the same IP address as the Originating Router's IP address when the NVE/PE originates the route, that is, when the NVE/PE is not an ASBR as in section 10.2 of [RFC8365]. Irrespective of the values in the Tunnel Identifier and Originating Router's IP Address fields, the ingress NVE/PE will process the received Replicator-AR route and will use the IP Address in the Next-Hop field to create IP tunnels to the AR- REPLICATOR. + The Next-Hop address is referred to as the AR-IP and MUST be different from the IR-IP for a given PE/NVE, unless the procedures in Section 8 are followed. o Tunnel Type = Assisted-Replication Tunnel. Section 11 provides the allocated type value. *************** *** 440,446 **** o L (Leaf Information Required) = 0 (for non-selective AR) or 1 (for selective AR). ! In addition, this document also uses the Leaf A-D route (RT-11) defined in [I-D.ietf-bess-evpn-bum-procedure-updates] in case the --- 501,507 ---- o L (Leaf Information Required) = 0 (for non-selective AR) or 1 (for selective AR). ! In addition, this document also uses the Leaf Auto-Discovery (A-D) route (RT-11) defined in [I-D.ietf-bess-evpn-bum-procedure-updates] in case the *************** *** 452,457 **** --- 513,522 ---- selective AR mode is used. The Leaf A-D route MAY be used by the AR- LEAF in response to a Replicator-AR route (with the L flag set) to + + The above is ambiguous. Maybe "An AR-LEAF MAY send a Leaf A-D route in + response to reception of a Replicator-AR route whose L flag is set."? + advertise its desire to receive the BM traffic from a specific AR- REPLICATOR. It is only used for selective AR and its fields are set as follows: *************** *** 459,466 **** --- 524,538 ---- o Originating Router's IP Address is set to the advertising PE's + + What's "the PE" in this context? I'm assuming it means "the advertising + router". If that's right, please say that instead of "the PE". + IP address (same IP used by the AR-LEAF in regular-IR routes). The Next-Hop address is set to the IR-IP. + + ... and the IR-IP is different from the "advertising PE's IP address" I + guess? o Route Key is the "Route Type Specific" NLRI of the Replicator- AR route for which this Leaf A-D route is generated. *************** *** 477,483 **** o The Leaf A-D route MUST include the PMSI Tunnel attribute with the Tunnel Type set to AR, type set to AR-LEAF and the Tunnel ! Identifier set to the IP of the advertising AR-LEAF. The PMSI Tunnel attribute MUST carry a downstream-assigned MPLS label or VNI that is used by the AR-REPLICATOR to send traffic to the AR-LEAF. --- 549,555 ---- o The Leaf A-D route MUST include the PMSI Tunnel attribute with the Tunnel Type set to AR, type set to AR-LEAF and the Tunnel ! Identifier set to the IP address of the advertising AR-LEAF. The PMSI Tunnel attribute MUST carry a downstream-assigned MPLS label or VNI that is used by the AR-REPLICATOR to send traffic to the AR-LEAF. *************** *** 488,494 **** Each node attached to the BD may understand and process the BM/U flags. Note that these BM/U flags may be used to optimize the ! delivery of multi-destination traffic and its use SHOULD be an administrative choice, and independent of the AR role. Non-optimized-IR nodes will be unaware of the new PMSI attribute flag --- 560,566 ---- Each node attached to the BD may understand and process the BM/U flags. Note that these BM/U flags may be used to optimize the ! delivery of multi-destination traffic and their use SHOULD be an administrative choice, and independent of the AR role. Non-optimized-IR nodes will be unaware of the new PMSI attribute flag *************** *** 512,518 **** AR function is enabled. Three different roles are defined for a given BD: AR-REPLICATOR, AR-LEAF and RNVE (Regular NVE). The solution is called "non-selective" because the chosen AR-REPLICATOR ! for a given flow MUST replicate the BM traffic to 'all' the NVE/PEs in the BD except for the source NVE/PE. ( ) --- 584,590 ---- AR function is enabled. Three different roles are defined for a given BD: AR-REPLICATOR, AR-LEAF and RNVE (Regular NVE). The solution is called "non-selective" because the chosen AR-REPLICATOR ! for a given flow MUST replicate the BM traffic to all the NVE/PEs in the BD except for the source NVE/PE. ( ) *************** *** 567,572 **** --- 639,658 ---- An AR-REPLICATOR is defined as an NVE/PE capable of replicating ingress BM (Broadcast and Multicast) traffic received on an overlay tunnel to other overlay tunnels and local Attachment Circuits (ACs). + + This is different from the definition you have in the terminology section, + which is: + + - AR-REPLICATOR: Assisted Replication - REPLICATOR, refers to an + NVE/PE that can replicate Broadcast or Multicast traffic received + on overlay tunnels to other overlay tunnels. + + In the definition here, you mention local attachment circuits, in §2 you + don't. Probably you should harmonize these definitions. Having done so, + it's not clear to me that you need to repeat the definition here (though + if you think you need to remind the reader of what you already told them, + it's OK). + The AR-REPLICATOR signals its role in the control plane and understands where the other roles (AR-LEAF nodes, RNVEs and other AR- REPLICATORs) are located. A given AR-enabled BD service may have *************** *** 584,608 **** generate a Regular-IR route if it does not have local attachment circuits (AC). If the Regular-IR route is advertised, the AR Type field is set to zero. c. The Replicator-AR and Regular-IR routes are generated according to section 3. The AR-IP and IR-IP used by the AR-REPLICATOR are different routable IP addresses. d. When a node defined as AR-REPLICATOR receives a BM packet on an ! overlay tunnel, it will do a tunnel destination IP lookup and apply the following procedures: ! o If the destination IP is the AR-REPLICATOR IR-IP Address the node will process the packet normally as in [RFC7432]. ! o If the destination IP is the AR-REPLICATOR AR-IP Address the node MUST replicate the packet to local ACs and overlay tunnels (excluding the overlay tunnel to the source of the packet). When replicating to remote AR-REPLICATORs the tunnel ! destination IP will be an IR-IP. That will be an indication for the remote AR-REPLICATOR that it MUST NOT replicate to ! overlay tunnels. The tunnel source IP used by the AR- REPLICATOR MUST be its IR-IP when replicating to either AR- REPLICATOR or AR-LEAF nodes. --- 670,705 ---- generate a Regular-IR route if it does not have local attachment circuits (AC). If the Regular-IR route is advertised, the AR Type field is set to zero. + + Do you mean "... the AR Type field of the Replicator-AR route MUST be + set to zero"? If so, please say that. [jorge] actually we mean the Regular-IR route. Changed it to: If the Regular-IR route is advertised, the Assisted-Replication Type field of the Regular-IR route MUST be set to zero. c. The Replicator-AR and Regular-IR routes are generated according to section 3. The AR-IP and IR-IP used by the AR-REPLICATOR are + + I think you mean Section 4? + different routable IP addresses. + + I think you'll find that "routable IP address" isn't a well-defined + term (for example I'm sure you're NOT talking specifically about non- + RFC 1918 addresses). Can you choose different language here to say + what you mean? [jorge] changed to: c. The Replicator-AR and Regular-IR routes are generated according to Section 4. The AR-IP and IR-IP are different IP addresses owned by the AR-REPLICATOR. d. When a node defined as AR-REPLICATOR receives a BM packet on an ! overlay tunnel, it will do a tunnel destination IP address lookup and apply the following procedures: ! o If the destination IP address is the AR-REPLICATOR IR-IP Address the node will process the packet normally as in [RFC7432]. ! o If the destination IP address is the AR-REPLICATOR AR-IP Address the node MUST replicate the packet to local ACs and overlay tunnels (excluding the overlay tunnel to the source of the packet). When replicating to remote AR-REPLICATORs the tunnel ! destination IP address will be an IR-IP. That will be an indication for the remote AR-REPLICATOR that it MUST NOT replicate to ! overlay tunnels. The tunnel source IP address used by the AR- REPLICATOR MUST be its IR-IP when replicating to either AR- REPLICATOR or AR-LEAF nodes. *************** *** 628,642 **** and remote NVE/PEs), skipping the non-BM overlay tunnels. - When an AR-REPLICATOR receives a BM packet on an overlay tunnel, ! it will check the destination IP of the underlay IP header and: ! o If the destination IP matches its AR-IP, the AR-REPLICATOR will forward the BM packet to its flooding list (ACs and overlay tunnels) excluding the non-BM overlay tunnels. The AR- ! REPLICATOR will do source squelching to ensure the traffic is not sent back to the originating AR-LEAF. ! o If the destination IP matches its IR-IP, the AR-REPLICATOR will skip all the overlay tunnels from the flooding list, i.e. it will only replicate to local ACs. This is the regular IR behavior described in [RFC7432]. --- 725,742 ---- and remote NVE/PEs), skipping the non-BM overlay tunnels. - When an AR-REPLICATOR receives a BM packet on an overlay tunnel, ! it will check the destination IP address of the underlay IP header and: ! o If the destination IP address matches its AR-IP, the AR-REPLICATOR will forward the BM packet to its flooding list (ACs and overlay tunnels) excluding the non-BM overlay tunnels. The AR- ! REPLICATOR will ensure the traffic is not sent back to the originating AR-LEAF. + + Above, I suggested the removal of "do source squelching" since AFAICT + it removes jargon while leaving the intention clear. ! o If the destination IP address matches its IR-IP, the AR-REPLICATOR will skip all the overlay tunnels from the flooding list, i.e. it will only replicate to local ACs. This is the regular IR behavior described in [RFC7432]. *************** *** 645,650 **** --- 745,754 ---- is different for BM traffic, as far as Unknown unicast traffic forwarding is concerned, AR-LEAF nodes behave exactly in the same way as AR-REPLICATORs do. + + I'm unclear why you're defining the behavior of AR-LEAF nodes here, when + you started by saying "An AR-REPLICATOR will follow..." Surely, defining + AR-LEAF behavior here is misplaced? - The AR-REPLICATOR/LEAF nodes will build an Unknown unicast flood- list composed of ACs and overlay tunnels to the IR-IP Addresses of *************** *** 655,660 **** --- 759,767 ---- o When an AR-REPLICATOR/LEAF receives an unknown packet on an AC, it will forward the unknown packet to its flood-list, skipping the non-U overlay tunnels. + + Possibly the term "unknown packet" is well-understood by the target + audience, but I think it needs either an explanation or a reference here. o When an AR-REPLICATOR/LEAF receives an unknown packet on an overlay tunnel will forward the unknown packet to its local ACs *************** *** 688,696 **** b. In this non-selective AR solution, the AR-LEAF MUST advertise a single Regular-IR inclusive multicast route as in [RFC7432]. The AR-LEAF SHOULD set the AR Type field to AR-LEAF. Note that ! although this flag does not make any difference for the egress nodes when creating an EVPN destination to the AR-LEAF, it is ! RECOMMENDED to use this flag for an easy operation and troubleshooting of the BD. c. In a service where there are no AR-REPLICATORs, the AR-LEAF MUST --- 795,803 ---- b. In this non-selective AR solution, the AR-LEAF MUST advertise a single Regular-IR inclusive multicast route as in [RFC7432]. The AR-LEAF SHOULD set the AR Type field to AR-LEAF. Note that ! although this field does not make any difference for the egress nodes when creating an EVPN destination to the AR-LEAF, it is ! RECOMMENDED to use this field for an easy operation and troubleshooting of the BD. c. In a service where there are no AR-REPLICATORs, the AR-LEAF MUST *************** *** 701,706 **** --- 808,816 ---- IGP or any other detection mechanism). Ingress replication MUST use the forwarding information given by the remote Regular-IR Inclusive Multicast Routes as described in [RFC7432]. + + I found the above paragraph to be confusing. Does it boil down to, + if there are no AR-REPLICATORS, use regular IR? [jorge] pretty much. I tried to clarify better: c. In a BD where there are no AR-REPLICATORs due to the AR- REPLICATORs being down or reconfigured, the AR-LEAF MUST use regular Ingress Replication, based on the remote Regular-IR Inclusive Multicast Routes as described in [RFC7432]. This may happen in the following cases: o The AR-LEAF has a list of AR-REPLICATORs for the BD, but it detects that all the AR-REPLICATORs for the BD are down (via next-hop tracking in the IGP or any other detection mechanism). o The AR-LEAF receives updates from all the former AR- REPLICATORs containing a non-REPLICATOR AR type in the Inclusive Multicast Etherner Tag routes. o The AR-LEAF never discovered an AR-REPLICATOR for the BD. d. In a service where there is one or more AR-REPLICATORs (based on the received Replicator-AR routes for the BD), the AR-LEAF can *************** *** 709,720 **** o A single AR-REPLICATOR MAY be selected for all the BM packets received on the AR-LEAF attachment circuits (ACs) for a given BD. This selection is a local decision and it does not have ! to match other AR-LEAF's selection within the same BD. o An AR-LEAF MAY select more than one AR-REPLICATOR and do either per-flow or per-BD load balancing. ! o In case of a failure on the selected AR-REPLICATOR, another AR-REPLICATOR will be selected. o When an AR-REPLICATOR is selected, the AR-LEAF MUST send all --- 819,830 ---- o A single AR-REPLICATOR MAY be selected for all the BM packets received on the AR-LEAF attachment circuits (ACs) for a given BD. This selection is a local decision and it does not have ! to match other AR-LEAFs' selections within the same BD. o An AR-LEAF MAY select more than one AR-REPLICATOR and do either per-flow or per-BD load balancing. ! o In case of a failure of the selected AR-REPLICATOR, another AR-REPLICATOR will be selected. o When an AR-REPLICATOR is selected, the AR-LEAF MUST send all *************** *** 752,757 **** --- 862,874 ---- to the AR-REPLICATOR and be programmed. While the AR-REPLICATOR- activation-time is running, the AR-LEAF node will use regular ingress replication. + + Probably you should say something about the case where a router has + selected its preferred AR-REPLICATOR from the set that are available, + and then a new AR-REPLICATOR shows up that is more preferable. Should + the router shift to the new, preferred replicator? Should it stick + with the one it was already using even though less-preferred? Is it a + matter of local policy? [jorge] sure, I added: f. If the AR-LEAF has selected an AR-REPLICATOR, it is a matter of local policy to change to a new preferred AR-REPLICATOR for the existing BM traffic flows. An AR-LEAF will follow a data path implementation compatible with the following rules: *************** *** 849,874 **** REPLICATORs will fall back to non-selective AR mode. c. The Selective AR-REPLICATOR MUST follow the procedures described ! in section Section 5.1, except for the following differences: o The Replicator-AR route MUST include L=1 (Leaf Information Required) in the Replicator-AR route. This flag is used by the AR-REPLICATORs to advertise their 'selective' AR- REPLICATOR capabilities. In addition, the AR-REPLICATOR auto- configures its IP-address-specific import route-target as ! described in section Section 4. o The AR-REPLICATOR will build a 'selective' AR-LEAF-set with the list of nodes that requested replication to its own AR-IP. For instance, assuming NVE1 and NVE2 advertise a Leaf A-D route with PE1's IP-address-specific route-target and NVE3 advertises a Leaf A-D route with PE2's IP-address-specific ! route-target, PE1 MUST only add NVE1/NVE2 to its selective AR- ! LEAF-set for BD-1, and exclude NVE3. ! o When a node defined and operating as Selective AR-REPLICATOR receives a packet on an overlay tunnel, it will do a tunnel ! destination IP lookup and if the destination IP is the AR- REPLICATOR AR-IP Address, the node MUST replicate the packet to: --- 966,997 ---- REPLICATORs will fall back to non-selective AR mode. c. The Selective AR-REPLICATOR MUST follow the procedures described ! in Section 5.1, except for the following differences: o The Replicator-AR route MUST include L=1 (Leaf Information Required) in the Replicator-AR route. This flag is used by the AR-REPLICATORs to advertise their 'selective' AR- REPLICATOR capabilities. In addition, the AR-REPLICATOR auto- configures its IP-address-specific import route-target as ! described in the third bullet of the procedures for Leaf A-D ! route in Section 4. o The AR-REPLICATOR will build a 'selective' AR-LEAF-set with the list of nodes that requested replication to its own AR-IP. For instance, assuming NVE1 and NVE2 advertise a Leaf A-D route with PE1's IP-address-specific route-target and NVE3 advertises a Leaf A-D route with PE2's IP-address-specific ! route-target, PE1 will only add NVE1/NVE2 to its selective AR- ! LEAF-set for BD-1, and exclude NVE3. Likewise, PE2 will only ! add NVE3 to its selective AR-LEAF-set for BD-1, and exclude ! NVE1/NVE2. ! ! I changed the MUST to "will" above -- it's an example, it's inappropriate ! to use RFC 2119 type keywords in it. ! o When a node defined and operating as a Selective AR-REPLICATOR receives a packet on an overlay tunnel, it will do a tunnel ! destination IP lookup and if the destination IP address is the AR- REPLICATOR AR-IP Address, the node MUST replicate the packet to: *************** *** 878,893 **** overlay tunnel to the source AR-LEAF). + overlay tunnels to the RNVEs if the tunnel source IP is the ! IR-IP of an AR-LEAF (in any other case, the AR-REPLICATOR ! MUST NOT replicate the BM traffic to remote RNVEs). In other words, only the first-hop selective AR-REPLICATOR will replicate to all the RNVEs. + overlay tunnels to the remote Selective AR-REPLICATORs if ! the tunnel source IP is an IR-IP of its own AR-LEAF-set (in any other case, the AR-REPLICATOR MUST NOT replicate the BM ! traffic to remote AR-REPLICATORs), where the tunnel ! destination IP is the AR-IP of the remote Selective AR- REPLICATOR. The tunnel destination IP AR-IP will be an --- 1001,1016 ---- overlay tunnel to the source AR-LEAF). + overlay tunnels to the RNVEs if the tunnel source IP is the ! IR-IP of an AR-LEAF. In any other case, the AR-REPLICATOR ! MUST NOT replicate the BM traffic to remote RNVEs. In other words, only the first-hop selective AR-REPLICATOR will replicate to all the RNVEs. + overlay tunnels to the remote Selective AR-REPLICATORs if ! the tunnel source IP address is an IR-IP of its own AR-LEAF-set. In any other case, the AR-REPLICATOR MUST NOT replicate the BM ! traffic to remote AR-REPLICATORs. When doing this replication, the tunnel ! destination IP address is the AR-IP of the remote Selective AR- REPLICATOR. The tunnel destination IP AR-IP will be an *************** *** 911,916 **** --- 1034,1042 ---- destination IP addresses. Some of those overlay tunnels MAY be flagged as non-BM receivers based on the BM flag received from the remote nodes in the BD. + + It's not clear to me why you'd include "overlay tunnels ... flagged as + non-BM receivers" in a flood-list that's used for flooding BM traffic? [jorge] this refers to the pruned-flood-lists capability that can be signaled by the remote nodes. I added: “Some of those overlay tunnels MAY be flagged as non-BM receivers based on the BM flag received from the remote nodes **in the Inclusive Multicast Ethernet Tag routes.**” 2. Flood-list #2 - composed of ACs, a Selective AR-LEAF-set and a Selective AR-REPLICATOR-set, where: *************** *** 928,945 **** - When a Selective AR-REPLICATOR receives a BM packet on an AC, it will forward the BM packet to its flood-list #1, skipping the non- BM overlay tunnels. - When a Selective AR-REPLICATOR receives a BM packet on an overlay tunnel, it will check the destination and source IPs of the underlay IP header and: ! o If the destination IP matches its AR-IP and the source IP matches an IP of its own Selective AR-LEAF-set, the AR- REPLICATOR will forward the BM packet to its flood-list #2, as long as the list of AR-REPLICATORs for the BD matches the Selective AR-REPLICATOR-set. If the Selective AR-REPLICATOR- set does not match the list of AR-REPLICATORs, the node reverts back to non-selective mode and flood-list #1 is used. o If the destination IP matches its AR-IP and the source IP does not match any IP of its Selective AR-LEAF-set, the AR- --- 1054,1104 ---- - When a Selective AR-REPLICATOR receives a BM packet on an AC, it will forward the BM packet to its flood-list #1, skipping the non- BM overlay tunnels. + + It sure seems like it would have been cleaner to have expressed this by + naming a list (Flood-list #3, whatever) that doesn't include the non-BM + overlay tunnels to begin with, and then saying that's the list used in + this case. I guess this also relates to my previous comment/question -- + basically, why are the non-BM overlay tunnels even included? [jorge] it comes down to the specific implementation, but the concept is that the overlay tunnels are added to the flood list irrespective of the BM flags. Since the AR capability can be implemented without supporting the pruned-flood-lists capability (the spec makes both things independent) we decided to describe the text in this way, i.e., we add the overlay tunnels to the flood list, and if the implementation supports the BM flags, it will skip the ones with the flag if needed. I’d prefer to keep the text as it is… - When a Selective AR-REPLICATOR receives a BM packet on an overlay tunnel, it will check the destination and source IPs of the underlay IP header and: ! o If the destination IP address matches its AR-IP and the source IP address matches an IP of its own Selective AR-LEAF-set, the AR- REPLICATOR will forward the BM packet to its flood-list #2, as long as the list of AR-REPLICATORs for the BD matches the Selective AR-REPLICATOR-set. If the Selective AR-REPLICATOR- set does not match the list of AR-REPLICATORs, the node reverts back to non-selective mode and flood-list #1 is used. + + Presumably this time the non-BM overlay tunnels are NOT excluded? [jorge] they can be potentially excluded.. I added a sentence. + + Also, I guess the language above is where the answer to the "fall back to + non-selective AR mode" puzzle from point b, above, is hidden. It requires + that I make some assumptions: + + - The "list of AR-REPLICATORS for the BD" is derived from the set of + AR-REPLICATOR advertisements for the BD. (This is not intuitively + obvious; "list" is very generic and could be, for example, configured + or something.) + - The Selective AR-REPLICATOR-set is all the members of the above list + that have advertised L=1. + - Ergo, if the sets aren't identical, some of them must have advertised + L=0. + + It seems to me as though it would be more understandable to say something + like: + + -- + o If the destination IP address matches its AR-IP and the source IP address + matches an IP of its own Selective AR-LEAF-set, the AR- + REPLICATOR will forward the BM packet to its flood-list #2, + unless some AR-REPLICATOR within the BD has advertised L=0. + In the latter case, the node reverts + back to non-selective mode and flood-list #1 is used. + -- [jorge] good point. Changed it. o If the destination IP matches its AR-IP and the source IP does not match any IP of its Selective AR-LEAF-set, the AR- *************** *** 960,970 **** This is the regular-IR behavior described in [RFC7432]. - In any case, non-BM overlay tunnels are excluded from flood-lists ! and, also, source squelching is always done in order to ensure the traffic is not sent back to the originating source. If the ! encapsulation is MPLSoGRE (or MPLSoUDP) and the BD label is not the bottom of the stack, the AR-REPLICATOR MUST copy the rest of the labels when forwarding them to the egress overlay tunnels. 6.2. Selective AR-LEAF procedures --- 1119,1142 ---- This is the regular-IR behavior described in [RFC7432]. - In any case, non-BM overlay tunnels are excluded from flood-lists ! ! That seems inconsistent with what point 1, above, says -- the place where ! I asked why you'd include non-BM receivers. In any case, there you say ! they can be part of flood-list #1. Here you say they "are excluded". ! Which is it? [jorge] fixed it in the new version. Now it should be consistent. ! ! and, also, traffic is not sent back to the originating source. If the ! encapsulation is MPLSoGRE or MPLSoUDP and the BD label is not the bottom of the stack, the AR-REPLICATOR MUST copy the rest of the labels when forwarding them to the egress overlay tunnels. + + Above, I removed "source squelching" again since it seemed not to add + anything, as previously. + + Reference needed for "BD label". I also wonder, is the requirement that + the replicator copy the rest of the labels a new one introduced here, or + are you just repeating an existing requirement from an underlying spec? [jorge] it is a requirement in this spec for the AR-REPLICATOR. I clarified the sentence and added it to the non-selective AR-REPLICATOR procedures, since it was missing. New text in the selective ar-replicator rules: A Selective AR-REPLICATOR data path implementation MUST be compatible with the following rules: - The Selective AR-REPLICATORs will build two flood-lists: 1. Flood-list #1 - composed of Attachment Circuits and overlay tunnels to the remote nodes in the BD, always using the IR-IPs in the tunnel destination IP addresses. 2. Flood-list #2 - composed of Attachment Circuits, a Selective AR-LEAF-set and a Selective AR-REPLICATOR-set, where: + The Selective AR-LEAF-set is composed of the overlay tunnels to the AR-LEAFs that advertise a Leaf Auto- Discovery route for the local AR-REPLICATOR. This set is updated with every Leaf Auto-Discovery route received/ withdrawn from a new AR-LEAF. + The Selective AR-REPLICATOR-set is composed of the overlay tunnels to all the AR-REPLICATORs that send a Replicator-AR route with L=1. The AR-IP addresses are used as tunnel destination IP. - Some of the overlay tunnels in the flood-lists MAY be flagged as non-BM receivers based on the BM flag received from the remote nodes in the routes. - When a Selective AR-REPLICATOR receives a BM packet on an Attachment Circuit, it MUST forward the BM packet to its flood- list #1, skipping the non-BM overlay tunnels. - When a Selective AR-REPLICATOR receives a BM packet on an overlay tunnel, it will check the destination and source IPs of the underlay IP header and: o If the destination IP address matches its AR-IP and the source IP address matches an IP of its own Selective AR-LEAF-set, the AR-REPLICATOR MUST forward the BM packet to its flood-list #2, unless some AR-REPLICATOR within the BD has advertised L=0. In the latter case, the node reverts back to non-selective mode and flood-list #1 MUST be used. Non-BM overlay tunnels are skipped when sending BM packets. o If the destination IP address matches its AR-IP and the source IP address does not match any IP address of its Selective AR- LEAF-set, the AR-REPLICATOR MUST forward the BM packet to flood-list #2 but skipping the AR-REPLICATOR-set. Non-BM overlay tunnels are skipped when sending BM packets. o If the destination IP address matches its IR-IP, the AR- REPLICATOR MUST use flood-list #1 but MUST skip all the overlay tunnels from the flooding list, i.e. it will only replicate to local Attachment Circuits. This is the regular-IR behavior described in [RFC7432]. Non-BM overlay tunnels are skipped when sending BM packets. - In any case, the AR-REPLICATOR ensures the traffic is not sent back to the originating source. If the encapsulation is MPLSoGRE or MPLSoUDP and the received BD label (or label that the AR- REPLICATOR advertised in the Replicator-AR route) is not the bottom of the stack, the AR-REPLICATOR MUST copy the rest of the labels when forwarding them to the egress overlay tunnels. 6.2. Selective AR-LEAF procedures *************** *** 991,996 **** --- 1163,1174 ---- b. The AR-LEAF MAY advertise a Regular-IR route if there are RNVEs in the BD. The Selective AR-LEAF MUST advertise a Leaf A-D route after receiving a Replicator-AR route with L=1. It is + + "after receiving" -- so, does this mean it MUST NOT advertise a Leaf A-D + route prior to receiving any Replicator-AR route with L=1? That would also + imply that if all Replicator-AR routes with L=1 are withdrawn, the Leaf A-D + route MUST be withdrawn? [jorge] yes. I added a sentence to clarify that. + RECOMMENDED that the Selective AR-LEAF waits for a AR-LEAF-join- wait-timer (in seconds, default value is 3) before sending the Leaf A-D route, so that the AR-LEAF can collect all the *************** *** 998,1004 **** route. c. In a service where there is more than one Selective AR- ! REPLICATORs the Selective AR-LEAF MUST locally select a single Selective AR-REPLICATOR for the BD. Once selected: --- 1176,1182 ---- route. c. In a service where there is more than one Selective AR- ! REPLICATOR the Selective AR-LEAF MUST locally select a single Selective AR-REPLICATOR for the BD. Once selected: *************** *** 1021,1026 **** --- 1199,1211 ---- o In case of a failure on the selected AR-REPLICATOR, another AR-REPLICATOR will be selected and a new Leaf A-D update will be issued for the new AR-REPLICATOR. This new route will + + What does "in case of a failure on the selected AR-REPLICATOR" mean, + practically speaking? How is this detected? I presume the failure + is detected when the relevant route becomes infeasible as the result + of any of the relevant underlying BGP mechanisms (nexthop unresolvability, + holdtime expired, route withdrawal, etc). [jorge] precisely. I added a sentence taking your words to clarify: “In case of a failure on the selected AR-REPLICATOR (detected when the Replicator-AR route becomes infeasible as the result of any of the underlying BGP mechanisms), …” + update the selective list in the new Selective AR-REPLICATOR. In case of failure on the active Selective AR-REPLICATOR, it is RECOMMENDED for the Selective AR-LEAF to revert to IR *************** *** 1030,1035 **** --- 1215,1223 ---- AR mode with the new Selective AR-REPLICATOR. The AR- REPLICATOR-activation-timer MAY be the same configurable parameter as in Section 5.2. + + What happens if a new AR-REPLICATOR is learned by the AR-LEAF, and the + new replicator is preferred over the currently-selected one? [jorge] added: “A Selective AR-LEAF MAY change the AR-REPLICATOR(s) selection dynamically, due to an administrative or policy configuration change.” All the AR-LEAFs in a BD are expected to be configured as either selective or non-selective. A mix of selective and non-selective AR- *************** *** 1045,1051 **** 1. Flood-list #1 - composed of ACs and the overlay tunnel to the selected AR-REPLICATOR (using the AR-IP as the tunnel ! destination IP). 2. Flood-list #2 - composed of ACs and overlay tunnels to the remote IR-IP Addresses. --- 1233,1239 ---- 1. Flood-list #1 - composed of ACs and the overlay tunnel to the selected AR-REPLICATOR (using the AR-IP as the tunnel ! destination IP address). 2. Flood-list #2 - composed of ACs and overlay tunnels to the remote IR-IP Addresses. *************** *** 1054,1061 **** there is any selected AR-REPLICATOR. If there is, flood-list #1 will be used. Otherwise, flood-list #2 will. ! - When an AR-LEAF receives a BM packet on an overlay tunnel, will ! forward the BM packet to its local ACs and never to an overlay tunnel. This is the regular IR behavior described in [RFC7432]. --- 1242,1249 ---- there is any selected AR-REPLICATOR. If there is, flood-list #1 will be used. Otherwise, flood-list #2 will. ! - When an AR-LEAF receives a BM packet on an overlay tunnel, it will ! forward the packet to its local ACs and never to an overlay tunnel. This is the regular IR behavior described in [RFC7432]. *************** *** 1071,1076 **** --- 1259,1267 ---- In addition to AR, the second optimization supported by this solution is the ability for the all the BD nodes to signal Pruned-Flood-Lists (PFL). As described in section 3, an EVPN node can signal a given + + I guess you meant Section 4? + value for the BM and U PFL flags in the IR Inclusive Multicast Routes, where: *************** *** 1085,1090 **** --- 1276,1286 ---- PFL flag and remove the sender from the corresponding flood-list. A given BD node receiving BUM traffic on an overlay tunnel MUST replicate the traffic normally, regardless of the signaled PFL flags. + + What exactly does "replicate the traffic normally" mean, in the context + of this specification? I guess you should say something like "replicate + the traffic according to [reference]". Also, I don't get it: what are the + flags FOR, if they're ignored when receiving on an overlay tunnel? [jorge] I clarified the entire section with things that were missing/unclear. The sentence you referred to was indeed confusing. I replaced it with: “An AR-LEAF or RNVE receiving BUM traffic on an overlay tunnel MUST replicate the traffic to its local Attachment Circuits, regardless of the BM/U flags on the overlay tunnels.” This optimization MAY be used along with the AR solution. *************** *** 1123,1128 **** --- 1319,1328 ---- NVE2, but not to NVE3. PE2 and NVE2 will replicate the BM + + "but not to NVE3". What happened to "MUST replicate the traffic normally"? + To me, these two pieces of text seem to contradict one another. [jorge] hope the above change clarifies. + packets to their local ACs but we will avoid NVE3 having to replicate unnecessarily those BM packets to VM31 and VM32. *************** *** 1135,1147 **** --- 1335,1357 ---- NVE3 to NVE2, PE1 and PE2 but not NVE1. The solution avoids the unnecessary replication to NVE1, since the destination of the unknown traffic cannot be at NVE1. + + It's not clear to me why the destination can't be at NVE1. 4. Any Unknown unicast packet sent from TS1 will be forwarded by PE1 to the WAN link, PE2 and NVE2 but not to NVE1 and NVE3, since the target of the unknown traffic cannot be at those NVEs. + + Similarly, I don't get why this is the case. [jorge] I added this, does it help?: That is, neither NVE1 nor NVE3 are interested in receiving BM or Unknown Unicast traffic since: o Their attached VMs (VM11, VM12, VM31, VM32) do not support multicast applications. o Their attached VMs will not receive ARP Requests. Proxy-ARP [I-D.ietf-bess-evpn-proxy-arp-nd] on the remote NVE/PEs will reply ARP Requests locally, and no other Broadcast is expected. o Their attached VMs will not receive unknown unicast traffic, since the VMs' MAC and IP addresses are always advertised by EVPN as long as the VMs are active. 8. AR Procedures for single-IP AR-REPLICATORS + I'm curious why the design choice was made to specify two different ways to + do the same thing. You motivate why not all routers can use distinguished + IP addresses for the two different functional modes; however, presumably all + routers could make use of distinguished VNIs as you do here. I'd appreciate + a few words about why you didn't choose to just always use the VNI approach. + [jorge] actually we found the two cases: * typically VNIs are global per VNI, so the dual IP solution works better with some merchant silicon. * However we found some cases where supporting more than one loopback IP on the NVE was not trivial (actually the request came from one of the vendors with authors/contributors in the document – not mine 😊) That’s why we decided to include both. The implementations I know of (and were tested in public interop events) use different IP addresses and same VNI. The procedures explained in sections Section 5 and Section 6 assume that the AR-REPLICATOR can use two local routable IP addresses to terminate and originate NVO tunnels, i.e. IR-IP and AR-IP addresses. *************** *** 1184,1201 **** 9. AR Procedures and EVPN All-Active Multi-homing Split-Horizon This section extends the procedures for the cases where AR-LEAF nodes ! or AR-REPLICATOR nodes are attached to the the same Ethernet Segment in the BD. The case where one (or more) AR-LEAF node(s) and one (or more) AR-REPLICATOR node(s) are attached to the same Ethernet Segment is out of scope. 9.1. Ethernet Segments on AR-LEAF nodes If VXLAN or NVGRE are used, and if the Split-horizon is based on the tunnel IP SA and "Local-Bias" as described in [RFC8365], the Split- horizon check will not work if there is an Ethernet-Segment shared ! between two AR-LEAF nodes, and the AR-REPLICATOR changes the tunnel IP SA of the packets with its own AR-IP. In order to be compatible with the IP SA split-horizon check, the AR- REPLICATOR MAY keep the original received tunnel IP SA when --- 1394,1418 ---- 9. AR Procedures and EVPN All-Active Multi-homing Split-Horizon This section extends the procedures for the cases where AR-LEAF nodes ! or AR-REPLICATOR nodes are attached to the same Ethernet Segment in the BD. The case where one (or more) AR-LEAF node(s) and one (or more) AR-REPLICATOR node(s) are attached to the same Ethernet Segment is out of scope. + + I just can't understand what this paragraph is telling me. :-( Apart from + anything else, to the casual reader the second sentence seems to contradict + the first. [jorge] the sentence is indeed unfortunate, changed it to: “This section extends the procedures for the cases where two or more AR-LEAF nodes are attached to the same Ethernet Segment, and two or more AR-REPLICATOR nodes are attached to the same Ethernet Segment in the BD. The mixed case, that is, an AR-LEAF node and an AR-REPLICATOR node are attached to the same Ethernet Segment, is out of scope.” 9.1. Ethernet Segments on AR-LEAF nodes If VXLAN or NVGRE are used, and if the Split-horizon is based on the tunnel IP SA and "Local-Bias" as described in [RFC8365], the Split- horizon check will not work if there is an Ethernet-Segment shared ! between two AR-LEAF nodes, and the AR-REPLICATOR replaces the tunnel IP SA of the packets with its own AR-IP. + + I changed "changes" to "replaces"; it's my best guess as to what you meant. + If that's wrong, please help me understand what you did mean. In order to be compatible with the IP SA split-horizon check, the AR- REPLICATOR MAY keep the original received tunnel IP SA when *************** *** 1203,1209 **** LEAF nodes to apply Split-horizon check procedures for BM packets, before sending them to the local Ethernet-Segment. Even if the AR- LEAF's IP SA is preserved when replicating to AR-LEAFs or RNVEs, the ! AR-REPLICATOR MUST always use its IR-IP as IP SA when replicating to other AR-REPLICATORs. When EVPN is used for MPLS over GRE (or UDP), the ESI-label based --- 1420,1426 ---- LEAF nodes to apply Split-horizon check procedures for BM packets, before sending them to the local Ethernet-Segment. Even if the AR- LEAF's IP SA is preserved when replicating to AR-LEAFs or RNVEs, the ! AR-REPLICATOR MUST always use its IR-IP as the IP SA when replicating to other AR-REPLICATORs. When EVPN is used for MPLS over GRE (or UDP), the ESI-label based *************** *** 1220,1226 **** 9.2. Ethernet Segments on AR-REPLICATOR nodes ! Ethernet Segments associated to one or more AR-REPLICATOR nodes SHOULD follow "Local-Bias" procedures for EVPN all-active multi- homing, as follows: --- 1437,1443 ---- 9.2. Ethernet Segments on AR-REPLICATOR nodes ! Ethernet Segments associated with one or more AR-REPLICATOR nodes SHOULD follow "Local-Bias" procedures for EVPN all-active multi- homing, as follows: *************** *** 1240,1245 **** --- 1457,1464 ---- it had been received on a local AC that is part of the ES and will be forwarded to all local ES, irrespective of their DF or NDF state. + + Please define/expand "ES". - BUM traffic received on an AR-REPLICATOR overlay tunnel with IR-IP as the IP DA, will follow regular [RFC8365] "Local-Bias" rules and *************** *** 1254,1259 **** --- 1473,1483 ---- In addition, the procedures introduced by this document may bring some new risks for the successful delivery of BM traffic. Unicast traffic is not affected by this document. The forwarding of + + If unicast traffic isn't affected, what's the U flag even for? It sure + seems as though it's intended to affect the forwarding of (unknown) + unicast traffic. [jorge] good point, changed it to: “In addition, the Assisted-Replication method introduced by this document may bring some new risks for the successful delivery of BM traffic. Unicast traffic is not affected by Assisted-Replication (although Unknown unicast traffic is affected by the Pruned-Flood-Lists procedures). The forwarding of Broadcast and Multicast (BM) traffic is modified; and BM traffic from the AR-LEAF nodes will be attracted by the existence of AR-REPLICATORs in the BD. An AR-LEAF will forward BM traffic to its selected AR-REPLICATOR, therefore an attack on the AR-REPLICATOR could impact the delivery of the BM traffic using that node.” + Broadcast and Multicast (BM) traffic is modified though, and BM traffic from the AR-LEAF nodes will be attracted by the existance of AR-REPLICATORs in the BD. An AR-LEAF will forward BM traffic to its *************** *** 1262,1270 **** An implementation following the procedures in this document should not create BM loops, since the AR-REPLICATOR will always forward the BM traffic using the correct tunnel IP Destination Address that indicates the remote nodes how to forward the traffic. This is true ! in both, the Non-Selective and Selective modes defined in this document. The Selective mode provides a multi-staged replication solution, --- 1486,1503 ---- An implementation following the procedures in this document should not create BM loops, since the AR-REPLICATOR will always forward the + + Instead of "should not create BM loops" I suggest "will not create" or + if you can't actually promise that, "is not expected to create". I assume + you're using "should" in the sense of weak expectation, and not like a + RFC 2119 SHOULD. + [jorge] changed the paragraph based on previous feedback: “This document introduces the ability for the AR-REPLICATOR to forward traffic received on an overlay tunnel to another overlay tunnel. The reader may interpret that this introduces the risk of BM loops. That is, an AR-LEAF receiving a BM encapsulated packet that the AR-LEAF originated in the first place, due to one or two AR-REPLICATORs "looping" the BM traffic back to the AR-LEAF. The procedures in this document prevent these BM loops, since the AR-REPLICATOR will always forward the BM traffic using the correct tunnel IP Destination Address that instructs the remote nodes how to forward the traffic. This is true in both the Non-Selective and Selective modes defined in this document. However, a wrong implementation of the procedures in this document may lead to those unexpected BM loops.” BM traffic using the correct tunnel IP Destination Address that indicates the remote nodes how to forward the traffic. This is true ! ! Instead of "indicates", try instructs, cues, or directs? ! ! in both the Non-Selective and Selective modes defined in this document. The Selective mode provides a multi-staged replication solution,
- [bess] John Scudder's Discuss on draft-ietf-bess-… John Scudder via Datatracker
- Re: [bess] John Scudder's Discuss on draft-ietf-b… John Scudder
- Re: [bess] John Scudder's Discuss on draft-ietf-b… Rabadan, Jorge (Nokia - US/Mountain View)
- Re: [bess] John Scudder's Discuss on draft-ietf-b… John Scudder
- Re: [bess] John Scudder's Discuss on draft-ietf-b… Rabadan, Jorge (Nokia - US/Mountain View)
- Re: [bess] John Scudder's Discuss on draft-ietf-b… John Scudder
- Re: [bess] John Scudder's Discuss on draft-ietf-b… Rabadan, Jorge (Nokia - US/Sunnyvale)
- Re: [bess] John Scudder's Discuss on draft-ietf-b… John Scudder
- Re: [bess] John Scudder's Discuss on draft-ietf-b… Rabadan, Jorge (Nokia - US/Sunnyvale)