Re: [Rift] AD Review of draft-ietf-rift-rift-12 (Part 2a)
"Pascal Thubert (pthubert)" <pthubert@cisco.com> Wed, 07 April 2021 13:22 UTC
Return-Path: <pthubert@cisco.com>
X-Original-To: rift@ietfa.amsl.com
Delivered-To: rift@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A1B223A452D; Wed, 7 Apr 2021 06:22:25 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -11.817
X-Spam-Level:
X-Spam-Status: No, score=-11.817 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, HTML_TAG_BALANCE_BODY=0.1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_NONE=0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com header.b=NQkmi2yX; dkim=pass (1024-bit key) header.d=cisco.onmicrosoft.com header.b=k9zTTfU3
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jDqhUmCC1cuB; Wed, 7 Apr 2021 06:22:16 -0700 (PDT)
Received: from rcdn-iport-6.cisco.com (rcdn-iport-6.cisco.com [173.37.86.77]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 090653A452B; Wed, 7 Apr 2021 06:22:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=922242; q=dns/txt; s=iport; t=1617801735; x=1619011335; h=from:to:cc:subject:date:message-id:references: in-reply-to:mime-version; bh=doHMOqpLIf+EYZBvLCcoX1hzBcX6rqWymuSzKrfp9bU=; b=NQkmi2yXJII7sRtyJWYjE3tqzNE5F6vMjs23StZnAoNxnWSuk1/ycb+5 kYckB3+J29DIytBw6uJUOILlrnGv1137/uEilzxprQlxV9Ayae/0Ki+XE lDMYfYoUGWc5/sVdqASdE+OBoK5/Lr+R/EAKAYp+sM/gNNVgE4vRc5CR+ s=;
X-Files: draft-ietf-rift-13.txt : 355767
IronPort-PHdr: A9a23:SjVs9RQiSrXMjQ8p7NWOt1f+99pso0/LVj590bIulq5Of6K//p/rIE3Y47B3gUTUWZnAg9pLjuPXt+brXmlTqZqCsXVXdptKWldFjMgNhAUvDYaDDlGzN//laSE2XaEgHF9o9n22Kw5ZTcD5YVCBrXi77DpUERL6ZkJ5I+3vEdvUiMK6n+m555zUZVBOgzywKbN/JRm7t0PfrM4T1IBjMa02jBDOpyggRg==
IronPort-HdrOrdr: A9a23:ovQVbK6SHnAVjJCT7QPXwbGFI+orLtY04lQ7vn1ZYSd+NuSFisGjm+ka3xfoiDAXHEotg8yEJbPoexLh3LZPy800Ma25VAfr/FGpIoZr8Jf4z1TbdRHW3tV2kZ1te60WMrLNJHBxh8ri/U2cG9Ev3NGI/MmT9Jnj5l1GJDsaDJ1IxQF/FwqdDwlSTA5JGZI2GPOnl7t6jhCnfmkaadn+O2kdU4H41pz2vb/FQTpDPR4o7wGSkSilgYSbLzG01goTOgk/uosK3GTLnxfj6qjmnv2/ygDRzH+71eUtpPLP0d1Gbfb87/Q9CjKpsQqwYZQkZrvqhkFLnMiKyHIH1ObBuA0hOcMb0QKQQkiQrQH20wftlBYCgkWStGOwunforcznSD9SMaMo7ug1Hmq7migdlepx365R02WSu4A/N2K9oA3G+9PKWxt2/3DEx0YKrO8Jg3RTFasYZbNBxLZvhH99LZYaECr2rL0gCellZfusncp+TFXyVQG8gkBfhPiXGlgjFBaPRUYP/uaP1SJNoXx/x0wEgOQCg3Yp7vsGOtp5ztWBFp4tuKBFT8cQY644LvwGW9GLBmvERg+JGH6OIG7gCLoMNxv22tzKyYRwwNvvVI0DzZM0lpiEekhfr3QOd0XnDtDL+5FX7BbXQiGYUS72ws9To7h104eMAYbDAGmmchQDgsGgq/IQDonwQPCoIq9bBPflMC/gAoBM0wriW4RDKHUXXcEP0+xLHG6mk4buEMnHp+bbePHcKP7GCjA/QF7yBXMFQXzyKax7nwaWc069pCKUd2Lme0T58541OrPd5fIvxI8EMZAJtgAUjF++99yaMDEqiN1uQGJOZJfc1o+rr2i/+mjFq09zPABGM0pT6LL8F3VQpQELNEvwea0Zu8qWfH1T2HfvHG46c+rmVCpk43hn86O+KJKdgQo4Dci8D26ch3wP4G6RQ4wEga2F78f9cpY+BpIrMZYBTTnjJlhQo0JHuW1DYAgLSgvjDTvok7yil4FRLvrYbcNAjACiJtN0pXrTuV6Hn9wmQmIWUleVIJWqqDdrYwARp1Vqt4cDnbKLmF+UWBsCqdV9FGcJVUO6L/ZtCh+faIBdh7bxETsAPluitHi9kBE8em3j6kMIoHfuRBfkI83jMx56pm1S1Lrs/RdScGiQFngANkxSgMlaCXnMvGp13KuwQpeLl0GValcE34gmQWz4SDMPPwJjwM223haJmDCEUW4r3IkqI/a1NsVQT5jDnnyqM4GGjqcAArtd+4tkLsnntqsRXfuYYBL9FkK1N8o5nwiUrG0iIi96tT0tlu7pwgTs6AGDrTQCKOuXJFRtXLcAJd6Aq2DiWvaTyZ18ydY4p/G5PGm0atmIz8jsHnN+AwKWpW69VOczr59I+ao0qbtoBpHeFSLSy2sv5mR2EO7k0EcFBKhr6rHIPYFiO8QUZiJC51Is0NCCNlEivAD6CvI3FGtdwEPzLpeM+f7FuLAvCkqOqE/rNV6T/zZU8v3FUyGAvIRqQJ4YMCBTcgwx+X5i9OSNe8nMEw2sbfhE50f/PXmncrNRIZL1V4k4v1J/+ZWPkOCWfSajh1yVsjt/P65U82GoBcm1GxmBHOZU89q8fVSA65Hal/KbnXPyU3+8bU9dmIhOMUoXZc5HgiM5jII23jOpI5aH6n4Ngh9b+3V/ilXp2oK6+2/VEkFNLB3BjvxtLE1uG2nNid6A7POR23v86iVUwJXPFE9feddVBtgbJ7KHWRtGOIwXp76n/60mnyRFblMvFgcH+UXA498=
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A0CiBADjsG1g/5FdJa27ZoZ2Sx0DBgSNbDrAa5Fd
X-IronPort-AV: E=Sophos;i="5.82,203,1613433600"; d="txt'?scan'208,217";a="883156343"
Received: from rcdn-core-9.cisco.com ([173.37.93.145]) by rcdn-iport-6.cisco.com with ESMTP/TLS/DHE-RSA-SEED-SHA; 07 Apr 2021 13:22:13 +0000
Received: from mail.cisco.com (xbe-rcd-002.cisco.com [173.37.102.17]) by rcdn-core-9.cisco.com (8.15.2/8.15.2) with ESMTPS id 137DMD4I005683 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=OK); Wed, 7 Apr 2021 13:22:13 GMT
Received: from xfe-rcd-005.cisco.com (173.37.227.253) by xbe-rcd-002.cisco.com (173.37.102.17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.792.3; Wed, 7 Apr 2021 08:22:01 -0500
Received: from xfe-rcd-005.cisco.com (173.37.227.253) by xfe-rcd-005.cisco.com (173.37.227.253) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.792.3; Wed, 7 Apr 2021 08:21:59 -0500
Received: from NAM12-DM6-obe.outbound.protection.outlook.com (72.163.14.9) by xfe-rcd-005.cisco.com (173.37.227.253) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.792.3 via Frontend Transport; Wed, 7 Apr 2021 08:21:59 -0500
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=be6MoQJk/ZvipJWn/N+CQllfPrMmJfR4AjJr/BIJQIqozw4oihLQ1G+azEDXAbhiuoR6d9ihvxvX07dkGob4fziPIgWK8Rmkiuj5Re5zir7UCMQGUA/A4wvbl3FTWkIj39zZ8hSuOGWdrpE8oYIo8e6y+jvjodqU1/FyiRHw47gq+NAcTXpUI2f5wF6/K4GTC14QjH16S5h04RyJkiyMm2zgxZpEXyCU7GoNR2+t2Qgl5KtYG8iTuXq3Nl6ZRqijsO/xdLgt6txnpJ5dHcnH0PYIzbwQFkcdCMJCHGLBf79WuPrsrvq4MlIgYlPg0lj1rgBRbzFUGDPLcFt5DqM5/A==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=bGup4X6phdEc7/Xet+zKnvRknklDZAZD804e2tsTlnA=; b=P3ghNFSnRvF1kzK4FnUs/hTbGUdpzIE6UrNrfPPFuWkcCb7P2I8JoH8ED9r1en7suAeBiRiKP/7hlaH2sIiwUlrjRPwsgUzJnjswxhO8moRgBhAxHq15V4MyBi6/qn5eB/524VonH6Pes2APgN8eRDhPHXb7k6Ba1uuGkTIpFWXfd16Top1I4f2UmCVeUS0F5gPVt5T0CCB2qSWlzci0RXwUIY6c6Heo+bHsaSuaPisgv7WEFGlyAWbIKSZ/K76iiWokPS5vtnZVzyKqK2WXZycoBzVaJY08IUSqJD56MjC0UB/Y8Lm4ETgkH7mUi8vaIPwRVs+ZsOJhtArPK0UTPg==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=cisco.com; dmarc=pass action=none header.from=cisco.com; dkim=pass header.d=cisco.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cisco.onmicrosoft.com; s=selector2-cisco-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=bGup4X6phdEc7/Xet+zKnvRknklDZAZD804e2tsTlnA=; b=k9zTTfU3SFhcFvZ5f/3M3Tt+kUy/II/SCPaFKg/M4hLi72xCizS/G2bmlZ9+XNMsQb4U+IGle8N0mgfoT9yGJbQJHq3fzOqvFA4BKqR1GRR5ujrxXaDrABJ9rZtQGJxMph2cBiLQJt+3Co3ImLH+xpXKSGfogahgf/gSsor5vLo=
Received: from CO1PR11MB4881.namprd11.prod.outlook.com (2603:10b6:303:91::20) by MWHPR1101MB2365.namprd11.prod.outlook.com (2603:10b6:300:74::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4020.18; Wed, 7 Apr 2021 13:21:55 +0000
Received: from CO1PR11MB4881.namprd11.prod.outlook.com ([fe80::cd01:ffc9:6592:b1d5]) by CO1PR11MB4881.namprd11.prod.outlook.com ([fe80::cd01:ffc9:6592:b1d5%6]) with mapi id 15.20.3999.032; Wed, 7 Apr 2021 13:21:55 +0000
From: "Pascal Thubert (pthubert)" <pthubert@cisco.com>
To: Alvaro Retana <aretana.ietf@gmail.com>, "draft-ietf-rift-rift@ietf.org" <draft-ietf-rift-rift@ietf.org>
CC: "zhang.zheng@zte.com.cn" <zhang.zheng@zte.com.cn>, "rift-chairs@ietf.org" <rift-chairs@ietf.org>, "rift@ietf.org" <rift@ietf.org>
Thread-Topic: [Rift] AD Review of draft-ietf-rift-rift-12 (Part 2a)
Thread-Index: AQHXEdVVJZXc044GCU2P7+ux6/HvCaqnjXGQ
Date: Wed, 07 Apr 2021 13:20:51 +0000
Deferred-Delivery: Wed, 7 Apr 2021 13:18:38 +0000
Message-ID: <CO1PR11MB48817A1D56902CAB54013A95D8759@CO1PR11MB4881.namprd11.prod.outlook.com>
References: <CAMMESsxBr0+UriSaTDVZMrFzU6DSiuC3-wO4+7HgX4nX7SLHmg@mail.gmail.com>
In-Reply-To: <CAMMESsxBr0+UriSaTDVZMrFzU6DSiuC3-wO4+7HgX4nX7SLHmg@mail.gmail.com>
Accept-Language: fr-FR, en-US
Content-Language: en-US
X-MS-Has-Attach: yes
X-MS-TNEF-Correlator:
authentication-results: gmail.com; dkim=none (message not signed) header.d=none;gmail.com; dmarc=none action=none header.from=cisco.com;
x-originating-ip: [2a01:cb1d:4ec:2200:d461:9b17:f630:23a6]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: 5be65b88-c118-47a5-6314-08d8f9c81d4a
x-ms-traffictypediagnostic: MWHPR1101MB2365:
x-microsoft-antispam-prvs: <MWHPR1101MB2365983F1EC4813BCCFFE0E8D8759@MWHPR1101MB2365.namprd11.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:7219;
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: IVgL9QFTn8fGtqNcj4fcQYd7FZM1GJ48bhUTxmx4X/HhMEy/WOF6b3ed/yI6xy9P081Nzzcfz+yoSpNfOg3qfBVP8VunrOtZdy52jA2n2wIQu3plYc3Xhof6Haybiqs00c2AL9Yo5yTCSfJ4i+XNXoURbnwIEOU1iOb5PQkyXGpL2oyTkgsjNpup8rGlaC+YB5SsyDGPVYBu9iQK05cLUuZ6ZK0L4EmPVd/ekRrfDekNOoXIgtvmH+ZdZMu+go/2NJvUUT4CcETQ+OJEFESUPDOyNsx3DFU43SdijN86fBuCh9ntBruSrwIk3zGE3LZwV3UVtquzzr42QSp9HB7gc/uKJYjLcNvDsvRxAQeLC990thWEWQz+pn/Pz70ErDsz2qCrjd30ihhCAslTR8lmb6Zu/Y1gK2pQEjHTiuzCwMInTnJ0i02KFD/hmqxGo8saAF3VOCEJViXGOBXGn9othCovwfFH18tEi4W7Jqv/qZobQekKddbain7daklilFXV1PCFIXt090e+/+qXa/3Yb+WZN5vvg6n2tTXSU2IAl/t8rhKIsKRJeB8bSeDduo0q/iceWticouvo0LmNvBrrxGbaZp5o5lNyeZjhUmQrfLLkhvMp8M/lzxr+xKyoYbrGjAHdqgrBaEEPaXeMbJbU5NrYFK4so7Nnb1UXv3NrjeVQBHaFd5V9ChhfqZlMs96vzbX7guH+iBdzEUBMLDSkmtl7nWs77RQJlReVh9gk8jPUXhVPuA/meid1qVUj60iG
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:CO1PR11MB4881.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(39860400002)(346002)(136003)(376002)(396003)(366004)(5660300002)(18074004)(66476007)(66556008)(6506007)(4326008)(53546011)(52536014)(38100700001)(66946007)(66616009)(186003)(64756008)(7696005)(76116006)(33656002)(66574015)(30864003)(9686003)(54906003)(478600001)(2906002)(6666004)(166002)(99936003)(316002)(86362001)(8936002)(966005)(66446008)(71200400001)(8676002)(55016002)(83380400001)(110136005)(559001)(569008); DIR:OUT; SFP:1101;
x-ms-exchange-antispam-messagedata: sdB8Q1rChueohTRHJYHSDMorTdO0lTup6BUWMzH5gruQhsifDlcQ7//lCO3+l4u36sPIqnCeq08TOGcJtAFiaQlDw/Y/DIr+aY+xRUQZvXqh64y4hPsVmUvaMT2ucCDZXj4hADZOcwIftmdC1ELTj3EZnFsB+Wj2iDffnp7I2c5j9B4kO06hxnm+JrTI2vpVcI2cdhPvWABy+Pl0XotSaw8+HpQlKoeCC/THID4Y0DJ+xy+j9sJX6Ugmn8a7JW6FmZ05bWl5GFDzNgNoSE+hpIJTxylY6K3WqdPi95JeNV8HloHU+oSWAYbk2WpEuS5bxC/W5WW0lo0YEUnY5zz5C+OZDblUa3VhlYLqhYT9wRiDsoErym2ggdqRCNXAtlWYs2QJ+xI1woktZYoho2zgPipgnxtXcUW9+8KrPSCmrxWEua0y0Sb3GEtOcY1X2iPKIiPuyGIGBJVCct5zdPUC4enl+omefXkZsp1sNeEuOSiK8tSAez7qtzgYiPygU+mImMhSKNVzUwDuHUUuhj+yh0V5LIIknbViZBEdDltPuYmq8rWig0BcphKfrFcm5DIvNmNMvRNWtBM5Q1HgluQrpYpB5iLIbm8ckdwUKTZGUTYFjEiLpuTu4ivCx2NJNk7epSkfMqmx9jRAfREb6/27tcMqae/f/2mVLR+qwBNnorqzS8jq1WGDLalBwDwPfsluHUO0VH14HqWwiONPinN8khnHPz9u2k8KudxbrUB77wB8BMfKSsocJCSr6hVT+jvdDIE23DG3RXL8X3HAPNkTcOORmmVsAyHN9Tzpa0Fmz6+xCYPjx/bMc9uwXPBs3DFcMo5HRDCxM7UwX9bpMQYcz9PapxvbwRB7fmPfAOl8c2rT4x/p8xIibHF6SV0aQhzW551Cj/qp/YHsDsMoybTC8kx3bZqg9tnN9HqSYp85T+5HlvD0Y2Vqnat/0XutBEsgZ7WU4VCy4+xpKbCcDY/KbtAHrocJyWZl4JZy9OaQIQxUUtEE6PJIlozo+tLK7H1DODcD3XGXfr+DhU0FL2pg44q2kojOnjfnnOpWzhLwAEvVn11R4VRATWS80v5My3PEJjwEbZN0TUdR9cHLXHOMzG3T8+SrPY9uJeLWob0NKEbpy50q4PWjOfiJIVpWX5bPWMDhrUTtxr3HbjdnjltdzJCZmF1a0OfBM44dQasWgJlsdGjbSuq7JCyq8G+PMhEPuAPeXHyvdwFOFc1MPdtJwkKIwq3VF42ZSzURLtsvxLyGVJAAp2g1nnUftJAyakUhkoLS3JOgNbelXzvonIikX/dp6R++bcr5mbkF809IeOV9z5aUvbCke5SLuvn69TqI4C/UxJHIszI5Cs3ZEJ3siGh7BubxaounQIC9gFZT1zHI9YdgqhWmVlDP6w4loYrK
x-ms-exchange-transport-forked: True
Content-Type: multipart/mixed; boundary="_004_CO1PR11MB48817A1D56902CAB54013A95D8759CO1PR11MB4881namp_"
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: CO1PR11MB4881.namprd11.prod.outlook.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 5be65b88-c118-47a5-6314-08d8f9c81d4a
X-MS-Exchange-CrossTenant-originalarrivaltime: 07 Apr 2021 13:21:55.1621 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 5ae1af62-9505-4097-a69a-c1553ef7840e
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: l0oDMy+qeAXmcsyN2oT4mhM5/+pHwhKCyTns+rwo9wYRBMTdEAQlNZSTV+qYg90g+0SQd5w+HocJgsikb9RYEA==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR1101MB2365
X-OriginatorOrg: cisco.com
X-Outbound-SMTP-Client: 173.37.102.17, xbe-rcd-002.cisco.com
X-Outbound-Node: rcdn-core-9.cisco.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/rift/OkHRTuIxhNLT5ah43I0S67LGE4I>
Subject: Re: [Rift] AD Review of draft-ietf-rift-rift-12 (Part 2a)
X-BeenThere: rift@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Discussion of Routing in Fat Trees <rift.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rift>, <mailto:rift-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rift/>
List-Post: <mailto:rift@ietf.org>
List-Help: <mailto:rift-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rift>, <mailto:rift-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 07 Apr 2021 13:22:26 -0000
Dear all; Continuing changes starting at line 1014 of review part 1. I attached the result for Alvaro’s convenience. The main commits are https://github.com/przygienda/rift-draft/commit/8d1d382c9ae2321fa7d488a4ba1881361c87e542 and https://github.com/przygienda/rift-draft/commit/237655f8d04cd33f1e5b151e3c5f9ae190b34594 please see below: 1013 with this, positive disaggregation can heal all failures and still 1014 allow all the ToF nodes to see each other via south reflection. 1015 Disaggregation will be explained in further detail in Section 4.2.5. [nit] s/deployment is it introduces/deployment is, it introduces Done 1017 In order to scale beyond the "single plane limit", the Top-of-Fabric 1018 can be partitioned by a N number of identically wired planes where N 1019 is an integer divider of K_LEAF. The 1:1 ratio and the desired 1020 symmetry are still served, this time with (K_TOP * N) ToF nodes, each 1021 of (P * K_LEAF / N) ports. N=1 represents a non-partitioned Spine 1022 and N=K_LEAF is a maximally partitioned Spine. Further, if R is any 1023 integer divisor of K_LEAF, then N=K_LEAF/R is a feasible number of 1024 planes and R a redundancy factor. If proves convenient for 1025 deployments to use a radix for the leaf nodes that is a power of 2 so 1026 they can pick a number of planes that is a lower power of 2. The 1027 example in Figure 11 splits the Spine in 2 planes with a redundancy 1028 factor R=3, meaning that there are 3 non-intersecting paths between 1029 any leaf node and any ToF node. A ToF node must have, in this case, 1030 at least 3*P ports, and be directly connected to 3 of the 6 PoD-ToP 1031 nodes (spines) in each PoD. [nit] s/by a N number/by an N number Done [minor] "(K_TOP * N) ToF nodes, each of (P * K_LEAF / N) ports" Again, the use of the terminology without a reference assumes a specific interpretation by the reader. Yes the definition was screwed somehow. The best I can do is probably to revise/fix the definition in the terminology “ K: Denotes half of the radix of a symmetrical switch, meaning that the switch has K ports pointing north and K ports pointing south. K_LEAF (K of a leaf) thus represents both the number of access ports in a leaf Node and the maximum number of planes in the fabric, whereas K_TOP (K of a ToP) represents the number of leaves in the PoD and the number of ports pointing north in a ToP Node towards a higher spine level, thus the number of ToF nodes in a plane. To simplify the visual aids, notations and further considerations, we assume that the switches are symmetrical, so K is set to Radix/2. “ [minor] "if R is any integer divisor of K_LEAF, then N=K_LEAF/R is a feasible number of planes and R a redundancy factor." Please expand on the meaning of the redundancy factor. Added: “ R a redundancy factor that denotes the number of independent paths between 2 leaves within a plane. “ [minor] "6 PoD-ToP nodes" I count 8. The leaves are vertical, half of their port in each plane. The have 6 ports north so there are 6 ToP nodes. The ToP nodes are horizontal. 6 of them. What you count to 8 is K-TOP, the number of ports in a ToP Node. My biggest regret is that the actual drawing is not built into a real switch. I patented the general structure of that 3-D box: https://pdfpiw.uspto.gov/.piw?docid=10973148&SectionNum=1&IDKey=CAD3F825B319&HomeUrl=http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2%2526Sect2=HITOFF%2526p=1%2526u=%25252Fnetahtml%25252FPTO%25252Fsearch-bool.html%2526r=1%2526f=G%2526l=50%2526co1=AND%2526d=PTXT%2526s1=%252522thubert%252Bpascal%252522.INNM.%2526OS=IN/%252522thubert%252Bpascal%252522%2526RS=IN/%252522thubert%252Bpascal%252522 Back to the text, I added: “ The ToP nodes are represented horizontally with K_TOP=8 ports northwards each. “ 1033 +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ 1034 +-| |--| |--| |--| |--| |--| |--| |--| |-+ 1035 | | o | | o | | o | | o | | o | | o | | o | | o | | 1036 +-| |--| |--| |--| |--| |--| |--| |--| |-+ 1037 +-| |--| |--| |--| |--| |--| |--| |--| |-+ 1038 | | o | | o | | o | | o | | o | | o | | o | | o | | 1039 +-| |--| |--| |--| |--| |--| |--| |--| |-+ 1040 +-| |--| |--| |--| |--| |--| |--| |--| |-+ 1041 | | o | | o | | o | | o | | o | | o | | o | | o | | 1042 +-| |--| |--| |--| |--| |--| |--| |--| |-+ 1043 +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ 1045 Plane 1 1046 ----------- . ------------ . ------------ . ------------ . -------- 1047 Plane 2 1049 +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ 1050 +-| |--| |--| |--| |--| |--| |--| |--| |-+ 1051 | | o | | o | | o | | o | | o | | o | | o | | o | | 1052 +-| |--| |--| |--| |--| |--| |--| |--| |-+ 1053 +-| |--| |--| |--| |--| |--| |--| |--| |-+ 1054 | | o | | o | | o | | o | | o | | o | | o | | o | | 1055 +-| |--| |--| |--| |--| |--| |--| |--| |-+ 1056 +-| |--| |--| |--| |--| |--| |--| |--| |-+ 1057 | | o | | o | | o | | o | | o | | o | | o | | o | | 1058 +-| |--| |--| |--| |--| |--| |--| |--| |-+ 1059 +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ 1060 ^ 1061 | 1062 | ---------------- 1063 +----- Top-of-Fabric node 1064 "across" depth 1065 ---------------- 1067 Figure 11: Northern View of a Multi-Plane ToF Level, K_LEAF=6, N=2 1069 At the extreme end of the spectrum it is even possible to fully 1070 partition the spine with N = K_LEAF and R=1, while maintaining 1071 connectivity between each leaf node and each Top-of-Fabric node. In 1072 that case the ToF node connects to a single Port per PoD, so it 1073 appears as a single port in the projected view represented in 1074 Figure 12. The number of ports required on the Spine Node is more or 1075 equal to P, the number of PoDs. [minor] "more or equal to P" ?? Changed to: “ more than or equal to P “ ... 1121 4.1.3. Fallen Leaf Problem ... 1140 In a maximally partitioned fabric, the redundancy factor is R= 1, so 1141 any breakage in the fabric may cause one or more fallen leaves. 1142 However, not all cases require disaggregation. The following cases 1143 do not require particular action in such scenario: [major] A quick look at §4.2.5.1 doesn't explicitly mention how a node considers the redundancy factor...but that may be included in the "DAG computation" mentioned in the first step. I'm putting this comment here so I don't forget later... There is no operation in RIFT that is based on the value of R. It’s just that after R breakages in a plane, it becomes possible that a leaf falls in that plane. If R=1, it’s guaranteed that a breakage will cause fallen leaves. R=2 guarantees that a single breakage will not cause fallen leaves. Changed to: “ In a maximally partitioned fabric, the redundancy factor is R= 1, so any breakage in the fabric will cause one or more fallen leaves in the affected plane. R=2 guarantees that a single breakage will not cause a fallen leaf. “ 1145 If a southern link on a node goes down, then connectivity through 1146 that node is lost for all nodes south of it. There is no need to 1147 disaggregate since the connectivity to this node is lost for all 1148 spine nodes in a same fashion. 1150 If a ToF Node goes down, then northern traffic towards it is 1151 routed via alternate ToF nodes in the same plane and there is no 1152 need to disaggregate routes. ... 1159 If the breakage is the last northern link from a ToP node to a ToF 1160 node going down, then the fallen leaf problem affects only The ToF 1161 node, and the connectivity to all the nodes in the PoD is lost 1162 from that ToF node. This can be observed by other ToF nodes 1163 within the plane where the ToP node is located and positively 1164 disaggregated within that plane. [nit] s/only The ToF/only the ToF done 1166 On the other hand, there is a need to disaggregate the routes to 1167 Fallen Leaves in a transitive fashion, all the way to the other 1168 leaves in the following cases: [] Without having seen the specific mechanism, this overview is hard to digest. Yes, I’d expect the reader to come back here later; this is for advanced rifters ; ) changed to “ On the other hand, there is a need to disaggregate the routes to Fallen Leaves within the plane in a transitive fashion, that is, all the way to the other leaves, in the following cases “ 1170 o If the breakage is the last northern link from a leaf node within 1171 a plane (there is only one such link in a maximally partitioned 1172 fabric) that goes down, then connectivity to all unicast prefixes 1173 attached to the leaf node is lost within the plane where the link 1174 is located. Southern Reflection by a leaf node, e.g., between ToP 1175 nodes, if the PoD has only 2 levels, happens in between planes, 1176 allowing the ToP nodes to detect the problem within the PoD where 1177 it occurs and positively disaggregate. The breakage can be 1178 observed by the ToF nodes in the same plane through the North 1179 flooding of TIEs from the ToP nodes. The ToF nodes however need 1180 to be aware of all the affected prefixes for the negative, 1181 possibly transitive disaggregation to be fully effective (i.e. a 1182 node advertising in control plane that it cannot reach a certain 1183 more specific prefix than default whereas such disaggregation must 1184 in extreme condition propagate further down southbound). The 1185 problem can also be observed by the ToF nodes in the other planes 1186 through the flooding of North TIEs from the affected leaf nodes, 1187 together with non-node North TIEs which indicate the affected 1188 prefixes. To be effective in that case, the positive 1189 disaggregation must reach down to the nodes that make the plane 1190 selection, which are typically the ingress leaf nodes. The 1191 information is not useful for routing in the intermediate levels. [nit] s/in control plane/in the control plane done [nit] s/in extreme condition/in the extreme condition done 1193 o If the breakage is a ToP node in a maximally partitioned fabric - 1194 in which case it is the only ToP node serving the plane in that 1195 PoD - goes down, then the connectivity to all the nodes in the PoD 1196 is lost within the plane where the ToP node is located. 1197 Consequently, all leaves of the PoD fall in this plane. Since the 1198 Southern Reflection between the ToF nodes happens only within a 1199 plane, ToF nodes in other planes cannot discover fallen leaves in 1200 a different plane. They also cannot determine beyond their local 1201 plane whether a leaf node that was initially reachable has become 1202 unreachable. As the breakage can be observed by the ToF nodes in 1203 the plane where the breakage happened, the ToF nodes in the plane 1204 need to be aware of all the affected prefixes for the negative 1205 disaggregation to be fully effective. The problem can also be 1206 observed by the ToF nodes in the other planes through the flooding 1207 of North TIEs from the affected leaf nodes, if there are only 3 1208 levels and the ToP nodes are directly connected to the leaf nodes, 1209 and then again it can only be effective it is propagated 1210 transitively to the leaf, and useless above that level. [nit] s/fabric -...- goes down,/fabric -...-, Actually I meant “ If the breakage is a ToP node in a maximally partitioned fabric - in which case it is the only ToP node serving the plane in that PoD that goes down - “ 1212 For the sake of easy comprehension let us roll the abstractions back 1213 into a simple example and observe that in Figure 3 the loss of link 1214 Spine 122 to Leaf 122 will make Leaf 122 a fallen leaf for Top-of- 1215 Fabric plane B. Worse, if the cabling was never present in first 1216 place, plane B will not even be able to know that such a fallen leaf 1217 exists. Hence partitioning without further treatment results in two 1218 grave problems: [] "For the sake of easy comprehension...Figure 3..." Finally! Hmmm...sorry...I mean, it is a little ironic that after all the new terminology, detailed descriptions and figures, the clearer explanation uses the simplest drawing. I leave that to Tony [nit] s/in first place/in the first place done 1220 o Leaf 111 trying to route to Leaf 122 MUST choose Spine 111 in 1221 plane A as its next hop since plane B will inevitably blackhole 1222 the packet when forwarding using default routes or do excessive 1223 bow tying. This information must be in its routing table. [major] s/MUST/must This is not a Normative statement, just a statement of fact (inside an example). done 1225 o Any kind of "flooding" or distance vector trying to deal with the 1226 problem by distributing host routes will be able to converge only 1227 using paths through leaves. The flooding of information on Leaf 1228 122 would have to go up to Top-of-Fabric A and then "loopback" 1229 over other leaves to ToF B leading in extreme cases to traffic for 1230 Leaf 122 when presented to plane B taking an "inverted fabric" 1231 path where leaves start to serve as TOFs, at least for the 1232 duration of a protocol's convergence. [] "Any kind of "flooding" or distance vector..." I can guess the meaning, but it would be better that I don't have to. Maybe something like: "Any advertisement..." Changed to: “ o A path computation trying to deal with the problem by distributing host routes may only form paths through leaves. “ [minor] "information on Leaf 122" s/on/ about (?), or maybe from. ?? Used “about” 1234 4.1.4. Discovering Fallen Leaves 1236 As illustrated later, and without further proof, the way to deal with 1237 fallen leaves in multi-plane designs, when aggregation is used, is 1238 that RIFT requires all the ToF nodes to share the same north topology 1239 database. This happens naturally in single plane design by the means 1240 of northbound flooding and south reflection but needs additional 1241 considerations in multi-plane fabrics. To satisfy this RIFT, in 1242 multi-plane designs, relies at the ToF level on ring interconnection 1243 of switches in multiple planes. Other solutions are possible but 1244 they either need more cabling or end up having much longer flooding 1245 paths and/or single points of failure. [minor] "As illustrated later..." Where? See next [] "and without further proof" I hope this is at least specified at that later point. No need to prove anything. We made a design decision. We’re not routing though leaves -which are the only place where the planes meet in classical designs- so we need to create a wormhole between planes. The natural place for that iss the superspine. See next. [nit] s/To satisfy this RIFT, in multi-plane designs, relies/To satisfy this need in multi-plane designs, RIFT relies I changed the text a little bit more: “ When aggregation is used, RIFT deals with fallen leaves by ensuring that all the ToF nodes share the same north topology database. This happens naturally in single plane design by the means of northbound flooding and south reflection but needs additional considerations in multi-plane fabrics. To enable routing to fallen leaves in multi- plane designs, RIFT requires additional interconnection across planes between the ToF nodes, e.g., using rings as illustrated in Figure 13. Other solutions are possible but they either need more cabling or end up having much longer flooding paths and/or single points of failure. “ 1247 In detail, by reserving two ports on each Top-of-Fabric node it is 1248 possible to connect them together by interplane bi-directional rings 1249 as illustrated in Figure 13. The rings will be used to exchange full 1250 north topology information between planes. All ToFs having same 1251 north topology allows by the means of transitive, negative 1252 disaggregation described in Section 4.2.5.2 to efficiently fix any 1253 possible fallen leaf scenario. Somewhat as a side-effect, the 1254 exchange of information fulfills the ask to present full view of the 1255 fabric topology at the Top-of-Fabric level, without the need to 1256 collate it from multiple points by additional complexity of 1257 technologies like [RFC7752]. [nit] s/fulfills the ask to present full view/fulfills the requirement to have a full view done [] "..., without the need to collate it from multiple points by additional complexity of technologies like [RFC7752]." This last phrase is unnecessary: because carrying RIFT information in BGP-LS is not defined, and more importantly, there's no need to criticize other technology to make RIFT look better. done 1259 +---+ +---+ +---+ +---+ +---+ +---+ +--------+ 1260 | | | | | | | | | | | | | | 1261 | | | | | | | | 1262 +-o-+ +-o-+ +-o-+ +-o-+ +-o-+ +-o-+ +-o-+ | 1263 +-| |--| |--| |--| |--| |--| |--| |-+ | 1264 | | o | | o | | o | | o | | o | | o | | o | | | Plane A 1265 +-| |--| |--| |--| |--| |--| |--| |-+ | 1266 +-o-+ +-o-+ +-o-+ +-o-+ +-o-+ +-o-+ +-o-+ | 1267 | | | | | | | | 1268 +-o-+ +-o-+ +-o-+ +-o-+ +-o-+ +-o-+ +-o-+ | 1269 +-| |--| |--| |--| |--| |--| |--| |-+ | 1270 | | o | | o | | o | | o | | o | | o | | o | | | Plane B 1271 +-| |--| |--| |--| |--| |--| |--| |-+ | 1272 +-o-+ +-o-+ +-o-+ +-o-+ +-o-+ +-o-+ +-o-+ | 1273 | | | | | | | | 1274 ... | 1275 | | | | | | | | 1276 +-o-+ +-o-+ +-o-+ +-o-+ +-o-+ +-o-+ +-o-+ | 1277 +-| |--| |--| |--| |--| |--| |--| |-+ | 1278 | | o | | o | | o | | o | | o | | o | | o | | | Plane X 1279 +-| |--| |--| |--| |--| |--| |--| |-+ | 1280 +-o-+ +-o-+ +-o-+ +-o-+ +-o-+ +-o-+ +-o-+ | 1281 | | | | | | | | 1282 | | | | | | | | | | | | | | 1283 +---+ +---+ +---+ +---+ +---+ +---+ +--------+ 1284 Rings 1 2 3 4 5 6 7 1286 Figure 13: Connecting Top-of-Fabric Nodes Across Planes by Rings [minor] Is that one ring per plane, multiple rings per plane or a big ring for all the planes? The drawing is not clear to me. :-( Maybe you missed that the Rings are numbered at the bottom? I tweaked the image as follows, I hope this helps: “ Ring 1 2 3 4 5 6 7 / \ / \ / \ / \ / \ / \ / \ | . | . | . | . | . | . | . | . | . | . | . | . | . | . +-o-+ . +-o-+ . +-o-+ . +-o-+ . +-o-+ . +-o-+ . +-o-+ . +-| |-.-| |-.-| |-.-| |-.-| |-.-| |-.-| |-+ . | | O | . | O | . | O | . | O | . | O | . | O | . | O | | . +-| |-.-| |-.-| |-.-| |-.-| |-.-| |-.-| |-+ . +-o-+ . +-o-+ . +-o-+ . +-o-+ . +-o-+ . +-o-+ . +-o-+ . | . | . | . | . | . | . | . Plane A ---|---.---|---.---|---.---|---.---|---.---|---.---|-----.----------- | . | . | . | . | . | . | . Plane B +-o-+ . +-o-+ . +-o-+ . +-o-+ . +-o-+ . +-o-+ . +-o-+ . +-| |-.-| |-.-| |-.-| |-.-| |-.-| |-.-| |-+ . | | O | . | O | . | O | . | O | . | O | . | O | . | O | | . +-| |-.-| |-.-| |-.-| |-.-| |-.-| |-.-| |-+ . +-o-+ . +-o-+ . +-o-+ . +-o-+ . +-o-+ . +-o-+ . +-o-+ . | . | . | . | . | . | . | . Plane B ---|---.---|---.---|---.---|---.---|---.---|---.---|-----.----------- | . | . | . | . | . | . | . ... | . | . | . | . | . | . | . ---|---.---|---.---|---.---|---.---|---.---|---.---|-----.----------- | . | . | . | . | . | . | . Plane X +-o-+ . +-o-+ . +-o-+ . +-o-+ . +-o-+ . +-o-+ . +-o-+ . +-| |-.-| |-.-| |-.-| |-.-| |-.-| |-.-| |-+ . | | O | . | O | . | O | . | O | . | O | . | O | . | O | | . +-| |-.-| |-.-| |-.-| |-.-| |-.-| |-.-| |-+ . +-o-+ . +-o-+ . +-o-+ . +-o-+ . +-o-+ . +-o-+ . +-o-+ . | . | . | . | . | . | . | . | . | . | . | . | . | . | . \ / \ / \ / \ / \ / \ / \ / Ring 1 2 3 4 5 6 7 “ 1288 4.1.5. Addressing the Fallen Leaves Problem 1290 One consequence of the "Fallen Leaf" problem is that some prefixes 1291 attached to the fallen leaf become unreachable from some of the ToF 1292 nodes. RIFT proposes two methods to address this issue, the positive 1293 and the negative disaggregation. Both methods flood South TIEs to 1294 advertise the impacted prefix(es). [nit] s/RIFT proposes two methods/RIFT defines two methods Done Many tanks again for the careful reading. Sorry that the 3-D representation takes time and effort to get used to. Please let me know how you fell about the changes above. Keep safe Pascal [End of Review - Part 1] From: RIFT <rift-bounces@ietf.org> On Behalf Of Alvaro Retana Sent: vendredi 5 mars 2021 16:36 To: draft-ietf-rift-rift@ietf.org Cc: zhang.zheng@zte.com.cn; rift-chairs@ietf.org; rift@ietf.org Subject: [Rift] AD Review of draft-ietf-rift-rift-12 (Part 2a) Dear authors: This is Part 2a of my review of this document. I wanted this second installment to be longer, but given the IETF meeting next week, I am sending this fragment out now. Part 2a starts with §4.2 (Specification) and goes just through §4.2.2 -- I just started reading the LIE FSM/§4.2.2.1. I included a couple of comments on the schema itself. While reading this part of the document I've been learning thrift and applying my programming and modeling experience (*). In general, the model seems straight forward (so far). It is important to highlight somewhere at the start of §4.2 that Appendix B is Normative. The existing references don't explicitly say so. In this first part of the specification there are a number of assumptions made that should be explained explicitly and not by reference. For example, it seems to be assumed that the reader knows what a three-way handshake is -- while most readers of this specification will be familiar with other routing protocols, this document is *the* RIFT spec and should not rely on other protocol specifications, much less when the specification is not (can't!) be the same -- this is what §4.2.2 says: It uses a three-way handshake mechanism which is a cleaned up version of [RFC5303]. Observe that for easier comprehension the terminology of one/two and three-way states does NOT align with OSPF or ISIS FSMs albeit they use roughly same mechanisms. IOW, "similar to IS-IS, but not the same." There are more specific comments inline. "Traditional" specifications usually present packet formats and describe the fields before explaining their use. §4.2.2 jumped right into talking about a specific flag (v4_forwarding_capable) without introducing it or the place where it is carried (LIEPacket). It is important to describe the different structures, and provide a way to make them simpler to find in Appendix B. In my comments I'm asking for a lot of pointers to things that have not been discussed. I don't know (at this point) whether the overall organization of the document makes sense as is or if it might be worth rethinking (i.e. shuffle things around). Food for thought. These comments/questions are for the Chairs/Shepherd: - It looks like an early TSV review was requested, but I didn't see the review come in. Did I miss it? - §8.1 includes a set of suggested UDP ports, but they are not reflected in the registry. Early allocation has not been requested -- you should consider doing so. See rfc7120. Thanks! Alvaro. (*) I don't have any. ;-) [Line numbers from idnits.] ... 1341 4.2. Specification ... 1350 "On Entry" actions on FSM state are performed every time and right 1351 before the according state is entered, i.e. after any transitions 1352 from previous state. [nit] Suggestion> "On Entry" actions at an FSM state are performed right before the corresponding state is entered, i.e. after any transitions from a previous state. 1354 "On Exit" actions are performed every time and immediately when a 1355 state is exited, i.e. before any transitions towards target state are 1356 performed. [nit] Suggestion> "On Exit" actions are performed immediately when a state is exited, i.e. before a transition towards a target state is performed. 1358 Any attempt to transition from a state towards another on reception 1359 of an event where no action is specified MUST be considered an 1360 unrecoverable error. [major] Any type of action, or are you referring to an "on exit" one? [major] "MUST be considered an unrecoverable error" What is the result of that? Should the adjacencies be reset, the rift process restarted, the interfaces shut down, etc.?? There's no other place in the document that mentions "unrecoverable". 1362 The FSMs and procedures are normative in the sense that an 1363 implementation MUST implement them either literally or an 1364 implementation MUST exhibit externally observable behavior that is 1365 identical to the execution of the specified FSMs. [major] How can you tell the difference? rfc2119 keywords "MUST only be used where it is actually required for interoperation". In this case there are options, and there is no way to verify one or the other as long as the implementations "exhibit externally observable behavior that is identical". IOW, I am not comfortable with using MUST if it is not required. Also, I believe the statement to generally be true for any specification and not needed. Suggestion (borrowing from rfc4271): The data structures and FSMs described in this document are conceptual and do not have to be implemented precisely as described here, as long as the implementations support the described functionality and exhibit the same externally visible behavior. 1367 Where a FSM representation is inconvenient, i.e. the amount of 1368 procedures and kept state exceeds the amount of transitions, we defer 1369 to a more procedural description on data structures. [] This paragraph can also be eliminated. 1371 4.2.1. Transport 1373 All packet formats are defined in Thrift [thrift] models in 1374 Appendix B. [] Can we get an index/TOC or some type of guide in the appendix to make it easier to find specific parts? For example, finding where a LIE is defined is not straight forward -- I happened to stumble on it in B.2. Another option would be to point in §4.2.2 to look for LIEPacket in B.2. [major] It must be stated somewhere that the contents of Appendix B are Normative. 1376 The serialized model is carried in an envelope within a UDP frame 1377 that provides security and allows validation/modification of several 1378 important fields without de-serialization for performance and 1379 security reasons. [minor] This is a very short section -- please put references to other places where these topics are explained more, if any. 1381 4.2.2. Link (Neighbor) Discovery (LIE Exchange) 1383 RIFT LIE exchange auto-discovers neighbors, negotiates ZTP parameters 1384 and discovers miscablings. It uses a three-way handshake mechanism 1385 which is a cleaned up version of [RFC5303]. Observe that for easier 1386 comprehension the terminology of one/two and three-way states does 1387 NOT align with OSPF or ISIS FSMs albeit they use roughly same 1388 mechanisms. The formation progresses under normal conditions from 1389 one-way to two-way and then three-way state at which point it is 1390 ready to exchange TIEs per Section 4.2.3. [minor] "...which is a cleaned up version of [RFC5303]." There's no need to hint at the fact that other implementations/protocols may not be as good/clean as rift. Please take this part of the sentence out. [minor] "Observe that for easier comprehension the terminology of one/two and three-way states does NOT align with OSPF or ISIS FSMs albeit they use roughly same mechanisms." I don't know how changing everything I know makes comprehension easier. ;-) Seriously: we don't need to tell the reader what is not used. 1392 LIE exchange happens over well-known administratively locally scoped 1393 and configured or otherwise well-known IPv4 multicast address 1394 [RFC2365] and/or link-local multicast scope [RFC4291] for IPv6 1395 [RFC8200] using a configured or otherwise a well-known destination 1396 UDP port defined in Appendix C.1. LIEs SHOULD be sent with an IPv4 1397 Time to Live (TTL) / IPv6 Hop Limit (HL) of 1 to prevent RIFT 1398 information reaching beyond a single L3 next-hop in the topology. 1399 LIEs SHOULD be sent with network control precedence. [] The first sentence is long and convoluted. I think I understood that the assigned destination IP address and UDP port are used, but that they also can be configured. Is that right? Suggestion> LIEs are exchanged over the well-known multicast addresses and UDP port, as summarized in Appendix C.1. An implementation MAY allow for the local configuration of these parameters. [] If configured, do both sides need to be configured beforehand or is there a discovery mechanism? [major] "LIEs SHOULD be sent with..." When is it ok to not use these values? IOW, why is this action recommended and not required? (The next comment is related.) [minor] "IPv4 Time to Live (TTL) / IPv6 Hop Limit (HL) of 1" Was GTSM (rfc5082) considered? Several documents (for example, rfc8085 and draft-ietf-opsec-v6) suggest its use. You may be asked about it later. It would be good to preempt those questions by adding some text in the Security Considerations about any risks...or using it. [major] "LIEs SHOULD be sent with network control precedence." When is it ok to not do so? IOW, when is this behavior recommended and not required. BTW, please add a reference. 1401 Originating port of the LIE has no further significance other than 1402 identifying the origination point. LIEs are exchanged over all links 1403 running RIFT. [nit] s/Originating port/The originating port 1405 An implementation MAY listen and send LIEs on IPv4 and/or IPv6 1406 multicast addresses. A node MUST NOT originate LIEs on an address 1407 family if it does not process received LIEs on that family. LIEs on 1408 same link are considered part of the same negotiation independent of 1409 the address family they arrive on. Observe further that the LIE 1410 source address may not identify the peer uniquely in unnumbered or 1411 link-local address cases so the response transmission MUST occur over 1412 the same interface the LIEs have been received on. A node MAY use 1413 any of the adjacency's source addresses it saw in LIEs on the 1414 specific interface during adjacency formation to send TIEs. That 1415 implies that an implementation MUST be ready to accept TIEs on all 1416 addresses it used as source of LIE frames. [major] "An implementation MAY listen and send LIEs on IPv4 and/or IPv6 multicast addresses. A node MUST NOT originate LIEs on an address family if it does not process received LIEs on that family." The first sentence says that it is optional to listen and send LIEs over either IPv4/IPv6 (IOW, a router doesn't have to do either). The second one gives the impression that originating a LIE on a specific AF means that the router is willing to only receive/process LIEs on it (e.g. if an IPv6 LIE is not generated then there's no positive indication that the router can receive/process IPv6 LIEs). The combination results in a potential lock at startup: For example, if a router chooses to only listen on IPv4 (MAY = optional) and it's neighbor on IPv6, then they will be sending LIEs and never talk to each other. I'm assuming that the intent is to allow for maximum flexibility: allow to send/receive on either (or both). But, what if only one AF is supported? Should an implementation be capable to control which AF is used? If I have an IPv6-only deployment, should the router be expected to receive LIEs over IPv4? [nit] s/LIEs on an address family/LIEs using an address family [nit] s/LIEs on same link/All LIEs on a link [minor] "part of the same negotiation" Negotiation of what? The text at the start of this section mentions that LIEs negotiate ZTP parameters -- is that it? Other actions don't seem to be negotiations: discovery, for example. I'm not sure if something like "session" would capture the intended meaning. [major] "MAY use any of the adjacency's source addresses it saw in LIEs...to send TIEs." This use of "MAY" makes using the source address from a LIE optional. §4.2.3.3 says that "TIEs...using the destination address on which the LIE adjacency has been formed", which doesn't make the use optional. OLD> A node MAY use any of the adjacency's source addresses it saw in LIEs on the specific interface during adjacency formation to send TIEs. NEW> A node may use any of the adjacency's source addresses from the LIEs received on the specific interface during adjacency formation to send TIEs (Section 4.2.3.3). [See related question in 4.2.3.3.] 1418 A three-way adjacency over any address family implies support for 1419 IPv4 forwarding if the `v4_forwarding_capable` flag is set to true 1420 and a node can use [RFC5549] type of forwarding in such a situation. 1421 It is expected that the whole fabric supports the same type of 1422 forwarding of address families on all the links. Operation of a 1423 fabric where only some of the links are supporting forwarding on an 1424 address family and others do not is outside the scope of this 1425 specification. [minor] "three-way adjacency" I know there's a definition in the terminology section, but it is not a specification of what a three-way adjacency is. Please add a forward reference to where it is specified. [major] "`v4_forwarding_capable` flag" (This comment is not specific to this flag.) The specific LIE packet format has not been presented, introduced or even referenced up to now. Pointing to a flag, which happens to be optional in an optional part of the LIEPacket, comes out of the blue and feels completely out of place without prior explanation. I can see that some of the fields are described in Appendix B, but there should at least be a pointer to that. I also expect a description somewhere of expectations and error conditions. For example, see my comments in the appendix related to LinkCapabilities. [major] "a node can use [RFC5549] type of forwarding" First of all, rfc5549 is a BGP-specific RFC. How do the BGP mechanisms defined there (and in rfc8950) map to RIFT? IOW, please specify how RIFT works. Because you listed rfc5549 as a Normative reference, I assume you want this behavior to be normative. s/can use/MUST/SHOULD/MAY ?? [major] "It is expected that the whole fabric supports the same type of forwarding of address families on all the links. Operation of a fabric where only some of the links are supporting forwarding on an address family and others do not is outside the scope of this specification." Please remind me, is IPv4 optional or required? I'm assuming that IPv6 is required and IPv4 optional, but I can't find anything definitive in the document. Even if IPv4 is optional, I'm assuming that most deployments will support both. But as time goes on that may (slowly) change to IPv6-only. Is that a fair assumption? Because the ability to support IPv4 is signaled per link (v4_forwarding_capable), then the expectation that "the whole fabric supports the same type of forwarding of address families" may be "easily" broken if one link doesn't support IPv4 (including the case where a rogue node may not advertise v4_forwarding_capable to mess things up). What should a node that detects an inconsistency (e.g. some links use dual-stack, but others only one AF) do? I'm assuming that it SHOULD/MUST (?) go as far as disabling the adjacencies as described below, right? Please be explicit and specific on the actions to be taken. 1427 The protocol does NOT support selective disabling of address 1428 families, disabling v4 forwarding capability or any local address 1429 changes in three-way state, i.e. if a link has entered three-way IPv4 1430 and/or IPv6 with a neighbor on an adjacency and it wants to stop 1431 supporting one of the families or change any of its local addresses 1432 or stop v4 forwarding, it has to tear down and rebuild the adjacency. 1433 It also has to remove any information it stored about the adjacency 1434 such as LIE source addresses seen. 1436 Unless ZTP as described in Section 4.2.7 is used, each node is 1437 provisioned with the level at which it is operating. It MAY be also 1438 provisioned with its PoD. If any of those values is undefined, then 1439 accordingly a default level and/or an "undefined" PoD are assumed. 1440 This means that leaves do not need to be configured at all if initial 1441 configuration values are all left at "undefined" value. Nodes above 1442 ToP MUST remain at "any" PoD value which has the same value as 1443 "undefined" PoD. This information is propagated in the LIEs 1444 exchanged. [major] There are references made in this paragraph to things that are defined elsewhere (""any" PoD"..."propagated in the LIEs"). Please put pointers to where these things are defined. As mentioned above, the LIE format hasn't been presented yet. The text above gives the impression that the type of the PoD is somehow related to the level, so I went in search of that and found this in the LIEPacket: struct LIEPacket { ... /** Node's PoD. */ 7: optional common.PodType pod = common.default_pod; ...but then I can't find anything that corresponds to PodType matching the values above: undefined, any... ...but I did find this: /** Common RIFT packet header. */ struct PacketHeader { ... /** Level of the node sending the packet, required on everything except LIEs. Lack of presence on LIEs indicates UNDEFINED_LEVEL and is used in ZTP procedures. */ 4: optional common.LevelType level; } ...and a couple of constants: const LevelType top_of_fabric_level = 24 const LevelType leaf_level = 0 const LevelType default_level = leaf_level If ZTP is not used, are there operational/deployment considerations for an operator to provision the levels? Should they simply start counting from 0, is incrementing by one good enough or should it be by 2? Or maybe there are considerations about the use of ZTP? Please take a look at rfc5706. [minor] "It MAY be also provisioned with its PoD." What does this mean? Does it mean that all the nodes in the PoD have the same level? ... [major] BTW, this reminds me, what are the manageability considerations related to RIFT? Among other things, it would be nice to have a summary of parameters that are expected to be configured and their defaults. Please take a look at rfc5706. 1446 Further definitions of leaf flags are found in Section 4.2.7 given 1447 they have implications in terms of level and adjacency forming here. [] "leaf flags" What are those? I think this is the first mention. [] Suggestion> Further definitions of leaf flags are found in Section 4.2.7. 1449 A node tries to form a three-way adjacency if and only if [nit] s/tries to form/starts to form [major] "if and only if" Can this be translated into a Normative statement? Are all these required (MUST) before the formation of a 3-way adjacency started? 1451 1. the node is in the same PoD or either the node or the neighbor 1452 advertises "undefined/any" PoD membership (PoD# = 0) AND [minor] "the node is in the same PoD" As what? I guess you mean in the same PoD as what a neighbor indicated in its LIE, right? 1454 2. the neighboring node is running the same MAJOR schema version AND [] The formats have not been introduced yet. I peeked into Appendix B and found the PacketHeader. As with #1, you're assuming a LIE packet has been received from the neighbor, but that is not mentioned. [] The versioning system has not been introduced. Please put a reference to it. [nit] This is the only place where "MAJOR" (all caps) is used. Please be consistent. [minor] Are there any requirements related to the minor version? 1456 3. the neighbor is not member of some PoD while the node has a 1457 northbound adjacency already joining another PoD AND [nit] s/not member of some PoD/not a member of the same PoD [major] "not [a] member of [the] some PoD" #1 says that the "node is in the same PoD" -- it looks like there's some context missing -- I guess about the ability to join other PoDs. Is this only a northbound characteristic? 1459 4. the neighboring node uses a valid System ID AND [minor] "System ID" and "SystemID" are both used. Please pick one. [major] What is a "valid System ID"? The only clue that I found is in §4.2.7.2, which makes we wonder whether it is specific to ZTP or not (??): RIFT nodes require a 64 bit SystemID which SHOULD be derived as EUI-64 MA-L derive according to [EUI64]. The organizationally governed portion of this ID (24 bits) can be used to generate multiple IDs if required to indicate more than one RIFT instance." If this is a recommendation (SHOULD), and assuming that it applies beyond ZTP, what is a "valid System ID"? I also only found one place that talks about an IllegalSystemID, but that is probably not the same as an invalid one: /** 0 is illegal for SystemID */ const SystemIDType IllegalSystemID = 0 1461 5. the neighboring node uses a different System ID than the node 1462 itself [minor] Is there a way to resolve/address duplicate System IDs? It seems to me that a rogue node may reflect the sender's System ID and prevent an adjacency from forming (along with a number of other values). [minor] Also, I hope that the selection/assignment of the System ID is discussed elsewhere. A pointer might be nice. 1464 6. the advertised MTUs match on both sides AND [] Ahhh...this in in the LIEPacket too. [major] struct LIEPacket { ... /** Layer 3 MTU, used to discover to mismatch. */ 4: optional common.MTUSizeType link_mtu_size = common.default_mtu_size; If MTUSizeType is not included, where is it specified that the default_mtu_size must be considered? Without that specification it is not possible to satisfy this condition. 1466 7. both nodes advertise defined level values AND [major] The advertisement of LevelType in PacketHeader is optional. That means that even if there is a default value a node may not "*advertise* defined level values". Also, there are only 2 level values defined: const LevelType top_of_fabric_level = 24 const LevelType leaf_level = 0 What am I missing? 1468 8. [ 1470 i) the node is at level 0 and has no three way adjacencies 1471 already to nodes at Highest Adjacency Three-Way level (HAT as 1472 defined later in Section 4.2.7.1) with level different than 1473 the adjacent node OR [nit] s/three way/three-way/g ... 1478 iii) both nodes are at level 0 AND both indicate support for 1479 Section 4.3.8 OR [minor] "support for Section 4.3.8" Is there a name for that? How is it indicated? §4.3.8 says that a leaf can "advertise the LEAF_2_LEAF flag in its node capabilities"; there is nothing called "LEAF_2_LEAF" in the schema, but "leaf_only_and_leaf_2_leaf_procedures" does show up. Please be consistent in the terminology. [Also, there are places where "leaf-2-leaf" is used.] ... 1486 The rules checking PoD numbering MAY be optionally disregarded by a 1487 node if PoD detection is undesirable or has to be ignored. This will 1488 not affect the correctness of the protocol except preventing 1489 detection of certain miscabling cases. [nit] s/rules checking/rules for checking [nit] s/MAY be optionally/MAY be (redundant) [minor] What are that the "rules [for] checking PoD numbering"? Where are they specified? [major] "MAY be optionally disregarded by a node if PoD detection is undesirable or has to be ignored." This is the only place where "PoD detection" is used. What is it? Where is it specified? Why would be it undesirable? When would it have to be ignored? Are there operational considerations for making that decision? [major] "...except preventing detection of certain miscabling cases." Where is miscabling detection described? Which cases may not be covered? §4.2.7.6 says that "internal ruleset flags a possible miscabling", but that is the closest that I came to finding a description (and rulesets are not mentioned anywhere else). 1491 A node configured with "undefined" PoD membership MUST, after 1492 building first northbound three way adjacencies to a node being in a 1493 defined PoD, advertise that PoD as part of its LIEs. In case that 1494 adjacency is lost, from all available northbound three way 1495 adjacencies the node with the highest System ID and defined PoD is 1496 chosen. That way the northmost defined PoD value (normally the ToP 1497 nodes) can diffuse southbound towards the leaves "forcing" the PoD 1498 value on any node with "undefined" PoD. [nit] s/building first...adjacencies/building its first...adjacency [nit] s/to a node being in a defined PoD/to a node in a defined PoD [minor] "...defined PoD is chosen." ...chosen as the PoD value to advertise. (or somehting along those lines) [nit] s/northmost/northernmost [] I'm missing how using the System ID to select will always result in the north direction... [minor] "...diffuse southbound towards the leaves "forcing" the PoD value on any node with "undefined" PoD." An example of this would be nice. 1500 LIEs arriving with IPv4 Time to Live (TTL) / IPv6 Hop Limit (HL) 1501 larger than 1 MUST be ignored. [major] "MUST be ignored" The specification of the sender (§4.2.2) says that "LIEs SHOULD be sent with an IPv4 Time to Live (TTL) / IPv6 Hop Limit (HL) of 1". This is a Normative conflict because it is only recommended to the sender, but required by the receiver. 1503 A node SHOULD NOT send out LIEs without defined level in the header 1504 but in certain scenarios it may be beneficial for trouble-shooting 1505 purposes. [] See the related comments in #7 above. [nit] s/trouble-shooting/troubleshooting [major] #7 above makes advertising defined values a requirement to forming a three-way adjacency, but it is only recommended here. Unless there are specific troubleshooting cases described, we should retain the requirement. I believe that a packet can be crafted in any way for troubleshooting/testing, but that doesn't have to be indicated in a specification which should deal with the correct operation of the protocol. IOW, I think we can do without this paragraph. 1507 4.2.2.1. LIE FSM 1509 This section specifies the precise, normative LIE FSM and can be 1510 omitted unless the reader is pursuing an implementation of the 1511 protocol. [nit] s/and can be omitted..../and can be omitted. [] It would be very nice if there was an upfront description (at least a list) of the different elements: states, events, etc.. to make reading the diagrams easier. Also, the terminology (PUSH, etc.) should also be upfront. 1513 Initial state is `OneWay`. 1515 Event `MultipleNeighbors` occurs normally when more than two nodes 1516 see each other on the same link or a remote node is quickly 1517 reconfigured or rebooted without regressing to `OneWay` first. Each 1518 occurrence of the event SHOULD generate a clear, according 1519 notification to help operational deployments. [] This event is the only one described. Is there anything special about it? Is this the only one for which generating a notification is recommended? Is that why it is described here? [minor] "occurs normally when..." Are there other conditions (which are not considered "normal")? [major] "more than two nodes see each other on the same link" The description below is different (and I think more accurate): "more than one neighbor seen on interface". Also, it doesn't include the second part about a "reconfigured or rebooted" node. What is the characteristic of that behavior? [major] "SHOULD generate a clear, according notification to help operational deployments." Normatively, what is "clear"? Is there information that you have in mind that should be included to make it "clear"? I assume that a notification is a message logged on the node, is that true? Suggestion> ...SHOULD log a message including the System IDs of each node. (Anything else?) [] This seems to be the only occurrence of a log resulting from a message or other action. I think I'm missing the significance of this event. [Ok. This is where I am in my reading. There are more comments below.] ... 2069 4.2.3.3. Flooding ... 2082 As described before, TIEs themselves are transported over UDP with 2083 the ports indicated in the LIE exchanges and using the destination 2084 address on which the LIE adjacency has been formed. For unnumbered 2085 IPv4 interfaces same considerations apply as in equivalent OSPF case. [major] (Related to a comment in 4.2.2.) "using the destination address on which the LIE adjacency has been formed" If multiple addresses are used (from different AFs), which one is selected? Is there a "primary" address that is considered when the adjacency is formed? ... 6165 10.2. Informative References ... 6175 [DOT] Ellson, J. and L. Koutsofios, "Graphviz: open source graph 6176 drawing tools", Springer-Verlag , 2001. [major] Is there an URI that can be used as a stable pointer? ... 6362 B.1. common.thrift ... 6730 /** Link capabilities. */ 6731 struct LinkCapabilities { 6732 /** Indicates that the link is supporting BFD. */ 6733 1: optional bool bfd = 6734 common.bfd_default; 6735 /** Indicates whether the interface will support v4 forwarding. 6737 @note: This MUST be set to true when LIEs from a v4 address are 6738 sent and MAY be set to true in LIEs on v6 address. If v4 6739 and v6 LIEs indicate contradicting information the 6740 behavior is unspecified. */ 6742 2: optional bool v4_forwarding_capable = 6743 true; 6744 } [major] Not knowing thrift, and assuming that this structure is some sort of enum/list, what should a receiver do if the undefined/unknown value 3 is received? [major] "contradicting information the behavior is unspecified." How can the behavior be unspecified if this *is* the specification? More to the point: if the indication is required in IPv4 LIEs, but only optional in IPv6 LIEs, why wouldn't the information from the IPv4 LIEs be considered as the truth? [nit] s/interface will support v4 forwarding/interface supports IPv4 forwarding/g [nit] s/v4/IPv4/g s/v6/IPv6/g [major] "Indicates that the link is supporting BFD." I was going to ask if setting bfd to TRUE means that the link can support BFD (is capable) or if it BFS is actively enabled (supporting), but I found this somewhere else: /** default link being BFD capable */ const bool bfd_default = true ...so I'm assuming that it just means that the link is capable, which makes sense given that is it in LinkCapablities. s/Indicates that the link is supporting BFD./Indicates that the link is BFD-capable./g Also, note that §4.3.5 says that "RIFT MAY incorporate BFD", which is not completely aligned with the default indication of the link being BFD-capable. Using BFD is different than being BFD-capable, so it is ok for its use to be optional (§4.3.5). However, there is no stated requirement for a link to be BFD-capable, as indicated in the default setting. [End of Review 2a]
- [Rift] AD Review of draft-ietf-rift-rift-12 (Part… Alvaro Retana
- [Rift] Fwd: Re: AD Review of draft-ietf-rift-rift… Alvaro Retana
- Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Alvaro Retana
- Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Antoni Przygienda
- Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Tony Przygienda
- Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Alvaro Retana
- Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Tony Przygienda
- Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Pascal Thubert (pthubert)
- Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Pascal Thubert (pthubert)
- Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Tony Przygienda
- Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Alvaro Retana
- Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Jordan Head
- Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Jordan Head
- [Rift] RIFT UDP Ports/Multicast Address Allocatio… Alvaro Retana
- Re: [Rift] RIFT UDP Ports/Multicast Address Alloc… Antoni Przygienda
- Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Alvaro Retana
- Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Antoni Przygienda
- Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Alvaro Retana
- Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Antoni Przygienda
- Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Jordan Head