Re: [Lsr] Flooding across a network
"Les Ginsberg (ginsberg)" <ginsberg@cisco.com> Wed, 06 May 2020 17:53 UTC
Return-Path: <ginsberg@cisco.com>
X-Original-To: lsr@ietfa.amsl.com
Delivered-To: lsr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E0C6E3A0947 for <lsr@ietfa.amsl.com>; Wed, 6 May 2020 10:53:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.598
X-Spam-Level:
X-Spam-Status: No, score=-9.598 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com header.b=HBh8Gavu; dkim=pass (1024-bit key) header.d=cisco.onmicrosoft.com header.b=mKAO0V2N
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AyDceqzgh8kz for <lsr@ietfa.amsl.com>; Wed, 6 May 2020 10:53:20 -0700 (PDT)
Received: from alln-iport-1.cisco.com (alln-iport-1.cisco.com [173.37.142.88]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 13E0A3A07F8 for <lsr@ietf.org>; Wed, 6 May 2020 10:53:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=44974; q=dns/txt; s=iport; t=1588787600; x=1589997200; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=wfL+FauXpeOmLh+ztZDXomlumj9hJSrUSGnpd9M1IVE=; b=HBh8GavuRAmAlBX4uBfgmO6xJyDu3G55FQn3anfkKBSY730WAuN8JFkq JiHhURAGtSCZVnVtRVYeBhEGuQrNCRq9OHsu+/MPW8ub5f5qtnkvAitU9 S8z40Sy+iEdmzeGaTzaEg3Kr2/wO2vrOj5ZXLbpGWyRoJeJfO2r6sD2nY I=;
IronPort-PHdr: 9a23:fg19pBKYQExWhS5SDtmcpTVXNCE6p7X5OBIU4ZM7irVIN76u5InmIFeGvKs/i0XAW4rWrflDjrmev6PhXDkG5pCM+DAHfYdXXhAIwcMRg0Q7AcGDBEG6SZyibyEzEMlYElMw+Xa9PBtSAs/4aFCUqXq3vnYeHxzlPl9zIeL4UofZk8Ww0bW0/JveKwVFjTawe/V8NhKz+A7QrcIRx4BlL/U8
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A0AUAADm+LJe/4QNJK1mGgEBAQEBAQEBAQEDAQEBARIBAQEBAgIBAQEBQIE1AwEBAQELAYFTKSgFblgvKoQjg0YDjUeYNYEuFIEQA1AECwEBAQwBARgLCgIEAQGBUIJ0AheBaiQ2Bw4CAwEBCwEBBQEBAQIBBQRthSoIJAyFcQEBAQECAQEBEAgBCBEMAQEsBAcBCwQCAQYCEQQBAQECAiMDAgICJQsUAQgIAgQBDQUIEweCOUyCSwMOIAEOmDOQZwKBOYhhdoEygwABAQWBNgKDbRiCDgMGgQ4qAYJigkmHGBqBQT+BEAFDgU9+PoJnAQEDgRwRAQwGASMVD4JsM4ItjiYYBDABAoJUkDKOeIFiCoJIiBiFe4RWhUqCWzOILoR7jGmQF4FYh3yPOYQPAgQCBAUCDgEBBYFZATFDI3BwFTuCaVAYDY04gwqBJgEJgkKFFIVCdDcCBgEHAQEDCXyPBoJEAQE
X-IronPort-AV: E=Sophos;i="5.73,360,1583193600"; d="scan'208";a="474101464"
Received: from alln-core-10.cisco.com ([173.36.13.132]) by alln-iport-1.cisco.com with ESMTP/TLS/DHE-RSA-SEED-SHA; 06 May 2020 17:53:18 +0000
Received: from XCH-ALN-001.cisco.com (xch-aln-001.cisco.com [173.36.7.11]) by alln-core-10.cisco.com (8.15.2/8.15.2) with ESMTPS id 046HrIrb003244 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=FAIL); Wed, 6 May 2020 17:53:18 GMT
Received: from xhs-aln-002.cisco.com (173.37.135.119) by XCH-ALN-001.cisco.com (173.36.7.11) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Wed, 6 May 2020 12:53:18 -0500
Received: from xhs-rcd-003.cisco.com (173.37.227.248) by xhs-aln-002.cisco.com (173.37.135.119) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Wed, 6 May 2020 12:53:17 -0500
Received: from NAM10-DM6-obe.outbound.protection.outlook.com (72.163.14.9) by xhs-rcd-003.cisco.com (173.37.227.248) with Microsoft SMTP Server (TLS) id 15.0.1497.2 via Frontend Transport; Wed, 6 May 2020 12:53:16 -0500
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=S3vl5dHMLfSv5tnrfOtjugIOBOcvrP3pkfl5PxngwUT0Bj4IaEhrZbuLs1t3og4WC/ZbJK94e46IsIxdC751oVm9OTOkBdh9dGCAZemywAF7o1nWwY1zLSrnBgFHqVtTM9Urq2+3AKqTq5Bs3iEU71U7TPW95mk+1a+52YvrLmyRbu8tCywTWj2D8FzdZ8MCyUcj1rGk2+6qHMXhBHHtkc+1BKvVwRn4EFMdjcPWuhFMcsKf81onGV5wQ8nSD/y11dHLmh+M3KQ0vHdA/8kKDMpXJ/Gv4l822Jx3FFOGPmZqUrRT5WYFSsFTDuFbCOZSYofPIYPfTCEB6qSjzmCiEA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=wfL+FauXpeOmLh+ztZDXomlumj9hJSrUSGnpd9M1IVE=; b=IhSv5/0VJO2ByXUYvUGGiLFyjFoRa4jJE3shHqEpSwkWrUMX0vY0AeDuFEWR+lfx7hbtB0XVrsr1lIKJEpWXD3dA/XLb6tYO7wgR37/rfM7t6j9QmifLTnJ4vqUGu06OUYkpmRf9OAfpYGJgwIHQcAbhiz/uiWgDnzZXnOOb84Q7ed80jHEBhbuvBFcnKOJOSSMWYduCe6bHUgO/maJob8K8gErGoXK4JLwdiQbRvtg8YpU/+QyVOK/tuqSTiNfsIQ5IegRS0+2GDZw28XO0nwkZFcgPioojDqUtVHmbCGBWBcVkeqiNr8PIkieT0Cl1kC297DRBqlz5VXcRfZ7srw==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=cisco.com; dmarc=pass action=none header.from=cisco.com; dkim=pass header.d=cisco.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cisco.onmicrosoft.com; s=selector2-cisco-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=wfL+FauXpeOmLh+ztZDXomlumj9hJSrUSGnpd9M1IVE=; b=mKAO0V2NVI1474gsrt2hXnbkk6+4xidT3uhVR876oAj4rFlbtJbKPeZdow/P3of5d63UJR8KxZkg5YMplkb+ACOaboQbWUd6ZH21NaXb68a+N2kspTztMt9zTndo5vukujLRnAvvgApY8OaaYvMDJKEcHFylZa8C7EOBupCgsvw=
Received: from MW3PR11MB4619.namprd11.prod.outlook.com (2603:10b6:303:5b::15) by MW3PR11MB4698.namprd11.prod.outlook.com (2603:10b6:303:5a::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2958.27; Wed, 6 May 2020 17:53:15 +0000
Received: from MW3PR11MB4619.namprd11.prod.outlook.com ([fe80::c4d2:505c:a6bf:21a6]) by MW3PR11MB4619.namprd11.prod.outlook.com ([fe80::c4d2:505c:a6bf:21a6%6]) with mapi id 15.20.2979.028; Wed, 6 May 2020 17:53:15 +0000
From: "Les Ginsberg (ginsberg)" <ginsberg@cisco.com>
To: "bruno.decraene@orange.com" <bruno.decraene@orange.com>, Christian Hopps <chopps@chopps.org>
CC: "lsr@ietf.org" <lsr@ietf.org>
Thread-Topic: [Lsr] Flooding across a network
Thread-Index: AdYi7bsxcCpCOCQDSjylEJ0cQECozgAREGJwAByiJrAAAl2gYAAA0u8gAAEHvYAABO66gAABUTNw
Date: Wed, 06 May 2020 17:53:15 +0000
Message-ID: <MW3PR11MB4619015E4B356DFC225CD001C1A40@MW3PR11MB4619.namprd11.prod.outlook.com>
References: <24209_1588692477_5EB185FD_24209_35_1_53C29892C857584299CBF5D05346208A48E3D455@OPEXCAUBM43.corporate.adroot.infra.ftgroup> <MW3PR11MB46198A668B9F2532BCCC38FEC1A70@MW3PR11MB4619.namprd11.prod.outlook.com> <6287_1588771252_5EB2B9B4_6287_332_1_53C29892C857584299CBF5D05346208A48E3F698@OPEXCAUBM43.corporate.adroot.infra.ftgroup> <MW3PR11MB46199CC33B10BC9D3D622D2AC1A40@MW3PR11MB4619.namprd11.prod.outlook.com> <10562_1588775602_5EB2CAB2_10562_251_11_53C29892C857584299CBF5D05346208A48E3FB63@OPEXCAUBM43.corporate.adroot.infra.ftgroup> <87CDE7F3-E08D-4C45-9AF1-9DAD635F8908@chopps.org> <9992_1588784982_5EB2EF56_9992_201_1_53C29892C857584299CBF5D05346208A48E40256@OPEXCAUBM43.corporate.adroot.infra.ftgroup>
In-Reply-To: <9992_1588784982_5EB2EF56_9992_201_1_53C29892C857584299CBF5D05346208A48E40256@OPEXCAUBM43.corporate.adroot.infra.ftgroup>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
authentication-results: orange.com; dkim=none (message not signed) header.d=none;orange.com; dmarc=none action=none header.from=cisco.com;
x-originating-ip: [2602:306:36ca:6640:297d:1f43:941a:5bfe]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: 755f56b1-7ea8-4a36-3e26-08d7f1e65a52
x-ms-traffictypediagnostic: MW3PR11MB4698:
x-microsoft-antispam-prvs: <MW3PR11MB46987635D0274AFC74A8D83AC1A40@MW3PR11MB4698.namprd11.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:10000;
x-forefront-prvs: 03950F25EC
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: BNCF0s+pMiWAT/49s1vM113DRkMLSM3OssbRhAYv6MCQ/V7eSKoOWTR8CnF3srmSbUNrRhFLCdyh8EtySibpBXd9oa6Bxrt/07HXdl7sA2dC3f4rWOAwZWeXW3LGemHD1KeoVJL8G+SvNZ4q/HJ8It2WnaToIZsxjiP1sx7hmYr1dlC+SrITKohBRimUdPRxfyWAuMa8nZ3SY+1Evzyk/Vr4FF4w8DvgZhLMKScbjn3Nfzc3KHkOMoOumvfbnvmWQRfZYtACzcAfBH+GmHfD1NFXiGsZkPiuHixIFEVr+yN3U+FHfgq/DhJroJGboeRyj4/nzE8CLUqJ/iylvSGD/iEOGzxmUW6CYkxqEzdKE/+D5/r3fVrWXHIZ/iAh4PiLTIya7z1JgRFyE7B9ZF/CtH+laQSspVCND+BKoj0OtwKibIM0VXIfv+8NVqQmud7lAhu11DG9xArbJgiOKZDdRMMTPbCT7Ig0Kf7WILA9ZVrplt+9GAgsJQ/atT/Kl0VM0zqloZrgndmZtQdDgMWJE3DMA08YE+1d8WklPKeI92wugwAkNu4DsJRbXn4fFDjpziWkuSPek95T4Q6pgMQVO8ueFiYASt7ekdUu7p/MSw4=
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:MW3PR11MB4619.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFTY:; SFS:(4636009)(39860400002)(136003)(376002)(366004)(346002)(396003)(33430700001)(55016002)(9686003)(186003)(53546011)(6506007)(2906002)(30864003)(8676002)(71200400001)(86362001)(110136005)(966005)(8936002)(66476007)(64756008)(66446008)(66556008)(316002)(66946007)(76116006)(4326008)(7696005)(52536014)(478600001)(5660300002)(33656002)(33440700001)(66574014)(579004)(559001); DIR:OUT; SFP:1101;
x-ms-exchange-antispam-messagedata: XeWP6YWleL1vPKbvtzac7MExvbS3qTX2GiAVro9MPsBqHl7Totu+Zb1UKmb1VCxeXFAQgg2wraH3KenUgLs656fowXh8/ODgBPXrAkPs9GTR7PjBMr8fSAAXuZ5v654dBtevS1sy8CKjPxqW64GRUhjS+Nl5pCml/dplT2ctH4v8mifdYwIOU6oKs2Dh5OMJAd1QUBsqjOwFkvJfTQqqq15gW0eZEyLQvgxcpT9BM/XV6rAOAvIsu5Vpa1VCikRlsV1l0vxzbE8teop3/UN4lH3fl+9lUKpBuRsn7hUBSZXZVMqbisWEWasqEx6I2p38rgBjyHW/QEzuSxVs8XjPpjM2K2MufeGivVuTYgJjvV3zIax6TQc4SVLXN3be1YxDg4oSWnw4wW2ejtgQ8Yzj6Cuqxf1hKU0umWb4Wz0OD9vi+MsDHWRxiNFVgnLTktkEm65tAKmkpt0GuVTA+9MC3+0NaSpwL3OzRf/zNy28f4opt0jHOeM8mFxNU3G7VvzpopQQoJnNob20xRepcyjIqOUSh4YXolJGT02YzRmhb/1Vr0mCFiyEJIOwprN+GbsOTo55miEa5vWqBGIAHY+2VCX/b5r3s1AV9Eq/MvCMwB2GDwyCrUcATLlpq3uB6TR4JZGmKKYVfdLSXgJjg9igd6zaOSQqpW5Q3istTGom7+up8MO+Z06e0MdMmyLABQwGqrK1e+swGm0kbF8dD+czNksGjYQkPkK3ihUYifHHJ9hFJM1/ckYC/4himiVZ48r3z014pMbu+HIPyMSGvgF2rUobtC2k/BTMN+SvVkkXGSwu/ub72M+5YUsYpOxMT7GGKkl5Cv0YjzRyKujqhkdbEf3bMhSy3s13Rl6XLLnn2sQ=
x-ms-exchange-transport-forked: True
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-Network-Message-Id: 755f56b1-7ea8-4a36-3e26-08d7f1e65a52
X-MS-Exchange-CrossTenant-originalarrivaltime: 06 May 2020 17:53:15.5046 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 5ae1af62-9505-4097-a69a-c1553ef7840e
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: 0bEftXJNjKqaN1/aNmVP1kF4rZhRxH0xci6ZeJy7ntQiaBoAYU5IrYvd2OL9jXFmE25oD2+dmHFFCZiK3amtnQ==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW3PR11MB4698
X-OriginatorOrg: cisco.com
X-Outbound-SMTP-Client: 173.36.7.11, xch-aln-001.cisco.com
X-Outbound-Node: alln-core-10.cisco.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/-aAX450y9dL-AA1wYqwKjYrDk5I>
Subject: Re: [Lsr] Flooding across a network
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 06 May 2020 17:53:25 -0000
Bruno - I am sorry it has been so difficult for us to understand each other. I am trying my best. Look at it this way: You are the customer. 😊 I am the vendor. The failure scenario I describe below happens and you notice that all Northbound destinations loop for 35 seconds whenever fast flooding is enabled. I think you are going to complain about this - to me. 😊 And I am going to tell you that this is a consequence of enabling fast flooding in the presence of a node which does not support it. Your options to reduce the period of looping will be: 1)Upgrade the slow node to support faster flooding 2)Disable fast flooding 3)Redesign your network Les > -----Original Message----- > From: bruno.decraene@orange.com <bruno.decraene@orange.com> > Sent: Wednesday, May 06, 2020 10:10 AM > To: Christian Hopps <chopps@chopps.org> > Cc: Les Ginsberg (ginsberg) <ginsberg@cisco.com>; lsr@ietf.org > Subject: RE: [Lsr] Flooding across a network > > > From: Christian Hopps [mailto:chopps@chopps.org] > > > > Bruno persistence has made me realize something fundamental here. > > > > The minute the LSP originator changes the LSP and floods it you have LSDB > inconsistency. > > Exactly my point. Thank you Chris. > I would even say: "The minute the LSP originator changes the LSP then you > have LSDB inconsistency." But no big deal if there is disagreement on this > detail. > > > That is going to last until the last node in the network has updated it's LSDB. > > Absolutely. > So the faster we flood, the shorter the LSBD inconsistency. > > Now IMO, even if a single/few nodes flood faster, there is a chance of > shortening the LSDB inconsistency. But in all cases, I don't see how this could > make the LSDB inconsistency longer. > > > > Les is pointing out that LSDB inconsistency can be bad in certain > circumstances e.g., if a critical node is slow and thus inconsistent. > > > > I believe the right way to fix this is a simple one, help the operator flag the > broken router software/hardware for replacement, but otherwise IS-IS > should just try to do the best job it can do to which is to flood around the > problem (i.e., flood as optimally as possible). > > +1 > On a side note, I would not call a router flooding slowly as "broken". I find it > understandable that in a given network there are different type of routers > (core vs aggregation), different roles (P having 50 IGP adjacencies with 50 PEs > vs PE having only 2 IGP adjacencies with 2 P), different hardware > generations, different software, different vendors with different > perspectives/markets. > > Thank you Chris. > > --Bruno > > > > Thanks, > > Chris. > > [as WG member] > > > > > > > On May 6, 2020, at 10:33 AM, bruno.decraene@orange.com wrote: > > > > > > Les, > > > > > > From: Les Ginsberg (ginsberg) [mailto:ginsberg@cisco.com] > > > Sent: Wednesday, May 6, 2020 4:14 PM > > > To: DECRAENE Bruno TGI/OLN > > > Cc: lsr@ietf.org > > > Subject: RE: Flooding across a network > > > > > > Bruno – > > > > > > I am somewhat at a loss to understand your comments. > > > The example is straightforward and does not need to consider FIB update > time nor the ordering of prefix updates on different nodes. > > > [Bruno] The example is straightforward but you are referring to FIB and IP > packets forwarding as per those FIBs. > > > I’d like we focus on LSP flooding and LSDB consistency. > > > > > > Consider the state of Node B and Node D at various time points from the > trigger event. > > > > > > T+ 2 seconds: > > > ----------------- > > > B has received all LSP Updates. It triggers an SPF and for all Northbound > destinations previously reachable via C it installs paths via D. > > > Let’s assume it take 5 seconds to update the forwarding plane. > > > > > > D has received 40 of the 1000 LSP updates. It triggers an SPF and finds > that all Northbound destinations are reachable via B-C. It makes no changes > to the forwarding plane. > > > > > > T+7 seconds > > > ----------------- > > > B has completed FIB updates. Traffic to all Northbound destinations is > being forwarded via D. > > > > > > D has now received 140 of the 1000 LSP updates. Entries in its forwarding > plane for Northbound destinations still point to B. > > > > > > We have a loop. > > > > > > T + 30 seconds > > > -------------------- > > > D has now received 600 of the 1000 LSP updates. Still no changes to its > forwarding plane. > > > Traffic to Northbound destinations is still looping. > > > > > > T+ 50 seconds > > > ------------------- > > > D has finally received all 1000 LSP updates.. > > > It triggers (another) SPF and calculates paths to Northbound destinations > via E. It begins to update its forwarding plane. > > > Let’s assume this will take 5 seconds.. > > > > > > T + 55 seconds > > > -------------------- > > > D has completed forwarding plane updates – no more looping. > > > > > > That is all I am trying to illustrate. > > > > > > If you want to start arguing that node protecting LFAs + microloop > avoidance could help (NOTE I explicitly took those out of the example for > simplicity) – it is easy enough to change the example to include multiple node > failures or a node failure plus some northbound link failures on other nodes. > > > [Bruno] I’m not talking about LFA/FRR. And with regards to microloops > avoidance, some algorithms can handle any graph transition so including > multiple node failures. > > > > > > But again, let’s stick to LSP flooding and LSDB consistency. (you are the > one speaking about microloops in the forwarding plane). > > > > > > The point here is to look at the impact of long-lived LSDB inconsistency > which results when some nodes support flooding an order of magnitude > faster flooding than other nodes – which is what you asked me to clarify. > > > [Bruno] No. I asked you to clarify why having a node with faster flooding > could prolongs the period of LSDB inconsistency. > > > > > > Again, with you own words: “when only some nodes in the network > support faster flooding the behavior of the whole network may not be > "better" when faster flooding is enabled because it prolongs the period of > LSDB inconsistency.” > > > And with less words: “when only some nodes in the network support > faster flooding […] it prolongs the period of LSDB inconsistency.” > > > > > > --Bruno > > > > > > Les > > > > > > > > > > > > From: bruno.decraene@orange.com <bruno.decraene@orange.com> > > > Sent: Wednesday, May 06, 2020 6:21 AM > > > To: Les Ginsberg (ginsberg) <ginsberg@cisco.com> > > > Cc: lsr@ietf.org > > > Subject: RE: Flooding across a network > > > > > > Les, > > > > > > From: Les Ginsberg (ginsberg) [mailto:ginsberg@cisco.com] > > > Sent: Wednesday, May 6, 2020 1:35 AM > > > To: DECRAENE Bruno TGI/OLN; lsr@ietf..org > > > Subject: RE: Flooding across a network > > > > > > Bruno - > > > > > > Seems like it was not too long ago that we were discussing this in person. > Ahhh...the good old days... > > > [Bruno] Indeed, may be not to the point of concluding. Indeed. > > > > > > First, let's agree that the interesting case does not involve 1 or even a > small number of LSPs. For those cases flooding speed does not matter. > > > The interesting cases involve a large number of LSPs (hundreds or > thousands). And in such cases LFA/microloop avoidance techniques are not > applicable. > > > > > > Take the following simple topology: > > > > > > | | ... | | > > > +---+ +---+ > > > | C | | E | > > > +---+ +---+ > > > | | 1000 > > > +---+ +---+ > > > | B |-------------| D | > > > +---+ 1000 +---+ > > > | | > > > | | > > > \ / > > > \ / > > > \ / > > > \ / > > > +---+ > > > | A | > > > +---+ > > > > > > There is a topology northbound of C and E (not shown) and a topology > southbound of A (not shown). > > > Cost on all links is 10 except B-D and D-E where cost is high. > > > > > > C is a node with 1000 neighbors. > > > When all links are up, shortest path for all northbound destinations is via > C. > > > All nodes in the network support fast flooding except for Node D. > > > Let’s say fast flooding is 500 LSPs/second and slow flooding (Node D) is 20 > LSPs/seconds. > > > If Node C fails we have 1000 LSPs to flood. > > > All nodes except for D can receive these in 2 seconds (plus internode > delay time). > > > D can receive LSPs in 50 seconds. > > > > > > [Bruno] Thanks for your example. Agreed so far. > > > > > > When A and B and all southbound nodes receive/process the LSP > updates they will start sending traffic to Northbound destinations via D. > > > But for the better part of 50 seconds, Node D has yet to receive all LSP > updates and still believes that shortest path is via B-C. It will loop traffic. > > > > > > [Bruno] May I remind you that we are discussing IS-IS flooding in order to > sync LSDB (LSP database). That is already a big enough subject. It does not > including FIB (updates), nor IP forwarding. > > > > > > Quoting you “when only some nodes in the network support faster > flooding the behavior of the whole network may not be "better" when faster > flooding is enabled because it prolongs the period of LSDB inconsistency.” > > > > > > Taking your own examples, in both cases (all nodes support fast flooding; > all nodes but D support fast flooding) the period of LSDB inconsistency is 50 > seconds. Hence this example does not illustrate your statement. > > > > > > Hence I’m restating my questions: > > > > > > > > when only some nodes in the network support faster flooding the > behavior > > > > of the whole network may not be "better" when faster flooding is > enabled > > > > because it prolongs the period of LSDB inconsistency. > > > > > > > > 1) Do you have data on this? > > > > > > > > 2) If not, can you provide an example where increasing the flooding > rate on > > > > one adjacency prolongs the period of LSDB inconsistency across the > > > > network? > > > > > > > > > Had all nodes used slow flooding, it still would have taken 50 seconds to > converge, but there would be significantly less looping. There could be a > good amount of blackholing, but this is preferable to looping. > > > [Bruno] You are using an example where ordering FIB updates across the > network, e.g. as per [1], allows to reduce _FIB_ inconsistency across the > path/network. And you seem to conclude from this that this translates to > LSDB update ordering. Those are two different things. In this thread, I’d > suggest that we focus on IGP flooding and LSDB sync only. (*) > > > [1] https://tools.ietf.org/html/rfc6976 > > > (*) We can discuss loop free IGP converge in a different thread if you > want. IMO, the use of segment routing/source routing is better than oFIB. > But at some point, it still relies on fast flooding when multiple LSPs are > involved. (and I mean _fast_ not _ordered_) > > > > > > --Bruno > > > > > > One can always come up with examples – based on a specific topology > and a specific failure - where things might be better/worse/unchanged in the > face of inconsistent flooding speed support. > > > But I hope this simple example illustrates the pitfalls. > > > > > > Les > > > > > > > -----Original Message----- > > > > From: bruno.decraene@orange.com <bruno.decraene@orange.com> > > > > Sent: Tuesday, May 05, 2020 8:28 AM > > > > To: Les Ginsberg (ginsberg) <ginsberg@cisco.com>; lsr@ietf.org > > > > Subject: Flooding across a network > > > > > > > > Les, > > > > > > > > > From: Lsr [mailto:lsr-bounces@ietf.org] On Behalf Of Les Ginsberg > > > > (ginsberg) > > > > > Sent: Monday, May 4, 2020 4:39 PM > > > > [...] > > > > > when only some nodes in the network support faster flooding the > behavior > > > > of the whole network may not be "better" when faster flooding is > enabled > > > > because it prolongs the period of LSDB inconsistency. > > > > > > > > 1) Do you have data on this? > > > > > > > > 2) If not, can you provide an example where increasing the flooding > rate on > > > > one adjacency prolongs the period of LSDB inconsistency across the > > > > network? > > > > > > > > 3) In the meantime, let's try the theoretical analysis on a simple > scenario > > > > where a single LSP needs to be flooded across the network. > > > > > > > > - Let's call Dij the time needed to flood the LSP from node i to the > adjacent > > > > node j. Clearly Dij>0. > > > > - Let's call k the node originating this LSP at t0=0s > > > > > > > > >From t0, the LSDB is inconsistent across the network as all nodes but k > are > > > > missing the LSP and hence only know about the 'old' topology. > > > > > > > > Let's call SPT(k) the SPT rooted on k, using Dij as the metric between > > > > adjacent nodes i and j. Let's call SP(k,i) the shortest path from k to i; and > > > > D(k,i) the shortest distance between k and i. > > > > > > > > It seems that the time needed: > > > > - for node j to learn about the LSP, and get in sync with k, is D(k,j) > > > > - for all nodes across the network to learn about the LSP, and get in sync > with > > > > k, is Max[for all j] D(k,j) > > > > > > > > Then how can reducing the flooding delay on one adjacency could > prolongs > > > > the period of LSDB inconsistency? > > > > It seems to me that it can only improve/decrease it. Otherwise, this > would > > > > mean that decreasing the cost on a link can increase the cost of the > shortest > > > > path. > > > > > > > > Note: I agree that there are other cases, such as multiple LSPs > originated by > > > > the same node, and multiple LSPs originated by multiple nodes, but > let's start > > > > with the simple case. > > > > > > > > Thanks, > > > > --Bruno > > > > > > > > > -----Original Message----- > > > > > From: Lsr [mailto:lsr-bounces@ietf.org] On Behalf Of Les Ginsberg > > > > (ginsberg) > > > > > Sent: Monday, May 4, 2020 4:39 PM > > > > > > > > > > Henk - > > > > > > > > > > Thanx for your thoughtful posts. > > > > > I have read your later posts on this thread as well - but decided to > reply to > > > > this one. > > > > > Top posting for better readability. > > > > > > > > > > There is broad agreement that faster flooding is desirable. > > > > > There are now two proposals as to how to address the issue - neither > of > > > > which is proposing to use TCP (or equivalent). > > > > > > > > > > I have commented on why IS-IS flooding requirements are > significantly > > > > different than that for which TCP is used. > > > > > I think it is also useful to note that even the simple test case which > Bruno > > > > reported on in last week's interim meeting demonstrated that without > any > > > > changes to the protocol at all IS-IS was able to flood an order of > magnitude > > > > faster than it commonly does today. > > > > > This gives me hope that we are looking at the problem correctly and > will not > > > > need "TCP". > > > > > > > > > > Introducing a TCP based solution requires: > > > > > > > > > > a)A major change to the adjacency formation logic > > > > > > > > > > b)Removal of the independence of the IS-IS protocol from the > address > > > > families whose reachability advertisements it supports - something > which I > > > > think is a great strength of the protocol - particularly in environments > where > > > > multiple address family support is needed > > > > > > > > > > I really don't want to do either of the above. > > > > > > > > > > Your comments regarding PSNP response times are quite correct - > and > > > > both of the draft proposals discuss this - though I agree more detail will > be > > > > required. > > > > > It is intuitive that if you want to flood faster you also need to ACK > faster - > > > > and probably even retransmit faster when that is needed. > > > > > The basic relationship between retransmit interval and PSNP interval > is > > > > expressed in ISO 10589: > > > > > > > > > > " partialSNPInterval - This is the amount of time between periodic > > > > > action for transmission of Partial Sequence Number PDUs. > > > > > It shall be less than minimumLSPTransmission-Interval." > > > > > > > > > > Of course ISO 10589 recommended values (2 seconds and 5 seconds > > > > respectively) associated with a much slower flooding rate and > > > > implementations I am aware of use values in this order of magnitude. > These > > > > numbers need to be reduced if we are to flood faster, but the > relationship > > > > between the two needs to remain the same. > > > > > > > > > > It is also true - as you state - that sending ACKs more quickly will result > in > > > > additional PDUs which need to be received/processed by IS-IS - and this > has > > > > some impact. But I think it is reasonable to expect that an > implementation > > > > which can support sending and receiving LSPs at a faster rate should > also be > > > > able to send/receive PSNPs at a faster rate. But we still need to be > smarter > > > > than sending one PSNP/one LSP in cases where we have a burst. > > > > > > > > > > LANs are a more difficult problem than P2P - and thus far draft- > ginsberg-lsr- > > > > isis-flooding-scale has been silent on this - but not because we aren't > aware > > > > of this - just have focused on the P2P behavior first. > > > > > What the best behavior on a LAN may be is something I am still > considering. > > > > Slowing flooding down to the speed at which the slowest IS on the LAN > can > > > > support may not be the best strategy - as it also slows down the > propagation > > > > rate for systems downstream from the nodes on the LAN which can > handle > > > > faster flooding - thereby having an impact on flooding speed > throughout the > > > > network in a way which may be out of proportion. This is a smaller > example > > > > of the larger issue that when only some nodes in the network support > faster > > > > flooding the behavior of the whole network may not be "better" when > faster > > > > flooding is enabled because it prolongs the period of LSDB > inconsistency. > > > > More work needs to be done here... > > > > > > > > > > In summary, I don't expect to have to "reinvent TCP" - but I do think > you > > > > have provided a useful perspective for us to consider as we progress on > this > > > > topic, > > > > > > > > > > Thanx. > > > > > > > > > > Les > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > From: Lsr <lsr-bounces@ietf.org> On Behalf Of Henk Smit > > > > > > Sent: Thursday, April 30, 2020 6:58 AM > > > > > > To: lsr@ietf.org > > > > > > Subject: [Lsr] Why only a congestion-avoidance algorithm on the > sender > > > > isn't > > > > > > enough > > > > > > > > > > > > > > > > > > Hello all, > > > > > > > > > > > > Two years ago, Gunter Van de Velde and myself published this > draft: > > > > > > https://tools.ietf.org/html/draft-hsmit-lsr-isis-flooding-over-tcp-00 > > > > > > That started this discussion about flow/congestion control and ISIS > > > > > > flooding. > > > > > > > > > > > > My thoughts were that once we start implementing new algorithms > to > > > > > > optimize ISIS flooding speed, we'll end up with our own version of > TCP. > > > > > > I think most people here have a good general understanding of TCP. > > > > > > But if not, this is a good overview how TCP does it: > > > > > > https://en.wikipedia.org/wiki/TCP_congestion_control > > > > > > > > > > > > > > > > > > What does TCP do: > > > > > > ==== > > > > > > TCP does 2 things: flow control and congestion control. > > > > > > > > > > > > 1) Flow control is: the receiver trying to prevent itself from being > > > > > > overloaded. The receiver indicates, through the receiver-window- > size > > > > > > in the TCP acks, how much data it can or wants to receive. > > > > > > 2) Congestion control is: the sender trying to prevent the links > between > > > > > > sender and receiver from being overloaded. The sender makes an > > > > educated > > > > > > guess at what speed it can send. > > > > > > > > > > > > > > > > > > The part we seem to be missing: > > > > > > ==== > > > > > > For the sender to make a guess at what speed it can send, it looks at > > > > > > how the transmission is behaving. Are there drops ? What is the RTT > ? > > > > > > Do drop-percentage and RTT change ? Do acks come in at the same > rate > > > > > > as the sender sends segments ? Are there duplicate acks ? To be > able > > > > > > to do this, the sender must know what to expect. How acks behave. > > > > > > > > > > > > If you want an ISIS sender to make a guess at what speed it can > send, > > > > > > without changing the protocol, the only thing the sender can do is > look > > > > > > at the PSNPs that come back from the receiver. But the RTT of > PSNPs can > > > > > > not be predicted. Because a good ISIS implementation does not > > > > > > immediately > > > > > > send a PSNP when it receives a LSP. 1) the receiver should jitter the > > > > > > PSNP, > > > > > > like it should jitter all packets. And 2) the receiver should wait a > > > > > > little > > > > > > to see if it can combine multiple acks into a single PSNP packet. > > > > > > > > > > > > In TCP, if a single segment gets lost, each new segment will cause > the > > > > > > receiver to send an ack with the seqnr of the last received byte. This > > > > > > is called "duplicate acks". This triggers the sender to do > > > > > > fast-retransmission. In ISIS, this can't be be done. The information > > > > > > a sender can get from looking at incoming PSNPs is a lot less than > what > > > > > > TCP can learn from incoming acks. > > > > > > > > > > > > > > > > > > The problem with sender-side congestion control: > > > > > > ==== > > > > > > In ISIS, all we know is that the default retransmit-interval is 5 > > > > > > seconds. > > > > > > And I think most implementations use that as the default. This > means > > > > > > that > > > > > > the receiver of an LSP has one requirement: send a PSNP within 5 > > > > > > seconds. > > > > > > For the rest, implementations are free to send PSNPs however and > > > > > > whenever > > > > > > they want. This means a sender can not really make conclusions > about > > > > > > flooding speed, dropped LSPs, capacity of the receiver, etc. > > > > > > There is no ordering when flooding LSPs, or sending PSNPs. This > makes > > > > > > a sender-side algorithm for ISIS a lot harder. > > > > > > > > > > > > When you think about it, you realize that a sender should wait the > > > > > > full 5 seconds before it can make any real conclusions about > dropped > > > > > > LSPs. > > > > > > If a sender looks at PSNPs to determine its flooding speed, it will > > > > > > probably > > > > > > not be able to react without a delay of a few seconds. A sender > might > > > > > > send > > > > > > hunderds or thousands of LSPs in those 5 seconds, which might all > or > > > > > > partially be dropped, complicating matters even further. > > > > > > > > > > > > > > > > > > A sender-sider algorithm should specify how to do PSNPs. > > > > > > ==== > > > > > > So imho a sender-side only algorithm can't work just like that in a > > > > > > multi-vendor environment. We must not only specify a congestion- > > > > control > > > > > > algorithm for the sender. We must also specify for the receiver a > more > > > > > > specific algorithm how and when to send PSNPs. At least how to do > > > > PSNPs > > > > > > under load. > > > > > > > > > > > > Note that this might result in the receiver sending more (and > smaller) > > > > > > PSNPs. > > > > > > More packets might mean more congestion (inside routers). > > > > > > > > > > > > > > > > > > Will receiver-side flow-control work ? > > > > > > ==== > > > > > > I don't know if that's enough. It will certainly help. > > > > > > > > > > > > I think to tackle this problem, we need 3 parts: > > > > > > 1) sender-side congestion-control algorithm > > > > > > 2) more detailed algorithm on receiver when and how to send > PSNPs > > > > > > 3) receiver-side flow-control mechanism > > > > > > > > > > > > As discussed at length, I don't know if the ISIS process on the > > > > > > receiving > > > > > > router can actually know if its running out of resources (buffers on > > > > > > interfaces, linecards, etc). That's implementation dependent. A > receiver > > > > > > can definitely advertise a fixed value. So the sender has an upper > bound > > > > > > to use when doing congestion-control. Just like TCP has both a > > > > > > flow-control > > > > > > window and a congestion-control window, and a sender uses both. > > > > Maybe > > > > > > the > > > > > > receiver can even advertise a dynamic value. Maybe now, maybe > only in > > > > > > the > > > > > > future. An advertised upper limit seems useful to me today. > > > > > > > > > > > > > > > > > > What I didn't like about our own proposal (flooding over TCP): > > > > > > ==== > > > > > > The problem I saw with flooding over TCP concerns multi-point > networks > > > > > > (LANs). > > > > > > > > > > > > When flooding over a multi-point network, setting up TCP > connections > > > > > > introduces serious challenges. Who are the endpoints of the TCP > > > > > > connections ? > > > > > > Full mesh ? Or do all ISes on a LAN create a TCP-connection to the > DIS ? > > > > > > There is no backup DIS in ISIS (unlike OSPF). Things get messy > quickly. > > > > > > > > > > > > However, the other two proposals do not solve this problem either. > > > > > > How will a sender-side congestion-avoidence algorithm determine > > > > whether > > > > > > there were drops ? There are no acks (PSNPs) on a LAN. We assume > most > > > > > > LSPs > > > > > > that are broadcasted are received by all other ISes on the LAN. > There > > > > > > are > > > > > > no acks. Only after the DIS has sent its periodic CSNPs, ISes can send > > > > > > PSNPs to request retransmissions. It seems impossible (or very > hard) to > > > > > > me for all ISes on a LAN to keep track of dropped LSPs and adjust > their > > > > > > sending speed accordingly.. > > > > > > > > > > > > When flooding on a LAN, the receiver-side algorithm seems best. > > > > Because > > > > > > all ISes can see what the lowest advertised sending-speed is. And > make > > > > > > sure they send slow enough to not overload the slowest IS. I'm not > sure > > > > > > this is a good solution, but is seems easier and more realistic than > > > > > > ISIS-flooding-over-TCP or sender-side congestion-avoidance. > > > > > > > > > > > > > > > > > > My conclusion: > > > > > > ==== > > > > > > Sender-side congestion-control won't work without specifying in > more > > > > > > detail how and when to send PSNPs. > > > > > > Receiver-side flow-control will certainly help. I dont' know if it's > > > > > > good enough. I don't know if advertising a static value is good > enough. > > > > > > But it's a start. > > > > > > > > > > > > I still think we'll end up re-implementing a new (and weaker) TCP. > > > > > > > > > > > > > > > > > > henk. > > > > > > > > > > > > _______________________________________________ > > > > > > Lsr mailing list > > > > > > Lsr@ietf.org > > > > > > https://www.ietf.org/mailman/listinfo/lsr > > > > > > > > > > _______________________________________________ > > > > > Lsr mailing list > > > > > Lsr@ietf.org > > > > > https://www.ietf.org/mailman/listinfo/lsr > > > > > > > > > __________________________________________________________ > > > > > __________________________________________________________ > > > > _____ > > > > > > > > Ce message et ses pieces jointes peuvent contenir des informations > > > > confidentielles ou privilegiees et ne doivent donc > > > > pas etre diffuses, exploites ou copies sans autorisation. Si vous avez > recu ce > > > > message par erreur, veuillez le signaler > > > > a l'expediteur et le detruire ainsi que les pieces jointes. Les messages > > > > electroniques etant susceptibles d'alteration, > > > > Orange decline toute responsabilite si ce message a ete altere, > deforme ou > > > > falsifie. Merci. > > > > > > > > This message and its attachments may contain confidential or privileged > > > > information that may be protected by law; > > > > they should not be distributed, used or copied without authorisation. > > > > If you have received this email in error, please notify the sender and > delete > > > > this message and its attachments. > > > > As emails may be altered, Orange is not liable for messages that have > been > > > > modified, changed or falsified. > > > > Thank you. > > > > > > > __________________________________________________________ > __________________________________________________________ > _____ > > > > > > Ce message et ses pieces jointes peuvent contenir des informations > confidentielles ou privilegiees et ne doivent donc > > > pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu > ce message par erreur, veuillez le signaler > > > a l'expediteur et le detruire ainsi que les pieces jointes. Les messages > electroniques etant susceptibles d'alteration, > > > Orange decline toute responsabilite si ce message a ete altere, deforme > ou falsifie. Merci. > > > > > > This message and its attachments may contain confidential or privileged > information that may be protected by law; > > > they should not be distributed, used or copied without authorisation. > > > If you have received this email in error, please notify the sender and > delete this message and its attachments. > > > As emails may be altered, Orange is not liable for messages that have > been modified, changed or falsified. > > > Thank you. > > > > __________________________________________________________ > __________________________________________________________ > _____ > > > > > > Ce message et ses pieces jointes peuvent contenir des informations > confidentielles ou privilegiees et ne doivent donc > > > pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu > ce message par erreur, veuillez le signaler > > > a l'expediteur et le detruire ainsi que les pieces jointes. Les messages > electroniques etant susceptibles d'alteration, > > > Orange decline toute responsabilite si ce message a ete altere, deforme > ou falsifie. Merci. > > > > > > This message and its attachments may contain confidential or privileged > information that may be protected by law; > > > they should not be distributed, used or copied without authorisation. > > > If you have received this email in error, please notify the sender and > delete this message and its attachments. > > > As emails may be altered, Orange is not liable for messages that have > been modified, changed or falsified. > > > Thank you. > > > > > > _______________________________________________ > > > Lsr mailing list > > > Lsr@ietf.org > > > https://www.ietf.org/mailman/listinfo/lsr > > > > __________________________________________________________ > __________________________________________________________ > _____ > > Ce message et ses pieces jointes peuvent contenir des informations > confidentielles ou privilegiees et ne doivent donc > pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce > message par erreur, veuillez le signaler > a l'expediteur et le detruire ainsi que les pieces jointes. Les messages > electroniques etant susceptibles d'alteration, > Orange decline toute responsabilite si ce message a ete altere, deforme ou > falsifie. Merci. > > This message and its attachments may contain confidential or privileged > information that may be protected by law; > they should not be distributed, used or copied without authorisation. > If you have received this email in error, please notify the sender and delete > this message and its attachments. > As emails may be altered, Orange is not liable for messages that have been > modified, changed or falsified. > Thank you.
- [Lsr] Flooding across a network bruno.decraene
- Re: [Lsr] Flooding across a network Les Ginsberg (ginsberg)
- Re: [Lsr] Flooding across a network Robert Raszuk
- Re: [Lsr] Flooding across a network Christian Hopps
- Re: [Lsr] Flooding across a network Robert Raszuk
- Re: [Lsr] Flooding across a network bruno.decraene
- Re: [Lsr] Flooding across a network Les Ginsberg (ginsberg)
- Re: [Lsr] Flooding across a network bruno.decraene
- Re: [Lsr] Flooding across a network Christian Hopps
- Re: [Lsr] Flooding across a network Les Ginsberg (ginsberg)
- Re: [Lsr] Flooding across a network Les Ginsberg (ginsberg)
- Re: [Lsr] Flooding across a network Christian Hopps
- Re: [Lsr] Flooding across a network bruno.decraene
- Re: [Lsr] Flooding across a network Les Ginsberg (ginsberg)
- Re: [Lsr] Flooding across a network Joel M. Halpern
- Re: [Lsr] Flooding across a network Les Ginsberg (ginsberg)
- Re: [Lsr] Flooding across a network Tony Przygienda
- Re: [Lsr] Flooding across a network Mitchell Erblich
- Re: [Lsr] Flooding across a network Jeff Tantsura
- Re: [Lsr] Flooding across a network bruno.decraene
- Re: [Lsr] Flooding across a network Robert Raszuk
- Re: [Lsr] Flooding across a network Les Ginsberg (ginsberg)
- Re: [Lsr] Flooding across a network tony.li
- Re: [Lsr] Flooding across a network bruno.decraene
- Re: [Lsr] Flooding across a network Gyan Mishra
- Re: [Lsr] Flooding across a network bruno.decraene
- Re: [Lsr] Flooding across a network Gyan Mishra