Re: [Lsr] Flooding across a network

"Les Ginsberg (ginsberg)" <ginsberg@cisco.com> Wed, 06 May 2020 17:53 UTC

Return-Path: <ginsberg@cisco.com>
X-Original-To: lsr@ietfa.amsl.com
Delivered-To: lsr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E0C6E3A0947 for <lsr@ietfa.amsl.com>; Wed, 6 May 2020 10:53:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.598
X-Spam-Level:
X-Spam-Status: No, score=-9.598 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com header.b=HBh8Gavu; dkim=pass (1024-bit key) header.d=cisco.onmicrosoft.com header.b=mKAO0V2N
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AyDceqzgh8kz for <lsr@ietfa.amsl.com>; Wed, 6 May 2020 10:53:20 -0700 (PDT)
Received: from alln-iport-1.cisco.com (alln-iport-1.cisco.com [173.37.142.88]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 13E0A3A07F8 for <lsr@ietf.org>; Wed, 6 May 2020 10:53:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=44974; q=dns/txt; s=iport; t=1588787600; x=1589997200; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=wfL+FauXpeOmLh+ztZDXomlumj9hJSrUSGnpd9M1IVE=; b=HBh8GavuRAmAlBX4uBfgmO6xJyDu3G55FQn3anfkKBSY730WAuN8JFkq JiHhURAGtSCZVnVtRVYeBhEGuQrNCRq9OHsu+/MPW8ub5f5qtnkvAitU9 S8z40Sy+iEdmzeGaTzaEg3Kr2/wO2vrOj5ZXLbpGWyRoJeJfO2r6sD2nY I=;
IronPort-PHdr: 9a23:fg19pBKYQExWhS5SDtmcpTVXNCE6p7X5OBIU4ZM7irVIN76u5InmIFeGvKs/i0XAW4rWrflDjrmev6PhXDkG5pCM+DAHfYdXXhAIwcMRg0Q7AcGDBEG6SZyibyEzEMlYElMw+Xa9PBtSAs/4aFCUqXq3vnYeHxzlPl9zIeL4UofZk8Ww0bW0/JveKwVFjTawe/V8NhKz+A7QrcIRx4BlL/U8
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A0AUAADm+LJe/4QNJK1mGgEBAQEBAQEBAQEDAQEBARIBAQEBAgIBAQEBQIE1AwEBAQELAYFTKSgFblgvKoQjg0YDjUeYNYEuFIEQA1AECwEBAQwBARgLCgIEAQGBUIJ0AheBaiQ2Bw4CAwEBCwEBBQEBAQIBBQRthSoIJAyFcQEBAQECAQEBEAgBCBEMAQEsBAcBCwQCAQYCEQQBAQECAiMDAgICJQsUAQgIAgQBDQUIEweCOUyCSwMOIAEOmDOQZwKBOYhhdoEygwABAQWBNgKDbRiCDgMGgQ4qAYJigkmHGBqBQT+BEAFDgU9+PoJnAQEDgRwRAQwGASMVD4JsM4ItjiYYBDABAoJUkDKOeIFiCoJIiBiFe4RWhUqCWzOILoR7jGmQF4FYh3yPOYQPAgQCBAUCDgEBBYFZATFDI3BwFTuCaVAYDY04gwqBJgEJgkKFFIVCdDcCBgEHAQEDCXyPBoJEAQE
X-IronPort-AV: E=Sophos;i="5.73,360,1583193600"; d="scan'208";a="474101464"
Received: from alln-core-10.cisco.com ([173.36.13.132]) by alln-iport-1.cisco.com with ESMTP/TLS/DHE-RSA-SEED-SHA; 06 May 2020 17:53:18 +0000
Received: from XCH-ALN-001.cisco.com (xch-aln-001.cisco.com [173.36.7.11]) by alln-core-10.cisco.com (8.15.2/8.15.2) with ESMTPS id 046HrIrb003244 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=FAIL); Wed, 6 May 2020 17:53:18 GMT
Received: from xhs-aln-002.cisco.com (173.37.135.119) by XCH-ALN-001.cisco.com (173.36.7.11) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Wed, 6 May 2020 12:53:18 -0500
Received: from xhs-rcd-003.cisco.com (173.37.227.248) by xhs-aln-002.cisco.com (173.37.135.119) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Wed, 6 May 2020 12:53:17 -0500
Received: from NAM10-DM6-obe.outbound.protection.outlook.com (72.163.14.9) by xhs-rcd-003.cisco.com (173.37.227.248) with Microsoft SMTP Server (TLS) id 15.0.1497.2 via Frontend Transport; Wed, 6 May 2020 12:53:16 -0500
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=S3vl5dHMLfSv5tnrfOtjugIOBOcvrP3pkfl5PxngwUT0Bj4IaEhrZbuLs1t3og4WC/ZbJK94e46IsIxdC751oVm9OTOkBdh9dGCAZemywAF7o1nWwY1zLSrnBgFHqVtTM9Urq2+3AKqTq5Bs3iEU71U7TPW95mk+1a+52YvrLmyRbu8tCywTWj2D8FzdZ8MCyUcj1rGk2+6qHMXhBHHtkc+1BKvVwRn4EFMdjcPWuhFMcsKf81onGV5wQ8nSD/y11dHLmh+M3KQ0vHdA/8kKDMpXJ/Gv4l822Jx3FFOGPmZqUrRT5WYFSsFTDuFbCOZSYofPIYPfTCEB6qSjzmCiEA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=wfL+FauXpeOmLh+ztZDXomlumj9hJSrUSGnpd9M1IVE=; b=IhSv5/0VJO2ByXUYvUGGiLFyjFoRa4jJE3shHqEpSwkWrUMX0vY0AeDuFEWR+lfx7hbtB0XVrsr1lIKJEpWXD3dA/XLb6tYO7wgR37/rfM7t6j9QmifLTnJ4vqUGu06OUYkpmRf9OAfpYGJgwIHQcAbhiz/uiWgDnzZXnOOb84Q7ed80jHEBhbuvBFcnKOJOSSMWYduCe6bHUgO/maJob8K8gErGoXK4JLwdiQbRvtg8YpU/+QyVOK/tuqSTiNfsIQ5IegRS0+2GDZw28XO0nwkZFcgPioojDqUtVHmbCGBWBcVkeqiNr8PIkieT0Cl1kC297DRBqlz5VXcRfZ7srw==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=cisco.com; dmarc=pass action=none header.from=cisco.com; dkim=pass header.d=cisco.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cisco.onmicrosoft.com; s=selector2-cisco-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=wfL+FauXpeOmLh+ztZDXomlumj9hJSrUSGnpd9M1IVE=; b=mKAO0V2NVI1474gsrt2hXnbkk6+4xidT3uhVR876oAj4rFlbtJbKPeZdow/P3of5d63UJR8KxZkg5YMplkb+ACOaboQbWUd6ZH21NaXb68a+N2kspTztMt9zTndo5vukujLRnAvvgApY8OaaYvMDJKEcHFylZa8C7EOBupCgsvw=
Received: from MW3PR11MB4619.namprd11.prod.outlook.com (2603:10b6:303:5b::15) by MW3PR11MB4698.namprd11.prod.outlook.com (2603:10b6:303:5a::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2958.27; Wed, 6 May 2020 17:53:15 +0000
Received: from MW3PR11MB4619.namprd11.prod.outlook.com ([fe80::c4d2:505c:a6bf:21a6]) by MW3PR11MB4619.namprd11.prod.outlook.com ([fe80::c4d2:505c:a6bf:21a6%6]) with mapi id 15.20.2979.028; Wed, 6 May 2020 17:53:15 +0000
From: "Les Ginsberg (ginsberg)" <ginsberg@cisco.com>
To: "bruno.decraene@orange.com" <bruno.decraene@orange.com>, Christian Hopps <chopps@chopps.org>
CC: "lsr@ietf.org" <lsr@ietf.org>
Thread-Topic: [Lsr] Flooding across a network
Thread-Index: AdYi7bsxcCpCOCQDSjylEJ0cQECozgAREGJwAByiJrAAAl2gYAAA0u8gAAEHvYAABO66gAABUTNw
Date: Wed, 06 May 2020 17:53:15 +0000
Message-ID: <MW3PR11MB4619015E4B356DFC225CD001C1A40@MW3PR11MB4619.namprd11.prod.outlook.com>
References: <24209_1588692477_5EB185FD_24209_35_1_53C29892C857584299CBF5D05346208A48E3D455@OPEXCAUBM43.corporate.adroot.infra.ftgroup> <MW3PR11MB46198A668B9F2532BCCC38FEC1A70@MW3PR11MB4619.namprd11.prod.outlook.com> <6287_1588771252_5EB2B9B4_6287_332_1_53C29892C857584299CBF5D05346208A48E3F698@OPEXCAUBM43.corporate.adroot.infra.ftgroup> <MW3PR11MB46199CC33B10BC9D3D622D2AC1A40@MW3PR11MB4619.namprd11.prod.outlook.com> <10562_1588775602_5EB2CAB2_10562_251_11_53C29892C857584299CBF5D05346208A48E3FB63@OPEXCAUBM43.corporate.adroot.infra.ftgroup> <87CDE7F3-E08D-4C45-9AF1-9DAD635F8908@chopps.org> <9992_1588784982_5EB2EF56_9992_201_1_53C29892C857584299CBF5D05346208A48E40256@OPEXCAUBM43.corporate.adroot.infra.ftgroup>
In-Reply-To: <9992_1588784982_5EB2EF56_9992_201_1_53C29892C857584299CBF5D05346208A48E40256@OPEXCAUBM43.corporate.adroot.infra.ftgroup>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
authentication-results: orange.com; dkim=none (message not signed) header.d=none;orange.com; dmarc=none action=none header.from=cisco.com;
x-originating-ip: [2602:306:36ca:6640:297d:1f43:941a:5bfe]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: 755f56b1-7ea8-4a36-3e26-08d7f1e65a52
x-ms-traffictypediagnostic: MW3PR11MB4698:
x-microsoft-antispam-prvs: <MW3PR11MB46987635D0274AFC74A8D83AC1A40@MW3PR11MB4698.namprd11.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:10000;
x-forefront-prvs: 03950F25EC
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: BNCF0s+pMiWAT/49s1vM113DRkMLSM3OssbRhAYv6MCQ/V7eSKoOWTR8CnF3srmSbUNrRhFLCdyh8EtySibpBXd9oa6Bxrt/07HXdl7sA2dC3f4rWOAwZWeXW3LGemHD1KeoVJL8G+SvNZ4q/HJ8It2WnaToIZsxjiP1sx7hmYr1dlC+SrITKohBRimUdPRxfyWAuMa8nZ3SY+1Evzyk/Vr4FF4w8DvgZhLMKScbjn3Nfzc3KHkOMoOumvfbnvmWQRfZYtACzcAfBH+GmHfD1NFXiGsZkPiuHixIFEVr+yN3U+FHfgq/DhJroJGboeRyj4/nzE8CLUqJ/iylvSGD/iEOGzxmUW6CYkxqEzdKE/+D5/r3fVrWXHIZ/iAh4PiLTIya7z1JgRFyE7B9ZF/CtH+laQSspVCND+BKoj0OtwKibIM0VXIfv+8NVqQmud7lAhu11DG9xArbJgiOKZDdRMMTPbCT7Ig0Kf7WILA9ZVrplt+9GAgsJQ/atT/Kl0VM0zqloZrgndmZtQdDgMWJE3DMA08YE+1d8WklPKeI92wugwAkNu4DsJRbXn4fFDjpziWkuSPek95T4Q6pgMQVO8ueFiYASt7ekdUu7p/MSw4=
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:MW3PR11MB4619.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFTY:; SFS:(4636009)(39860400002)(136003)(376002)(366004)(346002)(396003)(33430700001)(55016002)(9686003)(186003)(53546011)(6506007)(2906002)(30864003)(8676002)(71200400001)(86362001)(110136005)(966005)(8936002)(66476007)(64756008)(66446008)(66556008)(316002)(66946007)(76116006)(4326008)(7696005)(52536014)(478600001)(5660300002)(33656002)(33440700001)(66574014)(579004)(559001); DIR:OUT; SFP:1101;
x-ms-exchange-antispam-messagedata: XeWP6YWleL1vPKbvtzac7MExvbS3qTX2GiAVro9MPsBqHl7Totu+Zb1UKmb1VCxeXFAQgg2wraH3KenUgLs656fowXh8/ODgBPXrAkPs9GTR7PjBMr8fSAAXuZ5v654dBtevS1sy8CKjPxqW64GRUhjS+Nl5pCml/dplT2ctH4v8mifdYwIOU6oKs2Dh5OMJAd1QUBsqjOwFkvJfTQqqq15gW0eZEyLQvgxcpT9BM/XV6rAOAvIsu5Vpa1VCikRlsV1l0vxzbE8teop3/UN4lH3fl+9lUKpBuRsn7hUBSZXZVMqbisWEWasqEx6I2p38rgBjyHW/QEzuSxVs8XjPpjM2K2MufeGivVuTYgJjvV3zIax6TQc4SVLXN3be1YxDg4oSWnw4wW2ejtgQ8Yzj6Cuqxf1hKU0umWb4Wz0OD9vi+MsDHWRxiNFVgnLTktkEm65tAKmkpt0GuVTA+9MC3+0NaSpwL3OzRf/zNy28f4opt0jHOeM8mFxNU3G7VvzpopQQoJnNob20xRepcyjIqOUSh4YXolJGT02YzRmhb/1Vr0mCFiyEJIOwprN+GbsOTo55miEa5vWqBGIAHY+2VCX/b5r3s1AV9Eq/MvCMwB2GDwyCrUcATLlpq3uB6TR4JZGmKKYVfdLSXgJjg9igd6zaOSQqpW5Q3istTGom7+up8MO+Z06e0MdMmyLABQwGqrK1e+swGm0kbF8dD+czNksGjYQkPkK3ihUYifHHJ9hFJM1/ckYC/4himiVZ48r3z014pMbu+HIPyMSGvgF2rUobtC2k/BTMN+SvVkkXGSwu/ub72M+5YUsYpOxMT7GGKkl5Cv0YjzRyKujqhkdbEf3bMhSy3s13Rl6XLLnn2sQ=
x-ms-exchange-transport-forked: True
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-Network-Message-Id: 755f56b1-7ea8-4a36-3e26-08d7f1e65a52
X-MS-Exchange-CrossTenant-originalarrivaltime: 06 May 2020 17:53:15.5046 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 5ae1af62-9505-4097-a69a-c1553ef7840e
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: 0bEftXJNjKqaN1/aNmVP1kF4rZhRxH0xci6ZeJy7ntQiaBoAYU5IrYvd2OL9jXFmE25oD2+dmHFFCZiK3amtnQ==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW3PR11MB4698
X-OriginatorOrg: cisco.com
X-Outbound-SMTP-Client: 173.36.7.11, xch-aln-001.cisco.com
X-Outbound-Node: alln-core-10.cisco.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/-aAX450y9dL-AA1wYqwKjYrDk5I>
Subject: Re: [Lsr] Flooding across a network
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 06 May 2020 17:53:25 -0000

Bruno -

I am sorry it has been so difficult for us to understand each other. I am trying my best.

Look at it this way:

You are the customer. 😊
I am the vendor. 

The failure scenario I describe below happens and you notice that all Northbound destinations loop for 35 seconds whenever fast flooding is enabled.
I think you are going to complain about this - to me. 😊

And I am going to tell you that this is a consequence of enabling fast flooding in the presence of a node which does not support it. Your options to reduce the period of looping will be:

1)Upgrade the slow node to support faster flooding
2)Disable fast flooding
3)Redesign your network

    Les

> -----Original Message-----
> From: bruno.decraene@orange.com <bruno.decraene@orange.com>
> Sent: Wednesday, May 06, 2020 10:10 AM
> To: Christian Hopps <chopps@chopps.org>
> Cc: Les Ginsberg (ginsberg) <ginsberg@cisco.com>; lsr@ietf.org
> Subject: RE: [Lsr] Flooding across a network
> 
> > From: Christian Hopps [mailto:chopps@chopps.org]
> >
> > Bruno persistence has made me realize something fundamental here.
> >
> > The minute the LSP originator changes the LSP and floods it you have LSDB
> inconsistency.
> 
> Exactly my point. Thank you Chris.
> I would even say: "The minute the LSP originator changes the LSP then you
> have LSDB inconsistency." But no big deal if there is disagreement on this
> detail.
> 
> > That is going to last until the last node in the network has updated it's LSDB.
> 
> Absolutely.
> So the faster we flood, the shorter the LSBD inconsistency.
> 
> Now IMO, even if a single/few nodes flood faster, there is a chance of
> shortening the LSDB inconsistency. But in all cases, I don't see how this could
> make the LSDB inconsistency longer.
> 
> 
> > Les is pointing out that LSDB inconsistency can be bad in certain
> circumstances e.g., if a critical node is slow and thus inconsistent.
> >
> > I believe the right way to fix this is a simple one, help the operator flag the
> broken router software/hardware for replacement, but otherwise IS-IS
> should just try to do the best job it can do to which is to flood around the
> problem (i.e., flood as optimally as possible).
> 
> +1
> On a side note, I would not call a router flooding slowly as "broken". I find it
> understandable that in a given network there are different type of routers
> (core vs aggregation), different roles (P having 50 IGP adjacencies with 50 PEs
> vs PE having only 2 IGP adjacencies with 2 P), different hardware
> generations, different software, different vendors with different
> perspectives/markets.
> 
> Thank you Chris.
> 
> --Bruno
> >
> > Thanks,
> > Chris.
> > [as WG member]
> >
> >
> > > On May 6, 2020, at 10:33 AM, bruno.decraene@orange.com wrote:
> > >
> > > Les,
> > >
> > > From: Les Ginsberg (ginsberg) [mailto:ginsberg@cisco.com]
> > > Sent: Wednesday, May 6, 2020 4:14 PM
> > > To: DECRAENE Bruno TGI/OLN
> > > Cc: lsr@ietf.org
> > > Subject: RE: Flooding across a network
> > >
> > > Bruno –
> > >
> > > I am somewhat at a loss to understand your comments.
> > > The example is straightforward and does not need to consider FIB update
> time nor the ordering of prefix updates on different nodes.
> > > [Bruno] The example is straightforward but you are referring to FIB and IP
> packets forwarding as per those FIBs.
> > > I’d like we focus on LSP flooding and LSDB consistency.
> > >
> > > Consider the state of Node B and Node D at various time points from the
> trigger event.
> > >
> > > T+ 2 seconds:
> > > -----------------
> > > B has received all LSP Updates. It triggers an SPF and for all Northbound
> destinations previously reachable via C it installs paths via D.
> > > Let’s assume it take 5 seconds to update the forwarding plane.
> > >
> > > D has received 40 of the 1000 LSP updates. It triggers an SPF and finds
> that all Northbound destinations are reachable via B-C. It makes no changes
> to the forwarding plane.
> > >
> > > T+7 seconds
> > > -----------------
> > > B has completed FIB updates. Traffic to all Northbound destinations is
> being forwarded via D.
> > >
> > > D has now received 140 of the 1000 LSP updates. Entries in its forwarding
> plane for Northbound destinations still point to B.
> > >
> > > We have a loop.
> > >
> > > T + 30 seconds
> > > --------------------
> > > D has now received 600 of the 1000 LSP updates. Still no changes to its
> forwarding plane.
> > > Traffic to Northbound destinations is still looping.
> > >
> > > T+ 50 seconds
> > > -------------------
> > > D has finally received all 1000 LSP updates..
> > > It triggers (another) SPF and calculates paths to Northbound destinations
> via E. It begins to update its forwarding plane.
> > > Let’s assume this will take 5 seconds..
> > >
> > > T + 55 seconds
> > > --------------------
> > > D has completed forwarding plane updates – no more looping.
> > >
> > > That is all I am trying to illustrate.
> > >
> > > If you want to start arguing that node protecting LFAs + microloop
> avoidance could help (NOTE I explicitly  took those out of the example for
> simplicity) – it is easy enough to change the example to include multiple node
> failures or a node failure plus some northbound link failures on other nodes.
> > > [Bruno] I’m not talking about LFA/FRR. And with regards to microloops
> avoidance, some algorithms can handle any graph transition so including
> multiple node failures.
> > >
> > > But again, let’s stick to LSP flooding and LSDB consistency. (you are the
> one speaking about microloops in the forwarding plane).
> > >
> > > The point here is to look at the impact of long-lived LSDB inconsistency
> which results when some nodes support flooding an order of magnitude
> faster flooding than other nodes – which is what you asked me to clarify.
> > > [Bruno] No. I asked you to clarify why having a node with faster flooding
> could prolongs the period of LSDB inconsistency.
> > >
> > > Again, with you own words: “when only some nodes in the network
> support faster flooding the behavior of the whole network may not be
> "better" when faster flooding is enabled because it prolongs the period of
> LSDB inconsistency.”
> > > And with less words: “when only some nodes in the network support
> faster flooding […]  it prolongs the period of LSDB inconsistency.”
> > >
> > > --Bruno
> > >
> > >    Les
> > >
> > >
> > >
> > > From: bruno.decraene@orange.com <bruno.decraene@orange.com>
> > > Sent: Wednesday, May 06, 2020 6:21 AM
> > > To: Les Ginsberg (ginsberg) <ginsberg@cisco.com>
> > > Cc: lsr@ietf.org
> > > Subject: RE: Flooding across a network
> > >
> > > Les,
> > >
> > > From: Les Ginsberg (ginsberg) [mailto:ginsberg@cisco.com]
> > > Sent: Wednesday, May 6, 2020 1:35 AM
> > > To: DECRAENE Bruno TGI/OLN; lsr@ietf..org
> > > Subject: RE: Flooding across a network
> > >
> > > Bruno -
> > >
> > > Seems like it was not too long ago that we were discussing this in person.
> Ahhh...the good old days...
> > > [Bruno] Indeed, may be not to the point of concluding. Indeed.
> > >
> > > First, let's agree that the interesting case does not involve 1 or even a
> small number of LSPs. For those cases flooding speed does not matter.
> > > The interesting cases involve a large number of LSPs (hundreds or
> thousands). And in such cases LFA/microloop avoidance techniques are not
> applicable.
> > >
> > > Take the following simple topology:
> > >
> > >    |  | ... |            |
> > >      +---+             +---+
> > >      | C |             | E |
> > >      +---+             +---+
> > >        |                 | 1000
> > >      +---+             +---+
> > >      | B |-------------| D |
> > >      +---+   1000      +---+
> > >        |                 |
> > >        |                 |
> > >         \               /
> > >          \            /
> > >           \         /
> > >            \      /
> > >              +---+
> > >              | A |
> > >              +---+
> > >
> > > There is a topology northbound of C and E (not shown) and a topology
> southbound of A (not shown).
> > > Cost on all links is 10 except B-D and D-E where cost is high.
> > >
> > > C is a node with 1000 neighbors.
> > > When all links are up, shortest path for all northbound destinations is via
> C.
> > > All nodes in the network support fast flooding except for Node D.
> > > Let’s say fast flooding is 500 LSPs/second and slow flooding (Node D) is 20
> LSPs/seconds.
> > > If  Node C fails we have 1000 LSPs to flood.
> > > All nodes except for D can receive these in 2 seconds (plus internode
> delay time).
> > > D can receive LSPs in 50 seconds.
> > >
> > > [Bruno] Thanks for your example. Agreed so far.
> > >
> > > When A and B and all southbound nodes receive/process the LSP
> updates they will start sending traffic to Northbound destinations via D.
> > > But for the better part of 50 seconds, Node D has yet to receive all LSP
> updates and still believes that shortest path is via B-C. It will loop traffic.
> > >
> > > [Bruno] May I remind you that we are discussing IS-IS flooding in order to
> sync LSDB (LSP database). That is already a big enough subject. It does not
> including FIB (updates), nor IP forwarding.
> > >
> > > Quoting you “when only some nodes in the network support faster
> flooding the behavior of the whole network may not be "better" when faster
> flooding is enabled because it prolongs the period of LSDB inconsistency.”
> > >
> > > Taking your own examples, in both cases (all nodes support fast flooding;
> all nodes but D support fast flooding) the period of LSDB inconsistency is 50
> seconds. Hence this example does not illustrate your statement.
> > >
> > > Hence I’m restating my questions:
> > >
> > > > > when only some nodes in the network support faster flooding the
> behavior
> > > > of the whole network may not be "better" when faster flooding is
> enabled
> > > > because it prolongs the period of LSDB inconsistency.
> > > >
> > > > 1) Do you have data on this?
> > > >
> > > > 2) If not, can you provide an example where increasing the flooding
> rate on
> > > > one adjacency prolongs the period of LSDB inconsistency across the
> > > > network?
> > >
> > >
> > > Had all nodes used slow flooding, it still would have taken 50 seconds to
> converge, but there would be significantly less looping. There could be a
> good amount of blackholing, but this is preferable to looping.
> > > [Bruno] You are using an example where ordering FIB updates across the
> network, e.g. as per [1], allows to reduce _FIB_ inconsistency across the
> path/network. And you seem to conclude from this that this translates to
> LSDB update ordering. Those are two different things. In this thread, I’d
> suggest that we focus on IGP flooding and LSDB sync only. (*)
> > > [1] https://tools.ietf.org/html/rfc6976
> > > (*) We can discuss loop free IGP converge in a different thread if you
> want. IMO, the use of segment routing/source routing is better than oFIB.
> But at some point, it still relies on fast flooding when multiple LSPs are
> involved. (and I mean _fast_ not _ordered_)
> > >
> > > --Bruno
> > >
> > > One can always come up with examples – based on a specific topology
> and a specific failure - where things might be better/worse/unchanged in the
> face of inconsistent flooding speed support.
> > > But I hope this simple example illustrates the pitfalls.
> > >
> > >     Les
> > >
> > > > -----Original Message-----
> > > > From: bruno.decraene@orange.com <bruno.decraene@orange.com>
> > > > Sent: Tuesday, May 05, 2020 8:28 AM
> > > > To: Les Ginsberg (ginsberg) <ginsberg@cisco.com>; lsr@ietf.org
> > > > Subject: Flooding across a network
> > > >
> > > > Les,
> > > >
> > > > > From: Lsr [mailto:lsr-bounces@ietf.org] On Behalf Of Les Ginsberg
> > > > (ginsberg)
> > > > > Sent: Monday, May 4, 2020 4:39 PM
> > > > [...]
> > > > > when only some nodes in the network support faster flooding the
> behavior
> > > > of the whole network may not be "better" when faster flooding is
> enabled
> > > > because it prolongs the period of LSDB inconsistency.
> > > >
> > > > 1) Do you have data on this?
> > > >
> > > > 2) If not, can you provide an example where increasing the flooding
> rate on
> > > > one adjacency prolongs the period of LSDB inconsistency across the
> > > > network?
> > > >
> > > > 3) In the meantime, let's try the theoretical analysis on a simple
> scenario
> > > > where a single LSP needs to be flooded across the network.
> > > >
> > > > - Let's call Dij the time needed to flood the LSP from node i to the
> adjacent
> > > > node j. Clearly Dij>0.
> > > > - Let's call k the node originating this LSP at t0=0s
> > > >
> > > > >From t0, the LSDB is inconsistent across the network as all nodes but k
> are
> > > > missing the LSP and hence only know about the 'old' topology.
> > > >
> > > > Let's call  SPT(k) the SPT rooted on k, using Dij as the metric between
> > > > adjacent nodes i and j. Let's call SP(k,i) the shortest path from k to i; and
> > > > D(k,i) the shortest distance between k and i.
> > > >
> > > > It seems that the time needed:
> > > > - for node j to learn about the LSP, and get in sync with k, is D(k,j)
> > > > - for all nodes across the network to learn about the LSP, and get in sync
> with
> > > > k, is Max[for all j] D(k,j)
> > > >
> > > > Then how can reducing the flooding delay on one adjacency could
> prolongs
> > > > the period of LSDB inconsistency?
> > > > It seems to me that it can only improve/decrease it. Otherwise, this
> would
> > > > mean that decreasing the cost on a link can increase the cost of the
> shortest
> > > > path.
> > > >
> > > > Note: I agree that there are other cases, such as  multiple LSPs
> originated by
> > > > the same node, and multiple LSPs originated by multiple nodes, but
> let's start
> > > > with the simple case.
> > > >
> > > > Thanks,
> > > > --Bruno
> > > >
> > > > > -----Original Message-----
> > > > > From: Lsr [mailto:lsr-bounces@ietf.org] On Behalf Of Les Ginsberg
> > > > (ginsberg)
> > > > > Sent: Monday, May 4, 2020 4:39 PM
> > > > >
> > > > > Henk -
> > > > >
> > > > > Thanx for your thoughtful posts.
> > > > > I have read your later posts on this thread as well - but decided to
> reply to
> > > > this one.
> > > > > Top posting for better readability.
> > > > >
> > > > > There is broad agreement that faster flooding is desirable.
> > > > > There are now two proposals as to how to address the issue - neither
> of
> > > > which is proposing to use TCP (or equivalent).
> > > > >
> > > > > I have commented on why IS-IS flooding requirements are
> significantly
> > > > different than that for which TCP is used.
> > > > > I think it is also useful to note that even the simple test case which
> Bruno
> > > > reported on in last week's interim meeting demonstrated that without
> any
> > > > changes to the protocol at all IS-IS was able to flood an order of
> magnitude
> > > > faster than it commonly does today.
> > > > > This gives me hope that we are looking at the problem correctly and
> will not
> > > > need "TCP".
> > > > >
> > > > > Introducing a TCP based solution requires:
> > > > >
> > > > > a)A major change to the adjacency formation logic
> > > > >
> > > > > b)Removal of the independence of the IS-IS protocol from the
> address
> > > > families whose reachability advertisements it supports - something
> which I
> > > > think is a great strength of the protocol - particularly in environments
> where
> > > > multiple address family support is needed
> > > > >
> > > > > I really don't want to do either of the above.
> > > > >
> > > > > Your comments regarding PSNP response times are quite correct -
> and
> > > > both of the draft proposals discuss this - though I agree more detail will
> be
> > > > required.
> > > > > It is intuitive that if you want to flood faster you also need to ACK
> faster -
> > > > and probably even retransmit faster when that is needed.
> > > > > The basic relationship between retransmit interval and PSNP interval
> is
> > > > expressed in ISO 10589:
> > > > >
> > > > > " partialSNPInterval - This is the amount of time between periodic
> > > >         > action for transmission of Partial Sequence Number PDUs.
> > > >         > It shall be less than minimumLSPTransmission-Interval."
> > > > >
> > > > > Of course ISO 10589 recommended values (2 seconds and 5 seconds
> > > > respectively) associated with a much slower flooding rate and
> > > > implementations I am aware of use values in this order of magnitude.
> These
> > > > numbers need to be reduced if we are to flood faster, but the
> relationship
> > > > between the two needs to remain the same.
> > > > >
> > > > > It is also true - as you state - that sending ACKs more quickly will result
> in
> > > > additional PDUs which need to be received/processed by IS-IS - and this
> has
> > > > some impact. But I think it is reasonable to expect that an
> implementation
> > > > which can support sending and receiving LSPs at a faster rate should
> also be
> > > > able to send/receive PSNPs at a faster rate. But we still need to be
> smarter
> > > > than sending one PSNP/one LSP in cases where we have a burst.
> > > > >
> > > > > LANs are a more difficult problem than P2P - and thus far draft-
> ginsberg-lsr-
> > > > isis-flooding-scale has been silent on this - but not because we aren't
> aware
> > > > of this - just have focused on the P2P behavior first.
> > > > > What the best behavior on a LAN may be is something I am still
> considering.
> > > > Slowing flooding down to the speed at which the slowest IS on the LAN
> can
> > > > support may not be the best strategy - as it also slows down the
> propagation
> > > > rate for systems downstream from the nodes on the LAN which can
> handle
> > > > faster flooding - thereby having an impact on flooding speed
> throughout the
> > > > network in a way which may be out of proportion. This is a smaller
> example
> > > > of the larger issue that when only some nodes in the network support
> faster
> > > > flooding the behavior of the whole network may not be "better" when
> faster
> > > > flooding is enabled because it prolongs the period of LSDB
> inconsistency.
> > > > More work needs to be done here...
> > > > >
> > > > > In summary, I don't expect to have to "reinvent TCP" - but I do think
> you
> > > > have provided a useful perspective for us to consider as we progress on
> this
> > > > topic,
> > > > >
> > > > > Thanx.
> > > > >
> > > >     > Les
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Lsr <lsr-bounces@ietf.org> On Behalf Of Henk Smit
> > > > > > Sent: Thursday, April 30, 2020 6:58 AM
> > > > > > To: lsr@ietf.org
> > > > > > Subject: [Lsr] Why only a congestion-avoidance algorithm on the
> sender
> > > > isn't
> > > > > > enough
> > > > > >
> > > > > >
> > > > > > Hello all,
> > > > > >
> > > > > > Two years ago, Gunter Van de Velde and myself published this
> draft:
> > > > > > https://tools.ietf.org/html/draft-hsmit-lsr-isis-flooding-over-tcp-00
> > > > > > That started this discussion about flow/congestion control and ISIS
> > > > > > flooding.
> > > > > >
> > > > > > My thoughts were that once we start implementing new algorithms
> to
> > > > > > optimize ISIS flooding speed, we'll end up with our own version of
> TCP.
> > > > > > I think most people here have a good general understanding of TCP.
> > > > > > But if not, this is a good overview how TCP does it:
> > > > > > https://en.wikipedia.org/wiki/TCP_congestion_control
> > > > > >
> > > > > >
> > > > > > What does TCP do:
> > > > > > ====
> > > > > > TCP does 2 things: flow control and congestion control.
> > > > > >
> > > > > > 1) Flow control is: the receiver trying to prevent itself from being
> > > > > > overloaded. The receiver indicates, through the receiver-window-
> size
> > > > > > in the TCP acks, how much data it can or wants to receive.
> > > > > > 2) Congestion control is: the sender trying to prevent the links
> between
> > > > > > sender and receiver from being overloaded. The sender makes an
> > > > educated
> > > > > > guess at what speed it can send.
> > > > > >
> > > > > >
> > > > > > The part we seem to be missing:
> > > > > > ====
> > > > > > For the sender to make a guess at what speed it can send, it looks at
> > > > > > how the transmission is behaving. Are there drops ? What is the RTT
> ?
> > > > > > Do drop-percentage and RTT change ? Do acks come in at the same
> rate
> > > > > > as the sender sends segments ? Are there duplicate acks ? To be
> able
> > > > > > to do this, the sender must know what to expect. How acks behave.
> > > > > >
> > > > > > If you want an ISIS sender to make a guess at what speed it can
> send,
> > > > > > without changing the protocol, the only thing the sender can do is
> look
> > > > > > at the PSNPs that come back from the receiver. But the RTT of
> PSNPs can
> > > > > > not be predicted. Because a good ISIS implementation does not
> > > > > > immediately
> > > > > > send a PSNP when it receives a LSP. 1) the receiver should jitter the
> > > > > > PSNP,
> > > > > > like it should jitter all packets. And 2) the receiver should wait a
> > > > > > little
> > > > > > to see if it can combine multiple acks into a single PSNP packet.
> > > > > >
> > > > > > In TCP, if a single segment gets lost, each new segment will cause
> the
> > > > > > receiver to send an ack with the seqnr of the last received byte. This
> > > > > > is called "duplicate acks". This triggers the sender to do
> > > > > > fast-retransmission. In ISIS, this can't be be done. The information
> > > > > > a sender can get from looking at incoming PSNPs is a lot less than
> what
> > > > > > TCP can learn from incoming acks.
> > > > > >
> > > > > >
> > > > > > The problem with sender-side congestion control:
> > > > > > ====
> > > > > > In ISIS, all we know is that the default retransmit-interval is 5
> > > > > > seconds.
> > > > > > And I think most implementations use that as the default. This
> means
> > > > > > that
> > > > > > the receiver of an LSP has one requirement: send a PSNP within 5
> > > > > > seconds.
> > > > > > For the rest, implementations are free to send PSNPs however and
> > > > > > whenever
> > > > > > they want. This means a sender can not really make conclusions
> about
> > > > > > flooding speed, dropped LSPs, capacity of the receiver, etc.
> > > > > > There is no ordering when flooding LSPs, or sending PSNPs. This
> makes
> > > > > > a sender-side algorithm for ISIS a lot harder.
> > > > > >
> > > > > > When you think about it, you realize that a sender should wait the
> > > > > > full 5 seconds before it can make any real conclusions about
> dropped
> > > > > > LSPs.
> > > > > > If a sender looks at PSNPs to determine its flooding speed, it will
> > > > > > probably
> > > > > > not be able to react without a delay of a few seconds. A sender
> might
> > > > > > send
> > > > > > hunderds or thousands of LSPs in those 5 seconds, which might all
> or
> > > > > > partially be dropped, complicating matters even further.
> > > > > >
> > > > > >
> > > > > > A sender-sider algorithm should specify how to do PSNPs.
> > > > > > ====
> > > > > > So imho a sender-side only algorithm can't work just like that in a
> > > > > > multi-vendor environment. We must not only specify a congestion-
> > > > control
> > > > > > algorithm for the sender. We must also specify for the receiver a
> more
> > > > > > specific algorithm how and when to send PSNPs. At least how to do
> > > > PSNPs
> > > > > > under load.
> > > > > >
> > > > > > Note that this might result in the receiver sending more (and
> smaller)
> > > > > > PSNPs.
> > > > > > More packets might mean more congestion (inside routers).
> > > > > >
> > > > > >
> > > > > > Will receiver-side flow-control work ?
> > > > > > ====
> > > > > > I don't know if that's enough. It will certainly help.
> > > > > >
> > > > > > I think to tackle this problem, we need 3 parts:
> > > > > > 1) sender-side congestion-control algorithm
> > > > > > 2) more detailed algorithm on receiver when and how to send
> PSNPs
> > > > > > 3) receiver-side flow-control mechanism
> > > > > >
> > > > > > As discussed at length, I don't know if the ISIS process on the
> > > > > > receiving
> > > > > > router can actually know if its running out of resources (buffers on
> > > > > > interfaces, linecards, etc). That's implementation dependent. A
> receiver
> > > > > > can definitely advertise a fixed value. So the sender has an upper
> bound
> > > > > > to use when doing congestion-control. Just like TCP has both a
> > > > > > flow-control
> > > > > > window and a congestion-control window, and a sender uses both.
> > > > Maybe
> > > > > > the
> > > > > > receiver can even advertise a dynamic value. Maybe now, maybe
> only in
> > > > > > the
> > > > > > future. An advertised upper limit seems useful to me today.
> > > > > >
> > > > > >
> > > > > > What I didn't like about our own proposal (flooding over TCP):
> > > > > > ====
> > > > > > The problem I saw with flooding over TCP concerns multi-point
> networks
> > > > > > (LANs).
> > > > > >
> > > > > > When flooding over a multi-point network, setting up TCP
> connections
> > > > > > introduces serious challenges. Who are the endpoints of the TCP
> > > > > > connections ?
> > > > > > Full mesh ? Or do all ISes on a LAN create a TCP-connection to the
> DIS ?
> > > > > > There is no backup DIS in ISIS (unlike OSPF). Things get messy
> quickly.
> > > > > >
> > > > > > However, the other two proposals do not solve this problem either.
> > > > > > How will a sender-side congestion-avoidence algorithm determine
> > > > whether
> > > > > > there were drops ? There are no acks (PSNPs) on a LAN. We assume
> most
> > > > > > LSPs
> > > > > > that are broadcasted are received by all other ISes on the LAN.
> There
> > > > > > are
> > > > > > no acks. Only after the DIS has sent its periodic CSNPs, ISes can send
> > > > > > PSNPs to request retransmissions. It seems impossible (or very
> hard) to
> > > > > > me for all ISes on a LAN to keep track of dropped LSPs and adjust
> their
> > > > > > sending speed accordingly..
> > > > > >
> > > > > > When flooding on a LAN, the receiver-side algorithm seems best.
> > > > Because
> > > > > > all ISes can see what the lowest advertised sending-speed is. And
> make
> > > > > > sure they send slow enough to not overload the slowest IS. I'm not
> sure
> > > > > > this is a good solution, but is seems easier and more realistic than
> > > > > > ISIS-flooding-over-TCP or sender-side congestion-avoidance.
> > > > > >
> > > > > >
> > > > > > My conclusion:
> > > > > > ====
> > > > > > Sender-side congestion-control won't work without specifying in
> more
> > > > > > detail how and when to send PSNPs.
> > > > > > Receiver-side flow-control will certainly help. I dont' know if it's
> > > > > > good enough. I don't know if advertising a static value is good
> enough.
> > > > > > But it's a start.
> > > > > >
> > > > > > I still think we'll end up re-implementing a new (and weaker) TCP.
> > > > > >
> > > > > >
> > > > > > henk.
> > > > > >
> > > > > > _______________________________________________
> > > > > > Lsr mailing list
> > > > > > Lsr@ietf.org
> > > > > > https://www.ietf.org/mailman/listinfo/lsr
> > > > >
> > > > > _______________________________________________
> > > > > Lsr mailing list
> > > > > Lsr@ietf.org
> > > > > https://www.ietf.org/mailman/listinfo/lsr
> > > >
> > > >
> __________________________________________________________
> > > >
> __________________________________________________________
> > > > _____
> > > >
> > > > Ce message et ses pieces jointes peuvent contenir des informations
> > > > confidentielles ou privilegiees et ne doivent donc
> > > > pas etre diffuses, exploites ou copies sans autorisation. Si vous avez
> recu ce
> > > > message par erreur, veuillez le signaler
> > > > a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
> > > > electroniques etant susceptibles d'alteration,
> > > > Orange decline toute responsabilite si ce message a ete altere,
> deforme ou
> > > > falsifie. Merci.
> > > >
> > > > This message and its attachments may contain confidential or privileged
> > > > information that may be protected by law;
> > > > they should not be distributed, used or copied without authorisation.
> > > > If you have received this email in error, please notify the sender and
> delete
> > > > this message and its attachments.
> > > > As emails may be altered, Orange is not liable for messages that have
> been
> > > > modified, changed or falsified.
> > > > Thank you.
> > >
> > >
> __________________________________________________________
> __________________________________________________________
> _____
> > >
> > > Ce message et ses pieces jointes peuvent contenir des informations
> confidentielles ou privilegiees et ne doivent donc
> > > pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu
> ce message par erreur, veuillez le signaler
> > > a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
> electroniques etant susceptibles d'alteration,
> > > Orange decline toute responsabilite si ce message a ete altere, deforme
> ou falsifie. Merci.
> > >
> > > This message and its attachments may contain confidential or privileged
> information that may be protected by law;
> > > they should not be distributed, used or copied without authorisation.
> > > If you have received this email in error, please notify the sender and
> delete this message and its attachments.
> > > As emails may be altered, Orange is not liable for messages that have
> been modified, changed or falsified.
> > > Thank you.
> > >
> __________________________________________________________
> __________________________________________________________
> _____
> > >
> > > Ce message et ses pieces jointes peuvent contenir des informations
> confidentielles ou privilegiees et ne doivent donc
> > > pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu
> ce message par erreur, veuillez le signaler
> > > a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
> electroniques etant susceptibles d'alteration,
> > > Orange decline toute responsabilite si ce message a ete altere, deforme
> ou falsifie. Merci.
> > >
> > > This message and its attachments may contain confidential or privileged
> information that may be protected by law;
> > > they should not be distributed, used or copied without authorisation.
> > > If you have received this email in error, please notify the sender and
> delete this message and its attachments.
> > > As emails may be altered, Orange is not liable for messages that have
> been modified, changed or falsified.
> > > Thank you.
> > >
> > > _______________________________________________
> > > Lsr mailing list
> > > Lsr@ietf.org
> > > https://www.ietf.org/mailman/listinfo/lsr
> >
> 
> __________________________________________________________
> __________________________________________________________
> _____
> 
> Ce message et ses pieces jointes peuvent contenir des informations
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce
> message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
> electroniques etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, deforme ou
> falsifie. Merci.
> 
> This message and its attachments may contain confidential or privileged
> information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and delete
> this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have been
> modified, changed or falsified.
> Thank you.