Re: [Lsr] Dynamic flow control for flooding

"Les Ginsberg (ginsberg)" <ginsberg@cisco.com> Wed, 24 July 2019 05:17 UTC

Return-Path: <ginsberg@cisco.com>
X-Original-To: lsr@ietfa.amsl.com
Delivered-To: lsr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BF2E9120094 for <lsr@ietfa.amsl.com>; Tue, 23 Jul 2019 22:17:29 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -14.5
X-Spam-Level:
X-Spam-Status: No, score=-14.5 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com header.b=iGZQRXNp; dkim=pass (1024-bit key) header.d=cisco.onmicrosoft.com header.b=W+R4eq68
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5uvBIxnZ0cSv for <lsr@ietfa.amsl.com>; Tue, 23 Jul 2019 22:17:27 -0700 (PDT)
Received: from alln-iport-4.cisco.com (alln-iport-4.cisco.com [173.37.142.91]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C657612003F for <lsr@ietf.org>; Tue, 23 Jul 2019 22:17:26 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=35864; q=dns/txt; s=iport; t=1563945446; x=1565155046; h=from:to:subject:date:message-id:references:in-reply-to: mime-version; bh=3IiC5zvnnt06dHw2uxM53ohPj9b7XLZdiEcE6HSmlhk=; b=iGZQRXNpW0IlQ7pTXWmtMAlQqqscm+e3XlCMbu3kYeAGisuiE7Q265A8 Zq1eLNJiB1H/728PeA/V33Lkeig8nY+3t9fTENIQ0VfAQpWZoLByqKZGj yJexMXhH/jrU86+maCW1/rxFY4fByTYDDE4OMCl7fWS1qy41SAG9dLAFt 4=;
IronPort-PHdr: 9a23:aZRG4hBdnbbm7LtSv69SUyQJPHJ1sqjoPgMT9pssgq5PdaLm5Zn5IUjD/qs03kTRU9Dd7PRJw6rNvqbsVHZIwK7JsWtKMfkuHwQAld1QmgUhBMCfDkiuI//sdCY3BstqX15+9Hb9Ok9QS47z
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A0C9AACM6Tdd/4UNJK1lGgEBAQEBAgEBAQEHAgEBAQGBZ4EVLyQsA21VIAQLKoQdg0cDjgBMgg+XUIFCgRADVAkBAQEMAQEtAgEBhEACF4I5IzgTAQMBAQQBAQIBBm2FHgyFSgEBAQEDEhEKEwEBKgoEDwIBBgIRBAEBFgsHAwICAjAUCQgCBAEJCQgagwGBHU0DHQECkE+QYAKBOIhgcYEygnkBAQWFBRiCEwmBNIoMgVMXgUA/gRABRoFOUC4+g38SAQsHASEVDxAJgkwygiaMAggYM4IjhH6ILT+OBwkCghmUKYIthyWOOI03l1ECBAIEBQIOAQEFgWchZ3FwFYMngkIMF4NOg0aHDXKBKYozDxeCLAEB
X-IronPort-AV: E=Sophos;i="5.64,300,1559520000"; d="scan'208,217";a="297261546"
Received: from alln-core-11.cisco.com ([173.36.13.133]) by alln-iport-4.cisco.com with ESMTP/TLS/DHE-RSA-SEED-SHA; 24 Jul 2019 05:17:25 +0000
Received: from XCH-ALN-016.cisco.com (xch-aln-016.cisco.com [173.36.7.26]) by alln-core-11.cisco.com (8.15.2/8.15.2) with ESMTPS id x6O5HPw6031746 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=FAIL); Wed, 24 Jul 2019 05:17:25 GMT
Received: from xhs-rcd-001.cisco.com (173.37.227.246) by XCH-ALN-016.cisco.com (173.36.7.26) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 24 Jul 2019 00:17:25 -0500
Received: from xhs-rtp-001.cisco.com (64.101.210.228) by xhs-rcd-001.cisco.com (173.37.227.246) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 24 Jul 2019 00:17:24 -0500
Received: from NAM04-CO1-obe.outbound.protection.outlook.com (64.101.32.56) by xhs-rtp-001.cisco.com (64.101.210.228) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Wed, 24 Jul 2019 01:17:23 -0400
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=g44jzC6CW8JUn4NUHbgQIvPb7lGermeB0NRXuy4pH0V9yHAlOHP75PGmpDqHfqzqpmy19E2YAFPcbXxuV8Q3SVnNMlz4u9FiSNci9WH0p4uqz1BpU2qv6MrOhdlIAsiRL9tOjSowBVrqkP49wq7pbqPMQrTf+hXdPdFrmg8rOlaXAffXbCSQqPtS6KtUP02DxQIVRuuj+mkNoBBNaEnRoR8wdq/Ii4tzjI2ZXqrsN0pjaKYYWfj6gJew/yE8Y0MJG9t/5o1nKcsVFcPFPlYDfBOvk21C0d5mzYqlB9llVnztLng2ZYlJMFYBD2A/w0Jnkwr5wuFtrj7O/NE5648CEA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=3IiC5zvnnt06dHw2uxM53ohPj9b7XLZdiEcE6HSmlhk=; b=Jooq7eLEbQANTb4PN1cg5hJv5s3+ZFealnrPy/tucvg3YwEJ+e0M6jxpwqIGTf76WVLLi6tyCevQT2KR+D6RuCt9gnh0bcsTTYdbeCcc5gvHBxzInKETEw7tP3H5dUzb35nCiCuHY9ajQMK0zm/TOqNbpIMeE+mkQWeRwnryOlhn90WJjwBMmJgGHUUOqihEDnfQUo1QuTPOYOOxofBOImLCgFm3IqpogTRvO9K9RFOM3DS7nlSRxudP1Zj9puyzD184lzj4cogCNy7XSBKzDsRrASgTZDTrFbdj7D0x9tceUujBAb0aoSJsT+vt0kl91x6ogSpwNeVJp/Y7l6ngZQ==
ARC-Authentication-Results: i=1; mx.microsoft.com 1;spf=pass smtp.mailfrom=cisco.com;dmarc=pass action=none header.from=cisco.com;dkim=pass header.d=cisco.com;arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cisco.onmicrosoft.com; s=selector2-cisco-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=3IiC5zvnnt06dHw2uxM53ohPj9b7XLZdiEcE6HSmlhk=; b=W+R4eq68h6LtIqg9ulVrkY7FnTtfjEC13G6B30GerOyuIk8Aw7DQ/IbfNJ9pSRANqXjdknRNcoSW8saZDeuWEBqZhcptbdmmv+rll536qga/6EXCds99P2m1Wyfokz5OGJbv0SnGd7kmF56FG7qmXfhvfATi0hTBnX9mFVsTM5Q=
Received: from BYAPR11MB3638.namprd11.prod.outlook.com (20.178.237.19) by BYAPR11MB3480.namprd11.prod.outlook.com (20.177.187.25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2094.17; Wed, 24 Jul 2019 05:17:22 +0000
Received: from BYAPR11MB3638.namprd11.prod.outlook.com ([fe80::c8b3:b0b0:581d:e1ce]) by BYAPR11MB3638.namprd11.prod.outlook.com ([fe80::c8b3:b0b0:581d:e1ce%6]) with mapi id 15.20.2115.005; Wed, 24 Jul 2019 05:17:22 +0000
From: "Les Ginsberg (ginsberg)" <ginsberg@cisco.com>
To: "stephane.litkowski@orange.com" <stephane.litkowski@orange.com>, Tony Li <tony.li@tony.li>, "lsr@ietf.org" <lsr@ietf.org>
Thread-Topic: [Lsr] Dynamic flow control for flooding
Thread-Index: AQHVQVt0dc+yhfYADEyrMMJ609g9uKbYO7RggAD4DICAAAX00A==
Date: Wed, 24 Jul 2019 05:17:22 +0000
Message-ID: <BYAPR11MB363856BB026992DFBB3BB224C1C60@BYAPR11MB3638.namprd11.prod.outlook.com>
References: <CAMj-N0LdaNBapVNisWs6cbH6RsHiXd-EMg6vRvO_U+UQsYVvXw@mail.gmail.com> <BYAPR11MB36382C89363202D1B5659614C1C70@BYAPR11MB3638.namprd11.prod.outlook.com> <5841_1563943794_5D37E372_5841_105_1_9E32478DFA9976438E7A22F69B08FF924D9C373E@OPEXCAUBMA3.corporate.adroot.infra.ftgroup>
In-Reply-To: <5841_1563943794_5D37E372_5841_105_1_9E32478DFA9976438E7A22F69B08FF924D9C373E@OPEXCAUBMA3.corporate.adroot.infra.ftgroup>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
authentication-results: spf=none (sender IP is ) smtp.mailfrom=ginsberg@cisco.com;
x-originating-ip: [2001:420:c0c8:1006::374]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: f948e251-01f0-4fe8-b04d-08d70ff63526
x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600148)(711020)(4605104)(1401327)(2017052603328)(7193020); SRVR:BYAPR11MB3480;
x-ms-traffictypediagnostic: BYAPR11MB3480:
x-ms-exchange-purlcount: 2
x-microsoft-antispam-prvs: <BYAPR11MB3480F7702E32BFC170C45D86C1C60@BYAPR11MB3480.namprd11.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:10000;
x-forefront-prvs: 0108A997B2
x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(4636009)(396003)(376002)(346002)(39860400002)(366004)(136003)(53754006)(189003)(199004)(81156014)(790700001)(316002)(11346002)(81166006)(66476007)(99286004)(446003)(74316002)(476003)(86362001)(229853002)(6246003)(186003)(110136005)(52536014)(66946007)(14444005)(71200400001)(14454004)(66446008)(64756008)(8676002)(7736002)(256004)(5660300002)(46003)(5024004)(76116006)(53936002)(2906002)(9686003)(236005)(486006)(6436002)(33656002)(54896002)(55016002)(66556008)(71190400001)(2501003)(6306002)(25786009)(53546011)(6506007)(7696005)(68736007)(102836004)(76176011)(6116002)(8936002)(478600001); DIR:OUT; SFP:1101; SCL:1; SRVR:BYAPR11MB3480; H:BYAPR11MB3638.namprd11.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1;
received-spf: None (protection.outlook.com: cisco.com does not designate permitted sender hosts)
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam-message-info: htqGVOr8A9apDZzy1UKp4iRK92I1KtfZSSRCmZJ5j81OW3fmKriDSXw6i6u8rCP2Fb/t4ElI/wx0k9cpJZpFtbmas6Nl8Z28SYOqLrywHxoa4ttv68JXtxULT0C1uJl92/uIFR8VRvSBsu4OfVXda2lZsmOklSS++0fEC6TKEIDVGEJKapK85/CRJEdN0sf58Brx2p7i/rkw9iHOC6u7LGGOhr04ADq9uAQnhP0nFuxlpycsBjO1KNTCGVupVpdUDVqmXeg5BZanvK+ai9o8xYJ2dvMOgDDiPMP5A7JzDdto92+pZtWwn0Tj7n4xXSv/Ji4mRqkXGTYCwdHWJ6O5+xEAVjGbaXE3UgvWjfD7uFVaPE9YAVbskgTxQmQFAi/yV4j5J7zJ368Gacxmn5bdLKCzi5WujiWaYE+Zt2zx1OQ=
Content-Type: multipart/alternative; boundary="_000_BYAPR11MB363856BB026992DFBB3BB224C1C60BYAPR11MB3638namp_"
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-Network-Message-Id: f948e251-01f0-4fe8-b04d-08d70ff63526
X-MS-Exchange-CrossTenant-originalarrivaltime: 24 Jul 2019 05:17:22.2195 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 5ae1af62-9505-4097-a69a-c1553ef7840e
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: ginsberg@cisco.com
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR11MB3480
X-OriginatorOrg: cisco.com
X-Outbound-SMTP-Client: 173.36.7.26, xch-aln-016.cisco.com
X-Outbound-Node: alln-core-11.cisco.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/ZBdK8EmdGaetSUyuAa65gyUn-Jk>
Subject: Re: [Lsr] Dynamic flow control for flooding
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 24 Jul 2019 05:17:30 -0000

Stephane –

There is much we agree on.

There is something to be said for simply “flooding fast” and not worrying about flow control at all (regardless of whether TX or RX mechanisms would be used). Some packets would be dropped, but retransmission timers will insure that the flooding eventually succeeds and retransmit timers are long (5 seconds by default). (I am not the only one mentioning this BTW…)

But most important to me is to recognize that flow control (however done) is not fixing anything – nor making the flooding work better. The network is compromised and flow control won’t fix it.
If you accept that, then it makes sense to look for the simplest way to do flow control and that is decidedly not from the RX side. (I expect Tony Li to disagree with that 😊 – but I have already outlined why it is more complex to do it from the Rx side.)

   Les


From: stephane.litkowski@orange.com <stephane.litkowski@orange.com>
Sent: Tuesday, July 23, 2019 9:50 PM
To: Les Ginsberg (ginsberg) <ginsberg@cisco.com>; Tony Li <tony.li@tony.li>; lsr@ietf.org
Subject: RE: [Lsr] Dynamic flow control for flooding

Hi Les,

I agree that flooding is a global thing and not a local mechanism if we consider that the ultimate goal is to get the LSDB in-sync as fast as we can on all the nodes.

I just want to highlight three things:

  *   Link delay (due to transmission link distance) is already affecting the flooding speed (especially when we need to cross some links which have 100msec of RTD), so the flooding speed is already not equal on each link
  *   I put this one in parenthesis as it may be controversial ☺ (To converge a path after a topology change, we do not always require all the nodes to get the LSDB in-sync (I mean from a fwding point of view). That’s a tricky topic because it is highly depending on the network topology and in one hand flooding one or two hops away allows to converge the path, while in an other hand, it may create microloops with another network design. )
  *   I’m really wondering how much difference we may have considering the different routers we have in a single area today. Even if we have some legacy routers still deployed, they are more powerful compared to the time the ISO spec was done. Are we expecting hundreds of msec difference or tens between last generation of routers deployed and the legacy one ? In addition, in our case, we try to create consistent design, which means that we are trying to avoid having legacy routers in transit between last generation of routers and we are pushing the legacy one at the edge or try to remove them. There may be some transient situation when it happens but that’s not a design goal. This is to say that I’m not hurted to get a very fast flooding value on my core and last generation edges while letting a more conservative value for legacy edges. And I’m not expecting to have so much differences between the two (at least not really more than the link delay that may already exists and impact flooding).

Another point is that I would be really glad to see how much the flooding time is impacting the convergence time in real networks taking into account that the FIB rewrite is usually the biggest contributor (unfortunately we don’t have really instrumentation today to measure flooding). I’m not telling that there is nothing to do, of course the default flooding time we had for years could be improved and I fully agree. I’m just always interested to have some potential gain measurement.

Flow control is required in any case, we can always find a case when the IS-IS process will not get enough CPU time because CPU is busy doing other stuffs and IS-IS can’t process the input PDUs (as an example).


Brgds,

From: Lsr [mailto:lsr-bounces@ietf.org] On Behalf Of Les Ginsberg (ginsberg)
Sent: Tuesday, July 23, 2019 16:30
To: Tony Li; lsr@ietf.org<mailto:lsr@ietf.org>
Subject: Re: [Lsr] Dynamic flow control for flooding

Tony –

Thanx for picking up the discussion.
Thanx also for doing the math to show that bandwidth is not a concern. I think most/all of us knew that – but it is good to put that small question behind us.

I also think we all agree on the goal - which is to flood significantly faster than many implementations do today to handle deployments like the case you mention below.

Beyond this point, I have a different perspective.

As network-wide convergence depends upon fast propagation of LSP changes – which in turn requires consistent flooding rates on all interfaces enabled for flooding – a properly provisioned network MUST be able to sustain a consistent flooding rate or the operation of the network will suffer. We therefore need to view flow control issues as indicative of a problem.

It is a mistake to equate LSP flooding with a set of independent P2P “connections” – each of which can operate at a rate independent of the other.

If we can agree on this, then I believe we will have placed the flow control problem in its proper perspective – in which case it will become easier to agree on the best way to implement flow control.

   Les



From: Lsr <lsr-bounces@ietf.org<mailto:lsr-bounces@ietf.org>> On Behalf Of Tony Li
Sent: Tuesday, July 23, 2019 6:34 AM
To: lsr@ietf.org<mailto:lsr@ietf.org>
Subject: [Lsr] Dynamic flow control for flooding


Hi all,

I’d like to continue the discussion that we left off with last night.

The use case that I posited was a situation where we had 1000 LSPs to flood. This is an interesting case that can happen if there was a large network that partitioned and has now healed.  All LSPs from the other side of the partition are going to need to be updated.

Let’s further suppose that the LSPs have an average size of 1KB.  Thus, the entire transfer is around 1MB.

Suppose that we’re doing this on a 400Gb/s link. If we were to transmit the whole batch of LSPs at once, it takes a whopping 20us.  Not milliseconds, microseconds.  2x10^-5s.  Clearly, we are not going to be rate limited by bandwidth.

Note that 20us is an unreasonable lower bound: we cannot reasonably expect a node to absorb 1k PDUs back to back without loss today, in addition to all of it’s other responsibilities.

At the opposite end of the spectrum, suppose we transmit one PDU every 33ms.  That’s then going to take us 33 seconds to complete. Unreasonably slow.

How can we then maximize our goodput?  We know that the receiver has a set of buffers and a processing rate that it can support. The processing rate will vary, depending on other loads.

What we would like the transmitter to do is to transmit enough to create a small processing queue on the receiver and then transmit at the receiver’s processing rate.

Can we agree on this goal?

Tony


_________________________________________________________________________________________________________________________



Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc

pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler

a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,

Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.



This message and its attachments may contain confidential or privileged information that may be protected by law;

they should not be distributed, used or copied without authorisation.

If you have received this email in error, please notify the sender and delete this message and its attachments.

As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.

Thank you.