Re: [Lsr] Dynamic flow control for flooding

"Les Ginsberg (ginsberg)" <ginsberg@cisco.com> Wed, 24 July 2019 20:56 UTC

Return-Path: <ginsberg@cisco.com>
X-Original-To: lsr@ietfa.amsl.com
Delivered-To: lsr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E1DFC1205CD for <lsr@ietfa.amsl.com>; Wed, 24 Jul 2019 13:56:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -14.499
X-Spam-Level:
X-Spam-Status: No, score=-14.499 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com header.b=NUIUSwXv; dkim=pass (1024-bit key) header.d=cisco.onmicrosoft.com header.b=EQZLJUuL
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id f3MQVOvfl1oJ for <lsr@ietfa.amsl.com>; Wed, 24 Jul 2019 13:56:27 -0700 (PDT)
Received: from rcdn-iport-1.cisco.com (rcdn-iport-1.cisco.com [173.37.86.72]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 05D67120285 for <lsr@ietf.org>; Wed, 24 Jul 2019 13:56:26 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=40100; q=dns/txt; s=iport; t=1564001786; x=1565211386; h=from:to:cc:subject:date:message-id:references: in-reply-to:mime-version; bh=aLZjbOhOP0Y9NAwKdfkoaq9a8FEMn+z6C6FUIuglDIk=; b=NUIUSwXvRFODV4R8A2GFSOPvzW6BLsiWkTwBMB2HDKae3l8hgooxJVVj HFHJ6B/pw4QC1Br1MSpO30FbOPaCXqz310ehUR8XMzR2zFFECiDRegcRS 2X2sL+sK91qKaLqXfRWLag/d78cNoDWm/4f4KiLRQ7XZLsjQHbQxrJU5k E=;
IronPort-PHdr: 9a23:C+OsBhZmX1SLe75diiU8cqH/LSx94ef9IxIV55w7irlHbqWk+dH4MVfC4el20gabRp3VvvRDjeee87vtX2AN+96giDgDa9QNMn1NksAKh0olCc+BB1f8KavlbiohFslYW3du/mqwNg5eH8OtL1A=
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A0CBAAA5xThd/5pdJa1lGQEBAQEBAQEBAQEBAQcBAQEBAQGBZ4EVLyQFJwNtVSAECyqEHYNHA4x8glt+iFaNfIFCgRADVAkBAQEMAQEtAgEBhEACF4JCIzgTAQMBAQQBAQIBBm2FHgELhUoBAQEBAxIRChMBATIFAQ8CAQYCEQQBASEHAwICAh8RFAkIAgQOBQgTB4MBgR1NAx0BApEFkGACgTiIYHGBMoJ5AQEFhQsNC4ITCYE0i2AXgUA/gRFGghcHLj6CGoIEDwoPFR+CVTKCJowEIIJYhH+ILT+NSEAJAoIZi0eEUoQSgi2HJYQMiiyNN4ExG4dzjhUCBAIEBQIOAQEFgWchgVhwFTuCbIJCDBeBAwEJgkGDRocNcoEpiwwBJIIsAQE
X-IronPort-AV: E=Sophos;i="5.64,304,1559520000"; d="scan'208,217";a="601716629"
Received: from rcdn-core-3.cisco.com ([173.37.93.154]) by rcdn-iport-1.cisco.com with ESMTP/TLS/DHE-RSA-SEED-SHA; 24 Jul 2019 20:56:25 +0000
Received: from XCH-RCD-018.cisco.com (xch-rcd-018.cisco.com [173.37.102.28]) by rcdn-core-3.cisco.com (8.15.2/8.15.2) with ESMTPS id x6OKuPWR030297 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=FAIL); Wed, 24 Jul 2019 20:56:25 GMT
Received: from xhs-rcd-002.cisco.com (173.37.227.247) by XCH-RCD-018.cisco.com (173.37.102.28) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 24 Jul 2019 15:56:24 -0500
Received: from xhs-aln-002.cisco.com (173.37.135.119) by xhs-rcd-002.cisco.com (173.37.227.247) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 24 Jul 2019 15:56:23 -0500
Received: from NAM03-CO1-obe.outbound.protection.outlook.com (173.37.151.57) by xhs-aln-002.cisco.com (173.37.135.119) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Wed, 24 Jul 2019 15:56:23 -0500
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NgR5jMQSJtbo/que05podpw8KrHpYqhCDt5XxARt6CvlSXki8rC1lrkzDrpXBLbWIEjJ7xABQR/FP9EmKbVir4NIL6Bv7+ihVNK3oAy7qrphEI35WNt5QnpccxbZu7YaXGz7JmxWQ9E8zGNB18HicsUNT5qd5MEUMEenAFKfuYm4OXCsniyvDt24nSj3HGsurKBe8dcoOVfpg8YxASJ13S74sfJH+l4StnC7CBgIKlnFgvohgRemCBMfGgcLZ43OVRRB1V8ZkTUfdCxv8GoB9qz9P9HvdN5OXI650Cpbkei1drE66svxW5ousuy3gd8sm+aCd2su1gwoQvL7MLGUTQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=aLZjbOhOP0Y9NAwKdfkoaq9a8FEMn+z6C6FUIuglDIk=; b=PhRGBIQKZ6I2DRXwTlvHMi7iua9z/kZcg0Elc50M9KDkyW6XXnkRkF7dHEx+g43n0cCdCiDHIalGWVRIFojfc0nVLuRij3MJh1QnuLPixMxtaOdS+Uzxy9nyCqVoYU0FCistcD424KmuBD+rKXJgnMsW8WYg8lyQVDrpcbmPl2NXebDNDvgp/IsJnuh7B9f1yrQzqhMbtHkA4785YmK/MvI1beXuQyTypOtRiAQ0Tvi/uxpmPmlkQCNeVyxyeqsJm2DS4W8Au+50jDUEHOM94Hp6+pVa9ITYPf2o0JH9sXMK6QLB5Pw42itRF5vAb4ktUtFi9JIZ8PtXqkb4fbubnw==
ARC-Authentication-Results: i=1; mx.microsoft.com 1;spf=pass smtp.mailfrom=cisco.com;dmarc=pass action=none header.from=cisco.com;dkim=pass header.d=cisco.com;arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cisco.onmicrosoft.com; s=selector2-cisco-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=aLZjbOhOP0Y9NAwKdfkoaq9a8FEMn+z6C6FUIuglDIk=; b=EQZLJUuLaOgZ7zgfwtsiWCR8VR9wZriM8ueAJEvjugVvZA6pir0mHhClhLEB8QKkcOdsnECuyhTs3BBF1MwshZIvCz/4ipOwxAr1q2htUIvwGrJeJik+lN4oEB3ac6WKN+68Jo0XRY2Z15Jjg0oUuE6F5rMx1q3COjUmj5e7Ti8=
Received: from BYAPR11MB3638.namprd11.prod.outlook.com (20.178.237.19) by BYAPR11MB3141.namprd11.prod.outlook.com (20.177.228.29) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2073.14; Wed, 24 Jul 2019 20:56:22 +0000
Received: from BYAPR11MB3638.namprd11.prod.outlook.com ([fe80::c8b3:b0b0:581d:e1ce]) by BYAPR11MB3638.namprd11.prod.outlook.com ([fe80::c8b3:b0b0:581d:e1ce%6]) with mapi id 15.20.2115.005; Wed, 24 Jul 2019 20:56:22 +0000
From: "Les Ginsberg (ginsberg)" <ginsberg@cisco.com>
To: "tony.li@tony.li" <tony.li@tony.li>
CC: "lsr@ietf.org" <lsr@ietf.org>
Thread-Topic: [Lsr] Dynamic flow control for flooding
Thread-Index: AQHVQVt0dc+yhfYADEyrMMJ609g9uKbYO7RggAEF1QCAANNGsIAADZ+AgAAGNyCAAAs+AIAACkJg
Date: Wed, 24 Jul 2019 20:56:22 +0000
Message-ID: <BYAPR11MB363873CFAF558329DE1AEB7CC1C60@BYAPR11MB3638.namprd11.prod.outlook.com>
References: <CAMj-N0LdaNBapVNisWs6cbH6RsHiXd-EMg6vRvO_U+UQsYVvXw@mail.gmail.com> <BYAPR11MB36382C89363202D1B5659614C1C70@BYAPR11MB3638.namprd11.prod.outlook.com> <593D6ED8-A568-4B41-8882-3D32A6D0111F@tony.li> <BYAPR11MB36381F5B3EC20BC8BE2217D5C1C60@BYAPR11MB3638.namprd11.prod.outlook.com> <63EC078F-795D-4A20-9EBC-F87EE28C5EAB@tony.li> <BYAPR11MB3638734DA7246449F68FB7F2C1C60@BYAPR11MB3638.namprd11.prod.outlook.com> <C748D21A-26EF-4AA4-B5C8-307016E0638B@tony.li>
In-Reply-To: <C748D21A-26EF-4AA4-B5C8-307016E0638B@tony.li>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
authentication-results: spf=none (sender IP is ) smtp.mailfrom=ginsberg@cisco.com;
x-originating-ip: [2001:420:c0c8:1006::374]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: f22766af-da45-4b5e-8b62-08d710796266
x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600148)(711020)(4605104)(1401327)(2017052603328)(7193020); SRVR:BYAPR11MB3141;
x-ms-traffictypediagnostic: BYAPR11MB3141:
x-ms-exchange-purlcount: 2
x-microsoft-antispam-prvs: <BYAPR11MB3141E62DD53DCA6A208740A6C1C60@BYAPR11MB3141.namprd11.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:8273;
x-forefront-prvs: 0108A997B2
x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(4636009)(136003)(396003)(376002)(39860400002)(366004)(346002)(199004)(189003)(129404003)(6246003)(476003)(76116006)(446003)(76176011)(2501003)(66476007)(66446008)(66556008)(64756008)(102836004)(52536014)(6436002)(86362001)(53546011)(11346002)(5660300002)(81166006)(486006)(8936002)(7696005)(6306002)(54896002)(9686003)(236005)(81156014)(186003)(68736007)(71190400001)(66946007)(46003)(2906002)(256004)(6506007)(6916009)(55016002)(53936002)(14454004)(8676002)(66574012)(2351001)(99286004)(71200400001)(316002)(5640700003)(14444005)(478600001)(7736002)(6116002)(4326008)(25786009)(229853002)(790700001)(33656002)(561944003)(74316002); DIR:OUT; SFP:1101; SCL:1; SRVR:BYAPR11MB3141; H:BYAPR11MB3638.namprd11.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1;
received-spf: None (protection.outlook.com: cisco.com does not designate permitted sender hosts)
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam-message-info: E+CZi0FWDG4HumLxDYN7cxNfhw9RKdXgyLoOyIgOrEs4XmH1A5vnM/Z6zBhOuZ0c8BDvWJujktdHTM6aAc5AyqCvckHtDAxSTZ1AzCN+MTLjvcI/46ssfyD0IhTufKJjODxzN31Py1Z4QCB62LxkuVQOsDNEH+Cxkk7qV7iPDLutY7WXB18IcDFrIMwGcMOYyh7UtLXelMoJzTQElt8EmixbQ1A1ykKLjU5OFdeJq5/5PTE0xT0Id9tz/q0if5ZGaXgrX6aDI/2T2vM5lDusNnlW5YGVM5EGZN0MzsceowAP/AGet0mVBv6XZFpnZvUTEbf351s2yRfOSeEg5eI44AlduGqyxPYGbCivYMHxToCsKAbkM3XfDCE2OLqQlCTZHiUlOfphhvxPyID+o1z19voSA6tcBT7WdcJw+8Dymlo=
Content-Type: multipart/alternative; boundary="_000_BYAPR11MB363873CFAF558329DE1AEB7CC1C60BYAPR11MB3638namp_"
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-Network-Message-Id: f22766af-da45-4b5e-8b62-08d710796266
X-MS-Exchange-CrossTenant-originalarrivaltime: 24 Jul 2019 20:56:22.2230 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 5ae1af62-9505-4097-a69a-c1553ef7840e
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: ginsberg@cisco.com
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR11MB3141
X-OriginatorOrg: cisco.com
X-Outbound-SMTP-Client: 173.37.102.28, xch-rcd-018.cisco.com
X-Outbound-Node: rcdn-core-3.cisco.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/FaKvizHJ4weH7gYIuucPP-VrIcM>
Subject: Re: [Lsr] Dynamic flow control for flooding
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 24 Jul 2019 20:56:31 -0000

Tony –

This thread reminds me of how easy it is to miscommunicate – and I bear some of the responsibility for that.
Inline.

From: Tony Li <tony1athome@gmail.com> On Behalf Of tony.li@tony.li
Sent: Wednesday, July 24, 2019 1:07 PM
To: Les Ginsberg (ginsberg) <ginsberg@cisco.com>
Cc: lsr@ietf.org
Subject: Re: [Lsr] Dynamic flow control for flooding

Les,

Ok, let me reset.  I’ve re-read your slides.

I don’t see anything in there about changing the PSNP signaling rate.  From your comments to Henk, I infer that you’re open to changing that rate.
[Les:] The proposal in the slides is simply an example/straw man. I did not spend a lot of time on it – in fact in the first draft of the slides I did not even provide a proposal. It certainly needs more refinement.
It is meant only to illustrate how we can do things w/o requiring the receive side to do calculations for which the raw data may be difficult and w/o requiring new TLVs.


As soon as you do that, you’re now providing receiver based feedback and creating flow control.  You’re accepting that rates will vary per interface.

[Les:] Yes – but only when we know that continuing to send at a high rate isn’t useful. It isn’t meant to fix things (as I keep emphasizing) and in a network that works as intended it should never be necessary.

What you’re NOT doing is providing information about the receiver’s input queue and requested input rate.  With less information, the transmitter can only approximate the optimal rate and your proposal seems like a Newton’s method approach to determining that rate.

[Les:] For all of the implementations I have worked on (5 now – across 3 different vendors – not all still available 😊 ) such information is not easily determined. Buffer pools are shared among many components, input queues may have multiple stages not all of which are visible to the routing protocol. Plus, since once flow control is needed there is already a problem, this isn’t fixing things – it is just trying to get by.

A solution which depends on current receiver state “all the time” is hard – and hard to optimize. And I think we don’t need that degree of precision for optimal operation.


Your proposal depends on two constants: Usafe and Umax.  How do you know what those are?

[Les:] Not yet.

That’s information about the receiver.

[Les:] Happy to agree to that.

I infer that you propose to hard code some conservative values for these.  In my mind, that implies that you will be going more slowly than you could if you had more accurate data.  And pretty much what we’re proposing is that the receiver advertise this type of information so that we don’t have to assume the worst case.  This also is nice because an implementation only has to know about it’s own capabilities.

[Les:] I expect the values to be aggressive – because the downside of flooding LSPs too fast for (say) a few seconds is small.


Tony



On Jul 24, 2019, at 12:31 PM, Les Ginsberg (ginsberg) <ginsberg@cisco.com<mailto:ginsberg@cisco.com>> wrote:

Tony –

I have NEVER proposed that the flooding rate be determined by the slowest node.
Quite the opposite.

Flooding rate should be based on the target convergence time and should be aggressive because most topology changes involve much fewer than 1000 LSPs (arbitrary number). So even w a slow node fast flooding won’t be an issue for the vast majority of changes.

When we get a topology change with enough LSPs to expose the slowest node limitations we (in decreasing order of importance):

1)Continue to flood fast to those nodes/links which can handle it
2)Report the slow node to the operator (so they can address the limitation)
3)Do what we can to limit the overload on the slow node/link

Hope this helps.

   Les


From: Tony Li <tony1athome@gmail.com<mailto:tony1athome@gmail.com>> On Behalf Of tony.li@tony.li<mailto:tony.li@tony.li>
Sent: Wednesday, July 24, 2019 12:04 PM
To: Les Ginsberg (ginsberg) <ginsberg@cisco.com<mailto:ginsberg@cisco.com>>
Cc: lsr@ietf.org<mailto:lsr@ietf.org>
Subject: Re: [Lsr] Dynamic flow control for flooding


Les,


Optimizing the throughput through a slow receiver is pretty low on my list because the ROI is low.


Ok, I disagree. The slow receiver is the critical path to convergence.  Only when the slow receiver has absorbed all changes and SPFed do we have convergence.


First, the rate that you select might be too fast for one neighbor and not for the others.  Real flow control would help address this.

[Les:] At the cost of convergence. Not a good tradeoff.
I am arguing that we do want to flood at the same rate on all interfaces used for flooding. When we cannot, flow control does not help with convergence. It may decrease some wasted bandwidth – but as we all agree that bandwidth isn’t a significant limitation this isn’t a great concern.


Rate limiting flooding delays convergence.  Please consider the following topology:


1 —————— 2 —————— 3
|        |        |
|        |        |
4 —————— 5 —————— 6
|        |        |
|        |        |
7 —————— 8 —————— 9


Suppose that we have 1000 LSPs injected at router 1.  Suppose further that router 2 runs at half the rate of router 4.  [How router 1 knows this requires $DEITY and is out of scope for the moment.]

Router 1 now floods at the optimal rate for router 2.  Router 1 uses that same rate to flood to router 4.  Suppose that it takes time T for this to complete.

When does the network converge?

Option 1: All nodes use the same flooding rate.

Router 2 will flood to router 3 concurrent with receiving updates from router 1. Thus, router 3 will receive all updates in time T + delta, where delta is router 2’s processing time.  For now, let’s approximate delta as zero.

Similarly, all routers will use the same rate, so router 4 will flood to 7 in time T + delta, and so on, with router 9 receiving everything in time T + 3 * delta.

Assuming no nodes SPF during the process, the network converges nearly simultaneously in about time T.

Option 2: We flood a bit faster where we can.

Suppose that router 1 now floods at the full rate to router 4.  The full update now takes time T/2.  Because all of the other nodes in the network are fast, router 4 floods in time T/2 + delta to nodes 5 and 7.  Carrying this forward, router 9 gets a full update in time T/2 + 3 * delta.  Even router 3 has full updates in T/2 + 3 * delta.

With the exception of node 2, the network has converged in half the time.  Even node 2 converges in time T.

Key points:

1) Yes, the slow node delays convergence and causes micro-loops as everyone around it SPFs.  The point here (and I think you agree) is that slow nodes need to be upgraded.

2) There is no way for us to know how fast a node can go without some form of flow control, other than to go absurdly slowly.

3) There are many folks who want to converge quickly.  It is mission critical for them.  They will address slow nodes. They will not accept pessimal timing to avoid micro-loops.




[Les:] I do not see how flow control improves things.


Flow control allows the transmitter to transmit at the optimal rate for the receiver.




Dropping down to the least common denominator CPU speed in the entire network is going to be undoable without an oracle, and absurdly slow even with that.

[Les:] Never advocated that – please do not put those words in my mouth.


How is that different than what you’ve proposed?  Router 1 can only flood at the rate that it gets PSNPs from router 2.  That paces its flooding to router 4.  Following that logic, you somehow want router 4 to run at the same rate, forcing a uniformly slow rate.

Tony