Re: [Lsr] Dynamic flow control for flooding

"Les Ginsberg (ginsberg)" <ginsberg@cisco.com> Wed, 24 July 2019 19:32 UTC

Return-Path: <ginsberg@cisco.com>
X-Original-To: lsr@ietfa.amsl.com
Delivered-To: lsr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 01EFC12047F for <lsr@ietfa.amsl.com>; Wed, 24 Jul 2019 12:32:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -14.499
X-Spam-Level:
X-Spam-Status: No, score=-14.499 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com header.b=NatUxGkR; dkim=pass (1024-bit key) header.d=cisco.onmicrosoft.com header.b=naJTxgAB
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sOA4XjUUB7zs for <lsr@ietfa.amsl.com>; Wed, 24 Jul 2019 12:31:59 -0700 (PDT)
Received: from rcdn-iport-3.cisco.com (rcdn-iport-3.cisco.com [173.37.86.74]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DA951120494 for <lsr@ietf.org>; Wed, 24 Jul 2019 12:31:58 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=24912; q=dns/txt; s=iport; t=1563996718; x=1565206318; h=from:to:cc:subject:date:message-id:references: in-reply-to:mime-version; bh=I0GsJF+ybiAjneAjRKg2Qi0LTpn0oRp+roEqhn6VYfU=; b=NatUxGkRq/bfkqG+aSOm5fcLzvxF9FW5nbuRpDW6msck7tPyPmRZVVmo hKJjXzv0Xe8RNXRFY01lHl5QOnByUH04FzQqNwJ++vISSszg+LTiQPcP/ K3FSo0GdE7WW5tMyUQnEO+nxB7TiPFqM3CwPTkMi+vH8/oZKeDx2dbZHv w=;
IronPort-PHdr: 9a23:rsMW5hxn4CLCCrTXCy+N+z0EezQntrPoPwUc9psgjfdUf7+++4j5YhWN/u1j2VnOW4iTq+lJjebbqejBYSQB+t7A1RJKa5lQT1kAgMQSkRYnBZuKCEvgJvPwYAQxHd9JUxlu+HToeUU=
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A0AUAADRsDhd/5ldJa1mGgEBAQEBAgEBAQEHAgEBAQGBVQMBAQEBCwGBFC8pJwNtVSAECyqEHYNHA4x8gluJVI18gS6BJANUCQEBAQwBAS0CAQGEQAIXgkIjNgcOAQMBAQQBAQIBBm2FHgELhUoBAQEBAxIRChMBATcBDwIBBgIRBAEBKwICAh8RHQgCBA4FCBcDgwGBHU0DHQECkQuQYAKBOIhgcYEygnkBAQWFCg0LghMJgTQBi18XgUA/gVeCFwcuPoIaggQZDxWCdDKCJowEIIJYhH+ILT+NSEAJAoIZi0eEUoQSgi2HJY44jTeBMRuHc44VAgQCBAUCDgEBBYFXCSiBWHAVO4JsgkKBJgEJgkGDRocNcoEpiwwBJIIsAQE
X-IronPort-AV: E=Sophos;i="5.64,303,1559520000"; d="scan'208,217";a="591322789"
Received: from rcdn-core-2.cisco.com ([173.37.93.153]) by rcdn-iport-3.cisco.com with ESMTP/TLS/DHE-RSA-SEED-SHA; 24 Jul 2019 19:31:57 +0000
Received: from XCH-ALN-017.cisco.com (xch-aln-017.cisco.com [173.36.7.27]) by rcdn-core-2.cisco.com (8.15.2/8.15.2) with ESMTPS id x6OJVv52011816 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=FAIL); Wed, 24 Jul 2019 19:31:57 GMT
Received: from xhs-rcd-003.cisco.com (173.37.227.248) by XCH-ALN-017.cisco.com (173.36.7.27) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 24 Jul 2019 14:31:57 -0500
Received: from xhs-aln-003.cisco.com (173.37.135.120) by xhs-rcd-003.cisco.com (173.37.227.248) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 24 Jul 2019 14:31:56 -0500
Received: from NAM02-BL2-obe.outbound.protection.outlook.com (173.37.151.57) by xhs-aln-003.cisco.com (173.37.135.120) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Wed, 24 Jul 2019 14:31:56 -0500
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=kDYUPjTT8u8k4kYOloCfbuFbzINVD/hneylVBhQFPKes3PTpZr2qzfYX2pUhFZM0nSvLZ+kAxzxVODrqXbDCJQLsMxvVWPwZ11mtiFf3tftG2bmHJ03yT7wWoO1gpWOxJgC325sDYO+p+f1U6RMjSkvKpYlF4C4+ul9vFo+KjD4kL8NiLVIZxHlP5PkyV6LaaPKA6x70OaA2gYdyLYkpiljogvxdru0JyUcSRzCI+blblMEXQ5t2ZlmJTxby00tTltWSEOTjafXdhDv36UhNIdyvtvlk/r7H8PoVweRaCTGGrVTlO+2qVYr89Hc9OZr6WhmmXbgDeGezzGbNLeZzAA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=I0GsJF+ybiAjneAjRKg2Qi0LTpn0oRp+roEqhn6VYfU=; b=fvEUB2UDaLMsgmuGBwbx0mb8BYQcADlMw38RJmzg3VI4JVysunUuMffmd+aSZjdf0dG5MNF2yqoNXz2GFi6hYhEA8+y/WzERjuAJ4aEiQPFzFkSrLeMAiCL8qotvt5Pp9NDOtebEfVeg2U6acDYHcSXARayaZdVUXXHfO6BCw60HcCCNSGxw7KL0VJI6Nru3IHPYC3PQxtYDH8rEZSI5dzgpPXg5/81V0Cbis1a+LKNmLmZbM+HOjGSi+jtCur1j+A++gS2EgsiCD6PGgfXZO87EZIwOMpBwClf84xPlr9yUfXWZtidishpDFarQF6O/4sXaCeDxbPk2AsCmAeMvSQ==
ARC-Authentication-Results: i=1; mx.microsoft.com 1;spf=pass smtp.mailfrom=cisco.com;dmarc=pass action=none header.from=cisco.com;dkim=pass header.d=cisco.com;arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cisco.onmicrosoft.com; s=selector2-cisco-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=I0GsJF+ybiAjneAjRKg2Qi0LTpn0oRp+roEqhn6VYfU=; b=naJTxgABKTU+fyikiB1btu7MRSAuXT2Io/cb4OQ/dx4oL0w5GoHrYMihYkmN9PdMah0FJoSS9kpnXEtHYXD0Vn35Wd+2cTegGk13M3Xx9fNctsWDk/7noCKusUtWydMscLTHfxTcOx/rGyL6n6ToTmtuUBG4//mTZIfqzsVepfc=
Received: from BYAPR11MB3638.namprd11.prod.outlook.com (20.178.237.19) by BYAPR11MB3543.namprd11.prod.outlook.com (20.178.206.16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2094.17; Wed, 24 Jul 2019 19:31:55 +0000
Received: from BYAPR11MB3638.namprd11.prod.outlook.com ([fe80::c8b3:b0b0:581d:e1ce]) by BYAPR11MB3638.namprd11.prod.outlook.com ([fe80::c8b3:b0b0:581d:e1ce%6]) with mapi id 15.20.2115.005; Wed, 24 Jul 2019 19:31:55 +0000
From: "Les Ginsberg (ginsberg)" <ginsberg@cisco.com>
To: "tony.li@tony.li" <tony.li@tony.li>
CC: "lsr@ietf.org" <lsr@ietf.org>
Thread-Topic: [Lsr] Dynamic flow control for flooding
Thread-Index: AQHVQVt0dc+yhfYADEyrMMJ609g9uKbYO7RggAEF1QCAANNGsIAADZ+AgAAGNyA=
Date: Wed, 24 Jul 2019 19:31:55 +0000
Message-ID: <BYAPR11MB3638734DA7246449F68FB7F2C1C60@BYAPR11MB3638.namprd11.prod.outlook.com>
References: <CAMj-N0LdaNBapVNisWs6cbH6RsHiXd-EMg6vRvO_U+UQsYVvXw@mail.gmail.com> <BYAPR11MB36382C89363202D1B5659614C1C70@BYAPR11MB3638.namprd11.prod.outlook.com> <593D6ED8-A568-4B41-8882-3D32A6D0111F@tony.li> <BYAPR11MB36381F5B3EC20BC8BE2217D5C1C60@BYAPR11MB3638.namprd11.prod.outlook.com> <63EC078F-795D-4A20-9EBC-F87EE28C5EAB@tony.li>
In-Reply-To: <63EC078F-795D-4A20-9EBC-F87EE28C5EAB@tony.li>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
authentication-results: spf=none (sender IP is ) smtp.mailfrom=ginsberg@cisco.com;
x-originating-ip: [2001:420:c0c8:1006::374]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: 7daea05e-62f0-46d4-055e-08d7106d9624
x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600148)(711020)(4605104)(1401327)(2017052603328)(7193020); SRVR:BYAPR11MB3543;
x-ms-traffictypediagnostic: BYAPR11MB3543:
x-ms-exchange-purlcount: 2
x-microsoft-antispam-prvs: <BYAPR11MB35439DE2F8FC5B016FA70E58C1C60@BYAPR11MB3543.namprd11.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:7691;
x-forefront-prvs: 0108A997B2
x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(4636009)(346002)(376002)(39860400002)(136003)(366004)(396003)(199004)(189003)(129404003)(76176011)(66556008)(66446008)(64756008)(66476007)(6246003)(52536014)(5660300002)(25786009)(229853002)(5640700003)(66946007)(66574012)(81156014)(55016002)(76116006)(6306002)(86362001)(6436002)(102836004)(54896002)(486006)(53936002)(4326008)(81166006)(14444005)(8936002)(186003)(256004)(478600001)(53546011)(9686003)(71200400001)(68736007)(6116002)(71190400001)(2351001)(14454004)(6916009)(7696005)(316002)(2501003)(446003)(11346002)(6506007)(8676002)(2906002)(476003)(46003)(33656002)(99286004)(7736002)(74316002)(790700001); DIR:OUT; SFP:1101; SCL:1; SRVR:BYAPR11MB3543; H:BYAPR11MB3638.namprd11.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1;
received-spf: None (protection.outlook.com: cisco.com does not designate permitted sender hosts)
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam-message-info: 3WJeT6E2o0g4q4S20V7rRg2HuqJggTIR7vIE5Qn65oTdsuhovECz5qyaf7oXHo8XMTg59hNKdPJskZr6rGr5/BGkLJZQ5Sw9e9aceyYnr+cYJ35Dl1wNKAr96flBsfVCxtDP8kv50U1YsFJXCX+ydRkf/hIIhFtHgwy4WmhCWJVDALhQHx20B9guVgnuEWBQL87sm4Ym/y3UXNv7u0yFgDwVJRygxwGYC6kQ8OMiQwfsT677iwT1VoBY0gnZsHY6VOmrstx+pxiN3x1PZ8vQozeeGdpvCs9z0RxqZylChuCBxrS0tTntYudXWv276LYhnEfIdlhDIJiSWLOvk9I6yhShqcjBEVTbPGUktrfAe+jyUdsks1zSSS1AumZOz1b6cKDooEyh6UMUn864haKDPcC2RxnReoMXaFnqjtmSHTU=
Content-Type: multipart/alternative; boundary="_000_BYAPR11MB3638734DA7246449F68FB7F2C1C60BYAPR11MB3638namp_"
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-Network-Message-Id: 7daea05e-62f0-46d4-055e-08d7106d9624
X-MS-Exchange-CrossTenant-originalarrivaltime: 24 Jul 2019 19:31:55.0998 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 5ae1af62-9505-4097-a69a-c1553ef7840e
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: ginsberg@cisco.com
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR11MB3543
X-OriginatorOrg: cisco.com
X-Outbound-SMTP-Client: 173.36.7.27, xch-aln-017.cisco.com
X-Outbound-Node: rcdn-core-2.cisco.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/sGBz0xxMIujlHtk37qERYO-s37U>
Subject: Re: [Lsr] Dynamic flow control for flooding
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 24 Jul 2019 19:32:02 -0000

Tony –

I have NEVER proposed that the flooding rate be determined by the slowest node.
Quite the opposite.

Flooding rate should be based on the target convergence time and should be aggressive because most topology changes involve much fewer than 1000 LSPs (arbitrary number). So even w a slow node fast flooding won’t be an issue for the vast majority of changes.

When we get a topology change with enough LSPs to expose the slowest node limitations we (in decreasing order of importance):

1)Continue to flood fast to those nodes/links which can handle it
2)Report the slow node to the operator (so they can address the limitation)
3)Do what we can to limit the overload on the slow node/link

Hope this helps.

   Les


From: Tony Li <tony1athome@gmail.com> On Behalf Of tony.li@tony.li
Sent: Wednesday, July 24, 2019 12:04 PM
To: Les Ginsberg (ginsberg) <ginsberg@cisco.com>
Cc: lsr@ietf.org
Subject: Re: [Lsr] Dynamic flow control for flooding


Les,


Optimizing the throughput through a slow receiver is pretty low on my list because the ROI is low.


Ok, I disagree. The slow receiver is the critical path to convergence.  Only when the slow receiver has absorbed all changes and SPFed do we have convergence.


First, the rate that you select might be too fast for one neighbor and not for the others.  Real flow control would help address this.

[Les:] At the cost of convergence. Not a good tradeoff.
I am arguing that we do want to flood at the same rate on all interfaces used for flooding. When we cannot, flow control does not help with convergence. It may decrease some wasted bandwidth – but as we all agree that bandwidth isn’t a significant limitation this isn’t a great concern.


Rate limiting flooding delays convergence.  Please consider the following topology:


1 —————— 2 —————— 3
|        |        |
|        |        |
4 —————— 5 —————— 6
|        |        |
|        |        |
7 —————— 8 —————— 9


Suppose that we have 1000 LSPs injected at router 1.  Suppose further that router 2 runs at half the rate of router 4.  [How router 1 knows this requires $DEITY and is out of scope for the moment.]

Router 1 now floods at the optimal rate for router 2.  Router 1 uses that same rate to flood to router 4.  Suppose that it takes time T for this to complete.

When does the network converge?

Option 1: All nodes use the same flooding rate.

Router 2 will flood to router 3 concurrent with receiving updates from router 1. Thus, router 3 will receive all updates in time T + delta, where delta is router 2’s processing time.  For now, let’s approximate delta as zero.

Similarly, all routers will use the same rate, so router 4 will flood to 7 in time T + delta, and so on, with router 9 receiving everything in time T + 3 * delta.

Assuming no nodes SPF during the process, the network converges nearly simultaneously in about time T.

Option 2: We flood a bit faster where we can.

Suppose that router 1 now floods at the full rate to router 4.  The full update now takes time T/2.  Because all of the other nodes in the network are fast, router 4 floods in time T/2 + delta to nodes 5 and 7.  Carrying this forward, router 9 gets a full update in time T/2 + 3 * delta.  Even router 3 has full updates in T/2 + 3 * delta.

With the exception of node 2, the network has converged in half the time.  Even node 2 converges in time T.

Key points:

1) Yes, the slow node delays convergence and causes micro-loops as everyone around it SPFs.  The point here (and I think you agree) is that slow nodes need to be upgraded.

2) There is no way for us to know how fast a node can go without some form of flow control, other than to go absurdly slowly.

3) There are many folks who want to converge quickly.  It is mission critical for them.  They will address slow nodes. They will not accept pessimal timing to avoid micro-loops.



[Les:] I do not see how flow control improves things.


Flow control allows the transmitter to transmit at the optimal rate for the receiver.



Dropping down to the least common denominator CPU speed in the entire network is going to be undoable without an oracle, and absurdly slow even with that.

[Les:] Never advocated that – please do not put those words in my mouth.


How is that different than what you’ve proposed?  Router 1 can only flood at the rate that it gets PSNPs from router 2.  That paces its flooding to router 4.  Following that logic, you somehow want router 4 to run at the same rate, forcing a uniformly slow rate.

Tony