Re: [Lsr] Dynamic flow control for flooding

tony.li@tony.li Wed, 24 July 2019 20:06 UTC

Return-Path: <tony1athome@gmail.com>
X-Original-To: lsr@ietfa.amsl.com
Delivered-To: lsr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A640B1205F7 for <lsr@ietfa.amsl.com>; Wed, 24 Jul 2019 13:06:43 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.557
X-Spam-Level:
X-Spam-Status: No, score=-1.557 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.091, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id h5mX8XZmoeei for <lsr@ietfa.amsl.com>; Wed, 24 Jul 2019 13:06:40 -0700 (PDT)
Received: from mail-pl1-x62e.google.com (mail-pl1-x62e.google.com [IPv6:2607:f8b0:4864:20::62e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id CF5E912051B for <lsr@ietf.org>; Wed, 24 Jul 2019 13:06:40 -0700 (PDT)
Received: by mail-pl1-x62e.google.com with SMTP id a93so22372595pla.7 for <lsr@ietf.org>; Wed, 24 Jul 2019 13:06:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=Vz3Ux6aIlK7QPynlIf3TqobHi8PjNzRiPv9ZMuEv1sk=; b=adjqAJzHQldF8ECO6B4JULpNdWCVPOEjKwGtMmqu4QjubNwT8r5tVozEw0LcRtf6z+ rCMCNRsqv/BJhkjZReE1pvx9GvNIM6K8CSMSraaiLnrXr3o6SvjmxqBaMGKN+uPd6avp ZdY1ZDhlAms0PB+XvCfSFudnDJZDYuN47QKKtXBBfSu82vRavjX4n8DANe56aypkl8UC ChlcuGmzA9kIeeEosRWSoHOqu5dyjX/4KR80sAvfcpjZ5RxwnNvJES3vc7x0PmAoVUV+ 3V+14yntDBTtmP4LQ+SSVsfzO3dgZzW4093JqH5QLd/s3sJkj+mlk76U3VLcpSzdH4Pj 07pw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=Vz3Ux6aIlK7QPynlIf3TqobHi8PjNzRiPv9ZMuEv1sk=; b=ZjqoTKiMId+OPu6fAbYGGlfGYqD0WCsUdqgrezvPQfbXT8B7VK0yiss9/vlbP9uSQi OaP+JNI2cwF0NnYn0o/AouTVjJ27ZMynM/pmp/o20Ou1PUSo4AcVhF+uAx179aXvxHyK rVadJg3LrC5+FL+tFc1OPMKAeyg1mYMKSiLRU9hXcHdd4sqsvcLQ8Fieu516BXJeH6WG yFPzsjaih8oPd0QszzDDXGX7xjqFgSEj9uNmt1mVjonasO8y29Xy166lwtmkptY+GJag qbVxwkT1AxH0xhH4DOutVZt3IU3zvol1k9gTYoh9NjDihhBgz6NYIyYD6ueePDu8WxAx 1rmA==
X-Gm-Message-State: APjAAAUgPvYw3tnDWMxkOfa+EI5UhJpEcq80ZPjbah5ssbEG1Ww1NbDf WUxFc5r/Jwqdmv21cboKDKI=
X-Google-Smtp-Source: APXvYqxVDXa6gN3nGq26yVSXyrXOE3r5F24lheo6ki/vZw1A3aDR/+Wq0xn50UpRdX7Ziq9MP2LjgQ==
X-Received: by 2002:a17:902:7c90:: with SMTP id y16mr88786154pll.238.1563998800329; Wed, 24 Jul 2019 13:06:40 -0700 (PDT)
Received: from [172.22.228.115] ([162.210.130.3]) by smtp.gmail.com with ESMTPSA id f88sm46319013pjg.5.2019.07.24.13.06.39 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 24 Jul 2019 13:06:39 -0700 (PDT)
Sender: Tony Li <tony1athome@gmail.com>
From: tony.li@tony.li
Message-Id: <C748D21A-26EF-4AA4-B5C8-307016E0638B@tony.li>
Content-Type: multipart/alternative; boundary="Apple-Mail=_37175BB5-0B2A-4231-B4DC-E6958BC7A8A9"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
Date: Wed, 24 Jul 2019 13:06:38 -0700
In-Reply-To: <BYAPR11MB3638734DA7246449F68FB7F2C1C60@BYAPR11MB3638.namprd11.prod.outlook.com>
Cc: "lsr@ietf.org" <lsr@ietf.org>
To: Les Ginsberg <ginsberg@cisco.com>
References: <CAMj-N0LdaNBapVNisWs6cbH6RsHiXd-EMg6vRvO_U+UQsYVvXw@mail.gmail.com> <BYAPR11MB36382C89363202D1B5659614C1C70@BYAPR11MB3638.namprd11.prod.outlook.com> <593D6ED8-A568-4B41-8882-3D32A6D0111F@tony.li> <BYAPR11MB36381F5B3EC20BC8BE2217D5C1C60@BYAPR11MB3638.namprd11.prod.outlook.com> <63EC078F-795D-4A20-9EBC-F87EE28C5EAB@tony.li> <BYAPR11MB3638734DA7246449F68FB7F2C1C60@BYAPR11MB3638.namprd11.prod.outlook.com>
X-Mailer: Apple Mail (2.3445.104.11)
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/V3FIHBfhuA2LRvxBJxi61NC4HzM>
Subject: Re: [Lsr] Dynamic flow control for flooding
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 24 Jul 2019 20:06:44 -0000

Les,

Ok, let me reset.  I’ve re-read your slides.
 
I don’t see anything in there about changing the PSNP signaling rate.  From your comments to Henk, I infer that you’re open to changing that rate.

As soon as you do that, you’re now providing receiver based feedback and creating flow control.  You’re accepting that rates will vary per interface.

What you’re NOT doing is providing information about the receiver’s input queue and requested input rate.  With less information, the transmitter can only approximate the optimal rate and your proposal seems like a Newton’s method approach to determining that rate.  

Your proposal depends on two constants: Usafe and Umax.  How do you know what those are?

That’s information about the receiver. 

I infer that you propose to hard code some conservative values for these.  In my mind, that implies that you will be going more slowly than you could if you had more accurate data.  And pretty much what we’re proposing is that the receiver advertise this type of information so that we don’t have to assume the worst case.  This also is nice because an implementation only has to know about it’s own capabilities.

Tony


> On Jul 24, 2019, at 12:31 PM, Les Ginsberg (ginsberg) <ginsberg@cisco.com> wrote:
> 
> Tony –
>  
> I have NEVER proposed that the flooding rate be determined by the slowest node.
> Quite the opposite.
>  
> Flooding rate should be based on the target convergence time and should be aggressive because most topology changes involve much fewer than 1000 LSPs (arbitrary number). So even w a slow node fast flooding won’t be an issue for the vast majority of changes.
>  
> When we get a topology change with enough LSPs to expose the slowest node limitations we (in decreasing order of importance):
>  
> 1)Continue to flood fast to those nodes/links which can handle it
> 2)Report the slow node to the operator (so they can address the limitation)
> 3)Do what we can to limit the overload on the slow node/link
>  
> Hope this helps.
>  
>    Les
>  
>  
> From: Tony Li <tony1athome@gmail.com <mailto:tony1athome@gmail.com>> On Behalf Of tony.li@tony.li <mailto:tony.li@tony.li>
> Sent: Wednesday, July 24, 2019 12:04 PM
> To: Les Ginsberg (ginsberg) <ginsberg@cisco.com <mailto:ginsberg@cisco.com>>
> Cc: lsr@ietf.org <mailto:lsr@ietf.org>
> Subject: Re: [Lsr] Dynamic flow control for flooding
>  
>  
> Les,
>  
>  
> Optimizing the throughput through a slow receiver is pretty low on my list because the ROI is low.
>  
>  
> Ok, I disagree. The slow receiver is the critical path to convergence.  Only when the slow receiver has absorbed all changes and SPFed do we have convergence.
>  
>  
> First, the rate that you select might be too fast for one neighbor and not for the others.  Real flow control would help address this.
>  
> [Les:] At the cost of convergence. Not a good tradeoff.
> I am arguing that we do want to flood at the same rate on all interfaces used for flooding. When we cannot, flow control does not help with convergence. It may decrease some wasted bandwidth – but as we all agree that bandwidth isn’t a significant limitation this isn’t a great concern.
>  
>  
> Rate limiting flooding delays convergence.  Please consider the following topology:
>  
>  
> 1 —————— 2 —————— 3
> |        |        |
> |        |        |
> 4 —————— 5 —————— 6
> |        |        |
> |        |        |
> 7 —————— 8 —————— 9
>  
>  
> Suppose that we have 1000 LSPs injected at router 1.  Suppose further that router 2 runs at half the rate of router 4.  [How router 1 knows this requires $DEITY and is out of scope for the moment.]
>  
> Router 1 now floods at the optimal rate for router 2.  Router 1 uses that same rate to flood to router 4.  Suppose that it takes time T for this to complete.
>  
> When does the network converge?
>  
> Option 1: All nodes use the same flooding rate.
>  
> Router 2 will flood to router 3 concurrent with receiving updates from router 1. Thus, router 3 will receive all updates in time T + delta, where delta is router 2’s processing time.  For now, let’s approximate delta as zero.
>  
> Similarly, all routers will use the same rate, so router 4 will flood to 7 in time T + delta, and so on, with router 9 receiving everything in time T + 3 * delta.
>  
> Assuming no nodes SPF during the process, the network converges nearly simultaneously in about time T.
>  
> Option 2: We flood a bit faster where we can.
>  
> Suppose that router 1 now floods at the full rate to router 4.  The full update now takes time T/2.  Because all of the other nodes in the network are fast, router 4 floods in time T/2 + delta to nodes 5 and 7.  Carrying this forward, router 9 gets a full update in time T/2 + 3 * delta.  Even router 3 has full updates in T/2 + 3 * delta.
>  
> With the exception of node 2, the network has converged in half the time.  Even node 2 converges in time T.
>  
> Key points: 
>  
> 1) Yes, the slow node delays convergence and causes micro-loops as everyone around it SPFs.  The point here (and I think you agree) is that slow nodes need to be upgraded.
>  
> 2) There is no way for us to know how fast a node can go without some form of flow control, other than to go absurdly slowly.
>  
> 3) There are many folks who want to converge quickly.  It is mission critical for them.  They will address slow nodes. They will not accept pessimal timing to avoid micro-loops.
>  
> 
> 
> [Les:] I do not see how flow control improves things.
>  
>  
> Flow control allows the transmitter to transmit at the optimal rate for the receiver.
>  
> 
> 
> Dropping down to the least common denominator CPU speed in the entire network is going to be undoable without an oracle, and absurdly slow even with that.
>  
> [Les:] Never advocated that – please do not put those words in my mouth.
>  
>  
> How is that different than what you’ve proposed?  Router 1 can only flood at the rate that it gets PSNPs from router 2.  That paces its flooding to router 4.  Following that logic, you somehow want router 4 to run at the same rate, forcing a uniformly slow rate.
>  
> Tony