Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

Tony Przygienda <tonysietf@gmail.com> Wed, 11 March 2020 01:18 UTC

Return-Path: <tonysietf@gmail.com>
X-Original-To: lsr@ietfa.amsl.com
Delivered-To: lsr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B56523A0B75 for <lsr@ietfa.amsl.com>; Tue, 10 Mar 2020 18:18:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level:
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id veRRCPj0aNqN for <lsr@ietfa.amsl.com>; Tue, 10 Mar 2020 18:18:35 -0700 (PDT)
Received: from mail-il1-x12f.google.com (mail-il1-x12f.google.com [IPv6:2607:f8b0:4864:20::12f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AE94D3A0B73 for <lsr@ietf.org>; Tue, 10 Mar 2020 18:18:35 -0700 (PDT)
Received: by mail-il1-x12f.google.com with SMTP id a14so445037ilk.6 for <lsr@ietf.org>; Tue, 10 Mar 2020 18:18:35 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=qB8UB1i0YkL7benT1VeEpyJ4gfsos9ndGjrurHxtv5w=; b=BDjjKDB+pqUf2ADG+0xI4bzDjUot7azi38orpLZbB1v7CJQ6xGNT8I72bNrzKbsY47 VxM83btiD4BWMTpJMJHZ9d7O2cgZmz2yFP5HjyNRr1KGes7b578sabKKf1Q/N5AD0HxZ pDtc6sU72cl9tkCsq+bg0LowH3IrNKbfgHZruOgQoJuzP0aPG4qaZvrepDRtrzVVAbf+ H0ZqNtrtewt3Ao6DCEO7C2TEl7epFXbLT8NYqOlr6lko9mV1kxY7mrHdHz3sd/YeeeHZ c5S0pyqzpmbl5uUjlDNgKNExvvQrXjnf6Aj+m8904yIRTASSr+XDGmrIguA6ROcISzvF 3hiA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=qB8UB1i0YkL7benT1VeEpyJ4gfsos9ndGjrurHxtv5w=; b=TnLQZJ/+D8eWazY6hAyI3ADNVyrM5S0d2g8BPv9OCVfl/g0cjDDZKlKKIXe+FRlFDa 2Ys2aSF2ykGj7pNv5FdgqWS6kfDWFzYwdR8bG6Nm39Df6d+PXrVotpiMvQkI+XoFGHdy W5FJqsgHctrdCYFNk7xQJnUpadsNpbMx2eCTXcJE9EOgzeSXxZ6eQQGPaosjDeawWBFl Sh1KHxgJAXBU6PiLL30FRzCIHTFh+qrqc+6LI65M9f3miZ+/KTVKFmg6QTxIr/Vv+ani O9nlo8+2vGJvHrMb++wl7a3iE+Ihsr+yMdzuSIfdgiN3UbdnHy+hDWncRa/JTmo+Y2Z+ JCTg==
X-Gm-Message-State: ANhLgQ0xK5+YraBi2P2hQQUhc/6NFk4CHHvCpCwzb22BtwZb1o9cCJBC g8f6eaeqWNxilLM74gH1X45Ebt6T+2lJ5qfAyFY=
X-Google-Smtp-Source: ADFU+vvNZFtH+cDD1A/AdknJNkyJWD/cxd6cmI74rm4so1BlGB61cUP3NSupScxU0OV3EIbLgmGgQ4llJ/lHWeZ3KOs=
X-Received: by 2002:a92:ddd1:: with SMTP id d17mr774076ilr.269.1583889514783; Tue, 10 Mar 2020 18:18:34 -0700 (PDT)
MIME-Version: 1.0
References: <5b430357-56ad-2901-f5a8-c0678a507293@cisco.com> <4FC90EB2-D355-4DC5-8365-E5FBE037954E@gmail.com> <f5b56713-2a4d-1bf7-8362-df4323675c61@cisco.com> <MW3PR11MB4619C54F5C6160491847AA45C1100@MW3PR11MB4619.namprd11.prod.outlook.com><CA+wi2hMH1PjiaGxdE5Nhb2tjsZtCL7+vjxwE+dk9PWN1fyz7vQ@mail.gmail.com> <MW3PR11MB46194A956A31261459526B43C1100@MW3PR11MB4619.namprd11.prod.outlook.com> <sa6wo7suuze.fsf@stubbs.int.chopps.org> <CA+wi2hPa1jHiHXWJ_gnoQjbxGG1YNqC5YNMSZJ7RmA=2sfV+XQ@mail.gmail.com> <0DAB0683-2540-410F-BFE6-8407ADC7A63C@chopps.org>
In-Reply-To: <0DAB0683-2540-410F-BFE6-8407ADC7A63C@chopps.org>
From: Tony Przygienda <tonysietf@gmail.com>
Date: Tue, 10 Mar 2020 18:17:03 -0700
Message-ID: <CA+wi2hOfMTmt7u_Nyr+Yp7uWmwqQ3HZ1tcUBcYGxdUPAmotGkA@mail.gmail.com>
To: Christian Hopps <chopps@chopps.org>
Cc: lsr@ietf.org, Tony Li <tony.li@tony.li>, Tony Li <tony1athome@gmail.com>, "Peter Psenak (ppsenak)" <ppsenak@cisco.com>
Content-Type: multipart/alternative; boundary="000000000000e4696505a08a03d4"
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/qRH4Q_VwfFGLH7rlmDdFefdV6AI>
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 11 Mar 2020 01:18:39 -0000

well, e'thing can be improved. I found what I suggesed since Les asked
works well enough ... As they say, your mileage may vary

whatever you suggest as absolute value in MaxLPSTx will not hold up the
test of time, 10/sec was a great value in 80s ...

--- tony

On Tue, Mar 10, 2020 at 3:39 PM Christian Hopps <chopps@chopps.org> wrote:

>
>
> > On Mar 10, 2020, at 11:22 AM, Tony Przygienda <tonysietf@gmail.com>
> wrote:
> >
> > Hey Christian, MaxTX is not that hard to derive since it's basically
> limited by the local system and its CPU/prioritization/queing architecture.
>
> Well so the value is "Fast as you can?" then? It's a specific value
> "MaxLSPTx" in the algorithm in the document though and it doesn't say "fast
> as you can" :)
>
> Maybe "Fast as you can" is OK, as long as we have some way to quickly
> adjust that and not keep returning to it so much that we lose more than we
> gain when the receiver has no way to handle it. This flooding traffic can
> be very bursty so taking too long to reach optimal isn't very good I would
> guess.
>
> For the rest, I wasn't actually being serious about using TCP, and your
> points are valid. I think there's some good stuff to maybe be paid
> attention to in those TCP friendly algorithms, but they don't fit our needs
> close enough for copying. FWIW I did add the necessary on wire data and
> documentation to IPTFS (ipsecme) to actually implement the DCCP style
> congestion control, and will probably end up coding it, hopefully in less
> than 10k LOC -- I'll let you know. :)
>
> My main reason for the mail was I do think the algorithm in the document
> can be improved, and the first and last part of my mail were some questions
> in that direction.
>
> Thanks,
> Chris.
> [as wg member]
>
> > For the rest of your email, in short, you have my observations in the
> previous email what I think is useful and can be done ... BTW, timestamps
> are useless unless you synchronize clocks and with all the queing that ISIS
> does through the system normally to get stuff done it is very hard to
> account for delays between packet being generated (or rx'ed on interface)
> and last queue it's pulled from usually. More musings below backed by good
> amount of work & empirical experience ;-)
>
>
>
> >
> > If we try to punt to TCP (like BGP did in its time which I argued wasn't
> the optimal idea that has bitten us back endless amount of times for the
> shortcut it was ;-) then de facto, no real heavy duty IP box is using stock
> TCP stack, at least in the scope of experience I have across bunch of
> leading vendors. If you worked on mod'ing TCP for convergence speed with
> BGP and cute little things like GR/NSR you will know the practical problems
> and also why stock TCP is actually fairly underwhelming when it comes to
> push large amounts of control data around (mod' distro, mod rusty 2c, mod
> etc but that's my data).
> >
> > And agreed, control theory is a wonderful thing and transfer windowing
> protocols etc are long research if you know where to look and lots of the
> stuff is e.g. present in TCP, QUIC or https://tools.ietf.org/html/rfc4340
> and so on. All of them are quite a lot of stuff to put into ISIS/link-state
> and mostly do not do precisely what we need or precondition things we can't
> afford under heavy load (very fast, non slip timers which are absolutely
> non trivial if you're not in kernel). On top of that you'd need to drag 2
> protocol transports around now with old ISIS flooding with RE-TX and and
> the new thing that should be doing the stuff by itself (and negotiate
> transport on top and so on). To give you a rought idea DDCP which is
> probably smallest is ~ 10KLOC of C in user space in BETA and zero docs ;-)
> I looked @ the practically existing stuff 2+ years ago in detail when doing
> RIFT ;-) and with all what I practically found I ended up carving out the
> pieces we need for fast flooding without introducing fast-acks which IMO
> would be a non-starter for high scale link-state or rather, if we really
> want that, the loop closes and we should go practically speaking to TCP (or
> 4340 which looked like a better choice to me but  just like e.g. Open-R did
> and be done with it) or wait for the mythical QUIC all-singing-all-dancing
> public domain implementation maybe. For many reasons I do not think it
> would be a particularly good development entangling a control protocol
> again with a user transport in the whole ball of yarn that IP is already.
> >
> > kind of all I had to say, next thing ;-)
> >
> > --- tony
> >
> > On Tue, Mar 10, 2020 at 7:48 AM Christian Hopps <chopps@chopps.org>
> wrote:
> >
> > Les Ginsberg (ginsberg) <ginsberg@cisco.com> writes:
> >
> > > Tony –
> > >
> > > If you have a suggestion for Tx back-off algorithm please feel free to
> share.
> > > The proposal in the draft is just a suggestion.
> > > As this is a local matter there is no interoperability issue, but
> certainly documenting a better algorithm is worthwhile.
> >
> > [as WG member]
> >
> > The main thing I'm afraid of is we're just making up some new overly
> simple congestion control algorithm (are there CC experts reviewing this?);
> maybe simulate it a few ways, deploy it, and have it work poorly or make
> things worse. In any case, damn the torpedos...
> >
> > In this current algorithm how does MaxLSPTx get set? What happens if
> MaxLSPTx is too high? If its too low we could be missing a much faster
> convergence capability.
> >
> > What if we had more quality information from the receiver, could we do a
> better job here? Maybe faster ACKs, or could we include a timestamp somehow
> to calculate RTT? This is the type of data that is used by existing CC
> algorithms (https://tools.ietf.org/html/rfc4342,
> https://tools.ietf.org/html/rfc5348). Of course going through these
> documents (which I've had to do for in another area) can start making one
> think "Punt to TCP" :)
> >
> > What would be nice, if we're going to attempt CC, is that the algorithm
> would be good enough to send relatively fast to start, adjust quickly if
> need be, and allow for *increasing* the send rate. The increasing part I
> think is important, if we take this work on, and I don't think it's
> currently covered.
> >
> > I also don't have a good feel for how quickly the current suggested
> algorithm adjusts its send rate when it needs to. The correct value for
> Usafe seems very much dependent on the receivers partialSNPInterval. It's
> so dependent that one might imagine it would be smart for the receiver to
> signal the value to the transmitter so that the transmitter can set Usafe
> correctly.
> >
> > Thanks,
> > Chris.
> > [as WG member]
> >
> >
> >
> > >
> > >    Les (claws in check 😊 )
> > >
> > >
> > > From: Tony Przygienda <tonysietf@gmail.com>
> > > Sent: Wednesday, February 19, 2020 11:25 AM
> > > To: Les Ginsberg (ginsberg) <ginsberg@cisco.com>
> > > Cc: Peter Psenak (ppsenak) <ppsenak@cisco.com>; Tony Li <
> tony1athome@gmail.com>; lsr@ietf.org; tony.li@tony.li
> > > Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed
> > >
> > > Having worked for last couple of years on implementation of flooding
> speeds that converge LSDBs some order of magnitudes above today's speeds
> ;-) here's a bunch of observations
> > >
> > > 1. TX side is easy and useful. My observation having gone quickly over
> the -ginsberg- draft is that you really want a better hysterisis there,
> it's bit too vertical and you will generate oscillations rather than walk
> around the equilibrium ;-)
> > > 2. Queue per interface is fairly trivial with modern implementation
> techniques and memory sizes if done correctly. Yes, very memory constrained
> platforms are a mildly different game and kind of precondition a different
> discussion.
> > > 3. RX side is possible and somewhat useful but much harder to do well
> depending on flavor. If we're talking about the RX advertising a very
> static value to cap the flooding speed that's actually a useful knob to
> have IMO/IME. Trying to cleverly communicate to the TXer a window size is
> not only fiendishly difficult, incurs back propagation speed (not
> neglectible @ those rates IME) but can easily lead to subtle flood
> starvation behaviors and lots of slow starts due to mixture of control loop
> dynamics and implementation complexity of such a scheme. Though, giving the
> TXer some hint that a backpressure is desired is however not a bad thing
> IME and can be derived failry easily without needs for checking queue sizes
> and so on. It's observable by looking @ some standard stats on what is
> productive incoming rate on the interface. Anything smarter needs new TLVs
> on packets & then you have a problem under/oversampling based on hellos
> (too low a frequency) and ACKs (too bursty, too batchy) and flooded back
> LSPs (too unpredictable)
> > >
> > > For more details I can recommend rift draft of course ;-)
> > >
> > > otherwise I'm staying out from this mildly feline spat ;-)
> > >
> > > --- tony
> > >
> > > On Wed, Feb 19, 2020 at 9:59 AM Les Ginsberg (ginsberg) <
> ginsberg@cisco.com<mailto:ginsberg@cisco.com>> wrote:
> > > Tony -
> > >
> > > Peter has a done a great job of highlighting that "single queue" is an
> oversimplification - I have nothing to add to that discussion.
> > >
> > > I would like to point out another aspect of the Rx based solution.
> > >
> > > As you need to send signaling based upon dynamic receiver state and
> this signaling is contained in unreliable PDUs (hellos) and to be useful
> this signaling needs to be sent ASAP - you cannot wait until the next
> periodic hello interval (default 10 seconds) to expire. So you are going to
> have to introduce extra hello traffic at a time when protocol input queues
> are already stressed.
> > >
> > > Given hellos are unreliable, the question of how many transmissions of
> the update flow info is enough arises. You could make this more
> deterministic by enhancing the new TLV to include information received from
> the neighbor so that each side would know when the neighbor had received
> the updated info. This then requires additional hellos be sent in both
> directions - which exacerbates the queue issues on both receiver and
> transmitter.
> > >
> > > It is true (of course) that hellos should be treated with higher
> priority than other PDUs, but this does not mean that the additional hellos
> have no impact on the queue space available for LSPs/SNPs.
> > >
> > > Also, it seems like you are proposing interface independent logic, so
> you will be adjusting flow information on all interfaces enabled for IS-IS,
> which means that additional hello traffic will occur on all interfaces. At
> scale this is concerning.
> > >
> > >    Les
> > >
> > >
> > >> -----Original Message-----
> > >> From: Peter Psenak <ppsenak@cisco.com<mailto:ppsenak@cisco.com>>
> > >> Sent: Wednesday, February 19, 2020 2:49 AM
> > >> To: Tony Li <tony1athome@gmail.com<mailto:tony1athome@gmail.com>>
> > >> Cc: Les Ginsberg (ginsberg) <ginsberg@cisco.com<mailto:
> ginsberg@cisco.com>>; tony.li@tony.li<mailto:tony.li@tony.li>;
> > >> lsr@ietf.org<mailto:lsr@ietf.org>
> > >> Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed
> > >>
> > >> Tony,
> > >>
> > >> On 19/02/2020 11:37, Tony Li wrote:
> > >> > Peter,
> > >> >
> > >> >> I'm aware of the PD layer and that is not the issue. The problem
> is that
> > >> there is no common value to report across different PD layers, as each
> > >> architecture may have different number of queues involved, etc.
> Trying to
> > >> find a common value to report to IPGs across various PDs would involve
> > >> some PD specific logic and that is the part I'm referring to and I
> would like
> > >> NOT to get into.
> > >> >
> > >> >
> > >> > I’m sorry that scares you.  It would seem like an initial
> implementation
> > >> might be to take the min of the free space of the queues leading from
> the
> > >> >interface to the CPU. I grant you that some additional
> sophistication may be
> > >> necessary, but I suspect that this is not going to become more
> >complicated
> > >> than polynomial evaluation.
> > >>
> > >> I'm not scared of polynomial evaluation, but the fact that my IGP
> > >> implementation is dependent on the PD specifics, which are not
> generally
> > >> available and need to be custom built for each PD specifically. I
> always
> > >> thought a good IGP implementation is PD agnostic.
> > >>
> > >> thanks,
> > >> Peter
> > >>
> > >> >
> > >> > Tony
> > >> >
> > >> > _______________________________________________
> > >> > Lsr mailing list
> > >> > Lsr@ietf.org<mailto:Lsr@ietf.org>
> > >> > https://www.ietf.org/mailman/listinfo/lsr
> > >> >
> > >> >
> > >
> > > _______________________________________________
> > > Lsr mailing list
> > > Lsr@ietf.org<mailto:Lsr@ietf.org>
> > > https://www.ietf.org/mailman/listinfo/lsr
> > > _______________________________________________
> > > Lsr mailing list
> > > Lsr@ietf.org
> > > https://www.ietf.org/mailman/listinfo/lsr
>
>