Re: [Lsr] Dynamic flow control for flooding

tony.li@tony.li Wed, 24 July 2019 15:52 UTC

Return-Path: <tony1athome@gmail.com>
X-Original-To: lsr@ietfa.amsl.com
Delivered-To: lsr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A3C591201AF for <lsr@ietfa.amsl.com>; Wed, 24 Jul 2019 08:52:09 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.557
X-Spam-Level:
X-Spam-Status: No, score=-1.557 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.091, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vR-xNj0DE8Bf for <lsr@ietfa.amsl.com>; Wed, 24 Jul 2019 08:52:07 -0700 (PDT)
Received: from mail-pf1-x433.google.com (mail-pf1-x433.google.com [IPv6:2607:f8b0:4864:20::433]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 08D991200B3 for <lsr@ietf.org>; Wed, 24 Jul 2019 08:52:07 -0700 (PDT)
Received: by mail-pf1-x433.google.com with SMTP id 19so21164660pfa.4 for <lsr@ietf.org>; Wed, 24 Jul 2019 08:52:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=y9wnDaFLxArx2j2l0EHhw4SmNR5EKAymKi/4VcCgah4=; b=RzdGURYPR8xtLBsDmNk2/B7nmiNNui7YAtkgEkpiKpIIDV/Ng168TVWM2NPgH6WTHY NykQIGROHMZfzyQN8a9seFawlQ4Fa0W3r+vcgqALfvWXgl+rIF4hHYQ90lnUG5TRMAsW okFhahmtTaNbBvRe6PLSKxcQQd1GCWsYRHjCZowhHfIGbZUSEaPpx5mzAMHHhtuQxm+U +EF9WviLpox8IoEH18nEbU6cNVPXr9MB/fNj5cpBhreEeUoIlgi5VuPkqc0Pav+qPvVz 5hTIIFEO8OLVM0YlBkEXxgQMirU9etIg1FsfF1g37KLsA+t4By0srWr9+jV304ONnv/Z 5uXA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=y9wnDaFLxArx2j2l0EHhw4SmNR5EKAymKi/4VcCgah4=; b=bPZ7GPyn59GOl6wp2QrhbesR4nFJpTv9T/mW57tNG5tonwJLxEIME7eIkeJWeJ6fEt lTR+4i4pTUr/FCgcGzmk6RCoEDKAm8sa2OfWbx8TGen89B7jh1SHN579kZjb4BxA38n+ 8uPTST4XAKie7iBTixjDPaXTVDf6ebFdFd7mCfXaerJZB2xTwg4OuRQsLwMxNdmSeL0L V3x8DNh2HsV3R6Lf1Z38xKhqd6NnpI42+w12lJnUC0I1NRdGXFlAXhFg0lD5EYPbswf3 yV9UjMpF33ezlU4io3n/iNnvWd52SMm4abTKUOV+0MBtc5eNRyYrtMRZbNIFyNegEep/ doKg==
X-Gm-Message-State: APjAAAUkA/R4UI7RY5mEGPGz9TMprGaA7RlMz7AjAqMGeEixwrd9oxlm bybH6TOYyJtH2a7YGNGfJqc=
X-Google-Smtp-Source: APXvYqwDt/5ZIoEatPQttUP7W0BbKzEAPYgfCSnRXlzhE16wAX2mm4Nti3ad84/JB+aJx3+Tvitgrw==
X-Received: by 2002:a63:fc09:: with SMTP id j9mr64731288pgi.377.1563983526408; Wed, 24 Jul 2019 08:52:06 -0700 (PDT)
Received: from [192.168.1.13] (c-73-158-115-137.hsd1.ca.comcast.net. [73.158.115.137]) by smtp.gmail.com with ESMTPSA id j15sm46862681pfr.146.2019.07.24.08.52.03 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 24 Jul 2019 08:52:05 -0700 (PDT)
Sender: Tony Li <tony1athome@gmail.com>
From: tony.li@tony.li
Message-Id: <91E44B06-8C7F-4CEA-87FD-EBBBC6910686@tony.li>
Content-Type: multipart/alternative; boundary="Apple-Mail=_E4ACB47E-993C-46B1-81B8-9A7DBF838F5B"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
Date: Wed, 24 Jul 2019 08:52:01 -0700
In-Reply-To: <CAOj+MMHjOQY3Rp3ovHWdmCPsqW=T2LFWJyp5wpcdZL04RshFpg@mail.gmail.com>
Cc: Henk Smit <henk.ietf@xs4all.nl>, Robert Raszuk <rraszuk@gmail.com>, Les Ginsberg <ginsberg@cisco.com>, lsr@ietf.org
To: Robert Raszuk <robert@raszuk.net>
References: <CAMj-N0LdaNBapVNisWs6cbH6RsHiXd-EMg6vRvO_U+UQsYVvXw@mail.gmail.com> <BYAPR11MB36382C89363202D1B5659614C1C70@BYAPR11MB3638.namprd11.prod.outlook.com> <5841_1563943794_5D37E372_5841_105_1_9E32478DFA9976438E7A22F69B08FF924D9C373E@OPEXCAUBMA3.corporate.adroot.infra.ftgroup> <BYAPR11MB363856BB026992DFBB3BB224C1C60@BYAPR11MB3638.namprd11.prod.outlook.com> <8376a87831ffa6f5298c5122907c6e66@xs4all.nl> <CA+b+ER=LOZxoyoonPtC7VKppSNcQohGQdx+n8D3+LndnHdsofQ@mail.gmail.com> <cc806e622ad77ab73263cd9dc7eecad8@xs4all.nl> <CAOj+MMHjOQY3Rp3ovHWdmCPsqW=T2LFWJyp5wpcdZL04RshFpg@mail.gmail.com>
X-Mailer: Apple Mail (2.3445.104.11)
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/8F2Du5BkcyvBRTofIGLgwTAa8Zo>
Subject: Re: [Lsr] Dynamic flow control for flooding
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 24 Jul 2019 15:52:10 -0000

Robert,

Nothing has changed about the probability of network partitioning. That was simply a use case selected to motivate the discussion about flooding speed.

The entire discussion is almost orthogonal to dynamic flooding.  Let’s please take that out of the discussion.

Tony


> On Jul 24, 2019, at 7:38 AM, Robert Raszuk <robert@raszuk.net> wrote:
> 
> Hi,
> 
> Yes indeed while I was reading your richly connected node restart problem use of overload-bit should be explored, proposed, implemented. 
> 
> For the partition problem I have two general comments: 
> 
> a) If network partitions is likely to happen more often in the case of dynamic flooding perhaps as already said before we should increase the max number of occurrences given LSP is to arrive at flooding optimized node. Two may not be enough.
> 
> b)  If protocol extensions will help to mitigate effects of network partition via much faster repair some folks may treat network partitions as normal operational model and instead of re-architecting the network to make sure network partition events are as rear as possible. 
> 
> Thx,
> R.
> 
> On Wed, Jul 24, 2019 at 4:12 PM Henk Smit <henk.ietf@xs4all.nl <mailto:henk.ietf@xs4all.nl>> wrote:
> 
> Hello Robert,
> 
> Tony brought up the example of a partioned network.
> But there are more examples.
> 
> E.g. in a network there is a router with a 1000 neighbors.
> (When discussing distributed vs centralized flooding-topology
>   reduction algorithms, I've been told these network designs exist).
> When such a router reboots/crashes/comes back up, all 1000 neighbors
> will create a new version of their own LSP. This causes a 1000 different
> LSPs to be flooded through the network at the same time. Impacting every
> router in the network.
> 
> The case I was thinking of myself, was when a router in a large network
> boots. When it brings up a number of adjacencies, each neighbor will
> try to synchronize its LSPDB with the newly booted router. As the newly
> booted router will send emtpy CSNPs to each of its neighbors, each
> neighbor will start sending the full LSPDB. If such a network has 10k
> LSPs, and such a router has 100 neighbors, that router will receive 100 
> * 10k
> is 1 million LSPs. Having a faster and more efficient flooding 
> transport,
> with flow-control, will make a reboot in such a topology less painful.
> 
> (In that last case, creative use of the overload-bit could prevent 
> black-holing
> or microloops while ISIS synchronizes its LSPDB after a reboot. Just 
> like we
> used the overload-bit to solve the problem of slow convergence of BGP 
> after
> a reboot, 22 years ago. I have no idea if there are any implementations 
> that
> use the overload-bit to alleviate slow convergence of IS-IS after a 
> reboot).
> 
> henk.
> 
> 
> Robert Raszuk schreef op 2019-07-24 15:33:
> > Hey Henk & all,
> > 
> > If acks for 1000 LSPs take 16 PSNPs (max 66 per PSNP) or even as long
> > as Tony mentioned the full flooding as Tony said may take 33 sec - is
> > this really a problem ?
> > 
> > Remember we are not talking about protocol convergence after link flap
> > or node going down. We are talking about serious network partitioning
> > which itself may have lasted for minutes, hours or days. While just
> > considering absolute numbers yelds desire to go faster and faster, if
> > we put things in the overall perspective is there really a problem to
> > be solved in the first place ?
> > 
> > Would there still be a problem if LSR WG recommends faster acking
> > maybe not for each LSP but for say 20 or 30 max ?
> > 
> > Thx,
> > R.
> 
> _______________________________________________
> Lsr mailing list
> Lsr@ietf.org <mailto:Lsr@ietf.org>
> https://www.ietf.org/mailman/listinfo/lsr <https://www.ietf.org/mailman/listinfo/lsr>