Re: [Lsr] Dynamic flow control for flooding

Robert Raszuk <robert@raszuk.net> Wed, 24 July 2019 14:39 UTC

Return-Path: <robert@raszuk.net>
X-Original-To: lsr@ietfa.amsl.com
Delivered-To: lsr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B553B120073 for <lsr@ietfa.amsl.com>; Wed, 24 Jul 2019 07:39:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.998
X-Spam-Level:
X-Spam-Status: No, score=-1.998 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=raszuk.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id r6SzWnMgodXe for <lsr@ietfa.amsl.com>; Wed, 24 Jul 2019 07:39:05 -0700 (PDT)
Received: from mail-qk1-x729.google.com (mail-qk1-x729.google.com [IPv6:2607:f8b0:4864:20::729]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C0C361200B5 for <lsr@ietf.org>; Wed, 24 Jul 2019 07:39:03 -0700 (PDT)
Received: by mail-qk1-x729.google.com with SMTP id d79so33870964qke.11 for <lsr@ietf.org>; Wed, 24 Jul 2019 07:39:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=raszuk.net; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=jwXxFjueThNeLnQOUm+68BTn9NvNtv82wN0Z5BNoKus=; b=bwPCDPPQg1Haw7wDGkDzvYXS60dtEoG9GpXzW98RPnnRY5VkE+TNvtRKE2NWmjqwiS 9vjF4+i+77r+NfPk+3NvQRfrDdfIpobdMJTYEoGcddpVFNrMPaZjHF2H2YBaCNtd/zrv 24LcRuvojCZX/+e7y51n9EZke8Qmkknunph9qVwvllG8FarU1cBKC2C9klGYpNEqv1Ot AzXJyxuqlQ9wD3IK3Bs3KJlXsHcDFFVG2LFzg4s8LMH+mKk+YG90L8owpIU9BItf8/kn eYf4ibRSJKKBRf1tIuiyQdL6SWsi51Yi9q0D2gGY0jrp6enPX6smJ5AUjGdj6ZMdSkOE 1bfg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=jwXxFjueThNeLnQOUm+68BTn9NvNtv82wN0Z5BNoKus=; b=r8YoeKA1G+WzvOu3fGDwoBVZDGfBzfOdjx32EAG1ut3WrojmDiH7OseIMu6fYBXYmw 38cEEeMJFe8u+xGtC9dHZdylUmT7vxA/rdDaTXMYm58nqDwX+S5WBcrSt3AmaXDD1CQG LDMUzKj6NMKgfrXCw7wc4gtXO/TnkpkiP/nfsdqce1OOrrwQdcgt/wwVVsuf9myy6ET3 zoru7PbPkeVnz51zYWniYXvonc7PAx2HwjhgiAneCN8d6L0CWVeceTUzy88o4BJt4+Ke TxugB367Y/FoFS3Ybj3i3rlt+M6pio8pB8neu70n2/fl5FuGfpytxKv8wlpQshJSb41r C9hw==
X-Gm-Message-State: APjAAAVFGGyqnn9T5/uIzmYWrbF7NGV/cJnPCutK/k/8FlNTt4tpgOOJ AadqyCbzn9dA+8GZZLcqn2UyQmEEWOWhAQ5rTMD3hA==
X-Google-Smtp-Source: APXvYqzmJzggNiuVMtwbJgLwnquAxkz5HHFvvan30jBc6avhpvDbDOU1nvMYiwp6wGxkzP1JnKoiZLjmfnHn9yKLGjk=
X-Received: by 2002:ae9:e411:: with SMTP id q17mr52014720qkc.465.1563979142705; Wed, 24 Jul 2019 07:39:02 -0700 (PDT)
MIME-Version: 1.0
References: <CAMj-N0LdaNBapVNisWs6cbH6RsHiXd-EMg6vRvO_U+UQsYVvXw@mail.gmail.com> <BYAPR11MB36382C89363202D1B5659614C1C70@BYAPR11MB3638.namprd11.prod.outlook.com> <5841_1563943794_5D37E372_5841_105_1_9E32478DFA9976438E7A22F69B08FF924D9C373E@OPEXCAUBMA3.corporate.adroot.infra.ftgroup> <BYAPR11MB363856BB026992DFBB3BB224C1C60@BYAPR11MB3638.namprd11.prod.outlook.com> <8376a87831ffa6f5298c5122907c6e66@xs4all.nl> <CA+b+ER=LOZxoyoonPtC7VKppSNcQohGQdx+n8D3+LndnHdsofQ@mail.gmail.com> <cc806e622ad77ab73263cd9dc7eecad8@xs4all.nl>
In-Reply-To: <cc806e622ad77ab73263cd9dc7eecad8@xs4all.nl>
From: Robert Raszuk <robert@raszuk.net>
Date: Wed, 24 Jul 2019 16:38:49 +0200
Message-ID: <CAOj+MMHjOQY3Rp3ovHWdmCPsqW=T2LFWJyp5wpcdZL04RshFpg@mail.gmail.com>
To: Henk Smit <henk.ietf@xs4all.nl>
Cc: Robert Raszuk <rraszuk@gmail.com>, "Les Ginsberg (ginsberg)" <ginsberg@cisco.com>, Tony Li <tony.li@tony.li>, lsr@ietf.org
Content-Type: multipart/alternative; boundary="0000000000003cc90a058e6e4513"
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/L0_nx7CQ0y9YZ6kW_BgPe5B4jS4>
Subject: Re: [Lsr] Dynamic flow control for flooding
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 24 Jul 2019 14:39:09 -0000

Hi,

Yes indeed while I was reading your richly connected node restart problem
use of overload-bit should be explored, proposed, implemented.

For the partition problem I have two general comments:

a) If network partitions is likely to happen more often in the case of
dynamic flooding perhaps as already said before we should increase the max
number of occurrences given LSP is to arrive at flooding optimized node.
Two may not be enough.

b)  If protocol extensions will help to mitigate effects of network
partition via much faster repair some folks may treat network partitions as
normal operational model and instead of re-architecting the network to make
sure network partition events are as rear as possible.

Thx,
R.

On Wed, Jul 24, 2019 at 4:12 PM Henk Smit <henk.ietf@xs4all.nl> wrote:

>
> Hello Robert,
>
> Tony brought up the example of a partioned network.
> But there are more examples.
>
> E.g. in a network there is a router with a 1000 neighbors.
> (When discussing distributed vs centralized flooding-topology
>   reduction algorithms, I've been told these network designs exist).
> When such a router reboots/crashes/comes back up, all 1000 neighbors
> will create a new version of their own LSP. This causes a 1000 different
> LSPs to be flooded through the network at the same time. Impacting every
> router in the network.
>
> The case I was thinking of myself, was when a router in a large network
> boots. When it brings up a number of adjacencies, each neighbor will
> try to synchronize its LSPDB with the newly booted router. As the newly
> booted router will send emtpy CSNPs to each of its neighbors, each
> neighbor will start sending the full LSPDB. If such a network has 10k
> LSPs, and such a router has 100 neighbors, that router will receive 100
> * 10k
> is 1 million LSPs. Having a faster and more efficient flooding
> transport,
> with flow-control, will make a reboot in such a topology less painful.
>
> (In that last case, creative use of the overload-bit could prevent
> black-holing
> or microloops while ISIS synchronizes its LSPDB after a reboot. Just
> like we
> used the overload-bit to solve the problem of slow convergence of BGP
> after
> a reboot, 22 years ago. I have no idea if there are any implementations
> that
> use the overload-bit to alleviate slow convergence of IS-IS after a
> reboot).
>
> henk.
>
>
> Robert Raszuk schreef op 2019-07-24 15:33:
> > Hey Henk & all,
> >
> > If acks for 1000 LSPs take 16 PSNPs (max 66 per PSNP) or even as long
> > as Tony mentioned the full flooding as Tony said may take 33 sec - is
> > this really a problem ?
> >
> > Remember we are not talking about protocol convergence after link flap
> > or node going down. We are talking about serious network partitioning
> > which itself may have lasted for minutes, hours or days. While just
> > considering absolute numbers yelds desire to go faster and faster, if
> > we put things in the overall perspective is there really a problem to
> > be solved in the first place ?
> >
> > Would there still be a problem if LSR WG recommends faster acking
> > maybe not for each LSP but for say 20 or 30 max ?
> >
> > Thx,
> > R.
>
> _______________________________________________
> Lsr mailing list
> Lsr@ietf.org
> https://www.ietf.org/mailman/listinfo/lsr
>