Re: [Lsr] Dynamic flow control for flooding

tony.li@tony.li Wed, 24 July 2019 23:18 UTC

Return-Path: <tony1athome@gmail.com>
X-Original-To: lsr@ietfa.amsl.com
Delivered-To: lsr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4B4CC12027B for <lsr@ietfa.amsl.com>; Wed, 24 Jul 2019 16:18:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.558
X-Spam-Level:
X-Spam-Status: No, score=-1.558 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.091, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LDc_HuC-gXPs for <lsr@ietfa.amsl.com>; Wed, 24 Jul 2019 16:18:15 -0700 (PDT)
Received: from mail-pf1-x430.google.com (mail-pf1-x430.google.com [IPv6:2607:f8b0:4864:20::430]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A933112006E for <lsr@ietf.org>; Wed, 24 Jul 2019 16:18:15 -0700 (PDT)
Received: by mail-pf1-x430.google.com with SMTP id y15so21684393pfn.5 for <lsr@ietf.org>; Wed, 24 Jul 2019 16:18:15 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=HCbsh7yt6iLQWCo2NYWDsioUe0OwAK2kO4R48S8ytRc=; b=jJGiTzRnZmyTloF7nGy4bMPTNePSlMC8GFnb1fcfVBMxKb8hoX8HlceJpcqZsl4N8z GFLoUEn4jWVSxHHbJkDOhwncKEYERC9DDbpmJLtiE39MmjOrROSrcdKYGxHG8vz8gymB LlWvt3OxScKBAd/cGnWbAX9IREwriij9wg4fPTfP4SHfh8SO+vClFnTS5ZmvTs378LHS MC2zLmmOQ/5NLPQkLknTGGumVB4THj4znrGYT9OaTXUWjL2prxAkpPmKVotCdMGGF3we djNsyoeLxM9ueN+7LQEQJK7BJYEvlZEqzBc+EqfODJ/gN7ZjQII7ZKLAYw7H86m0QP7v kEPA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=HCbsh7yt6iLQWCo2NYWDsioUe0OwAK2kO4R48S8ytRc=; b=mjFN7TAc+zTqcLTsi47d3ntBqNUL99d/gdbOp0Asn7dUeija+Gl3G4/eCrHgnry5EK 4ENscB+qvgad8OEwNj0rY2TP/7qKmdHp7M+3G9h+NP0ngXuNeaNJbXODW3I7G42knDy/ wk5TvkWdXlCVHx3B/JgtIP7bwdDcwXvzx0MdtQEB+e5m/hphmmIdYi0W/pv+hArJoN15 FCb/bvDbXnRxRFuC8D/wz2VLhJ7z+43kvQ1MCksU9oXavo8YK/tGhzAOyg2kNFmvw0wB FMMFy7PqfKBgTCsGkJ6+9EUYlbaxEGeWdzOgurm870tgMcFNUX0GtTAmOBs53mri116M thGw==
X-Gm-Message-State: APjAAAVdFYJpIOC/b5+UZwCcF4bcaKhUXceB+qqsu1Fy3C9RVr5yBKPj v//kJfELP2/WbftJDVdWU7qP0GIZ
X-Google-Smtp-Source: APXvYqy5D6mIIC12/J/Xk5ZGHSH81MvC27PSv3L25fU217RHWfzDGF9c1l6aUa669+t5YBZcurMeCQ==
X-Received: by 2002:a62:35c6:: with SMTP id c189mr13662198pfa.96.1564010295159; Wed, 24 Jul 2019 16:18:15 -0700 (PDT)
Received: from [172.22.228.115] ([162.210.130.3]) by smtp.gmail.com with ESMTPSA id j15sm47522261pfr.146.2019.07.24.16.18.14 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 24 Jul 2019 16:18:14 -0700 (PDT)
Sender: Tony Li <tony1athome@gmail.com>
From: tony.li@tony.li
Message-Id: <56141168-4FD5-432E-9DEB-433E5F7A8506@tony.li>
Content-Type: multipart/alternative; boundary="Apple-Mail=_F4F61A2C-9C3B-4ACE-AB6F-69E2FC56A9F6"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
Date: Wed, 24 Jul 2019 16:18:13 -0700
In-Reply-To: <BYAPR11MB3638B0B52786E0D3C837F21EC1C60@BYAPR11MB3638.namprd11.prod.outlook.com>
Cc: "stephane.litkowski@orange.com" <stephane.litkowski@orange.com>, "lsr@ietf.org" <lsr@ietf.org>
To: Les Ginsberg <ginsberg@cisco.com>
References: <CAMj-N0LdaNBapVNisWs6cbH6RsHiXd-EMg6vRvO_U+UQsYVvXw@mail.gmail.com> <BYAPR11MB36382C89363202D1B5659614C1C70@BYAPR11MB3638.namprd11.prod.outlook.com> <5841_1563943794_5D37E372_5841_105_1_9E32478DFA9976438E7A22F69B08FF924D9C373E@OPEXCAUBMA3.corporate.adroot.infra.ftgroup> <BYAPR11MB363856BB026992DFBB3BB224C1C60@BYAPR11MB3638.namprd11.prod.outlook.com> <7D53FA6A-8072-4FC5-ABC9-5791F139C011@tony.li> <BYAPR11MB3638CD7EDAD8185BC4A788AEC1C60@BYAPR11MB3638.namprd11.prod.outlook.com> <5182CD7E-EBE2-4402-926D-24D427217D10@tony.li> <BYAPR11MB3638B0B52786E0D3C837F21EC1C60@BYAPR11MB3638.namprd11.prod.outlook.com>
X-Mailer: Apple Mail (2.3445.104.11)
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/BKEcfkpufmmR5nr0JnT8xNElViI>
Subject: Re: [Lsr] Dynamic flow control for flooding
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 24 Jul 2019 23:18:17 -0000

Les,


> Very true.  It would not be unreasonable for an implementation to report free space in the FIFO (in number of PDUs) divided by the number of active adjacencies.  Everyone gets their fair share.
>  
> [If dynamic flooding is enabled, this could be based on the number of adjacencies that should be actively flooding. That should be a much smaller number.]
> [Les:] So you are agreeing that when a receiver wants to “dial back” it will need to do so on all interfaces enabled for flooding?


Certainly.  There’s one CPU, one input queue.   If there’s a flooding event in progress, then it’s likely to arrive from all sides.  Dialing back on all interfaces makes sense as self-preservation.


> LSPs may be dropped at lower layers – IS-IS receiver may be unaware that the overload condition exists
>  
> That’s an implementation problem. The implementation NEEDS to be able to see its input queue plus input drops.
>  
> [Les:] And you want to ship this feature when…? 😊
> I think this is a difficult ask.
> Before we decide this is what is required we should explore the path of monitoring the unacknowledged Tx queue.


That’s the tail wagging the dog.  

I’m less concerned about when.  The people who want this feature are going to pay for it.  They will set the agenda and schedule, not us.  We need to figure out how to do it properly.

 
> Updating hellos dynamically to alter flooding transmission rate is an OOB signaling mechanism consuming  resources at a time when routers are the most busy
> Consistent flooding rates will require updated hellos be sent to all neighbors – exacerbating the cost on both sender and receiver
>  
> This is why I suggest sending the feedback in PSNPs as well as in IIHs.  Regardless of the details, we need to consider sending PSNPs back more frequently.  I concur that optimizing the rate and triggers for sending more PSNPs is an open issue.
>  
> Strictly speaking, sending a TLV inside of our protocol PDUs is an in-band signaling mechanism.
> [Les:] I agree – PSNP would be better since we need to send it anyway in order to ACK. Still does not convince me this is the preferred approach – but I agree it is better than hellos.


Please note that I prefer BOTH.  Sending it in IIHs is useful because it allows adjacencies that are not expecting a PSNP to still get some feedback. It would also prevent problems that we’ve seen with TCP and connections getting stuck with a zero window.


> The resources consumed by maintaining a running count of a queue in silicon or in process space is effectively zero.
>  
> [Les:] It is not about counting – it is about how a given queue might be used. It isn’t reasonable to mandate that a dataplane-to-forwarding plane queue be dedicated to IS-IS. What other control plane entities are using the queue and how they empty it will introduce new variables. And the implementation cost comes in providing “real time updates” on the current queue space to clients that need it.


Well, that’s up to your implementation.  Even if it is a shared queue, it has a finite depth and the transmitter should not overrun it. Scale your numbers accordingly.


> I really think monitoring the unacknowledged TX queue will give us what we need and make the solution completely contained within the IS-IS implementation.
> Guess I need to work on more details on that approach.


Please focus on how the transmitter distinguishes between packets lost and packets queued at the receiver.

Both imply that the TX rate was too high.  One suggests that retransmissions should be expedited.  The other suggests that they should not.

Tony