Re: [Lsr] Benjamin Kaduk's Discuss on draft-ietf-lsr-isis-rfc5306bis-07: (with DISCUSS and COMMENT)

Benjamin Kaduk <kaduk@mit.edu> Thu, 19 September 2019 22:53 UTC

Return-Path: <kaduk@mit.edu>
X-Original-To: lsr@ietfa.amsl.com
Delivered-To: lsr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8E4F212084F; Thu, 19 Sep 2019 15:53:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.199
X-Spam-Level:
X-Spam-Status: No, score=-4.199 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id t4JKOnTQQrBW; Thu, 19 Sep 2019 15:52:58 -0700 (PDT)
Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7294A120805; Thu, 19 Sep 2019 15:52:58 -0700 (PDT)
Received: from kduck.mit.edu ([24.16.140.251]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id x8JMqj37030553 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 19 Sep 2019 18:52:47 -0400
Date: Thu, 19 Sep 2019 17:52:44 -0500
From: Benjamin Kaduk <kaduk@mit.edu>
To: "Les Ginsberg (ginsberg)" <ginsberg@cisco.com>
Cc: The IESG <iesg@ietf.org>, "draft-ietf-lsr-isis-rfc5306bis@ietf.org" <draft-ietf-lsr-isis-rfc5306bis@ietf.org>, Uma Chunduri <umac.ietf@gmail.com>, "aretana.ietf@gmail.com" <aretana.ietf@gmail.com>, "lsr-chairs@ietf.org" <lsr-chairs@ietf.org>, "uma.chunduri@huawei.com" <uma.chunduri@huawei.com>, "lsr@ietf.org" <lsr@ietf.org>
Message-ID: <20190919225244.GF48975@kduck.mit.edu>
References: <156887619712.4539.18342580285245294855.idtracker@ietfa.amsl.com> <BYAPR11MB36381855CE71F8FF2E3BCB6BC1890@BYAPR11MB3638.namprd11.prod.outlook.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <BYAPR11MB36381855CE71F8FF2E3BCB6BC1890@BYAPR11MB3638.namprd11.prod.outlook.com>
User-Agent: Mutt/1.10.1 (2018-07-13)
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/saBQ-UVIWckFoK9PtEjS6K6t6JA>
Subject: Re: [Lsr] Benjamin Kaduk's Discuss on draft-ietf-lsr-isis-rfc5306bis-07: (with DISCUSS and COMMENT)
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 19 Sep 2019 22:53:05 -0000

On Thu, Sep 19, 2019 at 09:39:22PM +0000, Les Ginsberg (ginsberg) wrote:
> Ben -
> 
> Thanx for the detailed review.
> I have posted V8 of the draft in response to your comments.
> 
> More inline.
> 
> > -----Original Message-----
> > From: Benjamin Kaduk via Datatracker <noreply@ietf.org>
> > Sent: Wednesday, September 18, 2019 11:57 PM
> > To: The IESG <iesg@ietf.org>
> > Cc: draft-ietf-lsr-isis-rfc5306bis@ietf.org; Uma Chunduri
> > <umac.ietf@gmail.com>; aretana.ietf@gmail.com; lsr-chairs@ietf.org;
> > uma.chunduri@huawei.com; lsr@ietf.org
> > Subject: Benjamin Kaduk's Discuss on draft-ietf-lsr-isis-rfc5306bis-07: (with
> > DISCUSS and COMMENT)
> > 
> > Benjamin Kaduk has entered the following ballot position for
> > draft-ietf-lsr-isis-rfc5306bis-07: Discuss
> > 
> > When responding, please keep the subject line intact and reply to all
> > email addresses included in the To and CC lines. (Feel free to cut this
> > introductory paragraph, however.)
> > 
> > 
> > Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
> > for more information about IESG DISCUSS and COMMENT positions.
> > 
> > 
> > The document, along with other ballot positions, can be found here:
> > https://datatracker.ietf.org/doc/draft-ietf-lsr-isis-rfc5306bis/
> > 
> > 
> > 
> > ----------------------------------------------------------------------
> > DISCUSS:
> > ----------------------------------------------------------------------
> > 
> > Section 3.2 notes that "the presence of [the Restart] TLV indicates that
> > the sender supports the functionality defined in this document".  But,
> > while that was true for RFC 5306, I don't see how it can be true for the
> > extended functionality that's new in this document.  It seems on first
> > look that the need for a PR to be acknowledged by a PA means that the
> > routers in question will be able to properly determine full feature
> > support at runtime, in which case this is essentially an editorial
> > issue, but I would like to make sure it's received enough thought, so am
> > raising it as a Discuss point to get the needed discussion.
> > (We will still need to change this text to reflect reality, though.)
> > It's also unclear to me if we need to give more description of the
> > behavior when a router planning to restart does not receive (all) the
> > necessary PAs -- does it cancel the planned restart?  Fall back to the
> > regular RR usage?
> > 
> 
> [Les:] I have revised the text to indicate that the absence of the TLV implies that none of the functionality defined in the document is supported. 
> The requirement to always include the TLV (if supported of course) exists because the absence of the TLV is used by the receiver to immediately conclude that none of the functionality is supported.

The new text looks great; thanks!

> In all cases where the TLV is present, there are timers/retries specified when waiting for a response flag so that the (re)starting router will not wait forever.

I trust you on that :)

> > It looks like there's an internal inconsistency between Section 3.2
> > ("[w]hen transmitting a TLV multiple flags MUST NOT be set.") and
> > Section 3.3.2 ("the IIH is retransmitted with both RR and SA bits set").
> >  
> 
> [Les:] Yes - good catch. The text about legal combinations was recently added - but I overlooked the one case where multiple flags may be set. The text has been revised.

I'll clear my discuss regardless (and thanks for the update), but
pedantically one might argue that having a single flag set and all other
flags clear remains a "flag combination", and we don't want to prohibit
that.  So perhaps "other flag combinations involving more than one flag" is
safer.

> > 
> > ----------------------------------------------------------------------
> > COMMENT:
> > ----------------------------------------------------------------------
> > 
> > Abstract
> > 
> > Four paragraphs is perhaps pushing it a bit, for RFC style.  It might
> > also be nice to avoid starting two sentences with "This document
> > additionally describes", e.g., as Barry suggests.
> > 
> 
> [Les:] Apologies, but I am choosing to leave the text as is.

No need to apologize!  If Heather wants to take up the cause, then you will
have to convince her, though.

> > It also seems a little mismatched to me to differentiate between
> > "restart while maintaining forwarding plane state" and "restarting",
> > since in Section 2 we make the convention that "restarting" means that
> > forwarding state is maintained definitionally.
> > 
> 
> [Les:] It does not seem appropriate to have the abstract depend on definitions which are found in the body of the document - hence the explicit mention of forwarding state in the abstract.

That's fair.

> > Section 1
> > 
> >    In the case of a restarting router process, the first of these is
> >    highly undesirable, but the second is essential in order to ensure
> >    synchronization of the LSP database.
> > 
> > Is the convention introduced in section 2 about preserving forarding
> > state intended to apply here?
> > 
> 
> [Les:] Yes.

I'd consider a forward reference to the definition (depending on how
unusual/topic-specific the definition is), but you will know better than I
whether the reader will benefit from the extra hint.

> > Section 3.2.3
> > 
> >    Neighbors of the router which has signaled planned restart SHOULD
> >    maintain the adjacency in a planned restart state until it receives
> >    an IIH with the RR bit set, receives an IIH with both PR and RR bits
> >    clear, or the adjacency holding time expires - whichever occurs
> >    first.
> > 
> > When might a router want to ignore the SHOULD (and how so)?
> > 
> 
> [Les:] This is at the discretion of the implementors/deployment. 
> There are many choices/variables here. For example, if there are a robust # of alternate paths it might not be of much benefit to maintain the adjacency across the planned restart.
> The document is only defining a mechanism which allows for the opportunity to maintain the adjacency.
> I would expect that in most cases the request to maintain the adjacency will be honored (hence SHOULD) but if a customer wants to disable this I think this is also fine.

I'm coming at this from the angle of, as Barry has reminded me recently for
a different document, the guidance in BCP 14: SHOULD is used when “there
may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.”  Do we want to give
the reader any help in understanding those circumstances and implications?

> >    c.  on a Point-to-Point circuit, transmission of LSPs, CSNPs, and
> >        PSNPs MAY be suppressed.  It is expected that the PDUs will not
> >        be received.
> > 
> > Are there any security considerations here (e.g., a spoofed PR flag
> > would cause suppression of updates and potentially DoS)?
> > 
> 
> [Les:] Security consideration are discussed in Section 6 - more on that below.
> The point being made here is that since the control plane of the restarting router is offline for an extended period sending LSPs etc. is of no use - there is nobody home to process them.

I understand that, when operating as intended, the control plane is gone.
As SEC AD I have to ask whether we are at risk of mistakenly assuming the
control plane is gone when it actually isn't (but more below).

> > Section 4.1
> > 
> > I'm a little surprised to see "RX PA" under the "Running Router" table,
> > since I thought the "running" categorization was supposed to imply that
> > it was not going to restart (which would be a "restarting router").  If
> > the categorizations are exclusive, then receiving PA would be an error
> > condition.
> > 
> 
> [Les:] Another good catch. I have moved this to the Restarting Router table.
> 
> > Section 6
> > 
> > I think we should go a bit further about the false-IIH with PR bit set,
> > since there is not a protocol mechanism to apply any sanity check to the
> > hold time that the recipient is obligated to apply (overriding local
> > policy).  Such spoofed values could be unreasonably large, and it may be
> > prudent to have a policy filter that rejects implausible values.
> > Even if such a policy is present, it seems that a series of spoofed IIHs
> > with PR set could force an adjacency to be maintained (absent topology
> > changes), by cancelling the PR-bit and then sending a new PR-bit in
> > quick succession.
> > 
> 
> [Les:] The draft currently says:
> 
> "If the PR bit is set in a false IIH, neighbors who receive such an
>    IIH could modify the holding time of an existing adjacency
>    inappropriately."
> 
> I believe that covers all of your points above.

If you think that "modify" covers "to an arbitrary value and lasting
indefinitely" then I cannot disagree.  Whether or not to call out the
extreme case is basically an editorial matter, for your discretion.

> > Relatedly, Section 3.2.1's description of the RR/RA bits notes that:
> >       "This procedure allows the restarting router to cause the neighbor
> >        to maintain the adjacency long enough for restart to successfully
> >        complete, while also preventing repetitive restarts from
> >        maintaining an adjacency indefinitely."
> > It doesn't seem that this property (preventing repetitive restarts from
> > maintaining an adjacency indefinitely) still holds for PR/PA with a
> > neighbor router that is (mis)behaving in a certain way.  Explicitly
> > noting this disparity seems reasonable to me.
> > 
> 
> [Les:] If a router sends PR and then does not actually restart, the correct behavior on the neighbor is covered in Section 3.2.3:
> 
> "Neighbors of the router which has signaled planned restart SHOULD
>    maintain the adjacency in a planned restart state until it receives
>    an IIH with the RR bit set, receives an IIH with both PR and RR bits
>    clear, or the adjacency holding time expires - whichever occurs
>    first."

I guess I was trying to double check on how this prescribed behavior
interacts with restarts that occur at an interval between the normal
(configured) hold time of the peer and the hold time advertised with the
PR-bit IIH.  That is, if a router is sending PR and actually restarting,
comes up and sends a normal IIH, but then shortly thereafter sends another
PR and restarts again, do we want to have the adjacency to that router go
away?  My guess is that we have no reason to discard the adjacency so long
as the forwarding state is maintained (and the topology doesn't change),
but maybe there are considerations about excessive load from needing to
repeatedly transfer state back to the restarting router.  (I'm also not
even 100% certain that the scenario I'm describing is fully analogous to
the one considered by the text I quote regarding RR/RA.)

Thanks,

Ben