Re: [Lsr] Benjamin Kaduk's Discuss on draft-ietf-lsr-isis-rfc5306bis-07: (with DISCUSS and COMMENT)

Benjamin Kaduk <kaduk@mit.edu> Fri, 20 September 2019 03:28 UTC

Return-Path: <kaduk@mit.edu>
X-Original-To: lsr@ietfa.amsl.com
Delivered-To: lsr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AC41A12008A; Thu, 19 Sep 2019 20:28:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.199
X-Spam-Level:
X-Spam-Status: No, score=-4.199 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sSuC8IAVNmMT; Thu, 19 Sep 2019 20:28:44 -0700 (PDT)
Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 540DE12002F; Thu, 19 Sep 2019 20:28:44 -0700 (PDT)
Received: from kduck.mit.edu ([24.16.140.251]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id x8K3STb7023217 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 19 Sep 2019 23:28:32 -0400
Date: Thu, 19 Sep 2019 22:28:29 -0500
From: Benjamin Kaduk <kaduk@mit.edu>
To: "Les Ginsberg (ginsberg)" <ginsberg@cisco.com>
Cc: The IESG <iesg@ietf.org>, "draft-ietf-lsr-isis-rfc5306bis@ietf.org" <draft-ietf-lsr-isis-rfc5306bis@ietf.org>, Uma Chunduri <umac.ietf@gmail.com>, "aretana.ietf@gmail.com" <aretana.ietf@gmail.com>, "lsr-chairs@ietf.org" <lsr-chairs@ietf.org>, "uma.chunduri@huawei.com" <uma.chunduri@huawei.com>, "lsr@ietf.org" <lsr@ietf.org>
Message-ID: <20190920032828.GI48975@kduck.mit.edu>
References: <156887619712.4539.18342580285245294855.idtracker@ietfa.amsl.com> <BYAPR11MB36381855CE71F8FF2E3BCB6BC1890@BYAPR11MB3638.namprd11.prod.outlook.com> <20190919225244.GF48975@kduck.mit.edu> <BYAPR11MB36385E619E0C560AD8CCDF7CC1880@BYAPR11MB3638.namprd11.prod.outlook.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <BYAPR11MB36385E619E0C560AD8CCDF7CC1880@BYAPR11MB3638.namprd11.prod.outlook.com>
User-Agent: Mutt/1.10.1 (2018-07-13)
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/P73mszrzg0Uk6DvGyoSHk2A7D1g>
Subject: Re: [Lsr] Benjamin Kaduk's Discuss on draft-ietf-lsr-isis-rfc5306bis-07: (with DISCUSS and COMMENT)
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 20 Sep 2019 03:28:48 -0000

On Fri, Sep 20, 2019 at 12:51:12AM +0000, Les Ginsberg (ginsberg) wrote:
> Ben -
> 
> You are making me work. 😊
> I posted V9 with a couple of additional changes.

Thanks, I appreciate the updates in V9.

> Responses inline.

Also inline.

> > -----Original Message-----
> > From: Benjamin Kaduk <kaduk@mit.edu>;
> > Sent: Thursday, September 19, 2019 3:53 PM
> > To: Les Ginsberg (ginsberg) <ginsberg@cisco.com>;
> > Cc: The IESG <iesg@ietf.org>;; draft-ietf-lsr-isis-rfc5306bis@ietf.org; Uma
> > Chunduri <umac.ietf@gmail.com>;; aretana.ietf@gmail.com; lsr-
> > chairs@ietf.org; uma.chunduri@huawei.com; lsr@ietf.org
> > Subject: Re: Benjamin Kaduk's Discuss on draft-ietf-lsr-isis-rfc5306bis-07:
> > (with DISCUSS and COMMENT)
> > 
> > On Thu, Sep 19, 2019 at 09:39:22PM +0000, Les Ginsberg (ginsberg) wrote:
> > > Ben -
> > >
> > > Thanx for the detailed review.
> > > I have posted V8 of the draft in response to your comments.
> > >
> > > More inline.
> > >
> > > > -----Original Message-----
> > > > From: Benjamin Kaduk via Datatracker <noreply@ietf.org>;
> > > > Sent: Wednesday, September 18, 2019 11:57 PM
> > > > To: The IESG <iesg@ietf.org>;
> > > > Cc: draft-ietf-lsr-isis-rfc5306bis@ietf.org; Uma Chunduri
> > > > <umac.ietf@gmail.com>;; aretana.ietf@gmail.com; lsr-chairs@ietf.org;
> > > > uma.chunduri@huawei.com; lsr@ietf.org
> > > > Subject: Benjamin Kaduk's Discuss on draft-ietf-lsr-isis-rfc5306bis-07:
> > (with
> > > > DISCUSS and COMMENT)
> > > >
> > > > Benjamin Kaduk has entered the following ballot position for
> > > > draft-ietf-lsr-isis-rfc5306bis-07: Discuss
> > > >
> > > > When responding, please keep the subject line intact and reply to all
> > > > email addresses included in the To and CC lines. (Feel free to cut this
> > > > introductory paragraph, however.)
> > > >
> > > >
> > > > Please refer to https://www.ietf.org/iesg/statement/discuss-
> > criteria.html
> > > > for more information about IESG DISCUSS and COMMENT positions.
> > > >
> > > >
> > > > The document, along with other ballot positions, can be found here:
> > > > https://datatracker.ietf.org/doc/draft-ietf-lsr-isis-rfc5306bis/
> > > >
> > > >
> > > >
> > > > ----------------------------------------------------------------------
> > > > DISCUSS:
> > > > ----------------------------------------------------------------------
> > > >
> > > > Section 3.2 notes that "the presence of [the Restart] TLV indicates that
> > > > the sender supports the functionality defined in this document".  But,
> > > > while that was true for RFC 5306, I don't see how it can be true for the
> > > > extended functionality that's new in this document.  It seems on first
> > > > look that the need for a PR to be acknowledged by a PA means that the
> > > > routers in question will be able to properly determine full feature
> > > > support at runtime, in which case this is essentially an editorial
> > > > issue, but I would like to make sure it's received enough thought, so am
> > > > raising it as a Discuss point to get the needed discussion.
> > > > (We will still need to change this text to reflect reality, though.)
> > > > It's also unclear to me if we need to give more description of the
> > > > behavior when a router planning to restart does not receive (all) the
> > > > necessary PAs -- does it cancel the planned restart?  Fall back to the
> > > > regular RR usage?
> > > >
> > >
> > > [Les:] I have revised the text to indicate that the absence of the TLV implies
> > that none of the functionality defined in the document is supported.
> > > The requirement to always include the TLV (if supported of course) exists
> > because the absence of the TLV is used by the receiver to immediately
> > conclude that none of the functionality is supported.
> > 
> > The new text looks great; thanks!
> > 
> > > In all cases where the TLV is present, there are timers/retries specified
> > when waiting for a response flag so that the (re)starting router will not wait
> > forever.
> > 
> > I trust you on that :)
> > 
> > > > It looks like there's an internal inconsistency between Section 3.2
> > > > ("[w]hen transmitting a TLV multiple flags MUST NOT be set.") and
> > > > Section 3.3.2 ("the IIH is retransmitted with both RR and SA bits set").
> > > >
> > >
> > > [Les:] Yes - good catch. The text about legal combinations was recently
> > added - but I overlooked the one case where multiple flags may be set. The
> > text has been revised.
> > 
> > I'll clear my discuss regardless (and thanks for the update), but
> > pedantically one might argue that having a single flag set and all other
> > flags clear remains a "flag combination", and we don't want to prohibit
> > that.  So perhaps "other flag combinations involving more than one flag" is
> > safer.
> > 
> [Les:2] Modified the text.
> 
> > > >
> > > > ----------------------------------------------------------------------
> > > > COMMENT:
> > > > ----------------------------------------------------------------------
> > > >
> > > > Abstract
> > > >
> > > > Four paragraphs is perhaps pushing it a bit, for RFC style.  It might
> > > > also be nice to avoid starting two sentences with "This document
> > > > additionally describes", e.g., as Barry suggests.
> > > >
> > >
> > > [Les:] Apologies, but I am choosing to leave the text as is.
> > 
> > No need to apologize!  If Heather wants to take up the cause, then you will
> > have to convince her, though.
> > 
> > > > It also seems a little mismatched to me to differentiate between
> > > > "restart while maintaining forwarding plane state" and "restarting",
> > > > since in Section 2 we make the convention that "restarting" means that
> > > > forwarding state is maintained definitionally.
> > > >
> > >
> > > [Les:] It does not seem appropriate to have the abstract depend on
> > definitions which are found in the body of the document - hence the explicit
> > mention of forwarding state in the abstract.
> > 
> > That's fair.
> > 
> > > > Section 1
> > > >
> > > >    In the case of a restarting router process, the first of these is
> > > >    highly undesirable, but the second is essential in order to ensure
> > > >    synchronization of the LSP database.
> > > >
> > > > Is the convention introduced in section 2 about preserving forarding
> > > > state intended to apply here?
> > > >
> > >
> > > [Les:] Yes.
> > 
> > I'd consider a forward reference to the definition (depending on how
> > unusual/topic-specific the definition is), but you will know better than I
> > whether the reader will benefit from the extra hint.
> > 
> 
> [Les2:] I would prefer reversing the order of Sections 1 and 2 - but that seems to be inconsistent with the template - which is why I did not do it.

That would be nice, if we were able to get away with it, yes.

> 
> > > > Section 3.2.3
> > > >
> > > >    Neighbors of the router which has signaled planned restart SHOULD
> > > >    maintain the adjacency in a planned restart state until it receives
> > > >    an IIH with the RR bit set, receives an IIH with both PR and RR bits
> > > >    clear, or the adjacency holding time expires - whichever occurs
> > > >    first.
> > > >
> > > > When might a router want to ignore the SHOULD (and how so)?
> > > >
> > >
> > > [Les:] This is at the discretion of the implementors/deployment.
> > > There are many choices/variables here. For example, if there are a robust #
> > of alternate paths it might not be of much benefit to maintain the adjacency
> > across the planned restart.
> > > The document is only defining a mechanism which allows for the
> > opportunity to maintain the adjacency.
> > > I would expect that in most cases the request to maintain the adjacency
> > will be honored (hence SHOULD) but if a customer wants to disable this I
> > think this is also fine.
> > 
> > I'm coming at this from the angle of, as Barry has reminded me recently for
> > a different document, the guidance in BCP 14: SHOULD is used when “there
> > may exist valid reasons in particular circumstances to ignore a
> > particular item, but the full implications must be understood and
> > carefully weighed before choosing a different course.”  Do we want to give
> > the reader any help in understanding those circumstances and implications?
> >
> [Les2:] In such situations I always worry that if I say "something", it will be interpreted too literally as implying that these are the ONLY conditions to be considered.
> By saying nothing I don’t rule anything out. 😊
> But I added some text.
> 
>  
> > > >    c.  on a Point-to-Point circuit, transmission of LSPs, CSNPs, and
> > > >        PSNPs MAY be suppressed.  It is expected that the PDUs will not
> > > >        be received.
> > > >
> > > > Are there any security considerations here (e.g., a spoofed PR flag
> > > > would cause suppression of updates and potentially DoS)?
> > > >
> > >
> > > [Les:] Security consideration are discussed in Section 6 - more on that
> > below.
> > > The point being made here is that since the control plane of the restarting
> > router is offline for an extended period sending LSPs etc. is of no use - there
> > is nobody home to process them.
> > 
> > I understand that, when operating as intended, the control plane is gone.
> > As SEC AD I have to ask whether we are at risk of mistakenly assuming the
> > control plane is gone when it actually isn't (but more below).
> > 
> 
> [Les2:] Having announced the intent to restart (PR set) the sender has no right to expect LSP updates to take place until they either rescind the PR (by sending a hello w PR clear) or they actually restart and send RR.
> The neighbor doesn’t really have to know when the actual restart is initiated.
> Spoofing is possible, but that is covered in the Security section.

Right.  (I am thinking of the spoofing case.)

> > > > Section 4.1
> > > >
> > > > I'm a little surprised to see "RX PA" under the "Running Router" table,
> > > > since I thought the "running" categorization was supposed to imply that
> > > > it was not going to restart (which would be a "restarting router").  If
> > > > the categorizations are exclusive, then receiving PA would be an error
> > > > condition.
> > > >
> > >
> > > [Les:] Another good catch. I have moved this to the Restarting Router
> > table.
> > >
> > > > Section 6
> > > >
> > > > I think we should go a bit further about the false-IIH with PR bit set,
> > > > since there is not a protocol mechanism to apply any sanity check to the
> > > > hold time that the recipient is obligated to apply (overriding local
> > > > policy).  Such spoofed values could be unreasonably large, and it may be
> > > > prudent to have a policy filter that rejects implausible values.
> > > > Even if such a policy is present, it seems that a series of spoofed IIHs
> > > > with PR set could force an adjacency to be maintained (absent topology
> > > > changes), by cancelling the PR-bit and then sending a new PR-bit in
> > > > quick succession.
> > > >
> > >
> > > [Les:] The draft currently says:
> > >
> > > "If the PR bit is set in a false IIH, neighbors who receive such an
> > >    IIH could modify the holding time of an existing adjacency
> > >    inappropriately."
> > >
> > > I believe that covers all of your points above.
> > 
> > If you think that "modify" covers "to an arbitrary value and lasting
> > indefinitely" then I cannot disagree.  Whether or not to call out the
> > extreme case is basically an editorial matter, for your discretion.
> > 
> 
> [Les2:]  I added some text here - though not about the value of the holding time.
> 
> > > > Relatedly, Section 3.2.1's description of the RR/RA bits notes that:
> > > >       "This procedure allows the restarting router to cause the neighbor
> > > >        to maintain the adjacency long enough for restart to successfully
> > > >        complete, while also preventing repetitive restarts from
> > > >        maintaining an adjacency indefinitely."
> > > > It doesn't seem that this property (preventing repetitive restarts from
> > > > maintaining an adjacency indefinitely) still holds for PR/PA with a
> > > > neighbor router that is (mis)behaving in a certain way.  Explicitly
> > > > noting this disparity seems reasonable to me.
> > > >
> > >
> > > [Les:] If a router sends PR and then does not actually restart, the correct
> > behavior on the neighbor is covered in Section 3.2.3:
> > >
> > > "Neighbors of the router which has signaled planned restart SHOULD
> > >    maintain the adjacency in a planned restart state until it receives
> > >    an IIH with the RR bit set, receives an IIH with both PR and RR bits
> > >    clear, or the adjacency holding time expires - whichever occurs
> > >    first."
> > 
> > I guess I was trying to double check on how this prescribed behavior
> > interacts with restarts that occur at an interval between the normal
> > (configured) hold time of the peer and the hold time advertised with the
> > PR-bit IIH.  That is, if a router is sending PR and actually restarting,
> > comes up and sends a normal IIH, but then shortly thereafter sends another
> > PR and restarts again, do we want to have the adjacency to that router go
> > away?  My guess is that we have no reason to discard the adjacency so long
> > as the forwarding state is maintained (and the topology doesn't change),
> > but maybe there are considerations about excessive load from needing to
> > repeatedly transfer state back to the restarting router.  (I'm also not
> > even 100% certain that the scenario I'm describing is fully analogous to
> > the one considered by the text I quote regarding RR/RA.)
> > 
> 
> [Les2:] Sending multiple PRs in a short time isn’t the problem. Actually restarting multiple times in a short interval could be a problem as it may indicate instability.
> There are implementations which refuse to honor an RR if it comes too soon after an earlier restart. This is an implementation choice. We never specified this.
> Section 3.2.1a does prohibit updating the holding time a second time until after the complete restart sequence has been completed.

I think the "complete restart sequence" could maybe be spoofed (without
actually doing some of the key bits), but I don't need to ask you to put
more text around the spoofing considerations.

Thanks,

Ben