[Lsr] AD Review of draft-ietf-lsr-isis-rfc5306bis-02

Alvaro Retana <aretana.ietf@gmail.com> Fri, 02 August 2019 15:56 UTC

Return-Path: <aretana.ietf@gmail.com>
X-Original-To: lsr@ietfa.amsl.com
Delivered-To: lsr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AF0DE120614; Fri, 2 Aug 2019 08:56:40 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.754
X-Spam-Level:
X-Spam-Status: No, score=-0.754 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, NORMAL_HTTP_TO_IP=0.001, NUMERIC_HTTP_ADDR=1.242, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wnRRaSL1rNyk; Fri, 2 Aug 2019 08:56:36 -0700 (PDT)
Received: from mail-ed1-x534.google.com (mail-ed1-x534.google.com [IPv6:2a00:1450:4864:20::534]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B50FB120628; Fri, 2 Aug 2019 08:56:19 -0700 (PDT)
Received: by mail-ed1-x534.google.com with SMTP id e3so72809915edr.10; Fri, 02 Aug 2019 08:56:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:mime-version:date:message-id:subject:to:cc; bh=YAAno79QrwqMm9A1jcj2KzxzlCGPK0LtyX/Zuz7RNio=; b=lsBuOGrMgkvNv4lE4V8hRp7/0GSyh+hsb4+KWu4lG5wqeXXSirDV78FwA2dgViUg1T zosXnsAIkBxjqUF1ZxE2fsjXTwVp7CpArraeOd/IdaeVTY+9bO/fErfk1hK9d8+Flh1A rWNRpt+Yvtz/EbITsjw2+xAnGk5tfmZVx7f5fMdVWEhBKVOZ9XtII47dWW3U1NALrSew K97kO4rE0ZrzS7XzrdNnfYDKAwNU/x5rB0UsOZhxS8zvVZNosb+euWQtg0z092lfl9A8 F0k2ehWh+3NAa2Kz5ImgN7wn1oP9+bS2VfcnKcqcDclURAe6Mn1yExZalz3wxExmDcCR l3WA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:mime-version:date:message-id:subject:to:cc; bh=YAAno79QrwqMm9A1jcj2KzxzlCGPK0LtyX/Zuz7RNio=; b=qbnOifpIOxmNVMOd9cdny915/246k88iridJM9HDEm1QeAhEGeNOgVINF12n2T/gu2 m0ZeUt0d0feKSEnzCWHSVKCFXefIO9bJlhA9AvQF/usawqKUw+zE/ud79t0nvyOPZhbx E/lB1WJ+S+TyVBvOI1R20AdgCc70kYzadtb6p/ijb186xnhh2zEe+/Klsa7GfRBgis2o hUem3LKnGpfp8F5m9YEHgB/liSo2Bz/3baUBRUMTJW8VnTDnRk46DAiPMADqrNVh4c6H iGEQJt96QA28KBWEzsoM/v4Oz1hEafm1jOwXXkhqPD/2VZ6193TqSfU3YbSn496aKH3A klkw==
X-Gm-Message-State: APjAAAUfU/Xx1QxVGq9dUqk0D4lKsIR7zAlLiOX/qbh0dSyQcD55gIne MLhG3o2NBQAO+eTPcRN/FR6529pfFbxFsvHNo1zD6HkN
X-Google-Smtp-Source: APXvYqwXT178v2CYe5rxO2kS3doJEDfezPMsmVpf5D2jTQZwhxJ3Hyr0Ah0vq0P1rBdd5BLIxAy5lf3h2vJ9VpMJMD4=
X-Received: by 2002:aa7:c313:: with SMTP id l19mr108282430edq.258.1564761377772; Fri, 02 Aug 2019 08:56:17 -0700 (PDT)
Received: from 1058052472880 named unknown by gmailapi.google.com with HTTPREST; Fri, 2 Aug 2019 08:56:16 -0700
From: Alvaro Retana <aretana.ietf@gmail.com>
MIME-Version: 1.0
Date: Fri, 02 Aug 2019 08:56:16 -0700
Message-ID: <CAMMESswNLuG1FxSPhtL7eR7MveAHMWvmC0zLaL2iHsVVyVe6Bw@mail.gmail.com>
To: draft-ietf-lsr-isis-rfc5306bis@ietf.org
Cc: Uma Chunduri <umac.ietf@gmail.com>, lsr-chairs@ietf.org, lsr@ietf.org
Content-Type: multipart/alternative; boundary="000000000000149c50058f246645"
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/5SQk0bUJdlRAR8OAM6oaBdkZvyo>
Subject: [Lsr] AD Review of draft-ietf-lsr-isis-rfc5306bis-02
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Aug 2019 15:56:41 -0000

Dear authors:

I just finished reading this document.  Thank you for your work on it.

Why was it decided to create a bis document and not just an Update to
rfc5306?  Either way works for me, I'm just curious.

Knowing that the change in this document, with respect to rfc5306, is new
functionality, I tried very hard to not make comments about the unchanged
text, I really did.  However, there are a couple of significant items that
I *have* to comment on (please see in-line)...mostly related to lack of
clarity, or opportunities that we should take to improve.  In general, I
think the text could use a bit more clarity, but I won't insist on it to
avoid further changes.

The result of the elimination of §2 (Conventions Used in This Document) is
that explanations of what restart/restarting and start/starting mean were
removed. I think those helped frame the expectations of the operation.  In
the current document, there is no formal definition of those terms.  I
strongly believe that the text should be restored, perhaps at the end of
the Overview.

I will wait until the major items (in-line) are addressed before I start
the IETF Last Call.

Thanks!!

Alvaro.

[Line numbers from idnits.]


...
11 Abstract
...
18   This document additionally describes a mechansim for a router to
19   signal its neighbors that it is preparing to initiate a restart while
20   maintaining forwarding plane state.  This allows the neighbors to
21   maintain their adjacencies until the router has restarted, but also
22   allows the neighbors to bring the adjacencies down in the event of
23   other topology changes.

[nit] s/mechansim/mechanism


...
100 1.  Overview
...
158   It is assumed that the three-way handshake [RFC5303] is being used on
159   Point-to-Point circuits.

[major] If only assumed, will the mechanism work if 3-way handshake is not
used?  Why it is not mandated?

161 2.  Approach

163 2.1.  Timers
...
171   An instance of the timer T1 is maintained per interface, and
172   indicates the time after which an unacknowledged (re)start attempt
173   will be repeated.  A typical value might be 3 seconds.

[minor] s/A typical value might be 3 seconds./A typical value is 3 seconds.

175   An instance of the timer T2 is maintained for each LSP database
176   (LSPDB) present in the system, i.e., for a Level 1/2 system, there
177   will be an instance of the timer T2 for Level 1 and an instance for
178   Level 2.  This is the maximum time that the system will wait for
179   LSPDB synchronization.  A typical value might be 60 seconds.

[minor] s/A typical value might be 60 seconds./A typical value is 60
seconds.

181   A single instance of the timer T3 is maintained for the entire
182   system.  It indicates the time after which the router will declare
183   that it has failed to achieve database synchronization (by setting
184   the overload bit in its own LSP).  This is initialized to 65535
185   seconds, but is set to the minimum of the remaining times of received
186   IS-IS Hellos (IIHs) containing a restart TLV with the Restart
187   Acknowledgement (RA) set and an indication that the neighbor has an
188   adjacency in the "UP" state to the restarting router.

[nit] I think a comma would help in differentiating the two things in the
last sentence:  s/and/, and

[major] Maybe it's just me, but I don't know what the "the remaining times
of...an indication that the neighbor has an adjacency in the "UP" state to
the restarting router" is.  Please clarify.

190   NOTE: The timer T3 is only used by a restarting router.

192 2.2.  Restart TLV
...
200 Type 211

202      Length: Number of octets in the Value field (1 to (3 + ID Length))
203      Value

[minor] s/Length:...(1 to (3 + ID Length))/Length:...(1 or (3 + ID Length))

[nit] Add a line before "Value".

205                                       No. of octets
206        +-----------------------+
207        |   Flags               |     1
208        +-----------------------+
209        | Remaining Time        |     2
210        +-----------------------+
211        | Restarting Neighbor ID|     ID Length
212        +-----------------------+

214    Flags (1 octet)

216         0  1  2  3  4  5  6  7
217        +--+--+--+--+--+--+--+--+
218        |Reserved|PA|PR|SA|RA|RR|
219        +--+--+--+--+--+--+--+--+

[minor] Do you intend for IANA to assign the remaining bits?

221        RR - Restart Request
222        RA - Restart Acknowledgement
223        SA - Suppress adjacency advertisement
224        PR - Restart is planned
225        PA - Planned restart acknowledgement

[major] The functionality in this document (old and new) depends on a
single bit being set.  What should a receiver do if more than one bit is
set?  Are there valid combinations of multiple bits?

227   (Note: Remaining fields are )

[nit] Leftover text...


...
245        Note: Implementations based on earlier drafts of RFC 5306
246        may not include this field in the TLV when the RA bit is set.
247        In this case, a router that is expecting an RA on a LAN circuit
248        SHOULD assume that the acknowledgement is directed at the local
249        system.

[major] I think it is time we stop catering to old/non-compliant
implementations.  The last time that a draft didn't include the System ID
was draft-ietf-isis-restart-03 in March 2003: more than 15 years ago!!


...
360 2.2.3.  Use of PR and PA Bits
...
376   a.  if this is the first IIH with the PR bit set that this system has
377       received associated with this adjacency, then the adjacency is
378       marked as being in "Planned Restart state" and the adjacency
379       holding time is refreshed -- otherwise, the holding time is not
380       refreshed.  The holding time SHOULD be set to the "remaining
381       time" specified in the received IIH with PR set.  The "remaining
382       time" transmitted according to (b) below MUST reflect the actual
383       time after which the adjacency will now expire.  Receipt of a
384       normal IIH with the PR bit reset will clear the "Planned Restart
385       mode" state and cause the receiving router to set the adjacency
386       hold time to the locally configured value.  This procedure allows
387       the router planning a restart to cause the neighbor to maintain
388       the adjacency long enough for restart to successfully complete.
389       Whether or not an adjacency is marked as being in "Planned
390       Restart mode" has no effect on adjacency state transitions.

[minor] Is it "Planned Restart state" or "Planned Restart mode" state?

392   b.  immediately (i.e., without waiting for any currently running
393       timer interval to expire, but with a small random delay of a few
394       tens of milliseconds on LANs to avoid "storms") transmit over the
395       corresponding interface an IIH including the restart TLV with the
396       PR bit clear and the PA bit set.  The "Remaining Time" MUST be
397       set to the current time (in seconds) before the holding timer on
398       this adjacency is due to expire.  If the corresponding interface
399       is a LAN interface, then the Restarting Neighbor System ID SHOULD
400       be set to the System ID of the router from which the IIH with the
401       PR bit set was received.  This is required to correctly associate
402       the acknowledgement and holding time in the case where multiple
403       systems on a LAN are planning a restart at approximately the same
404       time.

[major] "System ID SHOULD be set to the System ID of the router from which
the...PR bit set was received.  This is required to correctly associate the
acknowledgement...where multiple systems on a LAN are planning a restart at
approximately the same time."  IOW, if the System ID is not set correctly,
then the sender of the IIH+PR won't necessarily know the reply is to it...
When is it ok to not set the System ID that way?  IOW, why is MUST not
used?  [BTW, I know this is the same text as in 2.2.1...this comment
applies there too.]


...
428   While the adjacency is in planned restart state the following actions
429   MAY be taken:

[major] The steps below are optional (MAY)...that is ok.  (a) talks about a
compromised ability to flood updates on a LAN is adjacencies are not reset,
which sound very serious to me.  What are the considerations for an
implementation/deployment to decide using these optional actions?  Should
they be used on LANs, but not necessarily in other link types (just as an
example)?

[minor] The actions below are independent of each other, right?  It may be
worth mentioning that.

431   a.  If additional topology changes occur, the adjacency which is in
432       planned restart state MAY be brought down even though the hold
433       time has not yet expired.  Given that the neighbor which has
434       signaled a planned restart is not expected to update its
435       forwarding plane in response to signaling of the topology changes
436       (since it is restarting) traffic which transits that node is at
437       risk of being improperly forwarded.  On a LAN circuit, if the
438       router in planned restart state is the DIS at any supported
439       level, the adjacency(ies) SHOULD be brought down whenever any LSP
440       update is either generated or received so as to trigger a new DIS
441       election.  Failure to do so will compromise the reliability of
442       the Update Process on that circuit.  What other criteria are used
443       to determine what topology changes will trigger bringing the
444       adjacency down is a local implementation decision.

[nit] s/received so as to/received, so as to

[major] "adjacency(ies) SHOULD be brought down...Failure to do so will
compromise the reliability of the Update Process on that circuit."  When is
it ok to not bring these adjacencies down?  If the operator already decided
to deploy these optional steps, why wouldn't the adjacencies be always
brought down?  IOW, why is MUST not used?

446   b.  If a BFD session to the neighbor which signals a planned restart
447       is in the UP state and subsequently goes DOWN, the event MAY be
448       ignored since it is possible this is an expected side effect of
449       the restart.  Use of the Control Plane Independent state as
450       signalled in BFD control packets [RFC5880] SHOULD be considered
451       in the decision to ignore a BFD Session DOWN event

[nit] Move the reference to rfc5880 to the first mention of BFD.

[nit] s/Session DOWN event/Session DOWN event.


...
467 2.3.1.  Adjacency Reacquisition during Restart

469   The restarting router explicitly notifies its neighbor that the
470   adjacency is being reacquired, and hence that it SHOULD NOT
471   reinitialize the adjacency.  This is achieved by setting the RR bit
472   in the restart TLV.  When the neighbor of a restarting router
473   receives an IIH with the restart TLV having the RR bit set, if there
474   exists on this interface an adjacency in state "UP" with the same
475   System ID, and in the case of a LAN circuit, with the same source LAN
476   address, then the procedures described in Section 3.2.1 are followed.

[minor] s/3.2.1/2.2.1


...
492   On a LAN circuit, the LAN-ID assigned to the circuit SHOULD be the
493   same as that used prior to the restart.  In particular, for any
494   circuits for which the restarting router was previously DIS, the use
495   of a different LAN-ID would necessitate the generation of a new set
496   of pseudonode LSPs, and corresponding changes in all the LSPs
497   referencing them from other routers on the LAN.  By preserving the
498   LAN-ID across the restart, this churn can be prevented.  To enable a
499   restarting router to learn the LAN-ID used prior to restart, the LAN-
500   ID specified in an IIH with RR set MUST be ignored.

[major] "the LAN-ID assigned to the circuit SHOULD be the same as that used
prior to the restart...churn can be prevented."  Are there cases when
(without knowledge of other changes) the LAN-ID would be better off being
something else?  IOW, why would a MUST not fit here?


...
516   On a Point-to-Point link, receipt of an IIH not containing the
517   restart TLV is also treated as an acknowledgement, since it indicates
518   that the neighbor is not restart capable.  However, since no CSNP is
519   guaranteed to be received over this interface, the timer T1 is
520   cancelled immediately without waiting for a complete set of CSNPs.
521   Synchronization may therefore be deemed complete even though there
522   are some LSPs which are held (only) by this neighbor (see
523   Section 3.4).  In this case, we also want to be certain that the
524   neighbor will reinitialize the adjacency in order to guarantee that
525   the SRMflags have been set on its database, thus ensuring eventual
526   LSPDB synchronization.  This is guaranteed to happen except in the
527   case where the Adjacency Three-Way State in the received IIH is "UP"
528   and the Neighbor Extended Local Circuit ID matches the extended local
529   circuit ID assigned by the restarting router.  In this case, the
530   restarting router MUST force the adjacency to reinitialize by setting
531   the local Adjacency Three-Way State to "DOWN" and sending a normal
532   IIH.

[minor] s/3.4/2.4


...
567   If an interface is active, but does not have any neighboring router
568   reachable over that interface, the timer T1 would never be cancelled,
569   and according to Section 3.4.1.1, the SPF would never be run.
570   Therefore, timer T1 is cancelled after some predetermined number of
571   expirations (which MAY be 1).

[minor] s/3.4.1.1/2.4.1.1

573 2.3.2.  Adjacency Acquisition during Start
...
592   Upon receipt of an IIH with the SA bit set, the behavior described in
593   Section 3.2.2 is followed.

[minor] s/3.2.2/2.2.2


...
642   When the T2 timer(s) are cancelled or expire, transmission of
643   "normal" IIHs (with RR, RA, and SA bits clear) will begin.

[minor] A "normal" IIH is mentioned before, but this is the first time that
it is described ("with RR, RA, and SA bits clear").  Shouldn't this
definition be expanded to include the PR/PA bits?  Also, it would be nice
if the term was defined earlier in the text: move the parenthesis to the
first mention.


...
659 2.4.  Database Synchronization
...
670   When (re)starting, a router starts an instance of timer T2 for each
671   LSPDB as described in Section 3.3.1 or Section 3.3.2.  In addition to
672   normal processing of the CSNPs, the set of LSPIDs contained in the
673   first complete set of CSNPs received over each interface is recorded,
674   together with their remaining lifetime.  In the case of a LAN
675   interface, a complete set of CSNPs MUST consist of CSNPs received
676   from neighbors that are not restarting.  If there are multiple
677   interfaces on the (re)starting router, the recorded set of LSPIDs is
678   the union of those received over each interface.  LSPs with a
679   remaining lifetime of zero are NOT so recorded.

[minor] s/Section 3.3.1 or Section 3.3.2/Section 2.3.1 or Section 2.3.2


...
710 2.4.1.1.  Restarting
...
766   Once the other level's SPF has run and any inter-level propagation
767   has been resolved, the router's own LSPs can be generated and
768   flooded.  Any own LSPs that were previously ignored, but that are not
769   part of the current set of own LSPs (including pseudonodes), MUST
770   then be purged.  Note that it is possible that a Designated Router
771   change may have taken place, and consequently the router SHOULD purge
772   those pseudonode LSPs that it previously owned, but that are now no
773   longer part of its set of pseudonode LSPs.

[major] This is a Normative conflict: "...(including pseudonodes), MUST
then be purged...the router SHOULD purge those pseudonode LSPs"   How is it
possible to always (MUST) purge the LSPs, but not always (SHOULD)?  Perhaps
s/SHOULD/MUST


...
793 2.4.1.2.  Starting
...
815   To avoid the possible formation of temporary blackholes, the starting
816   router sets the SA bit in the restart TLV (as described in
817   Section 3.3.2) in all IIHs that it sends.

[minor] s/3.3.2/2.3.2


...
967 4.  IANA Considerations

969   This document defines the following IS-IS TLV that is listed in the
970   IS-IS TLV codepoint registry:

972      Type        Description                            IIH   LSP   SNP
973      ----        -----------------------------------    ---   ---   ---
974      211         Restart TLV                              y     n     n

[major] You're missing the Purge column.

[major] This section should also ask IANA to change the reference from
rfc5306 to this document.

976 5.  Security Considerations
...
1000   If an SA bit is set in a false IIH, this could cause suppression of
1001   the advertisement of an IS neighbor, which could either continue for
1002   an indefinite period or occur intermittently with the result being a
1003   possible loss of reachability to some destinations in the network
1004   and/or increased frequency of LSP flooding and SPF calculation.

[major] Please add similar considerations as above for the new bits.