Re: [Rift] Shepherd review of draft-ietf-rift-applicability

wei.yuehua@zte.com.cn Fri, 22 October 2021 09:45 UTC

Return-Path: <wei.yuehua@zte.com.cn>
X-Original-To: rift@ietfa.amsl.com
Delivered-To: rift@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 883713A0B4F for <rift@ietfa.amsl.com>; Fri, 22 Oct 2021 02:45:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.894
X-Spam-Level:
X-Spam-Status: No, score=-1.894 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, HTML_NONELEMENT_30_40=0.001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qeFKx0VVlp4z for <rift@ietfa.amsl.com>; Fri, 22 Oct 2021 02:44:48 -0700 (PDT)
Received: from mxde.zte.com.cn (mxde.zte.com.cn [209.9.37.38]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 978D03A0CC1 for <rift@ietf.org>; Fri, 22 Oct 2021 02:44:45 -0700 (PDT)
Received: from mse-eu.zte.com.cn (unknown [10.35.13.51]) by Forcepoint Email with ESMTPS id B874B9A447B1449E9858; Fri, 22 Oct 2021 17:44:39 +0800 (CST)
Received: from dgapp01.zte.com.cn ([10.35.13.16]) by mse-eu.zte.com.cn with SMTP id 19M9iUZA044986; Fri, 22 Oct 2021 17:44:30 +0800 (GMT-8) (envelope-from wei.yuehua@zte.com.cn)
Received: from mapi (dgapp01[null]) by mapi (Zmail) with MAPI id mid1; Fri, 22 Oct 2021 17:44:32 +0800 (CST)
Date: Fri, 22 Oct 2021 17:44:32 +0800
X-Zmail-TransId: 2af96172880061e2c70f
X-Mailer: Zmail v1.0
Message-ID: <202110221744327731090@zte.com.cn>
In-Reply-To: <CA+wi2hPDa00S0mkXU0PK3LORUyYB8t75OKJuvQvusMCHe4BjuA@mail.gmail.com>
References: CA+wi2hOedScQFr3RoskF6uqb39OtBuyeWT_3jsecMRCZoJRLPA@mail.gmail.com, 202110220942126173826@zte.com.cn, CA+wi2hPDa00S0mkXU0PK3LORUyYB8t75OKJuvQvusMCHe4BjuA@mail.gmail.com
Mime-Version: 1.0
From: wei.yuehua@zte.com.cn
To: tonysietf@gmail.com
Cc: zzhang=40juniper.net@dmarc.ietf.org, prz=40juniper.net@dmarc.ietf.org, rift@ietf.org
Content-Type: multipart/mixed; boundary="=====_001_next====="
X-MAIL: mse-eu.zte.com.cn 19M9iUZA044986
Archived-At: <https://mailarchive.ietf.org/arch/msg/rift/bXOeiHJtV2--GeHupAwkQbd7Xjc>
Subject: Re: [Rift] Shepherd review of draft-ietf-rift-applicability
X-BeenThere: rift@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Discussion of Routing in Fat Trees <rift.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rift>, <mailto:rift-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rift/>
List-Post: <mailto:rift@ietf.org>
List-Help: <mailto:rift-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rift>, <mailto:rift-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Oct 2021 09:45:13 -0000

Dear Toni,


Thanks, please see my comments inline 



Yuehua Wei



原始邮件



发件人:TonyPrzygienda
收件人:魏月华00019655;
抄送人:Jeffrey (Zhaohui) Zhang;Antoni Przygienda;rift@ietf.org;
日 期 :2021年10月22日 15:33
主 题 :Re: [Rift] Shepherd review of draft-ietf-rift-applicability






On Fri, Oct 22, 2021 at 3:42 AM <wei.yuehua@zte.com.cn> wrote:


Dear Toni, Jeffrey, and RIFTers,




Follow up comment resolutions:






They tend to need extensive configuration or provisioning during      bring up and re-dimensioning.

<YuehuaWei5> so, will replace “re-dimensioning”to "scaling"?>



somehthing like this or just say "adding or removing switching elements from the fabric" 

<YuehuaWei5-1>Do you mean “RIFT nodes”for “switching elements”?>






   *  Can utilize all paths through fabric without looping   ...   *  Supports non-equal cost multipath ...Aren't the above two the same/related?


<YuehuaWei7> keep  "*  Can utilize all paths through fabric without looping", remove " *  Supports non-equal cost multipath ..."



as I wrote "all paths without looping". take any variation you like that says that 

 <YuehuaWei7-1>OK >






The "but" wording in the following is a bit strange:   ... it is   recommended to configure the level of all the nodes but those that   are forced as leaves to avoid an undesirable interaction between ZTP   and the manual configuration.Do you mean "those that are forced as leaves" do not need configuration? But then how are they "forced as leaves" w/o configuration?


 <YuehuaWei20>  will delete " but those that  are forced as leaves ". >




it should read more "forced as leaves to avoid formation of undesirable topologies" 

  <YuehuaWei20-1> Jeffrey's confusion is that  "configure the level of all the nodes" has already included "nodes that are forced as leaves". "nodes that are forced as leaves" is achieved by configuration as well. So I propose to delete " but those that  are forced as leaves ">








Best Regards,


 


Yuehua Wei


M: +86 13851460269 E: wei.yuehua@zte.com.cn










原始邮件


发件人:TonyPrzygienda
收件人:魏月华00019655;
抄送人:Jeffrey (Zhaohui) Zhang;Antoni Przygienda;rift@ietf.org;
日 期 :2021年10月21日 23:35
主 题 :Re: [Rift] Shepherd review of draft-ietf-rift-applicability


inline




On Thu, Oct 21, 2021 at 10:07 AM <wei.yuehua@zte.com.cn> wrote:

Dear Jeffrey and Toni,

Please see my comments resolution in line start with "<YuehuaWei".

Would like Toni to look at <YuehuaWei5>, <YuehuaWei6> as well.

Thank you!





Best Regards,


Yuehua Wei


ZTE Corporation






M: +86 13851460269 E: wei.yuehua@zte.com.cn










原始邮件


发件人:Jeffrey(Zhaohui)Zhang
收件人:draft-ietf-rift-applicability@ietf.org;
抄送人:'rift@ietf.org';
日 期 :2021年09月25日 09:02
主 题 :[Rift] Shepherd review of draft-ietf-rift-applicability

Hi,
 
As part of the shepherd review, I have the following questions/comments. Some are minor editorial nits.
 
   Node TIE should NOT be confused with a North
   TIE since "node" defines the type of TIE rather than its direction.
 
Should this be the following?
 
   N-TIE should not be confused with a Node TIE - the "N-" denotes
   "North-" not "Node-".
 <YuehuaWei0>Yes>




For the following:
 
   This is an acronym for a "Prefix Topology Information Element" and it
   contains all prefixes directly attached to this node in case of a
   North TIE and in case of South TIE the necessary default routes the
   node advertises southbound.
 
Should "default routes" be "default and disaggregated routes"?

<YuehuaWei1>the discription of this acronym is copied from rift-rift, shall I keep it the same with rift-rift? >


   .... RIFT can employ to ultimately
   calculate routes of which Dijkstra algorithm is a possible one.
 
The above does not read well to me. Is some wording missing?
 
<YuehuaWei2> the discription of this acronym is also copied from rift-rift, shall I keep it the same with rift-rift? >


   Clos [CLOS] topologies (called commonly a fat tree/network in modern
   IP fabric considerations as homonym to the original definition of the
   term Fat Tree [FATTREE])have gained prominence in today's networking,
 
Missing a " " in the last line.
 <YuehuaWei3> Accepted >




   c protocols were geared towards a network with
   an irregular topology with isotropic properties, and low degree of
   connectivity.
 
"Today's current ... were" sounds strange. How about "Other routing protocols are"?
 <YuehuaWei4>How about "The current routing protocols are......." ?>




   *  They tend to need extensive configuration or provisioning during
      bring up and re-dimensioning.
 
"re-dimensioning" lacks context. What does it mean? "reconfiguration"?
 <YuehuaWei5> it comes from Toni's presentation.  I think it means "scaling", Would like to call Toni for comfirmation,thanks >










well, when you add or remove switches/PoD etc

 




   The N-TIEs contain a link-state topology description of lower levels
   and S-TIEs carry simply default routes for the lower levels.
 
"default and disaggregated routes"?
 <YuehuaWei5> the same as YuehuaWei1>










correct

 




   RIFT also eliminates major disadvantages of link-state and distance-
   vector with:
   *  Reduced and balanced flooding
   *  Automatic neighbor detection
 
Does link-state routing not have auto neighbor detection?
 <YuehuaWei6>  How about  changing to“Level constrained automatic neighbor detection”? Would like to ask for Toni's comments >










agreed

 



   *  Can utilize all paths through fabric without looping
   ....
   *  Supports non-equal cost multipath ...
 
Aren't the above two the same/related?
 <YuehuaWei7>  I think the two above are nearly the same . Since "loop free" has been stated in other part of the document. I would like to delete "Can utilize all paths through fabric without looping">










yes, in a sense it's the same, I would just stick to "utilize all paths in loop-free fashion"

 




4.2.1.  Horizontal Links
 
   RIFT is not limited to pure Clos divided into PoD and multi-planes
   but supports horizontal (East-West) links below the top of fabric
   level.  Those links are used only for last resort northbound routes
   when a spine loses all its northbound links or cannot compute a
   default route through them.
 
   A possible configuration is a "ring" of horizontal links at a level.
   In presence of such a "ring" in any level (except Top of Fabric (ToF)
   level) neither North SPF (N-SPF) nor South SPF (S-SPF) will provide a
   "ring-based protection" scheme since such a computation would have to
   deal necessarily with breaking of "loops" in Dijkstra sense; an
   application for which RIFT is not intended.
 
   A full-mesh connectivity between nodes on the same level can be
   employed and that allows N-SPF to provide for any node loosing all
   its northbound adjacencies (as long as any of the other nodes in the
   level are northbound connected) to still participate in northbound
   forwarding.
 
I struggled a bit with the second paragraph above. Is the following understanding of all three paragraphs correct?
 
1. east-west links below TOF provide last resort northbound routes (1st paragraph) - BTW should it be "northbound forwarding" instead of "northbound route"?
2. a full mesh east-west links between nodes at the same level provides last resort northbound forwarding for all the nodes (3rd paragraph) (extending on #1)
3. however, a ring of horizontal links does not provide ring-based protection (2nd paragraph)
 
If so, it may be better to move the 3rd paragraph up, and change the previous 2nd paragraph to the following:
 
   Note that a "ring" of horizontal links at any level below ToF does
   not provide a "ring-based protection" scheme since the SPF computation
   would have to deal necessarily with breaking of "loops" in Dijkstra sense - an
   application for which RIFT is not intended.
 <YuehuaWei8>  reasonable,  Accepted.  will replace "northbound forwarding" with "northbound route", and switch 2nd paragraph and 3rd paragraph.>










in fact such forwarding can be implemented but accepted, we don't want to go down that route ;-) 

 




In the following:
 
   *  Southbound, RIFT operates as a distance-vector protocol, whereby
      the control packets are flooded only one-hop, interpreted, and the
      consequence of that computation is what gets flooded one more hop
      south.  In the most common use-cases, a ToF node can reach most of
      the prefixes in the fabric.  If that is the case, the ToF node
      advertises the fabric default and disaggregates the prefixes that
      it cannot reach.
 
Perhaps changing "consequence" to "result"?

<YuehuaWei9> Reasonable,accepted>

In the last sentence, perhaps say "negatively disaggregates" (if I understand it correctly)? <YuehuaWei10> Reasonable,accepted >


      In the general case, what gets advertised south is in more
      details:
 
Perhaps change to "... what get advertised south are:"? (remove the "in more details")

 <YuehuaWei11> Reasonable,accepted >


4.2.3.  Generalizing to any Directed Acyclic Graph
 
   RIFT is an anisotropic routing protocol, meaning that it has a sense
   of direction (northbound, southbound, east-west) and that it operates
   differently depending on the direction.
   ...
   A Directed Acyclic Graph (DAG) provides a sense of north (the
   direction of the DAG) and of south (the reverse), which can be used
   to apply RIFT.  For the purpose of RIFT, an edge in the DAG that has
   only incoming vertices is a ToF node.
 
I initially struggled a bit with the second paragraph above. Connecting to the section title, perhaps the following wording is better:
 
   Since a Directed Acyclic Graph (DAG) provides a sense of north (the
   direction of the DAG) and of south (the reverse), it can be used to
   apply RIFT — an edge in the DAG that has only incoming vertices is a
   ToF node.
  <YuehuaWei12> Reasonable,accepted.  will change the En dash (–) to Em dash(—)>

In the following:


   RIFT is not strictly limited to Clos topologies.  The protocol only
   requires a sense of "compass rose directionality" either achieved
   through configuration or derivation of levels.  So, conceptually,
   shortcuts between levels could be included.  Figure 2 depicts an
   example of a shortcut between levels.  In this example, sub-optimal
   routing will occur when traffic is sent from L0 to L1 via S0's
   default route and back down through A0 or A1.  In order to ensure
   that, only default routes from A0 or A1 are used, all leaves would be
   required to install each others routes.
 
Should "ensure" be "avoid"?
  <YuehuaWei13> Right,accepted.  >

   Commercial edifices are often cabled in topologies that are either

   Clos or its isomorphic equivalents.  The Clos can grow rather high
   with many floors.
 
If I understand it correctly, better change "floors" to "levels". We're talking about commercial buildings here, so "floor" may be confused as "building floors", which I don't think it is referring to.
  <YuehuaWei14> Reasonable,accepted. >

   RIFT is neither IP specific and hence any link addressing

   connecting internal device subnets is conceivable.
 
s/neither/not/
  <YuehuaWei15> Right,accepted.  >

   *  RIFT negotiates automatically BFD per link allowing this way for

      IP and micro-BFD [RFC7130] to replace Link Aggregation Groups
      (LAGs) which do hide bandwidth imbalances in case of constituent
      failures.




I find it a bit hard to parse the above. Perhaps break it down a bit?


 <YuehuaWei16> Would like to change to the following wording:


      RIFT MAY incorporate BFD [RFC5881] to react quickly to link failures. After RIFT ThreeWay hello adjacency convergence a BFD session MAY


      be formed automatically between the RIFT endpoints without further


      configuration using the exchanged discriminators. >


   Without disaggregation mechanism, when linkSL6 fails, the packet from
   leaf121 to prefix122 will probably go up through linkSL5 to linkTS3
   then go down through linkTS4 to linkSL8 to Leaf122 or go up through
   linkSL5 to linkTS6 then go down through linkTS4 and linkSL8 to
   Leaf122 based on pure default route.  It's the case of suboptimal
   routing or bow-tieing.
 
I think it should be changed to the following:
 
   Without disaggregation mechanism, when linkSL6 fails, the packet from
   leaf121 to prefix122 *may* go up through linkSL5 to linkTS3
   then go down through linkTS4 to linkSL8 to Leaf122 or go up through
   linkSL5 to linkTS6 then go down through *linkTS8* and linkSL8 to
   Leaf122 based on pure default route.  *This is* the case of suboptimal
   routing or bow-tieing.
 
The '*' mark the changes. The second change is to fix a mistake (I think).
  <YuehuaWei17> linkTS3 is connectiong to ToF21, so the downlinks are  linkTS4 + linkSL8.  you were thinking of linkTS6 to ToF22. I think both are right

will change *may*  and *This is* >   

It's the case of black-holing.


s/It's/This is/
  <YuehuaWei18> OK,accepted.  >

   ... that is, on the one

   hand, the SystemID of the node that must be unique in the RIFT
   network, and on the other hand the level of the node in the Fat Tree,
   which determines which peers are northwards "parents" and which are
   southwards "children".
 
Perhaps change to the following:
 
   ... including SystemID of the node that must be unique in the RIFT
   network and the level of the node in the Fat Tree,
   which determines which peers are northwards "parents" and which are
   southwards "children".
  <YuehuaWei19> OK,accepted.  >




The "but" wording in the following is a bit strange:
 
   ... it is
   recommended to configure the level of all the nodes but those that
   are forced as leaves to avoid an undesirable interaction between ZTP
   and the manual configuration.
 
Do you mean "those that are forced as leaves" do not need configuration? But then how are they "forced as leaves" w/o configuration?
  <YuehuaWei20>  启用zte的时候,leaf节点是需要配置level的?>










??? ;-) 




   A RIFT node may also be configured to confine it to the leaf role

   with the LEAF_ONLY flag.  A leaf node can also be configured to
   support leaf-2-leaf procedures with the LEAF_2_LEAF flag.  In either
   case the node cannot be TOP_OF_FABRIC and its level cannot be
   configured.  RIFT will fully configure the node's level after it is
   attached to the topology and ensure that the node is at the "bottom
   of the hierarchy" (southernmost).
 
s/fully configure/fully determine/
  <YuehuaWei21> Reasonable,accepted. >




   ... So the ToF nodes can exchange the full list of
   prefixes that exist in the fabric and figure when a ToF node lacks
   reachability and to existing prefix.
 
Is an "out" needed after "figure", and should "and to existing prefix" be "to some prefixes"?
  <YuehuaWei22> Right,accepted.  >




   ... In the case of Negative Disaggregation, the last ToF
   node(s) that injects the route may also incur an incast issue; this
   problem would occur if a prefix that becomes totally unreachable is
   disaggregated, but doing so is mostly useless and is not recommended.
 
What does "so" refer to, in the last sentence above?
  <YuehuaWei23> since this paragraph is talking about “note”, I would like to delete the last sentence.  >




   It is not envisioned in the short term that the average fabric
   supports a Precision Time Protocol [IEEEstd1588], and the precision
   that may be available with the Network Time Protocol [RFC5905], in
   the order of 100 to 200ms, may not be necessarily enough to cover,
   e.g., the fast mobility of a Virtual Machine.
 
I struggled with the above paragraph. Perhaps reword to the following?
  <YuehuaWei24> Reasonable,accepted. >

   It is not envisioned that an average fabric supports Precision Time Protocol

   [IEEEstd1588] in the short term, nor that the precision available with
   the Network Time Protocol [RFC5905] (in the order of 100 to 200ms)
   may not be necessarily enough to cover, e.g., the fast mobility of a Virtual Machine.
 
Maybe even change the double negative.
 
   RIFT doesn't precondition that nodes of the fabric have reachable
   addresses.  But the operational purposes to reach the internal nodes
   may exist.
 
s/purposes/reasons/?
  <YuehuaWei25> OK,accepted.  >




   In a fully connected ToF, in case of failure between ToF2 and spine
   nodes, ToF2's loopback address must be disaggregated recursively all
   the way to the leaves.
 
   In a partitioned ToF, a TOF node is only reachable within its Plane,
   and the disaggregation to the leaves is also required.  A possible
   alternative is to use the ring that interconnects the ToF nodes to
   transmit packets between them for their loopback addresses only...
 
If I understand it correctly, the above two paragraphs should be as following (the ring alternative to recursive disaggregation applies to both fully connected ToF and partitioned ToF):
  <YuehuaWei26> Better,accepted.  >




   In case of failure between ToF2 and spine
   nodes, ToF2's loopback address must be disaggregated recursively all
   the way to the leaves. In a partitioned ToF, even with recursive disaggregation
   a ToF node is only reachable within its plane.
 
   A possible alternative to recursive disaggregation is to use a ring
   that interconnects the ToF nodes to
   transmit packets between them for their loopback addresses only...
 
For the following:
 
   If a controller is attaching to the RIFT domain from ToF, it usually
   uses dual-homing connections.  The loopback prefix of the controller
   should be advertised down by the ToF and spine to leaves.  If the
   controller loses link to ToF, make sure the ToF withdraw the prefix
   of the controller(use different mechanisms).
 
What does "(use different mechanisms)" mean?
  <YuehuaWei27> the interworking between controller and ToF is outside of RIFT domain. so ToF withdraw the prefix of the controller will use different mechanisms other than RIFT  >

5.12.  Internet Connectivity With Underlay


s/With/Within/?
  <YuehuaWei28> Better,accepted.  >




   In case that an internet access request comes from a leaf and the
   internet gateway is another leaf ...
 
Where else could the internet access request come from? With the request from a non-leaf, don't you also need the default route to be advertised by the internet gateway?
 <YuehuaWei29>"5.12.2.  Internet Default on the ToFs" is the other case. >










yeah, it's complex, people will get confused. you have to stop to advertise in RIFT a default but use a "fabric-default" or have preference rules. 

 




   If the traffic comes from ToF to Leaf111 or Leaf121 which has anycast
   prefix PrefixA.  RIFT can deal with this case well.
 
s/. RIFT/, RIFT/
  <YuehuaWei30> OK,accepted.  >




   The adds huge
   capabilities for leaf-2-leaf ECMP paths, but additional complexity
   with the need to disaggregate.  Also RIFT uses Link State flooding
   northwards, and is not designed for low-power operation.
 
s/The/This/

 <YuehuaWei31> OK,accepted.  >

Why is low-poer operation mentioned all? It's not like that we'll run RIFT on IOTs? The following only talks about IOT being attached to leaves:
  <YuehuaWei32> You are right, But I think the first two paragraphs of this section is talking about RIFT design principles and applicability in general. The third paragraph  starts to talk about  IOT being attached to leaves>

   Still nothing prevents that the IP devices connected at the Leaf are

   IoT (Internet of Things) devices, which typically expose their
   address using WiND - which is an upgrade from 6LoWPAN ND [RFC6775].  
 
And are the following specific to IOT?
  <YuehuaWei33>The following is not specific to IoT, but it doesn't  violate the topic of this section?>

   A network that serves high speed/ high power IoT devices should

   typically provide deterministic capabilities for applications such as
   high speed control loops or movement detection.  The Fat Tree is
   highly reliable, and in normal condition provides an equilatent
   multipath operation; but the ECMP doesn't provide hard guarantees for
   either delivery or latency.  As long as the fabric is non-blocking
   the result is the same; but there can be load unbalances resulting in
   incast and possibly congestion loss that will prevent the delivery
   within bounded latency.
 
   This could be alleviated with Packet Replication, Elimination and
   Reordering (PREOF) [RFC8655] leaf-2-leaf but PREOF is hard to provide
   at the scale of all flows, and the replication may increase the
   probability of the overload that it attempts to solve.
 
Thanks!
Jeffrey
_______________________________________________
RIFT mailing list
RIFT@ietf.org
https://www.ietf.org/mailman/listinfo/rift










_______________________________________________
 RIFT mailing list
 RIFT@ietf.org
 https://www.ietf.org/mailman/listinfo/rift