Re: [trill] AD review of draft-ietf-trill-cmt-06

Donald Eastlake <d3e3e3@gmail.com> Tue, 18 August 2015 17:43 UTC

Return-Path: <d3e3e3@gmail.com>
X-Original-To: trill@ietfa.amsl.com
Delivered-To: trill@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EFC541A8AF9; Tue, 18 Aug 2015 10:43:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.15
X-Spam-Level:
X-Spam-Status: No, score=-1.15 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, J_CHICKENPOX_31=0.6, SPF_PASS=-0.001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jVo2E27cNEuj; Tue, 18 Aug 2015 10:43:45 -0700 (PDT)
Received: from mail-ob0-x22e.google.com (mail-ob0-x22e.google.com [IPv6:2607:f8b0:4003:c01::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D88911A8AE1; Tue, 18 Aug 2015 10:43:44 -0700 (PDT)
Received: by obbhe7 with SMTP id he7so147233929obb.0; Tue, 18 Aug 2015 10:43:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; bh=chQpAXW46W88KlblmvzRohiWgC6YN9lCYsU0CXp/69c=; b=qEnPV0jE3DD3xgDL0E1jk6qTumJbMT7RwS2iUZNYut+5P5uwv+hCm10RoVw/7ys9bQ koowDiy4eRi2m2z/8FD+X3oG/afzGUphaw2CPFHSHYICJCIhq7ItQG/N6u7IIZ0l5Eks +qWqXq6fn7gS/VHpN5RrSWR/0eI9YhTaY93bVyYzdFH75sgIoIVGWFhzlWZZiGU5UO2v Wl2QLKka77hyUzCvF0mb6nMtQaCZkKD4qg2H36zRy/NgY16BK2ZgDK2mpsf0j3NBFAeH F9WPu93D7+FyUUTsOEkEaNV8OXZNuXF8wKRzxpNZZDb5Fg4CsFcciNZ4qY2apI5KoWeB COqg==
X-Received: by 10.182.252.234 with SMTP id zv10mr6822951obc.68.1439919824217; Tue, 18 Aug 2015 10:43:44 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.76.173.3 with HTTP; Tue, 18 Aug 2015 10:43:29 -0700 (PDT)
In-Reply-To: <CAG4d1rewzPvLwPJcsWyKe_MnzhnnSX9MZTT5ZnJmVF_YQeV7pA@mail.gmail.com>
References: <CAG4d1rce6spmBWq3ONVRStQnsJptwCABePjzyrLi3g5siFgKWA@mail.gmail.com> <CAF4+nEEaC-Ws5ps_RZZFe_GR6pQa9VP70s1tFsDBeEERMcKdZQ@mail.gmail.com> <CAG4d1rewzPvLwPJcsWyKe_MnzhnnSX9MZTT5ZnJmVF_YQeV7pA@mail.gmail.com>
From: Donald Eastlake <d3e3e3@gmail.com>
Date: Tue, 18 Aug 2015 13:43:29 -0400
Message-ID: <CAF4+nEEa1B30QRo08om+RkTFiuLmaEkzuh+gRdbmQ-zCCHOp+g@mail.gmail.com>
To: Alia Atlas <akatlas@gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Archived-At: <http://mailarchive.ietf.org/arch/msg/trill/2vEu5exT4a9qjlxPXxh3YsfM7sI>
Cc: Tissa Senevirathne <tsenevir@gmail.com>, draft-ietf-trill-cmt@ietf.org, "trill@ietf.org" <trill@ietf.org>
Subject: Re: [trill] AD review of draft-ietf-trill-cmt-06
X-BeenThere: trill@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Developing a hybrid router/bridge." <trill.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/trill>, <mailto:trill-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/trill/>
List-Post: <mailto:trill@ietf.org>
List-Help: <mailto:trill-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/trill>, <mailto:trill-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Aug 2015 17:43:48 -0000

Hi Alia,

My apologies for the delay. I believe the authors and I have good
resolutions for those of your comments that were not fully resolved
prviously. Please see below (deleting some older text in this thread).

Thanks,
Donald

PS: Although I will be somewhat in contact I'm on vacation at the
WorldCon (www.sasquan.org) from tomorrow through next Monday.
=============================
 Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
 155 Beaver Street, Milford, MA 01757 USA
 d3e3e3@gmail.com


On Fri, Jul 31, 2015 at 1:24 PM, Alia Atlas <akatlas@gmail.com> wrote:
> Hi Donald,
>...
> On Fri, Jul 31, 2015 at 1:01 PM, Donald Eastlake <d3e3e3@gmail.com> wrote:
>> Hi Alia,
>> On Fri, Jul 31, 2015 at 11:22 AM, Alia Atlas <akatlas@gmail.com> wrote:
>> > ...
>>...
>>
>> > Major Issues:
>> >
>> > 1) In Sec 5.3, it says "If an RBridge RB1 advertises an Affinity sub-TLV
>> > with an AFFINITY RECORD that's ask for nickname RBn to be its child in
>> > any tree and RB1 is not adjacent to a real or virtual RBridge RBn,
>> > that AFFINITY RECORD is in conflict with the campus topology and MUST
>> > be ignored."  How does an RBridge determine the connectivity of a
>> > virtual RBridge RBn?  I can see that a Designated RBridge announces
>> > the pseudo-nickname for the vDRB to the other RBridges in the LAALPs
>> > (as described in draft-ietf-trill-pseudonode-nickname) but I don't see
>> > any specific way that the connectivity of a virtual RBridge is
>> > described and known by the other RBridges.  What am I missing?
>>
>> Well, a virtual RBridge represents an edge group of RBridges that all
>> have ports to links that are part of an active-active group of links.
>> The RBridges in the edge group advertise in IS-IS an ID for that edge
>> group which is unique in that TRILL network (campus) -- usually pretty
>> easy as they just use the MC-LAG (or DRNI) ID. So all the members of
>> the edge group can see each other in the link state and the highest
>> priority member obtains and advertises in IS-IS the nickname for the
>> virtual RBridge representing that group. All RBRidge in the edge group
>> are assumed to be "connected" to that virtual RBridge.
>
> Right - so I agree that an RBridge can announce the tuple of
> (pseudo-rbridge id, LAALP ID) to indicate that the RBridge is
> attached to that virtual RBridge for the particular LAALP ID.
> Probably, it isn't even necessary to include the LAALP ID.
>
> However, I don't see any text in
> draft-ietf-trill-pseudonode-nickname-04 that causes this
> announcement to happen.  For instance, in Sec 9.2 where the handling
> of a PN-RBv APPsub-TLV is discussed, at the end all that is done by
> the receiver is:
>
> "On receipt of such a sub-TLV, if RBn is not an LAALP related edge
> RBridge, it ignores the sub-TLV. Otherwise, if RBn is also a member
> RBridge of the RBv identified by the list of LAALPs, it associates
> the pseudo-nickname with the ports of these LAALPs and downloads the
> association to data plane fast path logic."
>
> So it sounds like what you are saying is that
>
> a) RBridges announce the LAALP IDs that they are part of
> (PN-LAALP-Membership APPsub-TLV) and the Designated RBridge
> announces the nickname for the associated virtual RBridge.
>
> and then something unspecified is done that is described in Sec 9.2
> and contradicts the claim that the sub-TLV is ignored.
>
> I'm not saying the technology as implemented doesn't work - just that there
> are clearly some missing details here that need to be written down.

So, the problem here is that the text is draft-ietf-trill-cmt is based
on the older mind set that the "virtual RBridge" representing the
active-active edge group would be a pseudonode is the IS-IS sense;
that is, it would have a seven byte IS-IS System ID, have link state,
appear in the topology as a separate node adjacent to all the real
edge RBridge in the group, etc. (This original view is still reflected
in the file name of draft-ietf-trill-pseudonode-nickname.)

However, for a variety of reasons, this is no longer true. In the
scheme in draft-ietf-trill-pseudonode-nickname, the "virtual RBridge"
representing the edge group has a nickname but each of the real
RBridges in the edge group advertises that nickname as one of their
own nickanmes, well as advertising their "real" nickname.  (The base
TRILL protocol standard RFC 6325 provides for RBridges to hold
multiple nicknames. The original motivation for this was that, if an
RBridge was the source and/or sink for a lot of multi-destination
traffic, you might want multiple different least cost trees to be
rooted at that RBridge to spread the load and, since trees are
represented by the nickname of their root, that RBridge would need to
be identified by multiple nicknames.)

There is no need for any RBridge that is not part of the edge group to
know what nickname is the semi-permanent "real" nickname for an
RBridge and what nickname identifies the virtual RBridge. These
nicknames are all advertised in the same way. Adjacencies only exist
between real RBridges that are advertising link state notwithstanding
that such real RBridges may sometimes advertise that one of their
nicknames is a nickname that happens to identify a virtual RBridge.

When AFFINITY is used to associate a virtual RBridge RBv with tree t
at edge group RBridge RB1, RB1 will be advertising RBv as one of its
nicknames so the AFFINITY advertised by RB1 will be referring to
itself.

Two changes appear to be required to bring draft-ietf-trill-cmt up to
date with draft-ietf-trill-psuedonode-nickname:

OLD
   Each RBridge that desires to be the parent RBridge for child Rbridge
   RBy in a multi-destination distribution tree x announces the desired
   association using an Affinity sub-TLV. The child RBridge RBy is
   specified by its nickname (or one of its nicknames if it holds more
   than one).

NEW
   Each RBridge that desires to be the parent RBridge for child
   RBridge RBy in a multi-destination distribution tree x announces
   the desired association using an Affinity sub-TLV. The child is
   specified by its nickname. If an RBridge RB1 advertises an AFFINITY
   sub-TLV designating one its own nicknames N1 as its “child” in some
   distribution tree, the effect is that that nickname N1 is ignored
   when constructing other distribution trees. Thus the RPF check will
   enforce that only RB1 can use nickname N1 to do ingress/egress on
   tree x. (This has no effect on least cost path calculations for
   unicast traffic.)

OLD
   If an RBridge RB1 advertises an Affinity sub-TLV with an AFFINITY
   RECORD that's ask for nickname RBn to be its child in any tree and
   RB1 is not adjacent to a real or virtual RBridge RBn, that AFFINITY
   RECORD is in conflict with the campus topology and MUST be ignored.

NEW
   If an RBridge RB1 advertises an Affinity sub-TLV with an AFFINITY
   RECORD that's ask for nickname RBn to be its child in any tree and
   RB1 is neither adjacent to RBn nor does nickname RBn identify RB1
   itself, that AFFINITY RECORD is in conflict with the campus
   topology and MUST be ignored.

>> > Minor Issues:
>> > ...
>
>> > 3) In Sec 5.1, could you clarify which RBridges are doing the
>> > Distribution Tree provisioning?  I'm sure it's my lack of deep
>> > familiarity, but until I got to Sec 5.2, it wasn't at all clear to me.
>>
>> I'm not sure that "Distribution Tree Provisioning" is the best name
>> for that section. In TRILL, every RBridge in the campus independently
>> computes the same set of distribution trees for the campus and each
>> tree reaches every RBridge (right now, this will change with
>> multi-topology, etc.). This section is about the assignment of edge
>> group RBridges to trees. They all know about all the trees due to how
>> tree calculation works in TRILL and they all know about all the
>> members of the edge group. So each member of the edge group does the
>> calculations described in 5.1 and they will all come up with the same
>> assignment of edge group RBridges to trees.
>
> Right - but this section doesn't specify that the intended behavior
> is just for the edge group RBridges.  IMHO, that would be useful for
> clarity.

See change below that consists of adding a reference on how tree
numbers are determined and adding "edge group" as a qualifier before
"RBridge" in two places.

OLD
   If n >= k

     Let's assume edge RBridges are sorted in numerically ascending
     order by IS-IS SystemID such that RB1 < RB2 < RBk. Each Rbridge in
     the numerically sorted list is assigned a monotonically increasing
     number j such that; RB1=0, RB2=1, RBi=j and RBi+1=j+1.

     Assign each tree to RBi such that tree number { (tree_number) %
     k}+1 } is assigned to RBridge i for tree_number from 1 to n. where
     n is the number of trees, k is the number of RBridges considered
     for tree allocation, and ''%'' is the integer division remainder
     operation.

   If n < k

     Distribution trees are assigned to RBridges RB1 to RBn, using the
     same algorithm as n >= k case. RBridges RBn+1 to RBk do not
     participate in active-active forwarding process on behalf of RBv.

NEW
   If n >= k

     Let's assume edge RBridges are sorted in numerically ascending
     order by IS-IS SystemID such that RB1 < RB2 < RBk. Each RBridge
     in the numerically sorted list is assigned a monotonically
     increasing number j such that; RB1=0, RB2=1, RBi=j and
     RBi+1=j+1. (See Section 4.5 of [RFC6325] as modified by Section
     3.4 of [RFC7180bis] for how tree numbers are determined.)

     Assign each tree to RBi such that tree number { (tree_number) %
     k}+1 } is assigned to edge group RBridge i for tree_number from 1
                           ^^^^^^^^^^
     to n. where n is the number of trees, k is the number of edge
                                                              ^^^^
     group RBridges considered for tree allocation, and ''%'' is the
     ^^^^^
     integer division remainder operation.

   If n < k

     Distribution trees are assigned to edge group RBridges RB1 to
                                        ^^^^^^^^^^
     RBn, using the same algorithm as n >= k case. RBridges RBn+1 to
     RBk do not participate in active-active forwarding process on
     behalf of RBv.

>> > 4) In Sec 5.6, it says "Timer T_j SHOULD be at least < T_i/2" Do
>> > you mean that timer T_j should be no more than T_i/2 or that
>> > timer T_j should be no less than T_i/2.  The "<" makes this
>> > unclear to me because the "at least" contradicts it; is it T_j <
>> > T_i/2 or T_i/2 < T_j.
>>
>> I am less familiar with that provision so I'm not sure what the
>> correct interpretation is. It should probably be clarified.
>
> Oh dear - if neither you nor I are certain, it definitely needs to
> be clarified.

The purpose of these timers is to minimize multi-destination packet
loss and/or duplication. Proposed OLD and NEW text is as follows:

OLD
   RBi upon start-up, starts advertising its presence through IS-IS
   LSPs and starts a timer T_i. Member RBridges detecting the presence
   of RBi start a timer T_j. Timer T_j SHOULD be at least < T_i/2.
   (Please see note below)

   Upon expiry of timer T_j, member RBridges recalculate the multi-
   destination tree assignment and advertised the related trees using
   Affinity sub-TLV.

   Upon expiry of timer T_i, RBi recalculate the multi-destination tree
   assignment and advertises the related trees using Affinity TLV.

   Note: Timers T_i and T_j are designed so as to minimize traffic down
   time and avoid multi-destination packet duplication.

NEW
   RBi, upon start-up, advertises its presence through IS-IS LSPs and
   starts a timer T_i. Other member RBridges of the edge group,
   detecting the presence of RBi, start a timer T_j.

   Upon expiry of timer T_j, other member RBridges recalculate the
   multi-destination tree assignment and advertised the related trees
   using Affinity sub-TLV. Upon expiry of timer T_i, RBi recalculate
   the multi-destination tree assignment and advertises the related
   trees using Affinity TLV.

   If the new RBridge in the edge group calculates trees and starts to
   use one or more before the existing RBridges in the edge group
   recalculate, there could be duplication of packets (for example
   more than one edge group RBridge could decapsulate and forward a
   multi-destination frame on links into the active active group) or
   loss of packets (for example due to the Reverse Path Forwarding
   Check in the rest of the campus if two edge group RBridges are
   trying to forward on the same tree those from one will be
   iscarded).  Alternatively, if the new RBridge in the edge group
   calculates trees and starts to use one or more after the existing
   RBridges recalculate, there could be loss of data due to frames
   arriving at the new RBridge being black holed. Timers T_i and T_j
   should be initialized to values designed to minimize these problems
   keeping in mind that, in general, duplicating is a more serious
   problem than dropping. It is RECOMMENDED that T_j be less than T_i
   and a reasonable default is 1/2 of T_i.


I hope the above resolutions of your comments are satisfactory.

> Thanks,
> Alia
>
>> Thanks,
>> Donald
>> =============================
>>  Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
>>  155 Beaver Street, Milford, MA 01757 USA
>>  d3e3e3@gmail.com
>>
>> > Thanks again,
>> > Alia