[bess] Genart last call review of draft-ietf-bess-evpn-etree-12

Dale Worley <worley@ariadne.com> Tue, 08 August 2017 02:29 UTC
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
From: Dale Worley <worley@ariadne.com>
To: gen-art@ietf.org
Cc: draft-ietf-bess-evpn-etree.all@ietf.org, ietf@ietf.org, bess@ietf.org
Auto-Submitted: auto-generated
Precedence: bulk
Message-ID: <150215935959.12343.2397423105077446380@ietfa.amsl.com>
Date: Mon, 07 Aug 2017 19:29:19 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/bess/KGhOgcqwaJkRTNTUc12dUReK7hM>
Subject: [bess] Genart last call review of draft-ietf-bess-evpn-etree-12
Reviewer: Dale Worley
Review result: On the Right Track

I am the assigned Gen-ART reviewer for this draft.  The General Area
Review Team (Gen-ART) reviews all IETF documents being processed
by the IESG for the IETF Chair.  Please treat these comments just
like any other last call comments.

For more information, please see the FAQ at
<http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.

Document:  draft-ietf-bess-evpn-etree-12
Reviewer:  Dale R. Worley
Review Date:  2017-08-07
IETF LC End Date:  2017-08-09
IESG Telechat date:  2017-08-17

Summary:

This draft is on the right track but has open issues, described in the
review.  A few of the issues are directly technical.

Reading this draft, I have the sense that it isn't so much a
specification as the description of an idea, which is that EVPN can be
used to implement E-Tree functionality.  It reads as if someone who is
extremely knowledgeable about EVPN is outlining the idea to someone
similar, given that various details don't seem to be worked out
completely and that in several places there are alternative
implementation methods that are mentioned but do not seem to be
rigorously enumerated.  The document seems to more describe *class* of
ways of implementing E-Trees, and not a rigidly delimited class.

As far as I can tell, the idea works, but I suspect that an
implementor would not just be following the specification but
completing it in many respects.  Given that the document seems to
extend the mechanisms of RFC 7432, I suspect that an implementor would
have to carefully work out the details of all the BGP announcements,
and that not all implementations would interoperate.  E.g., "This Leaf
label is advertised to other PE devices, using the E-TREE Extended
Community" sounds to me like it's very under-specified.

The way forward seems to be clear:  The draft needs to be edited
carefully, filling in the missing details and making more explicit and
rigid the various implementation alternatives.  It might be worth
enumerating all of the mentioned implementation choices in one place,
as successful interoperation requires that all devices in a VPN are
using the same choices.  And I think interoperation needs to be
emphasized -- two devices that implement this draft should
interoperate if they are configured to have the same choices of the
implementation choices enumerated in the draft.  Otherwise, this draft
is just the outline for a dozen vendors' similar-but-not-interoperable
proprietary products.

Abstract

   This document
   discusses how those functional requirements can be easily met with
   Ethernet VPN (EVPN) and how EVPN offers a more efficient
   implementation of these functions.

"More efficient" than what?  The abstract reads as if this document is
an alternative method of implementing E-Tree, but I read well into the
draft before it became clear that this draft is an alternative to RFC
7387, rather than that this draft and RFC 7387 are an alternative to
something else.  The abstract does not specifically state all of the
relationships between the specifications it mentions.

And although this mechanism is described as "more efficient", there
doesn't seem to be any discussion of how it is more efficient.  You
don't need a lot of detail for this, but it would be helpful if there
was at least a brief description of what way it is more efficient and
some indication of the degree.

Also, what is relationship between "EVPN" and this document?  Is EVPN
a widely-known technology whose name need not be footnoted?  Is it
something defined in this document?  Importantly, is the usage in this
document a subset within some defined EVPN specification, or is it a
modification/extension of EVPN?

   This document makes use of the
   most significant bit of the scope governed by the IANA registry
   created by RFC7385, and hence updates RFC7385 accordingly.

The use of "scope" is peculiar here.  Generally, "scope" refers to a
region of some sort of space, so one would say "scope value" or "scope
value field" to refer to a group of bits that designate a scope.  But
checking with RFC 7385 and the IANA registry page for "Border Gateway
Protocol (BGP) Parameters", I am unable to find any occurrence of the
word "scope".  And other than a very similar passage in section 1,
"scope" only appears in this document in the phrase "scope of this
document".  Perhaps "scope" should be "tunnel type"?

Table of Contents

What is the capitalization rule you're using for section titles?
E.g., sections 3.2.{1,2,3,4} are capitalized in a decidedly different
way that other sections.

1  Introduction

   The Metro Ethernet Forum (MEF) has defined a rooted-multipoint
   Ethernet service known as Ethernet Tree (E-Tree) [MEF6.1]. In an E-
   Tree service, Attachment Circuits (ACs) are labeled as either Root or
   Leaf ACs. Root ACs can communicate with all other ACs. Leaf ACs can
   communicate with Root ACs but not with other Leaf ACs. 

Given the use of "rooted-multipoint", it seems that there is to be
exactly one Root AC per virtual Ethernet, as rooted trees have exactly
one root node.  Where or not that is true is very important for the
conceptual model of the service, but is not stated clearly
here. Perhaps there is a terminology problem, as when I see "rooted"
and "tree" in a sentence, I assume that the "tree" is "rooted", i.e.,
it has exactly one root.  (Similarly in Wikipedia, "Rooted-MultiPoint"
[sic] redirects to "Point-to-multipoint", which says "providing
multiple paths from a *single* [my emphasis] location to multiple
locations".)  However, the last sentences of section 2.1 suggest that
an E-Tree might have more than one Root.

Subtle point:  If there are multiple Roots, the text implies that all
Roots can communicate with all other Roots.  It might be worth
mentioning that explicitly, as the naive reader might overlook it.
Indeed, it is possible within this model that all endpoints are Roots,
in which case the VPN is completely connected.

Also, it seems that in scenario 3 of section 2 that a single AC could
have multiple endpoints on it some of which are Root and some are
Leaf, which doesn't fit within the description in this paragraph.

(In general, meticulously editing the Introduction section is
extremely helpful to readers who aren't already thoroughly familiar
with the subject.)

   [RFC7387] proposes the solution framework for supporting E-Tree
   service in MPLS networks.

I think this should be "a solution framework".  If one says "the solution
framework", then there is in some way only one solution framework, and
the RFC would "state" it.  Saying that the RFC "proposes" a framework
shows that there could be others that could be proposed.  Similarly in
the next sentence would be "... of an overall solution ...".

   The document identifies the functional
   components of the overall solution to emulate E-Tree services in
   addition to Ethernet LAN (E-LAN) services on an existing MPLS
   network.

The relationship of E-LAN to E-Tree is not clear, and how the phrase
"on an existing MPLS network" attaches to either is not clear.  I
think you mean at least:

   The document identifies the functional components of the overall
   solution to emulate E-Tree services on an existing MPLS network.

and that the document does the same for E-LAN.  I suspect that there
is some implied relationship between E-Tree and E-LAN, other than that
solution frameworks are described in RFC 7387.

However, the term "E-LAN" does not appear anywhere else in this
document, is it important to include it?  And given that E-Tree gets
its own reference to describe its functional specification, shouldn't
there be a reference giving the functional specification of "E-LAN"?
(As opposed to the implementation specification in RFC 7387.)

   [RFC7432] is a solution for multipoint L2VPN services, with advanced
   multi-homing capabilities, using BGP for distributing customer/client
   MAC address reach-ability information over the MPLS/IP network.
   [RFC7623] combines the functionality of EVPN with [802.1ah] Provider
   Backbone Bridging (PBB) for MAC address scalability.

The structure of this paragraph suggests that "EVPN" is defined by RFC
7432, but I had to look at 7432 to verify that.  Perhaps "[RFC7432]
defines EVPN, a solution for ..."?

"[802.1ah]" appears to be a reference, but there is no entry in
section 9 for it.

   This document discusses how the functional requirements for E-Tree
   service can be met with (PBB-)EVPN and how (PBB-)EVPN offers a more
   efficient implementation of these functions.

This paragraph has some of the same problems as the abstract.  But I
am now starting to suspect that the document is proposing using EVPN
as a way to provide E-Tree service *instead of* using the RFC 7387
proposal.  If that is so, this sentence needs to end "a more efficient
implementation of these functions than RFC7387.", and the third
sentence of the Abstract should start "This document discusses how the
functional requirements for E-Tree can be ..." (making clear that it
logically succeeds the first sentence, not the second), and the use of
"EVPN" in the Abstract needs to reference RFC 7432.

   This document makes use
   of the most significant bit of the scope governed by the IANA
   registry created by RFC7385, and hence updates RFC7385 accordingly.

The "scope" business is still unresolved.

Also, this sentence doesn't say what the new purpose of the bit is.
Perhaps something like, "This document repurposes the most significant
bit of the tunnel type byte governed by the IANA registry created by
RFC7385 to ...".  But that is still opaque, as few people will
immediately know the purpose of a registry referenced only by RFC
number.  Perhaps "... the Tunnel Type byte in the BGP PMSI Tunnel
attribute ...".

   Section 2 discusses E-TREE scenarios.

What the proper capitalization of "E-Tree"?

1.1  Terminology

The text uses many acronyms which may not be widely known by people
not deeply conversant in these technologies.  It may help to put a
define many of them in section 1.1.  (E.g., "EVI" is defined well down
in the second paragraph of section 2.1, whereas it is used in the
first paragraph of that section.)

2  E-Tree Scenarios

The enumeration of cases is unclear.  I think you mean for scenario 1,
"PE is exclusively Leaf or Root site(s)", etc.  But "OR" doesn't carry
that meaning unambiguously.  You could s/OR/XOR/ to make it
unambiguous, but that would be awkward and very geekish.

Also, I think "MAC address" is more correct than "MAC" here.

2.1 Scenario 1: Leaf OR Root site(s) per PE

   In this scenario, a PE may receive traffic from either Root ACs OR
   Leaf ACs for a given MAC-VRF/bridge table, but not both concurrently.

There's a problem with "receive traffic from", because one can say a
PE "receives traffic from" any endpoint in the VPN -- if the traffic
is egressing from the VPN through the PE.  I think you want to phrase
it in terms of "all endpoints attached to the ACs attached to the PE
are Root endpoints or they are all Leaf endpoints".  Or you could
rephrase it in terms of ingressing traffic.  But really, Root and Leaf
are properties of endpoints much more than properties of traffic, and
it's best to use the terms accordingly.

Also, the meaning of "concurrently" is probably not what you want --
strictly, it means that two things cannot happen at the same time, but
it implies that they can happen at different times.  I don't think
that is what you mean.

                   +---------+            +---------+
                   |   PE1   |            |   PE2   |
    +---+          |  +---+  |  +------+  |  +---+  |            +---+
    |CE1+---AC1----+--+   |  |  | MPLS |  |  |   +--+----AC2-----+CE2|
    +---+  (Root)  |  |MAC|  |  |  /IP |  |  |MAC|  |   (Leaf)   +---+
                   |  |VRF|  |  |      |  |  |VRF|  |
                   |  |   |  |  |      |  |  |   |  |            +---+
                   |  |   |  |  |      |  |  |   +--+----AC3-----+CE3|
                   |  +---+  |  +------+  |  +---+  |   (Leaf)   +---+
                   +---------+            +---------+

The figure is useful in showing the relationship of PE, AC, and CE.
(I see now that what I call an "endpoint" is generally known as a
"CE".)  But there is a shortage of connection lines.  I think
something like this would be clearer:

                   +---------+            +---------+
                   |   PE1   |            |   PE2   |
    +---+          |  +---+  |  +------+  |  +---+  |            +---+
    |CE1+---AC1----+--+   |  |  | MPLS |  |  |   +--+----AC2-----+CE2|
    +---+  (Root)  |  |MAC|  |  |  /IP |  |  |MAC|  |   (Leaf)   +---+
                   |  |VRF+------------------+VRF|  |
                   |  |   |  |  |      |  |  |   |  |            +---+
                   |  |   |  |  |      |  |  |   +--+----AC3-----+CE3|
                   |  +---+  |  +------+  |  +---+  |   (Leaf)   +---+
                   +---------+            +---------+


   In such scenario, using tailored BGP Route Target (RT) import/export
   policies among the PEs belonging to the same EVI, can be used to
   restrict the communications among Leaf PEs.

Presumably, "In such a scenario ..." but I prefer "In this scenario
...".  But the grammar has a problem, since if you elide the clause
"using ... the same EVI", it reads "In such [a] scenario ... can be
used to ...".  I think you want "In such a scenario, tailored BGP
... policies ... can be used to ...".

I think you want "prevent" rather than "restrict", since by the
definition of E-Tree service, there is to be no communication between
Leaf ACs at all -- restrict has an implication that the limitation is
not absolute (and indeed, is somehow configurable).

   from one Leaf AC to another Leaf AC on a MAC-VRF for a given E-TREE
   EVI.

Across the whole document, it's clear that the operation of the
proposed E-Tree mechanism is absolutely independent for each EVI.
E.g., here you note that scenario 1 is "for any PE, *for any EVI*, all
CEs connected to that PE are either all Roots or all Leafs".  But you
could simply state this separation of each EVI from every other EVI at
the top of the document and not have to keep repeating it at various
places throughout the document.

That is, I think it's true.  If there are ways in which the
implementation of one EVI interacts the implementation of another EVI,
that needs to be prominently flagged and carefully assessed for
causing subtle problems.  E.g., in section 3.2.1, it seems like the
advertisements of "Ethernet A-D per ES routes" may bundle information
about multiple EVIs.

2.2 Scenario 2: Leaf OR Root site(s) per AC

There is a technical question in that scenario 1 is more restrictive
than scenario 2, so any solution for scenario 2 can be used in
scenario 1.  But only alternative A is presented for scenario 1, even
though alternative B must necessarily work.  Presumably there is a
reason why alternative B is not presented for scenario 1, and I think
it should be stated.

This fact caused me some confusion when I was reading the document:
The mechanism described in 2.1 seems to be satisfactory for fulfilling
the requirements stated in paragraph 2, so but paragraph 2 introduces
"coloring" of MAC addresses.  I think that the expositions of the
three scenarios could better be done if they were all coordinated with
each other.  Perhaps the choice between alternatives A and B for
coloring MAC addresses is actually common across the three scenarios,
and what really changes between the scenarios is the efficiency
tradeoffs between the two alternatives.

   Approach (A) would require the same data plane enhancements as
   approach (B) if MAC-VRF and bridge tables used per VLAN, are to
   remain consistent with [RFC7432] (section 6).

What is the subject of "are"?

   In order to avoid data-
   plane enhancements for approach (A), multiple bridge tables per VLAN
   may be considered; 

It's not clear what the difference is between "MAC-VRF and bridge
tables used per VLAN" and "multiple bridge tables per VLAN".  Can this
be described in a way that is clearer to people who are not highly
familiar with the subject?

   then two RTs (one for Root and another for Leaf)
   can still be used with approach (B)

This seems to be proposing some sort of mixture of alternatives A and
B.  What exactly are the alternatives that are being specified that
the implementor chooses between?

2.3 Scenario 3: Leaf OR Root site(s) per MAC

   This scenario is
   not covered in both [RFC7387] and [MEF6.1]

Literally, this means "It is not true that this scenario is covered in
RFC 7387 and in MEF6.1."  But I think you mean "This scenario is not
covered in either ...".

   the Designated Forwarding (DF)
   filtering per [RFC7432] would not be compatible with the required
   egress filtering

Interestingly, "designated forwarding" is not mentioned anywhere else
in this draft.  Does it appear elsewhere or is it implied in the
mechanics of RFC 7432 that are used throughout this draft?

There seems to be no description of the techniques to be used in this
case.

And given that it seems to be contemplated to support scenario 3,
there are numerous places in the draft where "a Root AC" and "a Leaf
AC" are not correct.  You could use "a Root CE" or "a Root site", etc.

3 Operation for EVPN

   In other words,
   [RFC7432] has inherent capability to support E-TREE services without
   defining any new BGP routes but by just defining a new BGP Extended
   Community for leaf indication as shown later in this document
   (section 5.1).

And by implication, the addition of various processing of that leaf
indication.

But this does show that this draft is not just an application of RFC
7432 but an extension of it.

3.1 Known Unicast Traffic

   sending ... traffic over MPLS/IP core to be filtered at the egress
   PE ...

The implication seems to be that this "sending" that is avoided is
really multicast, from the ingress PE to all egress PEs.  Naively,
describing this as "over MPLS/IP core" doesn't seem to capture the
meaning, since *all* traffic is going to be send "over the MPLS/IP
core".

   To provide such ingress filtering for known unicast traffic, a PE
   MUST indicate to other PEs what kind of sites (root or leaf) its MAC
   addresses are associated with by advertising a leaf indication flag
   (via an Extended Community) along with each of its MAC/IP
   Advertisement routes. The lack of such flag indicates that the MAC
   address is associated with a root site. This scheme applies to all
   scenarios described in section 2. 

Read literally, this paragraph says that a leaf indication flag MUST
be present on each advertisement, but then says the absence of the
flag means that it is a root side (implying that the absence of the
flag is legitimate).  There are two alternatives -- a) the flag is a
1/0 flag, with one value meaning Leaf and one meaning Root or b) the
flag is some optional field whose presence means Leaf and whose
absence means Root -- but this wording isn't quite correct for either.

Also, this paragraph seems to be saying that the extended community is
always used to indicate Leaf/Root status (though perhaps it specifies
status by its absence).  But the descriptions in section 2 seem to be
saying that alternative A doesn't require the extended community,
which contradicts the MUST in this paragraph.

3.2 BUM Traffic

   This specification does not provide support for filtering BUM
   (Broadcast, Unknown, and Multicast) traffic on the ingress PE because
   it is not possible to perform filtering of BUM traffic on the ingress
   PE, as is the case with known unicast described above, due to the
   multi-destination nature of BUM traffic.

This sentence is quite awkward.  I think you can carry all of the
meaning with "This specification does not provide support for
filtering BUM (Broadcast, Unknown, and Multicast) traffic on the
ingress PE because it is not possible to do so, due to the
multi-destination nature of BUM traffic."

But the phrase "does not provide support" is not correct -- the
ingress PE fully *supports* BUM traffic (except in scenario 3), it's
just that the support doesn't include *filtering* by the ingress PE.

   the MPLS-encapsulated frames MUST be tagged with an
   indication that they originated from a Leaf AC - i.e., to be tagged
   with a Leaf label as specified in section 5.1. 

As written, the sentence requires "... tagged with an indication
whether they originated ...".  The use of "that" states that all the
frames originated from a Leaf AC.

Looking ahead to section 5.1, I see that the Leaf label is
considerably longer than the 1 bit that would seem to be necessary for
this function given its description here.  And the next paragraph
suggests that the assignment of Leaf labels is complex.  It would be
helpful if you clarified here all of the functionality of Leaf labels
so the reader has context for following paragraphs.  I suspect the
meaning here is "all MPLS-encapsulated frame are tagged with labels,
and to distinguish Leaf-originated frames, they must be tagged with
labels which are known to be Leaf labels".  Also, is there only one
leaf label (for an EVI), or can there be many, perhaps one for every
Leaf AC?

   The main difference between
   downstream and upstream assigned Leaf label is that in case of
   downstream assigned not all egress PE devices need to receive the
   label just like ESI label for ingress replication procedures defined
   in [RFC7432].

This sentence isn't clear to me.  I suspect that it just needs a few
words adjusted.  Also, I suspect that it would help to move the final
clause, "just like ESI label ..." to a separate sentence.

   On the ingress PE, the PE needs to place all its Leaf ACs for a given
   bridge domain in a single split-horizon group in order to prevent
   intra-PE forwarding among its Leaf ACs.

The phrase "bridge domain" appears only twice in this document.  I
suspect it is synonymous with some other term you are using.

My belief is that a "split-horizon group" means a group of ACs that
are visible to each other.  If that is correct, this sentence needs to
be rephrased to something like "... the PE needs to place each of its
Leaf ACs for a given bridge domain into separate split-horizon groups
...".

   Other mechanisms
   for identifying root or leaf (e.g., on a per MAC address basis) is
   beyond the scope of this document.

Scenario 3 in section 2.3 posits that a single AC may support both
Leaf and Root endpoints.  So there must be a known method of
performing this identification on a per-MAC address basis, or else
scenario 3 cannot be implemented at present.  That doesn't require
that this document specify or describe a method of doing so, but it
seems that at this point, the document should either identify one or
more ways in which it can be done, or admit that there are no known
mechanisms at this time.  (Or perhaps that there are no known *good*
mechanisms at this time.)  Otherwise the inclusion of scenario 3 is
purely hypothetical.

The acronym "ES" us used a lot in this section, meaning "Ethernet
segment".  Is "Ethernet segment" related to "AC"?

3.2.1 BUM traffic originated from a single-homed site on a leaf AC

What is the rule for capitalizing or not the words "leaf" and "root"?

   This Leaf label is
   advertised to other PE devices, using the E-TREE Extended Community
   (section 5.1) along with an Ethernet A-D per ES route with ESI of
   zero and a set of Route Targets (RTs) corresponding to all EVIs on
   the PE with at least one leaf site per EVI.

I'm not sure, but I think you mean s/all EVIs on the PE with at least
one leaf site per EVI/all EVIs which have at least one leaf site on
the PE/.

   The set of Ethernet A-D
   per ES routes may be needed if the number of Route Targets (RTs) that
   need to be sent exceed the limit on a single route per [RFC7432].

I suspect from the text that "Ethernet A-D per ES route" is a single
term, but I have no idea what it is referring to.  And I suspect that
someone has coined a shorter term to use.

I suspect that one or more words in this sentence aren't quite right
and I can't parse it.  For instance "The set of ... routes may be
needed if ..." doesn't make sense -- in what way can a set of routes
be needed only under some conditions?  Perhaps it should be "A set of
... routes", if the meaning is that in some situations only one would
be needed.  Reading this again, I think you mean "Multiple Ethernet
A-D per ES routes will need to be advertised if the number of Route
Targets needed to carry the EVIs exceeds the limit on a single route."

   The
   ESI for the Ethernet A-D per ES route is set to zero to indicate
   single-homed sites.

This sentence seems to be talking about some attribute of a route
(which is an operation in the control plane) while the section is
theoretically talking about BUM traffic (which is in the data plane).
This seems to be a general organizational problem, where the details
of the data plane operations are mixed with general descriptions of
the control plane operations needed to support the data plane
operations.  It would be clearer if the data plane specifications and
the control plane specifications were separated (e.g., into adjacent
paragraphs), making it more explicit what information is passed from
the control plane to the data plane.

3.2.3 BUM traffic originated from a multi-homed site on a leaf AC

The first paragraph introduces various matters that aren't discussed
in the rest of the document, or at least aren't much discussed.  If I
understand it right, these considerations don't introduce anything
unexpected other than that an AC which is multi-homed (i.e., connects
to more than one PE) must, for any single EVI, be consistently a Root
or a Leaf on all of those PEs.  If I'm correct, this paragraph could
be made much easier to understand by stating just that and avoiding
details.

However, there may be more involved than that.  E.g., "there is no
forwarding among subnets" may be an additional requirement.  It's hard
to tell, because there are only two occurrences of "forwarding among
subnets", and it's not at all clear what that term is intended to
cover.

3.2.4 BUM traffic originated from a multi-homed site on a root AC

I think the critical point of this paragraph is omitted: "... but no
Leaf label is added".  Compare with section 3.2.2.

3.3 E-TREE Traffic Flows for EVPN

        - Ethernet known unicast from Root to Roots & Leaf
        - Ethernet known unicast from Leaf to Root
        - Ethernet BUM traffic from Root to Roots & Leafs
        - Ethernet BUM traffic from Leaf to Roots

The grammar of these is not good.  I think you mean "Known unicast
Ethernet from ... to ...", or better "Known unicast traffic from
... to ...".

   In the case where unicast flows need not be supported,
   the L2VPN PEs can avoid performing any MAC learning function. 

I think you want s/unicast/known unicast/ -- there may be situations
where unicast traffic exists, and it could be handled as known unicast
traffic, but the implementation can tolerate handling all of it as
unknown traffic to avoid MAC learning.

OTOH, the choice of supporting scenario 3 requires the choice of MAC
learning, since scenario 3 prevents handling should-be-known unicast
traffic as BUM traffic.  That dependency should be noted somewhere.

3.3.1 E-Tree with MAC Learning

I see here much use of "Ethernet Segments".  It seems that it has the
same functional meaning as "ACs".  If so, only one term should be used
consistently.  If not, does the distinction need to be explained?

   For the scenario
   described in section 2.1 (or possibly section 2.2), these routes are
   imported only by PEs with at least one Root site in the EVI ...

What is the condition on "possibly in section 2.2"?  (Is this a
distinction between alternatives A and B?)

   To support multicast/broadcast from Leaf to Root sites, ingress
   replication should be sufficient for most scenarios where there are
   only a few Roots (typically two).

This is introducing one of a pair of alternatives (the other one being
described in the following paragraphs).  I think the reader should be
warned in advance what the two alternatives are and what it depends on
(viz., the number of Roots).  That is, a top-down structure.

   If the number of Roots is large, P2MP tunnels originated at the PEs
   with Leaf sites may be used and thus there will be no need to use the
   modified PMSI tunnel attribute in section 5.2 for composite tunnel
   type.

The wording here is not quite right; I think it allows some ambiguity.
Perhaps

   If the number of Roots is large, P2MP tunnels originated at the PEs
   with Leaf sites may be used (and thus there will be no need to use
   the composite tunnel type values of the modified PMSI tunnel
   attribute in section 5.2).

3.3.2 E-Tree without MAC Learning

Similarly, I suggest

   Just as in the previous section, if the number of PEs with root
   sites are only a few and thus ingress replication is desired from
   leaf PEs to these root PEs, then the composite tunnel values
   defined in section 5.2 should be used.

4 Operation for PBB-EVPN

   In PBB-EVPN, the PE advertises a Root/Leaf indication along with each
   B-MAC Advertisement route, to indicate whether the associated B-MAC
   address corresponds to a Root or a Leaf site. Just like the EVPN
   case, the new E-TREE Extended Community defined in section [5.1] is
   advertised with each MAC Advertisement route.

I don't understand the distinctions here, but is "MAC Advertisement
route" correct?  There are two uses of "MAC" and 12 uses of "B-MAC" in
this section.

4.1 Known Unicast Traffic

   The ingress PE cross-checks this flag with the status of
   the originating site, and if both are a Leaf, then the packet is not
   forwarded.

I think this can be simplified to

   The ingress PE also checks the status of the originating site, and
   if both are a Leaf, then the packet is not forwarded.

4.2 BUM Traffic

   it updates its egress filtering (based on the
   source B-MAC address), as follows:

The stated algorithm has some important properties.  One is that if a
frame arrives, but its B-MAC address is unknown, then the frame is
forwarded.  That is, traffic not from a known Leaf is assumed to be
from a Root.  This may not be a problem, but it seems like it is an
important property of this specification, and the implementor should
be aware of it.  (OTOH, perhaps this is a generic property of
PBB-EVPN, in which case it need not be stated here.)

Another property is that if a B-MAC changes from being a Leaf to being
a Root (by whatever means that might happen), it seems that the new
advertisements of the B-MAC as a Root will *not* remove it from the
list of blocked B-MACs of any Leaf that has seen that B-MAC advertised
as a Leaf.  Unless there is some consideration that I'm not aware of,
I think you want to revise this algorithm in this regard.

4.3 E-Tree without MAC Learning

   For PBB-EVPN, the handling of such traffic is per
   section 4.2 without C-MAC learning part of it at both ingress and
   egress PEs.  

This could be phrased better.  And is the "non-C-MAC learning" mode of
operation of PBB-EVPN defined in the PBB-EVPN specification?  If not,
I think you might need to describe it in more detail here.

5.1 E-Tree Extended Community

   It is used for leaf indication of known unicast and BUM
   traffic.    

It's probably worth specifying that it indicates that the frame's
*origin* is a Leaf.

   The label value is encoded in the high-order 20 bits of the
   Leaf Label field.

This sentence seems to be in the wrong place, since the case described
in this paragraph doesn't use the Label Value field.  It seems to me
that it would be better to incorporate this information into the value
layout:

        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       | Type=0x06     | Sub-Type=0x05 | Flags(1 Octet)|  Reserved=0   |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |  Reserved=0   |           Leaf Label                  |0 0 0 0|
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

5.2 PMSI Tunnel Attribute

   Composite tunnel type is advertised by the root
   PE to simultaneously indicate a non-ingress replication tunnel ...

There's a certain ambiguity here, as I first read it as "... a
(non-ingress) replication tunnel ..." -- but of course, there is no
"egress replication".  What is meant is "... a non-(ingress
replication) tunnel ...", which I think would be better expressed as
"... a non-ingress-replication tunnel ...".

   When this Composite Tunnel bit is set, the "tunnel identifier" field
   would begin with a three-octet label, followed by the actual tunnel
   identifier for the transmit tunnel.

Probably better s/would begin/begins/.  (The clause of the condition
"Composite Tunnel bit is set" does not use the subjunctive mood, so
you don't use the subjunctive mood in the consequent clause.)

It might help to add a figure

         +---------------------------------+
         |  Flags (1 octet)                |
         +---------------------------------+
         |  Tunnel Type (1 octet)          |
         +---------------------------------+
         |  P2MP MPLS Label (3 octets)     |
         +---------------------------------+
         |  Ingress Replication MPLS Label |
         |  (3 octets)                     |
         +---------------------------------+
         |  Tunnel Identifier (variable)   |
         +---------------------------------+

And s/1 octets/1 octet/ -- even though RFC 6514 makes that mistake!

   PEs that don't understand the
   new meaning of the high-order bit would treat the tunnel type as an
   undefined tunnel type and would treat the PMSI tunnel attribute as a
   malformed attribute [RFC6514].

It might be worth noting that this processing is why the composite
tunnel bit is allocated in the Tunnel Type field rather than the Flags
field.

8.1 Considerations for PMSI Tunnel Types

   The registry is to be updated, by removing the entries for 0xFB-0xFE
   and 0x0F, and replacing them by:

   Value          Meaning                            Reference
   0x0B-0x7A      Unassigned
   0x7B-0x7E      Reserved for Experimental Use      this document
   0x7F           Reserved                           this document
   0x80-0xFF      Reserved for Composite Tunnels     this document

   The allocation policy for values 0x00 to 0x7A is IETF Review
   [RFC5226]. The range for experimental use is now 0x7B-0x7E, and value
   in this range are not to be assigned. The status of 0x7F may only be
   changed through Standards Action [RFC5226].  

This structure allows the high-order bit to modify the interpretation
of the other bits.  However, section 5.2 says, "... the high-order bit
of the tunnel type field (Composite Tunnel bit) is set while the
remaining low-order seven bits indicate the tunnel type as before."

I think what you intended is to change the "P-Multicast Service
Interface Tunnel (PMSI Tunnel) Tunnel Types" registry to only specify
the low-order seven bits of the Tunnel Type field, with the high-order
bit of Tunnel Type being the Composite Tunnel bit.  Conceptualized
that way, the registry changes to:

   Value          Meaning                            Reference
   0x00-0x7A      (as before)
   0x7B-0x7E      Reserved for Experimental Use      this document
   0x7F           Reserved                           this document

This implies that the octet values 0x80 to 0xFF are subdivided into
assigned, unassigned, experimental, and reserved groups in the
parallel way.

9.1  Normative References

   [MEF6.1] Metro Ethernet Forum, "Ethernet Services Definitions - Phase
   2", MEF 6.1, April 2008.

Can we get a URL for this?

[EOF]
[bess] Genart last call review of draft-ietf-bess… Dale Worley
Re: [bess] Genart last call review of draft-ietf-… Ali Sajassi (sajassi)