[Lsvr] AD Review of draft-ietf-lsvr-applicability-07

Alvaro Retana <aretana.ietf@gmail.com> Thu, 10 December 2020 10:35 UTC

Return-Path: <aretana.ietf@gmail.com>
X-Original-To: lsvr@ietfa.amsl.com
Delivered-To: lsvr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 46F863A0C55; Thu, 10 Dec 2020 02:35:21 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level:
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id h6f-rkg7-5xb; Thu, 10 Dec 2020 02:35:17 -0800 (PST)
Received: from mail-ed1-x534.google.com (mail-ed1-x534.google.com [IPv6:2a00:1450:4864:20::534]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 267673A0C4D; Thu, 10 Dec 2020 02:35:14 -0800 (PST)
Received: by mail-ed1-x534.google.com with SMTP id p22so4912951edu.11; Thu, 10 Dec 2020 02:35:14 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:mime-version:date:message-id:subject:to:cc :content-transfer-encoding; bh=4l9UItzPJZDwYPc31BDD4ZDfDBcb3B2cA25io3tIaLk=; b=S3S0x+w9CInBIdO5YNgpiuDs9NxDau9J7t164ePagnvSMz82Oa+Rvo1Q/0kIJGTbgK XwfuzyhpgD+/g8on1+/8Yq9wcd5fwm6Tj2igCnQnR/qBTMJtYFfqJiHfWOPwRaat3S0E TrZT2PBk664wfA6VwrIb+E//ijvR7wftv+E7xzrWenro78lPv47gxOsEUvvgp8MUNgGs oxQqtZ4XDrOXNr920UFL4wbqcPbMcqnUffoXO9ZV0qtUc7AswGOs0WmcgJTkNFTbpLq4 OFAVbwdj0n/RBFoNbW747rk2MxBUOo5mArf8H7BTZt7PHqhPdBWUdGbNFCecjZnOymux whBQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:mime-version:date:message-id:subject:to:cc :content-transfer-encoding; bh=4l9UItzPJZDwYPc31BDD4ZDfDBcb3B2cA25io3tIaLk=; b=hdQrLCiNcHt3cQLwEorTTxO0cEiKeCyfewtvXl9/fRzMN4Q2bHUBoh9Uopig2aseUb QpY8W8r+oauLwQ7HYSGxmIWpOvnYFFtEa5sKs8r/qkb45ZELvYVfutYxcAHlLYXCkAR4 WGrhoe3BWHP/RZI8L4vLcjKhi0fOZoRhfW6EmfTouWpPuVh+zqDVpKh/xBNKkpS20riO 9ES8NPRax+d3CDX9usRjDaXIyX8WWeLg7gS68/NFLrrnJDhUYpPMqMQsxY3CH4nu9s3s O1nuZ65Dkf04RY//8qTJ8UQHucGjhItp52ZBlnlBW7Ne1jVkEHrKpAU5AnCeAmGPkjdJ 4R5g==
X-Gm-Message-State: AOAM532r4i6PbHTznW2yvA1DK+Kccl1HBoKVoQTLlFrSjucaDmeeGXXM jHfw2aw6jfkVwwNfViHZLvxh2MJ6zFxNA4l6Gb+YUJB7
X-Google-Smtp-Source: ABdhPJy3jK156axfcgzxKVL0DEN6gfQ/mJQEpc9YmqQCAb912nIywktU8FA3YPIYuS7DdYTo/2CJ4cHc5wZ9W1LfO1A=
X-Received: by 2002:a05:6402:845:: with SMTP id b5mr6159130edz.38.1607596511056; Thu, 10 Dec 2020 02:35:11 -0800 (PST)
Received: from 1058052472880 named unknown by gmailapi.google.com with HTTPREST; Thu, 10 Dec 2020 02:35:10 -0800
From: Alvaro Retana <aretana.ietf@gmail.com>
MIME-Version: 1.0
Date: Thu, 10 Dec 2020 02:35:10 -0800
Message-ID: <CAMMESszYHXL0qq3fgBU4wowE1C9g_UCKLS3nCFRcQg1LLOrMfA@mail.gmail.com>
To: draft-ietf-lsvr-applicability@ietf.org
Cc: "lsvr-chairs@ietf.org" <lsvr-chairs@ietf.org>, "Van De Velde, Gunter (Nokia - BE/Antwerp)" <gunter.van_de_velde@nokia.com>, "lsvr@ietf.org" <lsvr@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsvr/wSalwvCwgY93lAaitw4h2o6Kfmo>
Subject: [Lsvr] AD Review of draft-ietf-lsvr-applicability-07
X-BeenThere: lsvr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Link State Vector Routing <lsvr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsvr>, <mailto:lsvr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsvr/>
List-Post: <mailto:lsvr@ietf.org>
List-Help: <mailto:lsvr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsvr>, <mailto:lsvr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 10 Dec 2020 10:35:22 -0000

Dear authors:

I just finished reading this document.  As the base specification,
this document also needs more work.  I have some initial comments, and
then specifics in-line below.



(1) Currently, this document reads mostly as "using BGP SPF instead of
what is described in rfc7938", instead of the expected general "using
BGP SPF in a DC".

Not only is the eBGP-based model in rfc7938 just one of many that
could be considered in a DC, but the protocol was selected and the
design developed relying heavily on the characteristics of BGP as a
protocol.  BGP SPF and BGP are *not* the same protocol and don't have
the same characteristics.  A simple example: §5.2.1-5.2.2/rfc7938
describe considerations and a sample ASN scheme based on the
well-known behavior of the AS_PATH.  To contrast, the BGP SPF spec
says that "any attributes that would influence the Decision process
defined in [RFC4271]...are ignored" (§5.1), and there's no mention of
the AS_PATH.  IOW, the considerations from rfc7938, and §6.3 in this
document don't seem to directly match.  [I know that the base spec is
being worked and this point may change; but for now, there is no
normative behavior to stand on.]

Instead of trying to replace rfc7938, please focus on describing how
BGP SPF (specifically draft-ietf-lsvr-bgp-spf) can be used in a DC and
the related operational considerations.

To be clear, referencing rfc7938 is not a bad thing -- but do it in an
Informative way, as an example...not as the guide for deploying BGP
SPF.


(2) It would be very nice, as part of describing the use (without
depending so heavily on rfc7938), if the descriptions included
topology figures for reference.  I know that ASCII-art is not the
best/easiest, but luckily xmltorfc v3 let's you also include SVG
drawings.


(3) There are some sections that don't belong in a "this is how you
use BGP SPF" document.  For example, the justification for BGP SPF and
requirements for future work.  I included specific comments in-line,
but in general, those sections should be removed.


Given that this document depends heavily on the BGP SPF specification,
and that the spec requires a significant amount of work, I am
returning both documents to the WG.

Thanks!

Alvaro.

P.S.  I'm not sure if this review will be fully received by all
mailers.  Please check the archive.  I also put an "[End of Review]"
indicator.


[Line numbers from idnits.]


...
11	  Usage and Applicability of Link State Vector Routing in Data Centers

[] While I personally like the term LSVR more than BGP SPF, the
protocol spec doesn't use LSVR anywhere.  This whole document should
be consistent with it in relation to all the terminology (not just
LSVR vs BGP SPF).  I am not going to repeat this comment everywhere;
please update all the instances in line with the spec.


...
84	1.  Introduction
...
90	   After describing the deployment scenario, Section 5 will describe the
91	   reasons for BGP modifications for such deployments.

[major] As mentioned in I-D.ietf-lsvr-bgp-spf: what is specified is a
new link-state protocol...not just BGP modifications.  IOW, the
motivation (if it was needed) should be about the need for a
link-state protocol based on the BGP transport, and not about
modifying BGP.


...
98	2.  Requirements Language

100	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
101	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
102	   "OPTIONAL" in this document are to be interpreted as described in BCP
103	   14 [RFC2119] [RFC8174] when, and only when, they appear in all
104	   capitals, as shown here.

[major] There's only one place where an rfc2119 keyword is used...and
it doesn't belong in this draft.  I don't think we need this
boilerplate.


106	3.  Recommended Reading

108	   This document assumes knowledge of existing data center networks and
109	   data center network topologies [CLOS].  This document also assumes
110	   knowledge of data center routing protocols like BGP [RFC4271], BGP-
111	   SPF [I-D.ietf-lsvr-bgp-spf], OSPF [RFC2328], as well as, data center
112	   OAM protocols like LLDP [RFC4957] and BFD [RFC5580].

[major] "assumes knowledge of...[CLOS]"  This reference makes the CLOS
document a Normative reference because it "must be read to understand
or implement the technology in the new RFC" [1].  While this shouldn't
be a big issue, I couldn't find a place where the document is freely
available (I did a quick 1-minute search) -- that is an issue!

It seems to me that the contents of this document don't require all
the information in [CLOS].  Consider either (1) finding a link to a
freely available reference, or, (2) include a figure and description
of a general topology that can be used as reference (and mention
[CLOS] as an additional resource).  Note that the description in
rfc7938 could also be another informative reference.  My personal
preference would be #2.

[1] https://www.ietf.org/about/groups/iesg/statements/normative-informative-references/


[minor] "assumes knowledge of data center routing protocols like BGP
[RFC4271], BGP-SPF [I-D.ietf-lsvr-bgp-spf], OSPF [RFC2328]"  Isn't
knowing about BGP SPF enough?  If not, then the references to
rfc4271/rfc2328 should be Normative.


[major] Why does the reader need knowledge of LLDP?  It is not
mentioned anywhere else.


[minor] s/BFD [RFC5580]/BFD [RFC5880]


[major] Why does the reader need knowledge of BFD?  Note that the BGP
SPF spec should be treating BFD as an example, at best.


114	4.  Common Deployment Scenario

116	   Within a Data Center, servers are commonly interconnected the CLOS
117	   topology [CLOS].  The CLOS topology is fully non-blocking and the
118	   topology is realized using Equal Cost Multi-Path (ECMP).  In a CLOS
119	   topology, the minimum number of parallel paths between two servers is
120	   determined by the width of a tier-1 stage as shown in the figure 1.

[nit] s/interconnected the CLOS topology/interconnected using a CLOS topology


[nit] s/topology is fully non-blocking and the topology is
realized/topology is fully non-blocking and is realized


122	   The following example illustrates multi-stage CLOS topology.

[nit] s/illustrates multi-stage/illustrates a multi-stage


...
149	5.  Justification for BGP SPF Extension

[] This section basically says that BGP converges slower than BGP SPF.
That is not a glowing justification to develop a new protocol when
others (traditional link-state protocols, for example) would have
similar characteristics.

Having said that, I don't think it is the job of this document to
justify the existence of BGP SPF -- but just to tell the reader how it
applies to a DC.


...
171	6.  LSVR Applicability to CLOS Networks

173	   With the BGP SPF extensions [I-D.ietf-lsvr-bgp-spf], the BGP best-
174	   path computation and route computation are replaced with OSPF-like
175	   algorithms [RFC2328] both to determine whether an BGP-LS SPF NLRI has
176	   changed and needs to be re-advertised and to compute the BGP routes.
177	   These modifications will significantly improve convergence of the
178	   underlay while affording the operational benefits of a single routing
179	   protocol [RFC7938].

[minor] s/OSPF-like algorithms [RFC2328]/an SPF algorithm


[minor] s/BGP routes/BGP SPF routes


[major] "will significantly improve convergence of the underlay while
affording the operational benefits"   This sounds like nice text for a
marketing brochure.  This document should not be a place to put down
other solutions, and much less to do so without substantiation.  I'm
not asking for tests to show faster convergence (not needed!), just
that the text stay focused on the use of BGP SPF in a DC (not how
other protocols do it or not).


[] "single routing protocol [RFC7938]."  I-D.ietf-lsvr-bgp-spf is not
clear about the use (or not) of multiple AFI/SAFI pairs in the same
peering session.  That is not an issue for this document -- just
pointing at it here so we don't forget to remain consistent.


181	   Data center controllers typically require visibility to the BGP
182	   topology to compute traffic-engineered paths.  These controllers
183	   learn the topology and other relevant information via the BGP-LS
184	   address family [RFC7752] which is totally independent of the underlay
185	   address families (usually IPv4/IPv6 unicast).  Furthermore, in
186	   traditional BGP underlays, all the BGP routers will need to advertise
187	   their BGP-LS information independently.  With the BGP SPF extensions,
188	   controllers can learn the topology using the same BGP advertisements
189	   used to compute the underlay routes.  Furthermore, these data center
190	   controllers can avail the convergence advantages of the BGP SPF
191	   extensions.  The placement of controllers can be outside of the
192	   forwarding path or within the forwarding path.

[nit] s/BGP topology/network topology


[minor] "all the BGP routers will need to advertise their BGP-LS
information independently"  Please add an Informative reference to how
that is done.


[nit] s/BGP advertisements/advertisements


[] "these data center controllers can avail the convergence advantages
of the BGP SPF extensions"  Another nice piece of marketing.  Note
that if each node independently advertises BGP-LS information then
there is no propagation/convergence delay.  In the case of BGP SPF, if
the controller is listening at a specific point in the topology, then
there is propagation delay.


194	   Alternatively, as each and every router in the BGP SPF domain will
195	   have a complete view of the topology, the operator can also choose to
196	   configure BGP sessions in hop-by-hop peering model described in
197	   [RFC7938] along with BFD [RFC5580].  In doing so, while the hop-by-
198	   hop peering model lacks the inherent benefits of the controller-based
199	   model, BGP updates need not be serialized by BGP best-path algorithm
200	   in either of these models.  This helps overall network convergence.

[] This paragraph seems to mix BGP SPF and "normal" BGP peering ala
rfc7938.  I don't follow what the point it -- I'm lost.


202	6.1.  Usage of BGP-LS SPF SAFI

204	   The BGP SPF extensions [I-D.ietf-lsvr-bgp-spf] define a new BGP-LS
205	   SPF SAFI for announcement of BGP SPF link-state.  The NLRI format and
206	   its associated attributes follow the format of BGP-LS for node, link,
207	   and prefix announcements.  Whether the peering model within a CLOS
208	   follows hop-by-hop peering described in [RFC7938] or any controller-
209	   based or route-reflector peering, an operator can exchange BGP SPF
210	   SAFI routes over the BGP peering by simply configuring BGP SPF SAFI
211	   between the necessary BGP speakers.

[] Here again there's a mix between rfc7938 and BGP SPF -- what am I missing?

[major] Route-reflectors don't exist in BGP SPF -- at last not in the
same manner as they do in "normal" BGP.


213	   The BGP-LS SPF SAFI can also co-exist with BGP IP Unicast SAFI which
214	   could exchange overlapping IP routes.  The routes received by these
215	   SAFIs are evaluated, stored, and announced independently according to
216	   the rules of [RFC4760].  The tie-breaking of route installation is a
217	   matter of the local policies and preferences of the network operator.

[major] Co-existence.  The BGP SPF specification is not clear about that.


[minor] "routes...are evaluated, stored, and announced...according to
the rules of [RFC4760]"   s/rfc4760/rfc4271


[minor] "tie-breaking...is a matter of the local policies and
preferences"  True.  This phrase is true regardless of where other
routing information was learned from (including local routes), right?
It feels as if the only case being addressed is one where BGP/BGP SPF
are used for the over/underlay, and not the general case -- maybe the
general case is the one with BGP overlay...describe that (from
scratch) instead of trying to replace a point solution.


219	   Finally, as the BGP SPF peering is done following the procedures
220	   described in [RFC4271], all the existing transport security
221	   mechanisms including [RFC5925] are available for the BGP-LS SPF SAFI.

[major] "peering is done following...[RFC4271]...security mechanisms
including [RFC5925] are available"  This is not accurate because
rfc4271 requires MD5, not TCP-AO.  Also, note that
I-D.ietf-lsvr-bgp-spf doesn't say anything about authentication,
relying on what is defined in rfc4271.

While I have no issues with using TCP-AO, this is not the document to
suggest changes to rfc4271.

This piece of text makes an interesting point (not sure if it was mean
that way) when it points out that TCP-AO is "available [specifically]
for the BGP-LS SPF SAFI".  The interesting part is of course that TCP
authentication is shared by whatever BGP is carrying.  If the BGP SPF
sessions are dedicated to it (i.e. no other AFI/SAFI is negotiated),
then there's no reason why a different authentication mechanism can't
be used.  As I mentioned in my review of the BGP SPF spec, I would
like to see a requirement for isolating the BGP SPF sessions (similar
to rfc7752bis) -- personal opinion, of course.

The other reason for trying to isolate the session is the probable
existence of latency sensitive needs in a datacenter.  Maybe even
going as far as configuring different parameters for the TCP session.


223	6.1.1.  Relationship to Other BGP AFI/SAFI Tuples

225	   Normally, the BGP-LS AFI/SAFI is used solely to compute the underlay
226	   and is given preference over other AFI/SAFIs.  Other BGP SAFIs, e.g.,
227	   IPv6/IPv6 Unicast VPN would use the BGP-SPF computed routes for next
228	   hop resolution.  However, if BGP-LS NLRI is also being advertised for
229	   controller consumption, there is no need to replicate the Node, Link,
230	   and Prefix NLRI in BGP-NLRI.  Rather, additional NLRI attributes can
231	   be advertised in the BGP-LS SPF AFI/SAFI as required (e.g., BGP-LS TE
232	   metric extensions [RFC8571] and BGP-LS segment routing extensions
233	   [I-D.ietf-idr-bgp-ls-segment-routing-ext]).

[major] "BGP-LS AFI/SAFI...is given preference over other AFI/SAFIs"
Preference where?  How?


[major] "...if BGP-LS NLRI is also being advertised for controller consumption,
there is no need to replicate the Node, Link, and Prefix NLRI in BGP-NLRI.
Rather, additional NLRI attributes can be advertised in the BGP-LS SPF AFI/SAFI
as required..."

It is not clear to me which of these two cases you're illustrating
here -- whichever it is, please find a clearer way to explain:

(1) A controller is receiving BGP-LS, and can learn additional details
by listening to BGP SPF (as already suggested in §6).  Why wouldn't
that additional information already be in BGP-LS?  Where is the BGP-LS
information coming from?

(2) A router (not a controller) can use the BGP-LS information
(complemented by additional details carried in BGP SPF).  Again, where
is the BGP-LS information coming from?  Why wouldn't the additional
information be carried by BGP-LS?  If the NLRI information is carried
in BGP-LS (and is to be used in the SPF computation), how is it
validated?


235	6.2.  Peering Models

237	   As previously stated, BGP SPF can be deployed using the existing
238	   peering model where there is a single-hop BGP session on each and
239	   every link in the data center fabric [RFC7938].  This provides for
240	   both the advertisement of routes and the determination of link and
241	   neighboring switch availability.  With BGP SPF, the underlay will
242	   converge faster due to changes to the decision process that will
243	   allow NLRI changes to be advertised faster after detecting a change.

[minor] "provides for both the advertisement of routes and the
determination of link and neighboring switch availability"   What does
this mean?


[] "the underlay will converge faster"  Marketing...


245	6.2.1.  Sparse Peering Model

247	   Alternately, BFD [RFC5580] can be used to swiftly determine the
248	   availability of links and the BGP peering model can be significantly
249	   sparser than the data center fabric.  BGP SPF sessions only need to
250	   be established with enough peers to provide a bi-connected graph.  If
251	   IEBGP is used, then the BGP routers at tier N-1 will act as route-
252	   reflectors for the routers at tier N.

[nit] s/Alternately, BFD/BFD


[minor] "bi-connected graph"  Put a forward reference to §6.2.2,


[major] "IEBGP"   Where is this defined?  The BGP SPF spec doesn't mention it...


254	   The obvious usage of sparse peering is to avoid parallel BGP sessions
255	   on links between the same two switches in the data center fabric.
256	   However, this use case is not very useful since parallel layer-3
257	   links between the same two BGP routers are rare in CLOS or Fat-Tree
258	   topologies.  Additionally, when there are multiple links, they are
259	   often aggregated at the link layer rather than the IP layer.  Two
260	   more interesting scenarios are described below.

[minor] "multiple links, they are often aggregated at the link layer"
I'm not sure if you mean that the link information can be advertised
as a single link.  Where is that covered?  I don't recall the topic of
link aggregation in the BGP SPF spec.


262	   In current data center topologies, there is often a very dense mesh
263	   of links between levels, e.g., leaf and spine, providing 32-way,
264	   64-way, or more Equal-Cost Multi-Path (ECMP) paths.  In these
265	   topologies, it is desirable not to have a BGP session on every link
266	   and techniques such as the one described in Section 6.2.2 can be used
267	   establish sessions on some subset of northbound links.  For example,
268	   in a Spine-Leaf topology, each leaf switch would only peer with a
269	   subset of the spines dependent on the flooding redundancy required to
270	   be reasonably certain that every node within the BGP-LS SPF routing
271	   domain has the complete topology.

[nit] "In current data center topologies..."  A drawing is worth a
thousand words.  Maybe refer to Figure 1 so the reader doesn't have to
make assumptions about what "current" is.


[nit] "be reasonably certain"  That is not enough.


273	   Alternately, controller-based data center topologies are envisioned
274	   where BGP speakers within the data center only establish BGP sessions
275	   with two or more controllers.  In these topologies, fabric nodes
276	   below the first tier (using [RFC7938] hierarchy) will establish BGP
277	   multi-hop sessions with the controllers.  For the multi-hop sessions,
278	   determining the route to the controllers without depending on BGP
279	   would need to be through some other means beyond the scope of this
280	   document.  However, the BGP discovery mechanisms described in
281	   Section 6.5 would be one possibility.

[nit] "first tier (using [RFC7938] hierarchy)"  Using a reference in
this document would be better.


[minor] "route to the controllers"  There is plenty of literature (I
think even some IRTF documents) that deal with the potential issues
with multi-hop, controller based deployments.  Convergence is one of
those issues.  I would like to then see considerations related to the
selection of a peering model: reasons, pros, cons, etc.


283	6.2.2.  Bi-Connected Graph Heuristic

285	   With this heuristic, discovery of BGP peers is assumed, e.g., as
286	   described in Section 6.5.  Additionally, it assumed that the
287	   direction of the peering can be ascertained.  In the context of a
288	   data center fabric, direction is either northbound (toward the
289	   spine), southbound (toward the Top-Of-Rack (TOR) switches) or east-
290	   west (same level in hierarchy.  The determination of the direction is
291	   beyond the scope of this document.  However, it would be reasonable
292	   to assume a technique where the TOR switches can be identified and
293	   the number of hops to the TOR is used to determine the direction.

[nit] s/it assumed/it is assumed


[nit] s/(same level in hierarchy/(same level in hierarchy)


[major] §6.1.1 explains the sparse peering model; the deployment
depends on establishing a bi-connected graph, which depends on both
discovery and the "direction of the peering".  *But*, determining the
direction is out of scope, and discovery is at this point wishful
thinking (from a specification point of view).  IOW, there's no real
explanation of the heuristic.  Also, the BGP SPF spec doesn't cover
either.


295	   In this heuristic, BGP speakers allow passive session establishment
296	   for southbound BGP sessions.  For northbound sessions, BGP speakers
297	   will attempt to maintain two northbound BGP sessions with different
298	   switches (in data center fabrics there is normally a single layer-3
299	   connection anyway).  For east-west sessions, passive BGP session
300	   establishment is allowed.  However, BGP speaker will never actively
301	   establish an east-west BGP session unless it cannot establish two
302	   northbound BGP sessions.

[major] "passive session establishment"   I'm assuming that this
refers to the PassiveTcpEstablishment session attribute from the FSM.
If so, please be specific and point to rfc4271.


[major] This paragraph describes, at a high level, the dynamic
behavior of the peers, but it is not clear how the behavior is
achieved: if automatic, assuming that the direction of the peering is
known, or if there's a configuration expectation to (at least)
indicate the active/passive state of the sessions.


304	6.3.  BGP Spine/Leaf Topology Policy

306	   One of the advantages of using BGP SPF as the underlay protocol is
307	   that BGP policy can be applied at any level.  For example, depending
308	   upon the topology, it may be possible to aggregate prefix
309	   advertisements using existing BGP policy.  In Spine/Leaf topologies,
310	   it is not necessary to advertise BGP-LS NLRI received by leaves
311	   northbound to the spine nodes at the level above.  If a common AS is
312	   used for the spine nodes, this can easily be accomplished with EBGP
313	   and a simple policy to filter advertisements from the leaves to the
314	   spine if the first AS in the AS path is the spine AS.

[major] Yes, "BGP policy can be applied at any level".  However, just
like with mesh-groups in OSPF/ISIS this type of flooding control
mechanism requires careful consideration: filter too much and backup
flooding paths may be lost, filter too little and achieve no
advantage.  The example given (the leaves don't send received routes
up) is straight forward.  Given that that is only an example (not the
only recommendation), I would like to see operational considerations
for the general case.


[major] "aggregate prefix advertisements"  I don't see any mention of
aggregation in the BGP SPF spec.  Maybe by "aggregation" you mean
filtering, or possibly limiting flooding, or...


[minor] "If a common AS is used for the spine nodes..."  What about
the general case?  Is using a common ASN for the spines (or per level)
a recommendation for BGP SPF in a DC, or just an example?


[nit] s/BGP-LS NLRI/BGP SPF NLRI


[minor] "...filter advertisements from the leaves to the spine if the
first AS in the AS path is the spine AS."  The BGP SPF spec says that
the attributes are ignored -- I would then assume that the AS_PATH is
not validated (or used), or is it?


...
319	                +--------+    +--------+             +--------+
320	    AS 64512    |        |    |        |             |        |
321	    for Spine   | Spine 1+----+ Spine 2+- ......... -+ Spine N|
322	    Nodes at    |        |    |        |             |        |
323	    this Level  +-+-+-+-++    ++-+-+-+-+             +-+-+-+-++
324	           +------+ | | |      | | | |                 | | | |
325	           |  +-----|-|-|------+ | | |                 | | | |
326	           |  |  +--|-|-|--------+-|-|-----------------+ | | |
327	           |  |  |  | | |    +---+ | |                   | | |
328	           |  |  |  | | |    |  +--|-|-------------------+ | |
329	           |  |  |  | | |    |  |  | |              +------+ +----+
330	           |  |  |  | | |    |  |  | +--------------|----------+  |
331	           |  |  |  | | |    |  |  +-------------+  |          |  |
332	           |  |  |  | | +----|--|----------------|--|--------+ |  |
333	           |  |  |  | +------|--|--------------+ |  |        | |  |
334	           |  |  |  +------+ |  |              | |  |        | |  |
335	          ++--+--++      +-+-+--++            ++-+--+-+     ++-+--+-+
336	          | Leaf 1|~~~~~~| Leaf 2|  ........  | Leaf X|     | Leaf Y|
337	          +-------+      +-------+            +-------+     +-------+

[nit] Is that squiggly line between Leaf 1 and Leaf 2 meant to be a
link between them?  If so, then they will be transit for each
other...is that what you meant?


339	                   Figure 2: Spine/Leaf Topology Policy

341	6.4.  BGP Peer Discovery Requirements

[major] Peer Discovery is not covered in the BGP SPF spec.  Why are
requirements included here?

[] I didn't include comments on the requirements.


...
386	6.5.1.  BGP Peer Discovery Alternatives

[major] This subject is speculative at this point.  As you start
saying: "peer discovery is not part of [I-D.ietf-lsvr-bgp-spf]".

388	   While BGP peer discovery is not part of [I-D.ietf-lsvr-bgp-spf],
389	   there are, at least, three proposals for BGP peer discovery.  At
390	   least one of these mechanisms will be adopted and will be applicable
391	   to deployments other than the data center.  It is strongly
392	   RECOMMENDED that the accepted mechanism be used in conjunction with
393	   BGP SPF in data centers.  The BGP discovery mechanism should
394	   discovery both peer addresses and endpoints for BFD discovery.
395	   Additionally, it would be great if there were a heuristic for
396	   determining whether the peer is at a tier above or below the
397	   discovering BGP speaker (refer to Section 6.2.2).


...
403	6.5.2.  BGP IPv6 Simplified Peering

405	   In order to conserve IPv4 address space and simplify operations, BGP-
406	   LS SPF routers in CLOS/Fat-Tree deployments can use IPv6 addresses as
407	   peer address.  For IPv4 address families, IPv6 peering as specified
408	   in [RFC5549] can be deployed to avoid configuring IPv4 addresses on
409	   BGP-LS SPF router interfaces.  When this is done, dynamic discovery
410	   mechanisms, as described in Section 6.5, can used to learn the global
411	   or link-local IPv6 peer addresses and IPv4 addresses need not be
412	   configured on these interfaces.  If IPv6 link-local peering is used,
413	   then configuration of IPv6 global addresses is also not required and
414	   these IPv6 link-local addresses must then be advertised in the BGP-LS
415	   Link Descriptor IPv6 Address TLV (262) [RFC7752].

[major] "as specified in [RFC5549]"  This is not covered in the BGP SPF spec.


417	6.5.3.  BGP-LS SPF Topology Visibility for Management

[minor] This sub-section doesn't seem to belong in §6.5: BGP Peer Discovery.


419	   Irrespective of whether or not BGP-LS SPF is used for route
420	   calculation, the BGP-LS SPF route advertisements can be used to
421	   periodically construct the CLOS/FAT Tree topology.  This is
422	   especially useful in deployments where an IGP is not used and the
423	   base BGP-LS routes [RFC7752] are not available.  The resultant
424	   topology visibility can then be used for troubleshooting and
425	   consistency checking.  This would normally be done on a central
426	   controller or other management tool which could also be used for
427	   fabric data path verification.  The precise algorithms and
428	   heuristics, as well as, the complete set of management applications
429	   is beyond the scope of this document.

[nit] s/BGP-LS SPF/BGP SPF/g


[major] "whether or not BGP-LS SPF is used for route calculation, the
BGP-LS SPF route advertisements can be used"  The BGP SPF spec doesn't
cover a mode where routes are advertised, but route calculation is not
done.  Maybe you mean that the routes are not used (as in having a
less-preferred admin distance), or maybe you meant that BGP-LS (not
BGP SPF) is used -- but the text also says that "the base BGP-LS
routes [RFC7752] are not available" (base routes?).


431	6.5.4.  Data Center Interconnect (DCI) Applicability

433	   Since BGP SPF is to be used for the routing underlay and DCI gateway
434	   boxes typically have direct or very simple connectivity, BGP external
435	   sessions would typically not include the BGP SPF SAFI.

[minor] "would typically not include"  That is not a strong don't use.
What are the considerations for using BGP SPF, or not?


437	7.  Non-CLOS/FAT Tree Topology Applicability

439	   The BGP SPF extensions [I-D.ietf-lsvr-bgp-spf] can be used in other
440	   topologies and avail the inherent convergence improvements.
441	   Additionally, sparse peering techniques may be utilized Section 6.2.
442	   However, determining whether or to establish a BGP session is more
443	   complex and the heuristic described in Section 6.2.2 cannot be used.
444	   In such topologies, other techniques such as those described in
445	   [I-D.ietf-lsr-dynamic-flooding] may be employed.  One potential
446	   deployment would be the underlay for a Service Provider (SP) backbone
447	   where usage of a single protocol, i.e., BGP, is desired.

[major] This document is about applicability of BGP SPF in a DC, so
this section seems out of place.  Even if it was to remain, the text
doesn't provide any useful information (beyond the fact that BGP SPF
could be used elsewhere).


449	8.  Non-Transit Node Capability

451	   In certain scenarios, a BGP node wishes to participate in the BGP SPF
452	   topology but never be used for transit traffic.  These in include
453	   situations where a server wants to make application services
454	   available to clients homed at subnets throughout the BGP SPF domain
455	   but does not ever want to be used as a router (i.e., carry transit
456	   traffic).  Another specific instance is where a controller is
457	   resident on a server and direct connectivity to the controller is
458	   required throughout the entire domain.  This can readily be
459	   accomplished using the BGP-LS Node NLRI Attribute SPF Status TLV as
460	   described in [I-D.ietf-lsvr-bgp-spf].

[nit] "a BGP node wishes"  The operator wishes sometimes, I doubt the node does.


[nit] s/in include/include


[minor] For the case where a "server...but does not ever want to be
used as a router", please be a little more descriptive on the setup.
It seems that you're assuming that the server itself is running BGP
SPF (vs being connected through a router), right?  Of course, a server
might only be used as transit if multiple interfaces are used with BGP
SPF...  Please be explicit.


[minor] Same comment for the "controller is resident on a server" case.


462	9.  BGP Policy Applicability

464	   Existing BGP policy including aggregation and prefix filtering may be
465	   used in conjunction with the BGP-LS SPF SAFI.  When aggregation
466	   policy is used, BGP-LS SPF prefix NLRI will be originated for the
467	   aggregate prefix and BGP-LS SPF prefix NLRI for components will be
468	   filtered.  Additionally, link and node NLRI may be filtered and the
469	   abstracted by the prefix NLRI.

[major] Again, I didn't see a specification for aggregation in the BGP
SPF draft.


[minor] Is "abstracted by the prefix NLRI" the same thing as aggregation?


471	   When BGP policy is used with the BGP-LS SPF SAFI, BGP speakers in the
472	   BGP-LS SPF routing domain will not all have the same set of NLRI and
473	   will compute a different BGP local routing table.  Consequently, care
474	   must be taken to assure routing is consistent and blackholes or
475	   routing loops do not ensue.  However, this is no different than if
476	   tradition BGP routing using the IPv4 and IPv6 address families were
477	   used.

[major] Yes, it is different!  A link-state protocol is definitely
different than a path vector one.  Among other things, SPF depends on
everyone having a view of the topology.  Please include
considerations/consequences when applying filters.


...
483	11.  Security Considerations

485	   This document introduces no new security considerations above and
486	   beyond those already specified in the [RFC4271] and
487	   [I-D.ietf-lsvr-bgp-spf].

[major] True.  For the benefit of other readers, please include a
quick description of what this document is about:

For example>
   This document discusses the use of BGP SPF in a DC.  As a description of its
   use, it introduces no new...


...
513	13.2.  Informative References

515	   [CLOS]     "A Study of Non-Blocking Switching Networks",  The Bell
516	              System Technical Journal, Vol. 32(2), DOI
517	              10.1002/j.1538-7305.1953.tb01433.x, March 1953.

...
548	   [RFC2328]  Moy, J., "OSPF Version 2", STD 54, RFC 2328,
549	              DOI 10.17487/RFC2328, April 1998,
550	              <https://www.rfc-editor.org/info/rfc2328>.

552	   [RFC4271]  Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A
553	              Border Gateway Protocol 4 (BGP-4)", RFC 4271,
554	              DOI 10.17487/RFC4271, January 2006,
555	              <https://www.rfc-editor.org/info/rfc4271>.

[major] Because of §3, these 3 references must be Normative.  Please
see my comments there.


...
593	   [RFC7938]  Lapukhov, P., Premji, A., and J. Mitchell, Ed., "Use of
594	              BGP for Routing in Large-Scale Data Centers", RFC 7938,
595	              DOI 10.17487/RFC7938, August 2016,
596	              <https://www.rfc-editor.org/info/rfc7938>.

[major] The current text relies heavily on rfc7938.  As is, that would
require this reference to be Normative.  See my comments at the start
of this message.

[End of Review]