Re: [Tsv-art] Tsvart last call review of draft-ietf-bfd-vxlan-07

"Carlos Pignataro (cpignata)" <cpignata@cisco.com> Thu, 20 June 2019 00:14 UTC

IronPort-PHdr: 9a23:uHmNRxIcXoNlhxOzLdmcpTVXNCE6p7X5OBIU4ZM7irVIN76u5InmIFeBvKd2lFGcW4Ld5roEkOfQv636EU04qZea+DFnEtRXUgMdz8AfngguGsmAXEbjLfHsZjAzNM9DT1RiuXq8NBsdFQ==
From: "Carlos Pignataro (cpignata)" <cpignata@cisco.com>
To: Greg Mirsky <gregimirsky@gmail.com>
CC: Olivier Bonaventure <Olivier.Bonaventure@uclouvain.be>, "tsv-art@ietf.org" <tsv-art@ietf.org>, rtg-bfd WG <rtg-bfd@ietf.org>, "draft-ietf-bfd-vxlan.all@ietf.org" <draft-ietf-bfd-vxlan.all@ietf.org>, IETF list <ietf@ietf.org>
Thread-Topic: Tsvart last call review of draft-ietf-bfd-vxlan-07
Thread-Index: AQHVJMqh+O1Bih53oEC7haZ3x5KaoKajsLYA
Date: Thu, 20 Jun 2019 00:14:39 +0000
Message-ID: <14822B96-D3C6-495E-8661-198068F72ABA@cisco.com>
References: <155933149484.6565.7386019489022348116@ietfa.amsl.com> <CA+RyBmXu-F0cWDkBydE_aJaVpUv=k1otqUCc7NdRW4pnBK3tgA@mail.gmail.com>
In-Reply-To: <CA+RyBmXu-F0cWDkBydE_aJaVpUv=k1otqUCc7NdRW4pnBK3tgA@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
received-spf: None (protection.outlook.com: cisco.com does not designate permitted sender hosts)
Content-Type: multipart/alternative; boundary="_000_14822B96D3C6495E8661198068F72ABAciscocom_"
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-Network-Message-Id: e40af728-a24e-4adb-9549-08d6f514491a
X-MS-Exchange-CrossTenant-originalarrivaltime: 20 Jun 2019 00:14:39.1895 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 5ae1af62-9505-4097-a69a-c1553ef7840e
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: cpignata@cisco.com
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL0PR11MB2994
X-OriginatorOrg: cisco.com
X-Outbound-SMTP-Client: 173.36.7.21, xch-aln-011.cisco.com
X-Outbound-Node: rcdn-core-5.cisco.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsv-art/aWXsCp8RRqqC5LKCsTEWtbaPw9c>
Subject: Re: [Tsv-art] Tsvart last call review of draft-ietf-bfd-vxlan-07
X-BeenThere: tsv-art@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Review Team <tsv-art.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsv-art>, <mailto:tsv-art-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsv-art/>
List-Post: <mailto:tsv-art@ietf.org>
List-Help: <mailto:tsv-art-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsv-art>, <mailto:tsv-art-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Jun 2019 00:14:50 -0000

Hi,

I have not reviewed this draft before, but triggered by this email, and briefly scanning through a couple of sections, it is unclear to me how some of the mechanics work.

There are some major issues with the Mac usage and association, as Joel Halpern mentioned in his Rtg Dir review.

And, additionally, please consider the following comments and questions:

1. Underspecification for initialization and initial demultiplexing.

This document allows multiple BFD sessions between a single pair of VTEPs:

An
implementation that supports this specification MUST be able to
control the number of BFD sessions that can be created between the
same pair of VTEPs.

The implication of this is that BFD single-hop initialization procedures will not work. Instead, there is a need to map the initial demultiplexing.

This issue is explained in RFCs 5882 and 5883: https://tools.ietf.org/html/rfc5883#section-4 and https://tools.ietf.org/html/rfc5882#section-6

Section 5.1 says:

For such packets, the BFD session MUST be identified
using the inner headers, i.e., the source IP, the destination IP, and
the source UDP port number present in the IP header carried by the
payload of the VXLAN encapsulated packet. The VNI of the packet
SHOULD be used to derive interface-related information for
demultiplexing the packet.

But this does not really explain how to do the initial demultiplexing. Does each BFD session need to have a separate inner source IP address? Or source UDP port? And how ofter are they recycled or kept as state? How are these mapped?
Equally importantly, which side is Active?
And what if there’s a race condition with both sides being Active and setting up redundant sessions?

1.b. By the way, based on this, using S-BFD [RFC 7880] might be easier to demux.

2. Security

This document says that the TTL in the inner packet carrying BFD is set to 1. However, RFC 5880 says to use GTSM [RFC 5082], i.e., a value of 255.

Why is GTSM not used here?

3. ECMP and fate-sharing under-specification:

Section 4.1. says:

The Outer IP/UDP
and VXLAN headers MUST be encoded by the sender as defined in
[RFC7348].

And RFC 7348 says:

- Source Port: It is recommended that the UDP source port number
be calculated using a hash of fields from the inner packet --
one example being a hash of the inner Ethernet frame's headers.
This is to enable a level of entropy for the ECMP/load-
balancing of the VM-to-VM traffic across the VXLAN overlay.
When calculating the UDP source port number in this manner, it
is RECOMMENDED that the value be in the dynamic/private port
range 49152-65535 [RFC6335].

Based on this, depending on the hashing calculation, the outer source UDP port can be different leading to different ECMP treatment. Does something else need to be specified here in regards to the outer UDP source port?

4. Section 7 says that “ Support for echo BFD is outside the scope of this document”.

Assuming this means “BFD Echo mode”, why is this out of scope? If this is a single logical hop underneath VXLAN, what’s preventing the use of Echo? Echo’s benefits are huge.

5. Terminology

Implementations SHOULD ensure that the BFD
packets follow the same lookup path as VXLAN data packets within the
sender system.

What is a “look up path within a sender system”?

6. Deployment scenarios

S3 says:
Figure 1 illustrates the scenario with two servers, each of them
hosting two VMs. The servers host VTEPs that terminate two VXLAN
[…]
Figure 1: Reference VXLAN Domain

However, RFC 7348 Figure 3 lists that as one deployment scenario, not as “the scenario” and “The Reference VXLAN Domain”.

Best,

Carlos.

On Jun 17, 2019, at 12:58 AM, Greg Mirsky <gregimirsky@gmail.com<mailto:gregimirsky@gmail.com>> wrote:

Hi Oliver,
thank you for your thorough review, clear and detailed questions. My apologies for the delay to respond. Please find my answers below in-line tagged GIM>>.

Regards,
Greg

On Fri, May 31, 2019 at 12:38 PM Olivier Bonaventure via Datatracker <noreply@ietf.org<mailto:noreply@ietf.org>> wrote:
Reviewer: Olivier Bonaventure
Review result: Ready with Issues

This document has been reviewed as part of the transport area review team's
ongoing effort to review key IETF documents. These comments were written
primarily for the transport area directors, but are copied to the document's
authors and WG to allow them to address any issues raised and also to the IETF
discussion list for information.

When done at the time of IETF Last Call, the authors should consider this
review as part of the last-call comments they receive. Please always CC
tsv-art@ietf.org<mailto:tsv-art@ietf.org> if you reply to or forward this review.

I have only limited knowledge of VXLAN and do not know all subtleties of BFD.
This review is thus more from a generalist than a specialist in this topic.

Major issues

Section 4 requires that " Implementations SHOULD ensure that the BFD
packets follow the same lookup path as VXLAN data packets within the
sender system."

Why is this requirement only relevant for the lookup path on the sender system
? What does this sentence really implies ?
GIM>> RFC 5880 set the scope of the fault detection of BFD protocol as
... the bidirectional path between two forwarding engines, including
interfaces, data link(s), and to the extent possible the forwarding
engines themselves ...
The requirement aimed to the forwarding engine of a BFD system that transmits BFD control packets over VXLAN tunnel.

Is it a requirement that the BFD packets follow the same path as the data
packet for a given VXLAN ? I guess so. In this case, the document should
discuss how Equal Cost Multipath could affect this.
GIM>> I think that ECMP environment is more likely to be experienced by a transit node in the underlay. If the BFD session is used to monitor the specific underlay path, then, I agree, we should explain that using the VXLAN payload information to draw path entropy may cause data and BFD packets following different underlay routes. But, on the other hand, that is the case for OAM and fault detection in all overlay networks in general.

Minor issues

Section 1

You write "The asynchronous mode of BFD, as defined in [RFC5880],
can be used to monitor a p2p VXLAN tunnel."

Why do you use the word can ? It is a possibility or a requirement ?
GIM>> In principle, BFD Demand mode may be used to monitor p2p paths as well, I agree, will re-word to more assertive:
The asynchronous mode of BFD, as defined in [RFC5880],
is used to monitor a p2p VXLAN tunnel.

NVE has not been defined before and is not in the terminology.
GIM>> Will add to the Terminology and expand as:
NVE Network Virtualization Endpoint

This entire section is not easy to read for an outsider.

Section 3

VNI has not been defined
GIM>> Will add to the Terminology section:
VNI VXLAN Network Identifier (or VXLAN Segment ID)

Figure 1 could take less space
GIM>> Yes, can make it bit denser. Would the following be an improvement?

+------------+-------------+
| Server 1 |
| +----+----+ +----+----+ |
| |VM1-1 | |VM1-2 | |
| |VNI 100 | |VNI 200 | |
| | | | | |
| +---------+ +---------+ |
| Hypervisor VTEP (IP1) |
+--------------------------+
|
| +-------------+
| | Layer 3 |
+---| Network |
+-------------+
|
+-----------+
|
+------------+-------------+
| Hypervisor VTEP (IP2) |
| +----+----+ +----+----+ |
| |VM2-1 | |VM2-2 | |
| |VNI 100 | |VNI 200 | |
| | | | | |
| +---------+ +---------+ |
| Server 2 |
+--------------------------+

Section 4

I do not see the benefits of having one paragraph in Section 4 followed by only
Section 4.1
GIM>> Will merge Section 4.1 into 4 with minor required re-wording:
4. BFD Packet Transmission over VXLAN Tunnel

BFD packet MUST be encapsulated and sent to a remote VTEP as
explained in this section. Implementations SHOULD ensure that the
BFD packets follow the same lookup path as VXLAN data packets within
the sender system.

BFD packets are encapsulated in VXLAN as described below. The VXLAN
packet format is defined in Section 5 of [RFC7348]. The Outer IP/UDP
and VXLAN headers MUST be encoded by the sender as defined in
[RFC7348].

Section 4.1

The document does not specify when a dedicated MAC address or the MAC address
of the destination VTEP must be used. This could affect the interoperability of
implementations. Should all implementations support both the dedicated MAC
address and the destination MAC address ?
GIM>> After further discussion, authors decided to remove the request for the dedicated MAC address allocation. Only the MAC address of the remote VTEP must be used as the destination MAC address in the inner Ethernet frame. Please check the attached diff between the -07 and the working versions or the working version of the draft.

It is unclear from this section whether IPv4 inside IPv6 and the opposite
should be supported or not.
GIM>> Any combination of outer IPvX and inner IPvX is possible.

Section 5.

If the received packet does not match the dedicated MAC address nor the MAC
address of the VTEP, should the packet be silently discarded or treated
differently ?
GIM>> As I've mentioned earlier, authors have decided to remove the use of the dedicated MAC address for BFD over VXLAN.

Section 5.1

Is this a modification to section 6.3 of RFC5880 ? This is not clear
GIM>> I think that this section is not modification but the definition of the application-specific procedure that is outside the scope of RFC 5880:
The method of demultiplexing the initial packets (in which Your
Discriminator is zero) is application dependent, and is thus outside
the scope of this specification.

Section 9

The sentence " Throttling MAY be relaxed for BFD packets
based on port number." is unclear.
GIM>> Yes, thank you for pointing to this. The updated text, in the whole paragraph, is as follows:
NEW TEXT:
The document requires setting the inner IP TTL to 1, which could be
used as a DDoS attack vector. Thus the implementation MUST have
throttling in place to control the rate of BFD control packets sent
to the control plane. On the other hand, over aggressive throttling
of BFD control packets may become the cause of the inability to form
and maintain BFD session at scale. Hence, throttling of BFD control
packets SHOULD be adjusted to permit BFD to work according to its
procedures.
<draft-ietf-bfd-vxlan-08.txt><Diff_ draft-ietf-bfd-vxlan-07.txt - draft-ietf-bfd-vxlan-08.txt.html>

[Tsv-art] Tsvart last call review of draft-ietf-b… Olivier Bonaventure via Datatracker
Re: [Tsv-art] Tsvart last call review of draft-ie… Greg Mirsky
Re: [Tsv-art] Tsvart last call review of draft-ie… Carlos Pignataro (cpignata)
Re: [Tsv-art] Tsvart last call review of draft-ie… Greg Mirsky
Re: [Tsv-art] Tsvart last call review of draft-ie… Carlos Pignataro (cpignata)
Re: [Tsv-art] Tsvart last call review of draft-ie… Greg Mirsky
Re: [Tsv-art] Tsvart last call review of draft-ie… Carlos Pignataro (cpignata)
[Tsv-art] Level of standardization of the Echo mo… Greg Mirsky
Re: [Tsv-art] Level of standardization of the Ech… Carlos Pignataro (cpignata)
Re: [Tsv-art] Level of standardization of the Ech… Greg Mirsky
Re: [Tsv-art] Level of standardization of the Ech… Carlos Pignataro (cpignata)
Re: [Tsv-art] Level of standardization of the Ech… Greg Mirsky
Re: [Tsv-art] Tsvart last call review of draft-ie… Jeffrey Haas
Re: [Tsv-art] Level of standardization of the Ech… Jeffrey Haas
Re: [Tsv-art] Tsvart last call review of draft-ie… Carlos Pignataro (cpignata)
Re: [Tsv-art] Level of standardization of the Ech… Carlos Pignataro (cpignata)
Re: [Tsv-art] Level of standardization of the Ech… Greg Mirsky
Re: [Tsv-art] Level of standardization of the Ech… Carlos Pignataro (cpignata)