Re: Tsvart last call review of draft-ietf-bfd-vxlan-07

Greg Mirsky <gregimirsky@gmail.com> Thu, 20 June 2019 01:09 UTC

Return-Path: <gregimirsky@gmail.com>
X-Original-To: rtg-bfd@ietfa.amsl.com
Delivered-To: rtg-bfd@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 48737120052; Wed, 19 Jun 2019 18:09:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.998
X-Spam-Level:
X-Spam-Status: No, score=-1.998 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wBGVG4jQHupB; Wed, 19 Jun 2019 18:09:34 -0700 (PDT)
Received: from mail-lj1-x22b.google.com (mail-lj1-x22b.google.com [IPv6:2a00:1450:4864:20::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A35921200F7; Wed, 19 Jun 2019 18:09:33 -0700 (PDT)
Received: by mail-lj1-x22b.google.com with SMTP id v24so977178ljg.13; Wed, 19 Jun 2019 18:09:33 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=UUURzktwcTMYd2DkXbUHEOXzutGT7XxmaaUYnXxRRrc=; b=rHz2Pt2Avr6EK41XmJPw4aFUpLebIUzJ6q4AbVARcB5RTKkR9BRwZii4qrym7HEhE1 6M2mZDs5IPdt9gRjrdULs2NPoz+1LBtmPUoWGmaxWYGsBWDgsQ6BImnvXMhtvryHCy2P nLT4Z8FB2yXl/edJpFdpPZfPXd1tcpB8GslLpTxhnp0LG2wPm0C9O6YHW+govR61fvVd jZ7TDCoWTm2CvQiyPreDZo0T+HZ3VtkfjGD1axNZwJvYNDKhJqp62HAgfguebNVmUhyq ceW/5GPwatu7n5UxbtsKQd2nXff9kftihb9IqNDNLCvsUxNeGFVsFIXnxTBmIFvgzctl GgVA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=UUURzktwcTMYd2DkXbUHEOXzutGT7XxmaaUYnXxRRrc=; b=i44SeGcQVx/ILSvi4ltxhx0YPCWB7Ws+hUnXfx48Zm1mEmcD+VS26hRLIgvNC+KS+5 AS2j5c06fk06eONfDIowvy8esFlsrMdKAJ++yf1ucS3MPaCMsxpiqas4zusHGGxC+XiX Zs7oYhd5VngTrjGSDLQ2w0tmo7D/W34q0eKzDChJJvwC2zKiTAMxNmsL2S4UGPBKWjdu MLIb+tnTQOr4r/NzYyszuYn+scwO9kGFtSDHxotkJICpP6m4z9q7LK8JYRL28rBRP+Nl n0k1V8gJw+AyU7VvjDz7cS/fBJL3EMr+5hnevuBzueV8Hg34c5gwAqdhQp7u1jV3d9qh +giQ==
X-Gm-Message-State: APjAAAWrnntyynA4iq0r05z32nmW9KAUK//OheB++1YSiGPdyFZWFqL+ psVXSXFx8WDa+47WyOuAm/rpWRxvyV9xAJClbbJGoEWa+Eo=
X-Google-Smtp-Source: APXvYqwslmi/NryDresVBDleREabpn5EB3OxVkfvDq82M42jkeZMsuMH7eCehKoPvSwqQWDwmLzn+gOxfmO/aKkL6xg=
X-Received: by 2002:a2e:834f:: with SMTP id l15mr45197868ljh.56.1560992971679; Wed, 19 Jun 2019 18:09:31 -0700 (PDT)
MIME-Version: 1.0
References: <155933149484.6565.7386019489022348116@ietfa.amsl.com> <CA+RyBmXu-F0cWDkBydE_aJaVpUv=k1otqUCc7NdRW4pnBK3tgA@mail.gmail.com> <14822B96-D3C6-495E-8661-198068F72ABA@cisco.com>
In-Reply-To: <14822B96-D3C6-495E-8661-198068F72ABA@cisco.com>
From: Greg Mirsky <gregimirsky@gmail.com>
Date: Thu, 20 Jun 2019 10:09:20 +0900
Message-ID: <CA+RyBmUMbW=B3FNmqiQNmMLM27f9G+MeRL5MrAnCd04EP3vmrQ@mail.gmail.com>
Subject: Re: Tsvart last call review of draft-ietf-bfd-vxlan-07
To: "Carlos Pignataro (cpignata)" <cpignata@cisco.com>
Cc: Olivier Bonaventure <Olivier.Bonaventure@uclouvain.be>, "tsv-art@ietf.org" <tsv-art@ietf.org>, rtg-bfd WG <rtg-bfd@ietf.org>, "draft-ietf-bfd-vxlan.all@ietf.org" <draft-ietf-bfd-vxlan.all@ietf.org>, IETF list <ietf@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000092ec62058bb6ff85"
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtg-bfd/sBlLEW8zCozn8BVdkL1zdAMh28M>
X-BeenThere: rtg-bfd@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "RTG Area: Bidirectional Forwarding Detection DT" <rtg-bfd.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtg-bfd/>
List-Post: <mailto:rtg-bfd@ietf.org>
List-Help: <mailto:rtg-bfd-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Jun 2019 01:09:38 -0000

Hi Carlos,
thank you for reminding of our continued discussion with Joel. We are
seeking comments from VXLAN experts and much appreciate if you have
insights on VXLAN to share.
I've got some clarifying questions before I can respond to you. To which
stage of the three-way handshake you refer as "initial demultiplexing"? I
couldn't find this term in RFC 5880.
Regarding the applicability of the Echo mode, thank you for pointing to the
need for stricter terminology, the Echo mode, as defined in RFC 5880, is
underspecified and it will require additional standardization. Future
drafts may explore and define how the Echo mode of BFD is used over VXLAN
tunnels.

Will review and respond to the remaining questions soon.

Regards,
Greg


On Thu, Jun 20, 2019 at 9:14 AM Carlos Pignataro (cpignata) <
cpignata@cisco.com> wrote:

> Hi,
>
> I have not reviewed this draft before, but triggered by this email, and
> briefly scanning through a couple of sections, it is unclear to me how some
> of the mechanics work.
>
> There are some major issues with the Mac usage and association, as Joel
> Halpern mentioned in his Rtg Dir review.
>
> And, additionally, please consider the following comments and questions:
>
>
> 1. Underspecification for initialization and initial demultiplexing.
>
> This document allows multiple BFD sessions between a single pair of VTEPs:
>
>    An
>    implementation that supports this specification MUST be able to
>    control the number of BFD sessions that can be created between the
>    same pair of VTEPs.
>
> The implication of this is that BFD single-hop initialization procedures
> will not work. Instead, there is a need to map the initial demultiplexing.
>
> This issue is explained in RFCs 5882 and 5883:
> https://tools.ietf.org/html/rfc5883#section-4 and
> https://tools.ietf.org/html/rfc5882#section-6
>
> Section 5.1 says:
>
>    For such packets, the BFD session MUST be identified
>    using the inner headers, i.e., the source IP, the destination IP, and
>    the source UDP port number present in the IP header carried by the
>    payload of the VXLAN encapsulated packet.  The VNI of the packet
>    SHOULD be used to derive interface-related information for
>    demultiplexing the packet.
>
> But this does not really explain how to do the initial demultiplexing.
> Does each BFD session need to have a separate inner source IP address? Or
> source UDP port? And how ofter are they recycled or kept as state? How are
> these mapped?
> Equally importantly, which side is Active?
> And what if there’s a race condition with both sides being Active and
> setting up redundant sessions?
>
> 1.b. By the way, based on this, using S-BFD [RFC 7880] might be easier to
> demux.
>
>
> 2. Security
>
> This document says that the TTL in the inner packet carrying BFD is set to
> 1. However, RFC 5880 says to use GTSM [RFC 5082], i.e., a value of 255.
>
> Why is GTSM not used here?
>
>
> 3. ECMP and fate-sharing under-specification:
>
> Section 4.1. says:
>
>    The Outer IP/UDP
>    and VXLAN headers MUST be encoded by the sender as defined in
>    [RFC7348].
>
>
> And RFC 7348 says:
>
>       -  Source Port:  It is recommended that the UDP source port number
>          be calculated using a hash of fields from the inner packet --
>          one example being a hash of the inner Ethernet frame's headers.
>          This is to enable a level of entropy for the ECMP/load-
>          balancing of the VM-to-VM traffic across the VXLAN overlay.
>          When calculating the UDP source port number in this manner, it
>          is RECOMMENDED that the value be in the dynamic/private port
>          range 49152-65535 [RFC6335].
>
>
> Based on this, depending on the hashing calculation, the outer source UDP
> port can be different leading to different ECMP treatment. Does something
> else need to be specified here in regards to the outer UDP source port?
>
>
> 4. Section 7 says that “ Support for echo BFD is outside the scope of this
> document”.
>
> Assuming this means “BFD Echo mode”, why is this out of scope? If this is
> a single logical hop underneath VXLAN, what’s preventing the use of Echo?
> Echo’s benefits are huge.
>
>
> 5. Terminology
>
>    Implementations SHOULD ensure that the BFD
>    packets follow the same lookup path as VXLAN data packets within the
>    sender system.
>
> What is a “look up path within a sender system”?
>
>
> 6. Deployment scenarios
>
> S3 says:
>    Figure 1 illustrates the scenario with two servers, each of them
>    hosting two VMs.  The servers host VTEPs that terminate two VXLAN
> […]
>                      Figure 1: Reference VXLAN Domain
>
>
> However, RFC 7348 Figure 3 lists that as one deployment scenario, not as
> “the scenario” and “The Reference VXLAN Domain”.
>
> Best,
>
> Carlos.
>
> On Jun 17, 2019, at 12:58 AM, Greg Mirsky <gregimirsky@gmail.com> wrote:
>
> Hi Oliver,
> thank you for your thorough review, clear and detailed questions. My
> apologies for the delay to respond. Please find my answers below in-line
> tagged GIM>>.
>
> Regards,
> Greg
>
> On Fri, May 31, 2019 at 12:38 PM Olivier Bonaventure via Datatracker <
> noreply@ietf.org> wrote:
>
>> Reviewer: Olivier Bonaventure
>> Review result: Ready with Issues
>>
>> This document has been reviewed as part of the transport area review
>> team's
>> ongoing effort to review key IETF documents. These comments were written
>> primarily for the transport area directors, but are copied to the
>> document's
>> authors and WG to allow them to address any issues raised and also to the
>> IETF
>> discussion list for information.
>>
>> When done at the time of IETF Last Call, the authors should consider this
>> review as part of the last-call comments they receive. Please always CC
>> tsv-art@ietf.org if you reply to or forward this review.
>>
>> I have only limited knowledge of VXLAN and do not know all subtleties of
>> BFD.
>> This review is thus more from a generalist than a specialist in this
>> topic.
>>
>> Major issues
>>
>> Section 4 requires that " Implementations SHOULD ensure that the BFD
>>    packets follow the same lookup path as VXLAN data packets within the
>>    sender system."
>>
>> Why is this requirement only relevant for the lookup path on the sender
>> system
>> ? What does this sentence really implies ?
>>
> GIM>> RFC 5880 set the scope of the fault detection of BFD protocol as
>    ... the bidirectional path between two forwarding engines, including
>    interfaces, data link(s), and to the extent possible the forwarding
>    engines themselves ...
> The requirement aimed to the forwarding engine of a BFD system that
> transmits BFD control packets over VXLAN tunnel.
>
>>
>> Is it a requirement that the BFD packets follow the same path as the data
>> packet for a given VXLAN ? I guess so. In this case, the document should
>> discuss how Equal Cost Multipath could affect this.
>>
> GIM>> I think that ECMP environment is more likely to be experienced by a
> transit node in the underlay. If the BFD session is used to monitor the
> specific underlay path, then, I agree, we should explain that using the
> VXLAN payload information to draw path entropy may cause data and BFD
> packets following different underlay routes. But, on the other hand, that
> is the case for OAM and fault detection in all overlay networks in general.
>
>>
>> Minor issues
>>
>> Section 1
>>
>> You write "The asynchronous mode of BFD, as defined in [RFC5880],
>>  can be used to monitor a p2p VXLAN tunnel."
>>
>> Why do you use the word can ? It is a possibility or a requirement ?
>>
> GIM>> In principle, BFD Demand mode may be used to monitor p2p paths as
> well, I agree, will re-word to more assertive:
>  The asynchronous mode of BFD, as defined in [RFC5880],
>  is used to monitor a p2p VXLAN tunnel.
>>
>>
>> NVE has not been defined before and is not in the terminology.
>>
> GIM>> Will add to the Terminology and expand as:
> NVE        Network Virtualization Endpoint
>
>>
>> This entire section is not easy to read for an outsider.
>>
>> Section 3
>>
>> VNI has not been defined
>>
> GIM>> Will add to the Terminology section:
> VNI    VXLAN Network Identifier (or VXLAN Segment ID)
>
>>
>> Figure 1 could take less space
>>
> GIM>> Yes, can make it bit denser. Would the following be an improvement?
>
>
>       +------------+-------------+
>       |        Server 1          |
>       | +----+----+  +----+----+ |
>       | |VM1-1    |  |VM1-2    | |
>       | |VNI 100  |  |VNI 200  | |
>       | |         |  |         | |
>       | +---------+  +---------+ |
>       | Hypervisor VTEP (IP1)    |
>       +--------------------------+
>                             |
>                             |   +-------------+
>                             |   |   Layer 3   |
>                             +---|   Network   |
>                                 +-------------+
>                                     |
>                                     +-----------+
>                                                 |
>                                          +------------+-------------+
>                                          |    Hypervisor VTEP (IP2) |
>                                          | +----+----+  +----+----+ |
>                                          | |VM2-1    |  |VM2-2    | |
>                                          | |VNI 100  |  |VNI 200  | |
>                                          | |         |  |         | |
>                                          | +---------+  +---------+ |
>                                          |      Server 2            |
>                                          +--------------------------+
>
>
>> Section 4
>>
>> I do not see the benefits of having one paragraph in Section 4 followed
>> by only
>> Section 4.1
>>
> GIM>> Will merge Section 4.1 into 4 with minor required re-wording:
> 4.  BFD Packet Transmission over VXLAN Tunnel
>
>    BFD packet MUST be encapsulated and sent to a remote VTEP as
>    explained in this section.  Implementations SHOULD ensure that the
>    BFD packets follow the same lookup path as VXLAN data packets within
>    the sender system.
>
>    BFD packets are encapsulated in VXLAN as described below.  The VXLAN
>    packet format is defined in Section 5 of [RFC7348].  The Outer IP/UDP
>    and VXLAN headers MUST be encoded by the sender as defined in
>    [RFC7348].
>
>>
>> Section 4.1
>>
>> The document does not specify when a dedicated MAC address or the MAC
>> address
>> of the destination VTEP must be used. This could affect the
>> interoperability of
>> implementations. Should all implementations support both the dedicated MAC
>> address and the destination MAC address ?
>>
> GIM>> After further discussion, authors decided to remove the request for
> the dedicated MAC address allocation. Only the MAC address of the remote
> VTEP must be used as the destination MAC address in the inner Ethernet
> frame. Please check the attached diff between the -07 and the working
> versions or the working version of the draft.
>
>>
>> It is unclear from this section whether IPv4 inside IPv6 and the opposite
>> should be supported or not.
>>
> GIM>> Any combination of outer IPvX and inner IPvX is possible.
>
>>
>> Section 5.
>>
>> If the received packet does not match the dedicated MAC address nor the
>> MAC
>> address of the VTEP, should the packet be silently discarded or treated
>> differently ?
>>
> GIM>> As I've mentioned earlier, authors have decided to remove the use of
> the dedicated MAC address for BFD over VXLAN.
>
>>
>> Section 5.1
>>
>> Is this a modification to section 6.3 of RFC5880 ? This is not clear
>>
> GIM>> I think that this section is not modification but the definition of
> the application-specific procedure that is outside the scope of RFC 5880:
>    The method of demultiplexing the initial packets (in which Your
>    Discriminator is zero) is application dependent, and is thus outside
>    the scope of this specification.
>
>>
>> Section 9
>>
>> The sentence " Throttling MAY be relaxed for BFD packets
>>    based on port number." is unclear.
>>
> GIM>> Yes, thank you for pointing to this. The updated text, in the whole
> paragraph, is as follows:
> NEW TEXT:
>    The document requires setting the inner IP TTL to 1, which could be
>    used as a DDoS attack vector.  Thus the implementation MUST have
>    throttling in place to control the rate of BFD control packets sent
>    to the control plane.  On the other hand, over aggressive throttling
>    of BFD control packets may become the cause of the inability to form
>    and maintain BFD session at scale.  Hence, throttling of BFD control
>    packets SHOULD be adjusted to permit BFD to work according to its
>    procedures.
> <draft-ietf-bfd-vxlan-08.txt><Diff_ draft-ietf-bfd-vxlan-07.txt -
> draft-ietf-bfd-vxlan-08.txt.html>
>
>
>